1Mail::SpamAssassin::PluUgsienr::CToxnRterpi(b3u)ted PerlMaDiolc:u:mSepnatmaAtsisoanssin::Plugin::TxRep(3)
2
3
4
6 Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender
7 reputation records
8
10 The TxRep (Reputation) plugin is designed as an improved replacement of
11 the AWL (Auto-Welcomelist) plugin. It adjusts the final message spam
12 score by looking up and taking in consideration the reputation of the
13 sender.
14
15 To try TxRep out, you have to first disable the AWL plugin (if
16 enabled), and back up its database. AWL is loaded in v310.pre and can
17 be disabled by commenting out the loadplugin line:
18
19 # loadplugin Mail::SpamAssassin::Plugin::AWL
20
21 When AWL is not disabled, TxRep will refuse to run.
22
23 TxRep should be enabled by uncommenting the following line in v341.pre:
24
25 loadplugin Mail::SpamAssassin::Plugin::TxRep
26
27 Use the supplied 60_txreputation.cf file or add these lines to a .cf
28 file:
29
30 header TXREP eval:check_senders_reputation()
31 describe TXREP Score normalizing based on sender's reputation
32 tflags TXREP userconf noautolearn
33 priority TXREP 1000
34
36 This plugin is intended to replace the former AWL - AutoWelcomeList.
37 Although the concept and the scope differ, the purpose remains the same
38 - the normalizing of spam score results based on previous sender's
39 history. The name was intentionally changed from "whitelist" to
40 "reputation" to avoid any confusion, since the result score can be
41 adjusted in both directions.
42
43 The TxRep plugin keeps track of the average SpamAssassin score for
44 senders. Senders are tracked using multiple identificators, or their
45 combinations: the From: email address, the originating IP and/or an
46 originating block of IPs, sender's domain name, the DKIM signature, and
47 the HELO name. TxRep then uses the average score to reduce the
48 variability in scoring from message to message, and modifies the final
49 score by pushing the result towards the historical average. This
50 improves the accuracy of filtering for most email.
51
52 In comparison with the original AWL plugin, several conceptual changes
53 were implemented in TxRep:
54
55 1. Scoring - at AWL, although it tracks the number of messages received
56 from each respective sender, when calculating the corrective score at a
57 new message, it does not take it in count in any way. So for example a
58 sender who previously sent a single ham message with the score of -5,
59 and then sends a second one with the score of +10, AWL will issue a
60 corrective score bringing the score towards the -5. With the default
61 "auto_welcomelist_factor" of 0.5, the resulting score would be only
62 2.5. And it would be exactly the same even if the sender previously
63 sent 1,000 messages with the average of -5. TxRep tries to take the
64 maximal advantage of the collected data, and adjusts the final score
65 not only with the mean reputation score stored in the database, but
66 also respecting the number of messages already seen from the sender.
67 You can see the exact formula in the section ""txrep_factor"".
68
69 2. Learning - AWL ignores any spam/ham learning. In fact it acts
70 against it, which often leads to a frustrating situation, where a user
71 repeatedly tags all messages of a given sender as spam (resp. ham), but
72 at any new message from the sender, AWL will adjust the score of the
73 message back to the historical average which does not include the
74 learned scores. This is now changed at TxRep, and every spam/ham
75 learning will be recorded in the reputation database, and hence taken
76 in consideration at future email from the respective sender. See the
77 section "LEARNING SPAM / HAM" for more details.
78
79 3. Auto-Learning - in certain situations SpamAssassin may declare a
80 message an obvious spam resp. ham, and launch the auto-learning
81 process, so that the message can be re-evaluated. AWL, by design, did
82 not perform any auto-learning adjustments. This plugin will readjust
83 the stored reputation by the value defined by ""txrep_learn_penalty""
84 resp. ""txrep_learn_bonus"". Auto-learning score thresholds may be
85 tuned, or the auto-learning completely disabled, through the setting
86 ""txrep_autolearn"".
87
88 4. Relearning - messages that were wrongly learned or auto-learned, can
89 be relearned. Old reputations are removed from the database, and new
90 ones added instead of them. The relearning works better when message
91 tracking is enabled through the ""txrep_track_messages"" option.
92 Without it, the relearned score is simply added to the reputation,
93 without removing the old ones.
94
95 5. Aging - with AWL, any historical record of given sender has the same
96 weight. It means that changes in senders behavior, or modified SA rules
97 may take long time, or be virtually negated by the AWL normalization,
98 especially at senders with high count of past messages, and low recent
99 frequency. It also turns to be particularly counterproductive when the
100 administrator detects new patterns in certain messages, and applies new
101 rules to better tag such messages as spam or ham. AWL will practically
102 eliminate the effect of the new rules, by adjusting the score back
103 towards the (wrong) historical average. Only setting the
104 "auto_welcomelist_factor" lower would help, but in the same time it
105 would also reduce the overall impact of AWL, and put doubts on its
106 purpose. TxRep, besides the ""txrep_factor"" (replacement of the
107 "auto_welcomelist_factor"), introduces also the
108 ""txrep_dilution_factor"" to help coping with this issue by
109 progressively reducing the impact of past records. More details can be
110 found in the description of the factor below.
111
112 6. Blocklisting and Welcomelisting - when a welcomelisting or
113 blocklisting was requested through SpamAssassin's API, AWL adjusts the
114 historical total score of the plain email address without IP (and
115 deleted records bound to an IP), but since during the reception new
116 records with IP will be added, the blocklisted entry would cease acting
117 during scanning. TxRep always uses the record of the plain email
118 address without IP together with the one bound to an IP address, DKIM
119 signature, or SPF pass (unless the weight factor for the EMAIL
120 reputation is set to zero). AWL uses the score of 100 (resp. -100) for
121 the blocklisting (resp. welcomelisting) purposes. TxRep increases the
122 value proportionally to the weight factor of the EMAIL reputation. It
123 is explained in details in the section " WELCOMELISTING" in
124 BLOCKLISTING . TxRep can blocklist or welcomelist also IP addresses,
125 domain names, and dotless HELO names.
126
127 7. Sender Identification - AWL identifies a sender on the basis of the
128 email address used, and the originating IP address (better told its
129 part defined by the mask setting). The main purpose of this measure is
130 to avoid assigning false good scores to spammers who spoof known email
131 addresses. The disadvantage appears at senders who send from frequently
132 changing locations or even when connecting through dynamical IP
133 addresses that are not within the block defined by the mask setting.
134 Their score is difficult or sometimes impossible to track. Another
135 disadvantage is, for example, at a spammer persistently sending spam
136 from the same IP address, just under different email addresses. AWL
137 will not find his previous scores, unless he reuses the same email
138 address again. TxRep uses several identificators, and creates separate
139 database entries for each of them. It tracks not only the email/IP
140 address combination like AWL, but also the standalone email address
141 (regardless of the originating IP), the standalone IP (regardless of
142 email address used), the domain name of the email address, the DKIM
143 signature, and the HELO name of the connecting PC. The influence of
144 each individual identificator may be tuned up with the help of weight
145 factors described in the section "REPUTATION WEIGHTS".
146
147 8. Message Tracking - TxRep (optionally) keeps track of already scanned
148 and/or learned message ID's. This is useful for avoiding to strengthen
149 the reputation score by simply rescanning or relearning the same
150 message multiple times. In the same time it also allows the proper
151 relearning of once wrongly learned messages, or relearning them after
152 the learn penalty or bonus were changed. See the option
153 ""txrep_track_messages"".
154
155 9. User and Global Storages - usually it is recommended to use the per-
156 user setup of SpamAssassin, because each user may have quite different
157 requirements, and may receive quite different sort of email. Especially
158 when using the Bayesian and AWL plugins, the efficiency is much better
159 when SpamAssassin is learned spam and ham separately for each user.
160 However, the disadvantage is that senders and emails already learned
161 many times by different users, will need to be relearned without any
162 recognized history, anytime they arrive to another user. TxRep uses the
163 advantages of both systems. It can use dual storages: the global common
164 storage, where all email processed by SpamAssassin is recorded, and a
165 local storage separate for each user, with reputation data from his
166 email only. See more details at the setting
167 ""txrep_user2global_ratio"".
168
169 10. Outbound Welcomelisting - when a local user sends messages to an
170 email address, we assume that he needs to see the eventual answer too,
171 hence the recipient's address should be welcomelisted. When
172 SpamAssassin is used for scanning outgoing email too, when local users
173 use the SMTP server where SA is installed, for sending email, and when
174 internal networks are defined, TxREP will improve the reputation of all
175 'To:' and 'CC' addresses from messages originating in the internal
176 networks. Details can be found at the setting
177 ""txrep_welcomelist_out"".
178
179 Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable
180 the AWL to allow TxRep running. TxRep reuses the database handling of
181 the original AWL module, and some its parameters bound to the database
182 handler modules. By default, TxRep creates its own database, but the
183 original auto-welcomelist can be reused as a starting point. The AWL
184 database can be renamed to the name defined in TxRep settings, and
185 TxRep will start using it. The original auto-welcomelist database has
186 to be backed up, to allow switching back to the original state.
187
188 The spamassassin/Plugin/TxRep.pm file replaces both
189 spamassassin/Plugin/AWL.pm and spamassassin/AutoWelcomelist.pm. Another
190 two AWL files, spamassassin/DBBasedAddrList.pm and
191 spamassassin/SQLBasedAddrList.pm are still needed.
192
194 This plugin module adds the following "tags" that can be used as
195 placeholders in certain options. See Mail::SpamAssassin::Conf for more
196 information on TEMPLATE TAGS.
197
198 _TXREPXXXY_ TXREP modifier
199 _TXREPXXXYMEAN_ Mean score on which TXREP modification is based
200 _TXREPXXXYCOUNT_ Number of messages on which TXREP modification is based
201 _TXREPXXXYPRESCORE_ Score before TXREP
202 _TXREPXXXYUNKNOWN_ New sender (not found in the TXREP list)
203
204 The XXX part of the tag takes the form of one of the following IDs,
205 depending on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or
206 HELO. The Y appendix ID is used only in the case of dual storage, and
207 takes the form of either U (for user storage reputations), or G (for
208 global storage reputations).
209
211 The following options can be used in both site-wide ("local.cf") and
212 user-specific ("user_prefs") configuration files to customize how
213 SpamAssassin handles incoming email messages.
214
215 use_txrep
216 0 | 1 (default: 0)
217
218 Whether to use TxRep reputation system. TxRep tracks the long-term
219 average score for each sender and then shifts the score of new
220 messages toward that long-term average. This can increase or
221 decrease the score for messages, depending on the long-term
222 behavior of the particular correspondent.
223
224 Note that certain tests are ignored when determining the final
225 message score:
226
227 - rules with tflags set to 'noautolearn'
228
229 txrep_factor
230 range [0..1] (default: 0.5)
231
232 How much towards the long-term mean for the sender to regress a
233 message. Basically, the algorithm is to track the long-term total
234 score and the count of messages for the sender ("total" and
235 "count"), and then once we have otherwise fully calculated the
236 score for this message ("score"), we calculate the final score for
237 the message as:
238
239 finalscore = score + factor * (total + score)/(count + 1)
240
241 So if "factor" = 0.5, then we'll move to half way between the
242 calculated score and the new mean value. If "factor" = 0.3, then
243 we'll move about 1/3 of the way from the score toward the mean.
244 "factor" = 1 means use the long-term mean including also the new
245 unadjusted score; "factor" = 0 mean just use the calculated score,
246 disabling so the score averaging, though still recording the
247 reputation to the database.
248
249 txrep_dilution_factor
250 range [0.7..1.0] (default: 0.98)
251
252 At any new email from given sender, the historical reputation
253 records are "diluted", or "watered down" by certain fraction given
254 by this factor. It means that the influence of old records will
255 progressively diminish with every new message from given sender.
256 This is important to allow a more flexible handling of changes in
257 sender's behavior, or new improvements or changes of local SA
258 rules.
259
260 Without any dilution expiry (dilution factor set to 1), the new
261 message score is simply add to the total score of given sender in
262 the reputation database. When dilution is used (factor < 1), the
263 impact of the historical reputation average is reduced by the
264 factor before calculating the new average, which in turn is then
265 used to adjust the new total score to be stored in the database.
266
267 newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
268
269 In other words, it means that the older a message is, the less and
270 less impact on the new average its original spam score has. For
271 example if we set the factor to 0.9 (meaning dilution by 10%), the
272 score of the new message will be recorded to its 100%, the last
273 score of the same sender to 90%, the second last to 81% (0.9 * 0.9
274 = 0.81), and for example the 10th last message just to 35%.
275
276 At stable systems, we recommend keeping the factor close to 1 (but
277 still lower than 1). At systems where SA rules tuning and spam
278 learning is still in progress, lower factors will help the
279 reputation to quicker adapt any modifications. In the same time, it
280 will also reduce the impact of the historical reputation though.
281
282 txrep_learn_penalty
283 range [0..200] (default: 20)
284
285 When SpamAssassin is trained a SPAM message, the given penalty
286 score will be added to the total reputation score of the sender,
287 regardless of the real spam score. The impact of the penalty will
288 be the smaller the higher is the number of messages that the sender
289 already has in the TxRep database.
290
291 txrep_learn_bonus
292 range [0..200] (default: 20)
293
294 When SpamAssassin is trained a HAM message, the given penalty score
295 will be deduced from the total reputation score of the sender,
296 regardless of the real spam score. The impact of the penalty will
297 be the smaller the higher is the number of messages that the sender
298 already has in the TxRep database.
299
300 txrep_autolearn
301 range [0..5] (default: 0)
302
303 When SpamAssassin declares a message a clear spam resp. ham during
304 the message scan, and launches the auto-learn process, sender
305 reputation scores of given message will be adjusted by the value of
306 the option ""txrep_learn_penalty"", resp. the ""txrep_learn_bonus""
307 in the same way as during the manual learning. Value 0 at this
308 option disables the auto-learn reputation adjustment - only the
309 score calculated before the auto-learn will be stored to the
310 reputation database.
311
312 txrep_track_messages
313 0 | 1 (default: 1)
314
315 Whether TxRep should keep track of already scanned and/or learned
316 messages. When enabled, an additional record in the reputation
317 database will be created to avoid false score adjustments due to
318 repeated scanning of the same message, and to allow proper
319 relearning of messages that were either previously wrongly learned,
320 or need to be relearned after modifying the learn penalty or bonus.
321
322 txrep_welcomelist_out
323 range [0..200] (default: 10)
324
325 Previously txrep_whitelist_out which will work interchangeably
326 until 4.1.
327
328 When the value of this setting is greater than zero, recipients of
329 messages sent from within the internal networks will be
330 welcomelisted through improving their total reputation score with
331 the number of points defined by this setting. Since the IP address
332 and other sender identificators are not known when sending the
333 email, only the reputation of the standalone email is being
334 welcomelisted. The domain name is intentionally also left
335 unaffected. The outbound welcomelisting can only work when
336 SpamAssassin is set up to scan also outgoing email, when local
337 users use the SMTP server for sending email, and when
338 "internal_networks" are defined in SpamAssassin configuration. The
339 improving of the reputation happens at every message sent from
340 internal networks, so the more messages is being sent to the
341 recipient, the better reputation his email address will have.
342
343 txrep_ipv4_mask_len
344 range [0..32] (default: 16)
345
346 The AWL database keeps only the specified number of most-
347 significant bits of an IPv4 address in its fields, so that
348 different individual IP addresses within a subnet belonging to the
349 same owner are managed under a single database record. As we have
350 no information available on the allocated address ranges of
351 senders, this CIDR mask length is only an approximation. The
352 default is 16 bits, corresponding to a former class B. Increase the
353 number if a finer granularity is desired, e.g. to 24 (class C) or
354 32. A value 0 is allowed but is not particularly useful, as it
355 would treat the whole internet as a single organization. The number
356 need not be a multiple of 8, any split is allowed.
357
358 txrep_ipv6_mask_len
359 range [0..128] (default: 48)
360
361 The AWL database keeps only the specified number of most-
362 significant bits of an IPv6 address in its fields, so that
363 different individual IP addresses within a subnet belonging to the
364 same owner are managed under a single database record. As we have
365 no information available on the allocated address ranges of
366 senders, this CIDR mask length is only an approximation. The
367 default is 48 bits, corresponding to an address range commonly
368 allocated to individual (smaller) organizations. Increase the
369 number for a finer granularity, e.g. to 64 or 96 or 128, or
370 decrease for wider ranges, e.g. 32. A value 0 is allowed but is
371 not particularly useful, as it would treat the whole internet as a
372 single organization. The number need not be a multiple of 4, any
373 split is allowed.
374
375 user_awl_sql_override_username
376 string (default: undefined)
377
378 Used by the SQLBasedAddrList storage implementation.
379
380 If this option is set the SQLBasedAddrList module will override the
381 set username with the value given. This can be useful for
382 implementing global or group based TxRep databases.
383
384 txrep_user2global_ratio
385 range [0..10] (default: 0)
386
387 When the option txrep_user2global_ratio is set to a value greater
388 than zero, and if the server configuration allows it, two data
389 storages will be used - user and global (server-wide) storages.
390
391 User storage keeps only senders who send messages to the respective
392 recipient, and will reflect also the corrected/learned scores, when
393 some messages are marked by the user as spam or ham, or when the
394 sender is welcomelisted or blocklisted through the API of
395 SpamAssassin.
396
397 Global storage keeps the reputation data of all messages processed
398 by SpamAssassin with their spam scores and spam/ham learning data
399 from all users on the server. Hence, the module will return a
400 reputation value even at senders not known to the current
401 recipient, as long as he already sent email to anyone else on the
402 server.
403
404 The value of the txrep_user2global_ratio parameter controls the
405 impact of each of the two reputations. When equal to 1, both the
406 global and the user score will have the same impact on the result.
407 When set to 2, the reputation taken from the user storage will have
408 twice the impact of the global value. The final value of the TXREP
409 tag will be calculated as follows:
410
411 total = ( ratio * user + global ) / ( ratio + 1 )
412
413 When no reputation is found in the user storage, and a global
414 reputation is available, the global storage is used fully, without
415 applying the ratio.
416
417 When the ratio is set to zero, only the default storage will be
418 used. And it then depends whether you use the global, or the local
419 user storage by default, which in turn is controlled either by the
420 parameter user_awl_sql_override_username (in case of SQL storage),
421 or the "/auto_welcomelist_path" parameter (in case of Berkeley
422 database).
423
424 When this dual storage is enabled, and no global storage is defined
425 by the above mentioned parameters for the Berkeley or SQL
426 databases, TxRep will attempt to use a generic storage - user
427 'GLOBAL' in case of SQL, and in the case of Berkeley database it
428 uses the path defined by '__local_state_dir__/tx-reputation', which
429 typically renders into /var/db/spamassassin/tx-reputation. When the
430 default storages are not available, or are not writable, you would
431 have to set the global storage with the help of the
432 "user_awl_sql_override_username" resp. "auto_welcomelist_path
433 settings".
434
435 Please note that some SpamAssassin installations run always under
436 the same user ID. In such case it is pointless enabling the dual
437 storage, because it would maximally lead to two identical global
438 storages in different locations.
439
440 This feature is disabled by default.
441
442 auto_welcomelist_distinguish_signed (default: 1 - enabled)
443 Previously auto_welcomelist_distinguish_signed which will work
444 interchangeably until 4.1.
445
446 Used by the SQLBasedAddrList storage implementation.
447
448 If this option is set the SQLBasedAddrList module will keep
449 separate database entries for DKIM-validated e-mail addresses and
450 for non-validated ones. Without this option, or for domains that do
451 not use a DKIM signature, the reputation of legitimate email can
452 get mixed with the reputation of forgeries. A pre-requisite when
453 setting this option is that a field txrep.signedby exists in a SQL
454 table, otherwise SQL operations will fail. A DKIM plugin must also
455 be enabled in order for this option to take effect. This option is
456 highly recommended. Unless you are using a pre-3.3.0 database
457 schema and cannot upgrade, there is no reason to disable this
458 option. If you are upgrading from AWL and using a pre-3.3.0 schema,
459 the txrep.signedby column will not exist. It is recommended that
460 you add this column, but if that is not possible you must set this
461 option to 0 to avoid SQL errors.
462
463 txrep_spf
464 0 | 1 (default: 1)
465
466 When enabled, TxRep will treat any IP address using a given email
467 address as the same authorized identity, and will not associate any
468 IP address with it. (The same happens with valid DKIM signatures.
469 No option available for DKIM).
470
471 Note: at domains that define the useless SPF +all (pass all), no IP
472 would be ever associated with the email address, and all addresses
473 (incl. the forged ones) would be treated as coming from the
474 authorized source. However, such domains are hopefully rare, and
475 ask for this kind of treatment anyway.
476
477 REPUTATION WEIGHTS
478 The overall reputation of the sender comprises several elements:
479
480 1) The reputation of the 'From' email address bound to the originating
481 IP address fraction (see the mask parameters for details)
482 2) The reputation of the 'From' email address alone (regardless the IP
483 address being currently used)
484 3) The reputation of the domain name of the 'From' email address
485 4) The reputation of the originating IP address, regardless of sender's
486 email address
487 5) The reputation of the HELO name of the originating computer (if
488 available)
489
490 Each of these partial reputations is weighted with the help of these
491 parameters, and the overall reputation is calculation as the sum of the
492 individual reputations divided by the sum of all their weights:
493
494 sender_reputation = weight_email * rep_email +
495 weight_email_ip * rep_email_ip +
496 weight_domain * rep_domain +
497 weight_ip * rep_ip +
498 weight_helo * rep_helo
499
500 You can disable the individual partial reputations by setting their
501 respective weight to zero. This will also reduce the size of the
502 database, since each partial reputation requires a separate entry in
503 the database table. Disabling some of the partial reputations in this
504 way may also help with the performance on busy servers, because the
505 respective database lookups and processing will be skipped too.
506
507 txrep_weight_email
508 range [0..10] (default: 3)
509
510 This weight factor controls the influence of the reputation of the
511 standalone email address, regardless of the originating IP address.
512 When adjusting the weight, you need to keep on mind that an email
513 address can be easily spoofed, and hence spammers can use 'from'
514 email addresses belonging to senders with good reputation. From
515 this point of view, the email address bound to the originating IP
516 address is a more reliable indicator for the overall reputation.
517
518 On the other hand, some reputable senders may be sending from a
519 bigger number of IP addresses, so looking for the reputation of the
520 standalone email address without regarding the originating IP has
521 some sense too.
522
523 We recommend using a relatively low value for this partial
524 reputation.
525
526 txrep_weight_email_ip
527 range [0..10] (default: 10)
528
529 This is the standard reputation used in the same way as it was by
530 the original AWL plugin. Each sender's email address is bound to
531 the originating IP, or its part as defined by the
532 txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
533
534 At a user sending from multiple locations, diverse mail servers, or
535 from a dynamic IP range out of the masked block, his email address
536 will have a separate reputation value for each of the different
537 (partial) IP addresses.
538
539 When the option auto_welcomelist_distinguish_signed is enabled, in
540 contrary to the original AWL module, TxRep does not record the IP
541 address when DKIM signature is detected. The email address is then
542 not bound to any IP address, but rather just to the DKIM signature,
543 since it is considered that it authenticates the sender more
544 reliably than the IP address (which can also vary).
545
546 This is by design the most relevant reputation, and its weight
547 should be kept high.
548
549 txrep_weight_domain
550 range [0..10] (default: 2)
551
552 Some spammers may use always their real domain name in the email
553 address, just with multiple or changing local parts. This
554 reputation will record the spam scores of all messages send from
555 the respective domain, regardless of the local part (user name)
556 used.
557
558 Similarly as with the email_ip reputation, the domain reputation is
559 also bound to the originating address (or a masked block, if mask
560 parameters used). It avoids giving false reputation based on
561 spoofed email addresses.
562
563 In case of a DKIM signature detected, the signature signer is used
564 instead of the domain name extracted from the email address. It is
565 considered that the signing authority is responsible for sending
566 email of any domain name, hence the same reputation applies here.
567
568 The domain reputation will give relevant picture about the owner of
569 the domain in case of small servers, or corporation with strict
570 policies, but will be less relevant for freemailers like Gmail,
571 Hotmail, and similar, because both ham and spam may be sent by
572 their users.
573
574 The default value is set relatively low. Higher weight values may
575 be useful, but we recommend caution and observing the scores before
576 increasing it.
577
578 txrep_weight_ip
579 range [0..10] (default: 4)
580
581 Spammers can send through the same relay (incl. compromised hosts)
582 under a multitude of email addresses. This is the exact case when
583 the IP reputation can help. This reputation is a kind of a local
584 RBL.
585
586 The weight is set by default lower than for the email_IP
587 reputation, because there may be cases when the same IP address
588 hosts both spammers and acceptable senders (for example the
589 marketing department of a company sends you spam, but you still
590 need to get messages from their billing address).
591
592 txrep_weight_helo
593 range [0..10] (default: 0.5)
594
595 Big number of spam messages come from compromised hosts, often
596 personal computers, or top-boxes. Their NetBIOS names are usually
597 used as the HELO name when connecting to your mail server. Some of
598 the names are pretty generic and hence may be shared by a big
599 number of hosts, but often the names are quite unique and may be a
600 good indicator for detecting a spammer, despite that he uses
601 different email and IP addresses (spam can come also from portable
602 devices).
603
604 No IP address is bound to the HELO name when stored to the
605 reputation database. This is intentional, and despite the
606 possibility that numerous devices may share some of the HELO names.
607
608 This option is still considered experimental, hence the low weight
609 value, but after some testing it could be likely at least slightly
610 increased.
611
613 These settings differ from the ones above, in that they are considered
614 'more privileged' -- even more than the ones in the PRIVILEGED SETTINGS
615 section. No matter what "allow_user_rules" is set to, these can never
616 be set from a user's "user_prefs" file.
617
618 txrep_factory module
619 (default: Mail::SpamAssassin::DBBasedAddrList)
620
621 Select alternative database factory module for the TxRep database.
622
623 auto_welcomelist_path /path/filename
624 (default: ~/.spamassassin/tx-reputation)
625
626 Previously auto_whitelist_path which will work interchangeably
627 until 4.1.
628
629 This is the TxRep directory and filename. By default, each user
630 has their own reputation database in their "~/.spamassassin"
631 directory with mode 0700. For system-wide SpamAssassin use, you
632 may want to share this across all users.
633
634 auto_welcomelist_db_modules Module ...
635 (default: see below)
636
637 Previously auto_whitelist_db_modules which will work
638 interchangeably until 4.1.
639
640 What database modules should be used for the TxRep storage database
641 file. The first named module that can be loaded from the Perl
642 include path will be used. The format is:
643
644 PreferredModuleName SecondBest ThirdBest ...
645
646 ie. a space-separated list of Perl module names. The default is:
647
648 DB_File GDBM_File SDBM_File
649
650 NDBM_File is not supported (see SpamAssassin bug 4353).
651
652 auto_welcomelist_file_mode
653 (default: 0700)
654
655 Previously auto_whitelist_file_mode which will work interchangeably
656 until 4.1.
657
658 The file mode bits used for the TxRep directory or file.
659
660 Make sure you specify this using the 'x' mode bits set, as it may
661 also be used to create directories. However, if a file is created,
662 the resulting file will not have any execute bits set (the umask is
663 set to 0111).
664
665 user_awl_dsn DBI:databasetype:databasename:hostname:port
666 Used by the SQLBasedAddrList storage implementation.
667
668 This will set the DSN used to connect. Example:
669 "DBI:mysql:spamassassin:localhost"
670
671 user_awl_sql_username username
672 Used by the SQLBasedAddrList storage implementation.
673
674 The authorized username to connect to the above DSN.
675
676 user_awl_sql_password password
677 Used by the SQLBasedAddrList storage implementation.
678
679 The password for the database username, for the above DSN.
680
681 user_awl_sql_table tablename
682 (default: txrep)
683
684 Used by the SQLBasedAddrList storage implementation.
685
686 The table name where reputation is to be stored in, for the above
687 DSN.
688
690 When asked by SpamAssassin to blocklist or welcomelist a user, the
691 TxRep plugin adds a score of 100 (for blocklisting) or -100 (for
692 welcomelisting) to the given sender's email address. At a plain address
693 without any IP address, the value is multiplied by the ratio of total
694 reputation weight to the EMAIL reputation weight to account for the
695 reduced impact of the standalone EMAIL reputation when calculating the
696 overall reputation.
697
698 total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
699 blocklisted_reputation = 100 * total_weight / weight_email
700
701 When a standalone email address is blocklisted/welcomelisted, all
702 records of the email address bound to an IP address, DKIM signature, or
703 a SPF pass will be removed from the database, and only the standalone
704 record is kept.
705
706 Besides blocklisting/welcomelisting of standalone email addresses, the
707 same method may be used also for blocklisting/welcomelisting of IP
708 addresses, domain names, and HELO names (only dotless Netbios HELO
709 names can be used).
710
711 When welcomelisting/blocklisting an email address or domain name, you
712 can bind them to a specified DKIM signature or SPF record by appending
713 the DKIM signing domain or the tag 'spf' after the ID in the following
714 way:
715
716 spamassassin --add-addr-to-blocklist=spamming.biz,spf
717 spamassassin --add-addr-to-welcomelist=friend@good.org,good.org
718
719 When a message contains both a DKIM signature and an SPF pass, the DKIM
720 signature takes the priority, so the record bound to the 'spf' tag
721 won't be checked. Only email addresses and domains can be bound to DKIM
722 or SPF. Records of IP addresses and HELO names are always without
723 DKIM/SPF.
724
725 In case of dual storage, the block/welcomelisting is performed only in
726 the default storage.
727
729 1. The most significant sender identificator is equally as at AWL, the
730 combination of the email address and the originating IP address,
731 resp.
732 its part defined by the IPv4 resp. IPv6 mask setting.
733
734 2. No IP checking for standalone EMAIL address reputation
735
736 3. No signature checking for IP reputation, and for HELO name
737 reputation
738
739 4. The EMAIL_IP weight, and not the standalone EMAIL weight is used
740 when
741 no IP address is available (EMAIL_IP is the main indicator, and has
742 the highest weight)
743
744 5. No IP checking at signed emails (signature authenticates the email
745 instead of the IP address)
746
747 6. No IP checking at SPF pass (we assume the domain owner is
748 responsible
749 for all IP's he authorizes to send from, hence we use the same
750 identity
751 for all of them)
752
753 7. No signature used for standalone EMAIL reputation (would be
754 redundant,
755 since no IP is used at signed EMAIL_IP reputation, and we would
756 store
757 two identical hits)
758
759 8. When available, the DKIM signer is used instead of the domain name
760 for
761 the DOMAIN reputation
762
763 9. No IP and no signature used for HELO reputation (despite the
764 possibility
765 of the possible existence of multiple computers with the same HELO)
766
767 10. The full (unmasked IP) address is used (in the address field,
768 instead the
769 IP field) for the standalone IP reputation
770
772 When SpamAssassin is told to learn (or relearn) a given message as spam
773 or ham, all reputations relevant to the message (email, email_ip,
774 domain, ip, helo) in both global and user storages will be updated
775 using the "txrep_learn_penalty" respectively the "rxrep_learn_bonus"
776 values. The new reputation of given sender property (email, domain,...)
777 will be the respective result of one of the following formulas:
778
779 new_reputation = old_reputation + learn_penalty
780 new_reputation = old_reputation - learn_bonus
781
782 The TxRep plugin currently does track each message individually, hence
783 it does not detect when you learn the message repeatedly. It will
784 add/subtract the penalty/bonus score each time the message is fed to
785 the spam learner.
786
788 TxRep can be optimized for speed and simplicity, or for the precision
789 in assigning the reputation scores.
790
791 First of all TxRep can be quickly disabled and re-enabled through the
792 option ""use_txrep"". It can be done globally, or individually in each
793 respective "user_prefs". Disabling TxRep will not destroy the database,
794 so it can be re-enabled any time later again.
795
796 On many systems, SQL-based storage may perform faster than the default
797 Berkeley DB storage, so you should consider setting it up.
798
799 Then there are multiple settings that can reduce the number of records
800 stored in the database, hence reducing the size of the storage, and
801 also the processing time:
802
803 1. Setting ""txrep_user2global_ratio"" to zero will disable the dual
804 storage, halving so the disk space requirements, and the processing
805 times of this plugin.
806
807 2. You can disable all but one of the "REPUTATION WEIGHTS". The
808 EMAIL_IP is the most specific option, so it is the most likely choice
809 in such case, but you could base the reputation system on any of the
810 remaining scores. Each of the enabled reputations adds a new entry to
811 the database for each new identificator. So while for example the
812 number of recorded and scored domains may be big, the number of stored
813 IP addresses will be probably higher, and would require more space in
814 the storage.
815
816 3. Disabling the ""txrep_track_messages"" avoids storing a separate
817 entry for every scanned message, hence also reducing the disk space
818 requirements, and the processing time.
819
820 4. Disabling the option ""txrep_autolearn"" will save the processing
821 time at messages that trigger the auto-learning process.
822
823 5. Disabling ""txrep_welcomelist_out"" will reduce the processing time
824 at outbound connections.
825
826 6. Keeping the option ""auto_welcomelist_distinguish_signed"" enabled
827 may help slightly reducing the size of the database, because at signed
828 messages, the originating IP address is ignored, hence no additional
829 database entries are needed for each separate IP address (resp. a
830 masked block of IP addresses).
831
832 Since TxRep reuses the storage architecture of the former AWL plugin,
833 for initializing the SQL storage, the same instructions apply also to
834 TxRep. Although the old AWL table can be reused for TxRep, by default
835 TxRep expects the SQL table to be named "txrep".
836
837 To install a new SQL table for TxRep, run the appropriate SQL file for
838 your system under the /sql directory.
839
840 If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
841 instead of ENGINE=MyISAM at the end of the command. You can also use
842 other types of ENGINE (depending on what is available on your system).
843 For example MEMORY engine stores the entire table in the server memory,
844 achieving performance similar to Redis. You would need to care about
845 the replication of the RAM table to disk through a cronjob, to avoid
846 loss of data at reboot. The InnoDB engine is used by default, offering
847 high scalability (database size and concurrence of accesses). In
848 conjunction with a high value of innodb_buffer_pool or with the
849 memcached plugin (MySQL v5.6+) it can also offer performance comparable
850 to Redis.
851
852
853
854perl v5.36.0 2023-01-2M1ail::SpamAssassin::Plugin::TxRep(3)