Mail::SpamAssassin::Plugin::TxRep(3pm)

1Mail::SpamAssassin::PluUgsienr::CToxnRterpi(b3u)ted PerlMaDiolc:u:mSepnatmaAtsisoanssin::Plugin::TxRep(3)
2
3
4

NAME

6       Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender
7       reputation records
8

SYNOPSIS

10       The TxRep (Reputation) plugin is designed as an improved replacement of
11       the AWL (Auto-Welcomelist) plugin. It adjusts the final message spam
12       score by looking up and taking in consideration the reputation of the
13       sender.
14
15       To try TxRep out, you have to first disable the AWL plugin (if
16       enabled), and back up its database. AWL is loaded in v310.pre and can
17       be disabled by commenting out the loadplugin line:
18
19        # loadplugin   Mail::SpamAssassin::Plugin::AWL
20
21       When AWL is not disabled, TxRep will refuse to run.
22
23       TxRep should be enabled by uncommenting the following line in v341.pre:
24
25         loadplugin   Mail::SpamAssassin::Plugin::TxRep
26
27       Use the supplied 60_txreputation.cf file or add these lines to a .cf
28       file:
29
30        header         TXREP   eval:check_senders_reputation()
31        describe       TXREP   Score normalizing based on sender's reputation
32        tflags         TXREP   userconf noautolearn
33        priority       TXREP   1000
34

DESCRIPTION

36       This plugin is intended to replace the former AWL - AutoWelcomeList.
37       Although the concept and the scope differ, the purpose remains the same
38       - the normalizing of spam score results based on previous sender's
39       history. The name was intentionally changed from "whitelist" to
40       "reputation" to avoid any confusion, since the result score can be
41       adjusted in both directions.
42
43       The TxRep plugin keeps track of the average SpamAssassin score for
44       senders.  Senders are tracked using multiple identificators, or their
45       combinations: the  From: email address, the originating IP and/or an
46       originating block of IPs, sender's domain name, the DKIM signature, and
47       the HELO name. TxRep then uses the average score to reduce the
48       variability in scoring from message to message, and modifies the final
49       score by pushing the result towards the historical average. This
50       improves the accuracy of filtering for most email.
51
52       In comparison with the original AWL plugin, several conceptual changes
53       were implemented in TxRep:
54
55       1. Scoring - at AWL, although it tracks the number of messages received
56       from each respective sender, when calculating the corrective score at a
57       new message, it does not take it in count in any way. So for example a
58       sender who previously sent a single ham message with the score of -5,
59       and then sends a second one with the score of +10, AWL will issue a
60       corrective score bringing the score towards the -5. With the default
61       "auto_welcomelist_factor" of 0.5, the resulting score would be only
62       2.5. And it would be exactly the same even if the sender previously
63       sent 1,000 messages with the average of -5. TxRep tries to take the
64       maximal advantage of the collected data, and adjusts the final score
65       not only with the mean reputation score stored in the database, but
66       also respecting the number of messages already seen from the sender.
67       You can see the exact formula in the section ""txrep_factor"".
68
69       2. Learning - AWL ignores any spam/ham learning. In fact it acts
70       against it, which often leads to a frustrating situation, where a user
71       repeatedly tags all messages of a given sender as spam (resp. ham), but
72       at any new message from the sender, AWL will adjust the score of the
73       message back to the historical average which does not include the
74       learned scores. This is now changed at TxRep, and every spam/ham
75       learning will be recorded in the reputation database, and hence taken
76       in consideration at future email from the respective sender. See the
77       section "LEARNING SPAM / HAM" for more details.
78
79       3. Auto-Learning - in certain situations SpamAssassin may declare a
80       message an obvious spam resp. ham, and launch the auto-learning
81       process, so that the message can be re-evaluated. AWL, by design, did
82       not perform any auto-learning adjustments. This plugin will readjust
83       the stored reputation by the value defined by ""txrep_learn_penalty""
84       resp. ""txrep_learn_bonus"". Auto-learning score thresholds may be
85       tuned, or the auto-learning completely disabled, through the setting
86       ""txrep_autolearn"".
87
88       4. Relearning - messages that were wrongly learned or auto-learned, can
89       be relearned.  Old reputations are removed from the database, and new
90       ones added instead of them. The relearning works better when message
91       tracking is enabled through the ""txrep_track_messages"" option.
92       Without it, the relearned score is simply added to the reputation,
93       without removing the old ones.
94
95       5. Aging - with AWL, any historical record of given sender has the same
96       weight. It means that changes in senders behavior, or modified SA rules
97       may take long time, or be virtually negated by the AWL normalization,
98       especially at senders with high count of past messages, and low recent
99       frequency. It also turns to be particularly counterproductive when the
100       administrator detects new patterns in certain messages, and applies new
101       rules to better tag such messages as spam or ham. AWL will practically
102       eliminate the effect of the new rules, by adjusting the score back
103       towards the (wrong) historical average. Only setting the
104       "auto_welcomelist_factor" lower would help, but in the same time it
105       would also reduce the overall impact of AWL, and put doubts on its
106       purpose. TxRep, besides the ""txrep_factor"" (replacement of the
107       "auto_welcomelist_factor"), introduces also the
108       ""txrep_dilution_factor"" to help coping with this issue by
109       progressively reducing the impact of past records. More details can be
110       found in the description of the factor below.
111
112       6. Blocklisting and Welcomelisting - when a welcomelisting or
113       blocklisting was requested through SpamAssassin's API, AWL adjusts the
114       historical total score of the plain email address without IP (and
115       deleted records bound to an IP), but since during the reception new
116       records with IP will be added, the blocklisted entry would cease acting
117       during scanning. TxRep always uses the record of the plain email
118       address without IP together with the one bound to an IP address, DKIM
119       signature, or SPF pass (unless the weight factor for the EMAIL
120       reputation is set to zero). AWL uses the score of 100 (resp. -100) for
121       the blocklisting (resp. welcomelisting) purposes. TxRep increases the
122       value proportionally to the weight factor of the EMAIL reputation. It
123       is explained in details in the section " WELCOMELISTING" in
124       BLOCKLISTING . TxRep can blocklist or welcomelist also IP addresses,
125       domain names, and dotless HELO names.
126
127       7. Sender Identification - AWL identifies a sender on the basis of the
128       email address used, and the originating IP address (better told its
129       part defined by the mask setting).  The main purpose of this measure is
130       to avoid assigning false good scores to spammers who spoof known email
131       addresses. The disadvantage appears at senders who send from frequently
132       changing locations or even when connecting through dynamical IP
133       addresses that are not within the block defined by the mask setting.
134       Their score is difficult or sometimes impossible to track. Another
135       disadvantage is, for example, at a spammer persistently sending spam
136       from the same IP address, just under different email addresses. AWL
137       will not find his previous scores, unless he reuses the same email
138       address again. TxRep uses several identificators, and creates separate
139       database entries for each of them. It tracks not only the email/IP
140       address combination like AWL, but also the standalone email address
141       (regardless of the originating IP), the standalone IP (regardless of
142       email address used), the domain name of the email address, the DKIM
143       signature, and the HELO name of the connecting PC. The influence of
144       each individual identificator may be tuned up with the help of weight
145       factors described in the section "REPUTATION WEIGHTS".
146
147       8. Message Tracking - TxRep (optionally) keeps track of already scanned
148       and/or learned message ID's. This is useful for avoiding to strengthen
149       the reputation score by simply rescanning or relearning the same
150       message multiple times. In the same time it also allows the proper
151       relearning of once wrongly learned messages, or relearning them after
152       the learn penalty or bonus were changed. See the option
153       ""txrep_track_messages"".
154
155       9. User and Global Storages - usually it is recommended to use the per-
156       user setup of SpamAssassin, because each user may have quite different
157       requirements, and may receive quite different sort of email. Especially
158       when using the Bayesian and AWL plugins, the efficiency is much better
159       when SpamAssassin is learned spam and ham separately for each user.
160       However, the disadvantage is that senders and emails already learned
161       many times by different users, will need to be relearned without any
162       recognized history, anytime they arrive to another user. TxRep uses the
163       advantages of both systems. It can use dual storages: the global common
164       storage, where all email processed by SpamAssassin is recorded, and a
165       local storage separate for each user, with reputation data from his
166       email only. See more details at the setting
167       ""txrep_user2global_ratio"".
168
169       10. Outbound Welcomelisting - when a local user sends messages to an
170       email address, we assume that he needs to see the eventual answer too,
171       hence the recipient's address should be welcomelisted. When
172       SpamAssassin is used for scanning outgoing email too, when local users
173       use the SMTP server where SA is installed, for sending email, and when
174       internal networks are defined, TxREP will improve the reputation of all
175       'To:' and 'CC' addresses from messages originating in the internal
176       networks. Details can be found at the setting
177       ""txrep_welcomelist_out"".
178
179       Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable
180       the AWL to allow TxRep running. TxRep reuses the database handling of
181       the original AWL module, and some its parameters bound to the database
182       handler modules. By default, TxRep creates its own database, but the
183       original auto-welcomelist can be reused as a starting point. The AWL
184       database can be renamed to the name defined in TxRep settings, and
185       TxRep will start using it. The original auto-welcomelist database has
186       to be backed up, to allow switching back to the original state.
187
188       The spamassassin/Plugin/TxRep.pm file replaces both
189       spamassassin/Plugin/AWL.pm and spamassassin/AutoWelcomelist.pm. Another
190       two AWL files, spamassassin/DBBasedAddrList.pm and
191       spamassassin/SQLBasedAddrList.pm are still needed.
192

TEMPLATE TAGS

194       This plugin module adds the following "tags" that can be used as
195       placeholders in certain options.  See Mail::SpamAssassin::Conf for more
196       information on TEMPLATE TAGS.
197
198        _TXREPXXXY_         TXREP modifier
199        _TXREPXXXYMEAN_     Mean score on which TXREP modification is based
200        _TXREPXXXYCOUNT_    Number of messages on which TXREP modification is based
201        _TXREPXXXYPRESCORE_ Score before TXREP
202        _TXREPXXXYUNKNOWN_  New sender (not found in the TXREP list)
203
204       The XXX part of the tag takes the form of one of the following IDs,
205       depending on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or
206       HELO. The Y appendix ID is used only in the case of dual storage, and
207       takes the form of either U (for user storage reputations), or G (for
208       global storage reputations).
209

USER PREFERENCES

211       The following options can be used in both site-wide ("local.cf") and
212       user-specific ("user_prefs") configuration files to customize how
213       SpamAssassin handles incoming email messages.
214
215       use_txrep
216             0 | 1                 (default: 0)
217
218           Whether to use TxRep reputation system.  TxRep tracks the long-term
219           average score for each sender and then shifts the score of new
220           messages toward that long-term average.  This can increase or
221           decrease the score for messages, depending on the long-term
222           behavior of the particular correspondent.
223
224           Note that certain tests are ignored when determining the final
225           message score:
226
227            - rules with tflags set to 'noautolearn'
228
229       txrep_factor
230            range [0..1]           (default: 0.5)
231
232           How much towards the long-term mean for the sender to regress a
233           message.  Basically, the algorithm is to track the long-term total
234           score and the count of messages for the sender ("total" and
235           "count"), and then once we have otherwise fully calculated the
236           score for this message ("score"), we calculate the final score for
237           the message as:
238
239            finalscore = score + factor * (total + score)/(count + 1)
240
241           So if "factor" = 0.5, then we'll move to half way between the
242           calculated score and the new mean value.  If "factor" = 0.3, then
243           we'll move about 1/3 of the way from the score toward the mean.
244           "factor" = 1 means use the long-term mean including also the new
245           unadjusted score; "factor" = 0 mean just use the calculated score,
246           disabling so the score averaging, though still recording the
247           reputation to the database.
248
249       txrep_dilution_factor
250            range [0.7..1.0]               (default: 0.98)
251
252           At any new email from given sender, the historical reputation
253           records are "diluted", or "watered down" by certain fraction given
254           by this factor. It means that the influence of old records will
255           progressively diminish with every new message from given sender.
256           This is important to allow a more flexible handling of changes in
257           sender's behavior, or new improvements or changes of local SA
258           rules.
259
260           Without any dilution expiry (dilution factor set to 1), the new
261           message score is simply add to the total score of given sender in
262           the reputation database. When dilution is used (factor < 1), the
263           impact of the historical reputation average is reduced by the
264           factor before calculating the new average, which in turn is then
265           used to adjust the new total score to be stored in the database.
266
267            newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
268
269           In other words, it means that the older a message is, the less and
270           less impact on the new average its original spam score has. For
271           example if we set the factor to 0.9 (meaning dilution by 10%), the
272           score of the new message will be recorded to its 100%, the last
273           score of the same sender to 90%, the second last to 81% (0.9 * 0.9
274           = 0.81), and for example the 10th last message just to 35%.
275
276           At stable systems, we recommend keeping the factor close to 1 (but
277           still lower than 1). At systems where SA rules tuning and spam
278           learning is still in progress, lower factors will help the
279           reputation to quicker adapt any modifications. In the same time, it
280           will also reduce the impact of the historical reputation though.
281
282       txrep_learn_penalty
283            range [0..200]         (default: 20)
284
285           When SpamAssassin is trained a SPAM message, the given penalty
286           score will be added to the total reputation score of the sender,
287           regardless of the real spam score. The impact of the penalty will
288           be the smaller the higher is the number of messages that the sender
289           already has in the TxRep database.
290
291       txrep_learn_bonus
292            range [0..200]         (default: 20)
293
294           When SpamAssassin is trained a HAM message, the given penalty score
295           will be deduced from the total reputation score of the sender,
296           regardless of the real spam score. The impact of the penalty will
297           be the smaller the higher is the number of messages that the sender
298           already has in the TxRep database.
299
300       txrep_autolearn
301            range [0..5]                   (default: 0)
302
303           When SpamAssassin declares a message a clear spam resp. ham during
304           the message scan, and launches the auto-learn process, sender
305           reputation scores of given message will be adjusted by the value of
306           the option ""txrep_learn_penalty"", resp. the ""txrep_learn_bonus""
307           in the same way as during the manual learning.  Value 0 at this
308           option disables the auto-learn reputation adjustment - only the
309           score calculated before the auto-learn will be stored to the
310           reputation database.
311
312       txrep_track_messages
313             0 | 1                 (default: 1)
314
315           Whether TxRep should keep track of already scanned and/or learned
316           messages.  When enabled, an additional record in the reputation
317           database will be created to avoid false score adjustments due to
318           repeated scanning of the same message, and to allow proper
319           relearning of messages that were either previously wrongly learned,
320           or need to be relearned after modifying the learn penalty or bonus.
321
322       txrep_welcomelist_out
323            range [0..200]         (default: 10)
324
325           Previously txrep_whitelist_out which will work interchangeably
326           until 4.1.
327
328           When the value of this setting is greater than zero, recipients of
329           messages sent from within the internal networks will be
330           welcomelisted through improving their total reputation score with
331           the number of points defined by this setting. Since the IP address
332           and other sender identificators are not known when sending the
333           email, only the reputation of the standalone email is being
334           welcomelisted. The domain name is intentionally also left
335           unaffected. The outbound welcomelisting can only work when
336           SpamAssassin is set up to scan also outgoing email, when local
337           users use the SMTP server for sending email, and when
338           "internal_networks" are defined in SpamAssassin configuration. The
339           improving of the reputation happens at every message sent from
340           internal networks, so the more messages is being sent to the
341           recipient, the better reputation his email address will have.
342
343       txrep_ipv4_mask_len
344            range [0..32]          (default: 16)
345
346           The AWL database keeps only the specified number of most-
347           significant bits of an IPv4 address in its fields, so that
348           different individual IP addresses within a subnet belonging to the
349           same owner are managed under a single database record. As we have
350           no information available on the allocated address ranges of
351           senders, this CIDR mask length is only an approximation.  The
352           default is 16 bits, corresponding to a former class B. Increase the
353           number if a finer granularity is desired, e.g. to 24 (class C) or
354           32.  A value 0 is allowed but is not particularly useful, as it
355           would treat the whole internet as a single organization. The number
356           need not be a multiple of 8, any split is allowed.
357
358       txrep_ipv6_mask_len
359            range [0..128]         (default: 48)
360
361           The AWL database keeps only the specified number of most-
362           significant bits of an IPv6 address in its fields, so that
363           different individual IP addresses within a subnet belonging to the
364           same owner are managed under a single database record. As we have
365           no information available on the allocated address ranges of
366           senders, this CIDR mask length is only an approximation. The
367           default is 48 bits, corresponding to an address range commonly
368           allocated to individual (smaller) organizations. Increase the
369           number for a finer granularity, e.g.  to 64 or 96 or 128, or
370           decrease for wider ranges, e.g. 32.  A value 0 is allowed but is
371           not particularly useful, as it would treat the whole internet as a
372           single organization. The number need not be a multiple of 4, any
373           split is allowed.
374
375       user_awl_sql_override_username
376             string                (default: undefined)
377
378           Used by the SQLBasedAddrList storage implementation.
379
380           If this option is set the SQLBasedAddrList module will override the
381           set username with the value given.  This can be useful for
382           implementing global or group based TxRep databases.
383
384       txrep_user2global_ratio
385            range [0..10]          (default: 0)
386
387           When the option txrep_user2global_ratio is set to a value greater
388           than zero, and if the server configuration allows it, two data
389           storages will be used - user and global (server-wide) storages.
390
391           User storage keeps only senders who send messages to the respective
392           recipient, and will reflect also the corrected/learned scores, when
393           some messages are marked by the user as spam or ham, or when the
394           sender is welcomelisted or blocklisted through the API of
395           SpamAssassin.
396
397           Global storage keeps the reputation data of all messages processed
398           by SpamAssassin with their spam scores and spam/ham learning data
399           from all users on the server.  Hence, the module will return a
400           reputation value even at senders not known to the current
401           recipient, as long as he already sent email to anyone else on the
402           server.
403
404           The value of the txrep_user2global_ratio parameter controls the
405           impact of each of the two reputations. When equal to 1, both the
406           global and the user score will have the same impact on the result.
407           When set to 2, the reputation taken from the user storage will have
408           twice the impact of the global value. The final value of the TXREP
409           tag will be calculated as follows:
410
411            total = ( ratio * user + global ) / ( ratio + 1 )
412
413           When no reputation is found in the user storage, and a global
414           reputation is available, the global storage is used fully, without
415           applying the ratio.
416
417           When the ratio is set to zero, only the default storage will be
418           used. And it then depends whether you use the global, or the local
419           user storage by default, which in turn is controlled either by the
420           parameter user_awl_sql_override_username (in case of SQL storage),
421           or the "/auto_welcomelist_path" parameter (in case of Berkeley
422           database).
423
424           When this dual storage is enabled, and no global storage is defined
425           by the above mentioned parameters for the Berkeley or SQL
426           databases, TxRep will attempt to use a generic storage - user
427           'GLOBAL' in case of SQL, and in the case of Berkeley database it
428           uses the path defined by '__local_state_dir__/tx-reputation', which
429           typically renders into /var/db/spamassassin/tx-reputation. When the
430           default storages are not available, or are not writable, you would
431           have to set the global storage with the help of the
432           "user_awl_sql_override_username" resp.  "auto_welcomelist_path
433           settings".
434
435           Please note that some SpamAssassin installations run always under
436           the same user ID. In such case it is pointless enabling the dual
437           storage, because it would maximally lead to two identical global
438           storages in different locations.
439
440           This feature is disabled by default.
441
442       auto_welcomelist_distinguish_signed  (default: 1 - enabled)
443           Previously auto_welcomelist_distinguish_signed which will work
444           interchangeably until 4.1.
445
446           Used by the SQLBasedAddrList storage implementation.
447
448           If this option is set the SQLBasedAddrList module will keep
449           separate database entries for DKIM-validated e-mail addresses and
450           for non-validated ones. Without this option, or for domains that do
451           not use a DKIM signature, the reputation of legitimate email can
452           get mixed with the reputation of forgeries. A pre-requisite when
453           setting this option is that a field txrep.signedby exists in a SQL
454           table, otherwise SQL operations will fail.  A DKIM plugin must also
455           be enabled in order for this option to take effect.  This option is
456           highly recommended. Unless you are using a pre-3.3.0 database
457           schema and cannot upgrade, there is no reason to disable this
458           option. If you are upgrading from AWL and using a pre-3.3.0 schema,
459           the txrep.signedby column will not exist. It is recommended that
460           you add this column, but if that is not possible you must set this
461           option to 0 to avoid SQL errors.
462
463       txrep_spf
464             0 | 1                 (default: 1)
465
466           When enabled, TxRep will treat any IP address using a given email
467           address as the same authorized identity, and will not associate any
468           IP address with it.  (The same happens with valid DKIM signatures.
469           No option available for DKIM).
470
471           Note: at domains that define the useless SPF +all (pass all), no IP
472           would be ever associated with the email address, and all addresses
473           (incl. the forged ones) would be treated as coming from the
474           authorized source. However, such domains are hopefully rare, and
475           ask for this kind of treatment anyway.
476
477   REPUTATION WEIGHTS
478       The overall reputation of the sender comprises several elements:
479
480       1) The reputation of the 'From' email address bound to the originating
481       IP address fraction (see the mask parameters for details)
482       2) The reputation of the 'From' email address alone (regardless the IP
483       address being currently used)
484       3) The reputation of the domain name of the 'From' email address
485       4) The reputation of the originating IP address, regardless of sender's
486       email address
487       5) The reputation of the HELO name of the originating computer (if
488       available)
489
490       Each of these partial reputations is weighted with the help of these
491       parameters, and the overall reputation is calculation as the sum of the
492       individual reputations divided by the sum of all their weights:
493
494        sender_reputation = weight_email    * rep_email    +
495                            weight_email_ip * rep_email_ip +
496                            weight_domain   * rep_domain   +
497                            weight_ip       * rep_ip       +
498                            weight_helo     * rep_helo
499
500       You can disable the individual partial reputations by setting their
501       respective weight to zero. This will also reduce the size of the
502       database, since each partial reputation requires a separate entry in
503       the database table. Disabling some of the partial reputations in this
504       way may also help with the performance on busy servers, because the
505       respective database lookups and processing will be skipped too.
506
507       txrep_weight_email
508            range [0..10]          (default: 3)
509
510           This weight factor controls the influence of the reputation of the
511           standalone email address, regardless of the originating IP address.
512           When adjusting the weight, you need to keep on mind that an email
513           address can be easily spoofed, and hence spammers can use 'from'
514           email addresses belonging to senders with good reputation. From
515           this point of view, the email address bound to the originating IP
516           address is a more reliable indicator for the overall reputation.
517
518           On the other hand, some reputable senders may be sending from a
519           bigger number of IP addresses, so looking for the reputation of the
520           standalone email address without regarding the originating IP has
521           some sense too.
522
523           We recommend using a relatively low value for this partial
524           reputation.
525
526       txrep_weight_email_ip
527            range [0..10]          (default: 10)
528
529           This is the standard reputation used in the same way as it was by
530           the original AWL plugin. Each sender's email address is bound to
531           the originating IP, or its part as defined by the
532           txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
533
534           At a user sending from multiple locations, diverse mail servers, or
535           from a dynamic IP range out of the masked block, his email address
536           will have a separate reputation value for each of the different
537           (partial) IP addresses.
538
539           When the option auto_welcomelist_distinguish_signed is enabled, in
540           contrary to the original AWL module, TxRep does not record the IP
541           address when DKIM signature is detected. The email address is then
542           not bound to any IP address, but rather just to the DKIM signature,
543           since it is considered that it authenticates the sender more
544           reliably than the IP address (which can also vary).
545
546           This is by design the most relevant reputation, and its weight
547           should be kept high.
548
549       txrep_weight_domain
550            range [0..10]          (default: 2)
551
552           Some spammers may use always their real domain name in the email
553           address, just with multiple or changing local parts. This
554           reputation will record the spam scores of all messages send from
555           the respective domain, regardless of the local part (user name)
556           used.
557
558           Similarly as with the email_ip reputation, the domain reputation is
559           also bound to the originating address (or a masked block, if mask
560           parameters used).  It avoids giving false reputation based on
561           spoofed email addresses.
562
563           In case of a DKIM signature detected, the signature signer is used
564           instead of the domain name extracted from the email address. It is
565           considered that the signing authority is responsible for sending
566           email of any domain name, hence the same reputation applies here.
567
568           The domain reputation will give relevant picture about the owner of
569           the domain in case of small servers, or corporation with strict
570           policies, but will be less relevant for freemailers like Gmail,
571           Hotmail, and similar, because both ham and spam may be sent by
572           their users.
573
574           The default value is set relatively low. Higher weight values may
575           be useful, but we recommend caution and observing the scores before
576           increasing it.
577
578       txrep_weight_ip
579            range [0..10]          (default: 4)
580
581           Spammers can send through the same relay (incl. compromised hosts)
582           under a multitude of email addresses. This is the exact case when
583           the IP reputation can help. This reputation is a kind of a local
584           RBL.
585
586           The weight is set by default lower than for the email_IP
587           reputation, because there may be cases when the same IP address
588           hosts both spammers and acceptable senders (for example the
589           marketing department of a company sends you spam, but you still
590           need to get messages from their billing address).
591
592       txrep_weight_helo
593            range [0..10]          (default: 0.5)
594
595           Big number of spam messages come from compromised hosts, often
596           personal computers, or top-boxes. Their NetBIOS names are usually
597           used as the HELO name when connecting to your mail server. Some of
598           the names are pretty generic and hence may be shared by a big
599           number of hosts, but often the names are quite unique and may be a
600           good indicator for detecting a spammer, despite that he uses
601           different email and IP addresses (spam can come also from portable
602           devices).
603
604           No IP address is bound to the HELO name when stored to the
605           reputation database.  This is intentional, and despite the
606           possibility that numerous devices may share some of the HELO names.
607
608           This option is still considered experimental, hence the low weight
609           value, but after some testing it could be likely at least slightly
610           increased.
611

ADMINISTRATOR SETTINGS

613       These settings differ from the ones above, in that they are considered
614       'more privileged' -- even more than the ones in the PRIVILEGED SETTINGS
615       section.  No matter what "allow_user_rules" is set to, these can never
616       be set from a user's "user_prefs" file.
617
618       txrep_factory module
619            (default: Mail::SpamAssassin::DBBasedAddrList)
620
621           Select alternative database factory module for the TxRep database.
622
623       auto_welcomelist_path /path/filename
624            (default: ~/.spamassassin/tx-reputation)
625
626           Previously auto_whitelist_path which will work interchangeably
627           until 4.1.
628
629           This is the TxRep directory and filename.  By default, each user
630           has their own reputation database in their "~/.spamassassin"
631           directory with mode 0700.  For system-wide SpamAssassin use, you
632           may want to share this across all users.
633
634       auto_welcomelist_db_modules Module ...
635            (default: see below)
636
637           Previously auto_whitelist_db_modules which will work
638           interchangeably until 4.1.
639
640           What database modules should be used for the TxRep storage database
641           file.   The first named module that can be loaded from the Perl
642           include path will be used.  The format is:
643
644             PreferredModuleName SecondBest ThirdBest ...
645
646           ie. a space-separated list of Perl module names.  The default is:
647
648             DB_File GDBM_File SDBM_File
649
650           NDBM_File is not supported (see SpamAssassin bug 4353).
651
652       auto_welcomelist_file_mode
653            (default: 0700)
654
655           Previously auto_whitelist_file_mode which will work interchangeably
656           until 4.1.
657
658           The file mode bits used for the TxRep directory or file.
659
660           Make sure you specify this using the 'x' mode bits set, as it may
661           also be used to create directories.  However, if a file is created,
662           the resulting file will not have any execute bits set (the umask is
663           set to 0111).
664
665       user_awl_dsn DBI:databasetype:databasename:hostname:port
666           Used by the SQLBasedAddrList storage implementation.
667
668           This will set the DSN used to connect.  Example:
669           "DBI:mysql:spamassassin:localhost"
670
671       user_awl_sql_username username
672           Used by the SQLBasedAddrList storage implementation.
673
674           The authorized username to connect to the above DSN.
675
676       user_awl_sql_password password
677           Used by the SQLBasedAddrList storage implementation.
678
679           The password for the database username, for the above DSN.
680
681       user_awl_sql_table tablename
682            (default: txrep)
683
684           Used by the SQLBasedAddrList storage implementation.
685
686           The table name where reputation is to be stored in, for the above
687           DSN.
688

BLOCKLISTING / WELCOMELISTING

690       When asked by SpamAssassin to blocklist or welcomelist a user, the
691       TxRep plugin adds a score of 100 (for blocklisting) or -100 (for
692       welcomelisting) to the given sender's email address. At a plain address
693       without any IP address, the value is multiplied by the ratio of total
694       reputation weight to the EMAIL reputation weight to account for the
695       reduced impact of the standalone EMAIL reputation when calculating the
696       overall reputation.
697
698          total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
699          blocklisted_reputation = 100 * total_weight / weight_email
700
701       When a standalone email address is blocklisted/welcomelisted, all
702       records of the email address bound to an IP address, DKIM signature, or
703       a SPF pass will be removed from the database, and only the standalone
704       record is kept.
705
706       Besides blocklisting/welcomelisting of standalone email addresses, the
707       same method may be used also for blocklisting/welcomelisting of IP
708       addresses, domain names, and HELO names (only dotless Netbios HELO
709       names can be used).
710
711       When welcomelisting/blocklisting an email address or domain name, you
712       can bind them to a specified DKIM signature or SPF record by appending
713       the DKIM signing domain or the tag 'spf' after the ID in the following
714       way:
715
716        spamassassin --add-addr-to-blocklist=spamming.biz,spf
717        spamassassin --add-addr-to-welcomelist=friend@good.org,good.org
718
719       When a message contains both a DKIM signature and an SPF pass, the DKIM
720       signature takes the priority, so the record bound to the 'spf' tag
721       won't be checked. Only email addresses and domains can be bound to DKIM
722       or SPF.  Records of IP addresses and HELO names are always without
723       DKIM/SPF.
724
725       In case of dual storage, the block/welcomelisting is performed only in
726       the default storage.
727

REPUTATION LOGICS

729       1. The most significant sender identificator is equally as at AWL, the
730          combination of the email address and the originating IP address,
731       resp.
732          its part defined by the IPv4 resp. IPv6 mask setting.
733
734       2. No IP checking for standalone EMAIL address reputation
735
736       3. No signature checking for IP reputation, and for HELO name
737       reputation
738
739       4. The EMAIL_IP weight, and not the standalone EMAIL weight is used
740       when
741          no IP address is available (EMAIL_IP is the main indicator, and has
742          the highest weight)
743
744       5. No IP checking at signed emails (signature authenticates the email
745          instead of the IP address)
746
747       6. No IP checking at SPF pass (we assume the domain owner is
748       responsible
749          for all IP's he authorizes to send from, hence we use the same
750       identity
751          for all of them)
752
753       7. No signature used for standalone EMAIL reputation (would be
754       redundant,
755          since no IP is used at signed EMAIL_IP reputation, and we would
756       store
757          two identical hits)
758
759       8. When available, the DKIM signer is used instead of the domain name
760       for
761          the DOMAIN reputation
762
763       9. No IP and no signature used for HELO reputation (despite the
764       possibility
765          of the possible existence of multiple computers with the same HELO)
766
767       10. The full (unmasked IP) address is used (in the address field,
768       instead the
769           IP field) for the standalone IP reputation
770

LEARNING SPAM / HAM

772       When SpamAssassin is told to learn (or relearn) a given message as spam
773       or ham, all reputations relevant to the message (email, email_ip,
774       domain, ip, helo) in both global and user storages will be updated
775       using the "txrep_learn_penalty" respectively the "rxrep_learn_bonus"
776       values. The new reputation of given sender property (email, domain,...)
777       will be the respective result of one of the following formulas:
778
779          new_reputation = old_reputation + learn_penalty
780          new_reputation = old_reputation - learn_bonus
781
782       The TxRep plugin currently does track each message individually, hence
783       it does not detect when you learn the message repeatedly. It will
784       add/subtract the penalty/bonus score each time the message is fed to
785       the spam learner.
786

OPTIMIZING TXREP

788       TxRep can be optimized for speed and simplicity, or for the precision
789       in assigning the reputation scores.
790
791       First of all TxRep can be quickly disabled and re-enabled through the
792       option ""use_txrep"". It can be done globally, or individually in each
793       respective "user_prefs". Disabling TxRep will not destroy the database,
794       so it can be re-enabled any time later again.
795
796       On many systems, SQL-based storage may perform faster than the default
797       Berkeley DB storage, so you should consider setting it up.
798
799       Then there are multiple settings that can reduce the number of records
800       stored in the database, hence reducing the size of the storage, and
801       also the processing time:
802
803       1. Setting ""txrep_user2global_ratio"" to zero will disable the dual
804       storage, halving so the disk space requirements, and the processing
805       times of this plugin.
806
807       2. You can disable all but one of the "REPUTATION WEIGHTS". The
808       EMAIL_IP is the most specific option, so it is the most likely choice
809       in such case, but you could base the reputation system on any of the
810       remaining scores. Each of the enabled reputations adds a new entry to
811       the database for each new identificator.  So while for example the
812       number of recorded and scored domains may be big, the number of stored
813       IP addresses will be probably higher, and would require more space in
814       the storage.
815
816       3. Disabling the ""txrep_track_messages"" avoids storing a separate
817       entry for every scanned message, hence also reducing the disk space
818       requirements, and the processing time.
819
820       4. Disabling the option ""txrep_autolearn"" will save the processing
821       time at messages that trigger the auto-learning process.
822
823       5. Disabling ""txrep_welcomelist_out"" will reduce the processing time
824       at outbound connections.
825
826       6. Keeping the option ""auto_welcomelist_distinguish_signed"" enabled
827       may help slightly reducing the size of the database, because at signed
828       messages, the originating IP address is ignored, hence no additional
829       database entries are needed for each separate IP address (resp. a
830       masked block of IP addresses).
831
832       Since TxRep reuses the storage architecture of the former AWL plugin,
833       for initializing the SQL storage, the same instructions apply also to
834       TxRep.  Although the old AWL table can be reused for TxRep, by default
835       TxRep expects the SQL table to be named "txrep".
836
837       To install a new SQL table for TxRep, run the appropriate SQL file for
838       your system under the /sql directory.
839
840       If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
841       instead of ENGINE=MyISAM at the end of the command. You can also use
842       other types of ENGINE (depending on what is available on your system).
843       For example MEMORY engine stores the entire table in the server memory,
844       achieving performance similar to Redis. You would need to care about
845       the replication of the RAM table to disk through a cronjob, to avoid
846       loss of data at reboot.  The InnoDB engine is used by default, offering
847       high scalability (database size and concurrence of accesses). In
848       conjunction with a high value of innodb_buffer_pool or with the
849       memcached plugin (MySQL v5.6+) it can also offer performance comparable
850       to Redis.
851
852
853
854perl v5.36.0                      2023-01-2M1ail::SpamAssassin::Plugin::TxRep(3)