Mail::SpamAssassin::Plugin::TxRep(3pm)

1Mail::SpamAssassin::PluUgsienr::CToxnRterpi(b3u)ted PerlMaDiolc:u:mSepnatmaAtsisoanssin::Plugin::TxRep(3)
2
3
4

NAME

6       Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender
7       reputation records
8

SYNOPSIS

10       The TxRep (Reputation) plugin is designed as an improved replacement of
11       the AWL (Auto-Whitelist) plugin. It adjusts the final message spam
12       score by looking up and taking in consideration the reputation of the
13       sender.
14
15       To try TxRep out, you have to first disable the AWL plugin (if
16       enabled), and back up its database. AWL is loaded in v310.pre and can
17       be disabled by commenting out the loadplugin line:
18
19        # loadplugin   Mail::SpamAssassin::Plugin::AWL
20
21       When AWL is not disabled, TxRep will refuse to run.
22
23       TxRep should be enabled by uncommenting the following line in v341.pre:
24
25         loadplugin   Mail::SpamAssassin::Plugin::TxRep
26
27       Use the supplied 60_txreputation.cf file or add these lines to a .cf
28       file:
29
30        header         TXREP   eval:check_senders_reputation()
31        describe       TXREP   Score normalizing based on sender's reputation
32        tflags         TXREP   userconf noautolearn
33        priority       TXREP   1000
34

DESCRIPTION

36       This plugin is intended to replace the former AWL - AutoWhiteList.
37       Although the concept and the scope differ, the purpose remains the same
38       - the normalizing of spam score results based on previous sender's
39       history. The name was intentionally changed from "whitelist" to
40       "reputation" to avoid any confusion, since the result score can be
41       adjusted in both directions.
42
43       The TxRep plugin keeps track of the average SpamAssassin score for
44       senders.  Senders are tracked using multiple identificators, or their
45       combinations: the  From: email address, the originating IP and/or an
46       originating block of IPs, sender's domain name, the DKIM signature, and
47       the HELO name. TxRep then uses the average score to reduce the
48       variability in scoring from message to message, and modifies the final
49       score by pushing the result towards the historical average. This
50       improves the accuracy of filtering for most email.
51
52       In comparison with the original AWL plugin, several conceptual changes
53       were implemented in TxRep:
54
55       1. Scoring - at AWL, although it tracks the number of messages received
56       from each respective sender, when calculating the corrective score at a
57       new message, it does not take it in count in any way. So for example a
58       sender who previously sent a single ham message with the score of -5,
59       and then sends a second one with the score of +10, AWL will issue a
60       corrective score bringing the score towards the -5. With the default
61       "auto_whitelist_factor" of 0.5, the resulting score would be only 2.5.
62       And it would be exactly the same even if the sender previously sent
63       1,000 messages with the average of -5. TxRep tries to take the maximal
64       advantage of the collected data, and adjusts the final score not only
65       with the mean reputation score stored in the database, but also
66       respecting the number of messages already seen from the sender. You can
67       see the exact formula in the section ""txrep_factor"".
68
69       2. Learning - AWL ignores any spam/ham learning. In fact it acts
70       against it, which often leads to a frustrating situation, where a user
71       repeatedly tags all messages of a given sender as spam (resp. ham), but
72       at any new message from the sender, AWL will adjust the score of the
73       message back to the historical average which does not include the
74       learned scores. This is now changed at TxRep, and every spam/ham
75       learning will be recorded in the reputation database, and hence taken
76       in consideration at future email from the respective sender. See the
77       section "LEARNING SPAM / HAM" for more details.
78
79       3. Auto-Learning - in certain situations SpamAssassin may declare a
80       message an obvious spam resp. ham, and launch the auto-learning
81       process, so that the message can be re-evaluated. AWL, by design, did
82       not perform any auto-learning adjustments. This plugin will readjust
83       the stored reputation by the value defined by ""txrep_learn_penalty""
84       resp. ""txrep_learn_bonus"". Auto-learning score thresholds may be
85       tuned, or the auto-learning completely disabled, through the setting
86       ""txrep_autolearn"".
87
88       4. Relearning - messages that were wrongly learned or auto-learned, can
89       be relearned.  Old reputations are removed from the database, and new
90       ones added instead of them. The relearning works better when message
91       tracking is enabled through the ""txrep_track_messages"" option.
92       Without it, the relearned score is simply added to the reputation,
93       without removing the old ones.
94
95       5. Aging - with AWL, any historical record of given sender has the same
96       weight. It means that changes in senders behavior, or modified SA rules
97       may take long time, or be virtually negated by the AWL normalization,
98       especially at senders with high count of past messages, and low recent
99       frequency. It also turns to be particularly counterproductive when the
100       administrator detects new patterns in certain messages, and applies new
101       rules to better tag such messages as spam or ham. AWL will practically
102       eliminate the effect of the new rules, by adjusting the score back
103       towards the (wrong) historical average. Only setting the
104       "auto_whitelist_factor" lower would help, but in the same time it would
105       also reduce the overall impact of AWL, and put doubts on its purpose.
106       TxRep, besides the ""txrep_factor"" (replacement of the
107       "auto_whitelist_factor"), introduces also the ""txrep_dilution_factor""
108       to help coping with this issue by progressively reducing the impact of
109       past records. More details can be found in the description of the
110       factor below.
111
112       6. Blacklisting and Whitelisting - when a whitelisting or blacklisting
113       was requested through SpamAssassin's API, AWL adjusts the historical
114       total score of the plain email address without IP (and deleted records
115       bound to an IP), but since during the reception new records with IP
116       will be added, the blacklisted entry would cease acting during
117       scanning. TxRep always uses the record of the plain email address
118       without IP together with the one bound to an IP address, DKIM
119       signature, or SPF pass (unless the weight factor for the EMAIL
120       reputation is set to zero). AWL uses the score of 100 (resp. -100) for
121       the blacklisting (resp. whitelisting) purposes. TxRep increases the
122       value proportionally to the weight factor of the EMAIL reputation. It
123       is explained in details in the section " WHITELISTING" in BLACKLISTING
124       . TxRep can blacklist or whitelist also IP addresses, domain names, and
125       dotless HELO names.
126
127       7. Sender Identification - AWL identifies a sender on the basis of the
128       email address used, and the originating IP address (better told its
129       part defined by the mask setting).  The main purpose of this measure is
130       to avoid assigning false good scores to spammers who spoof known email
131       addresses. The disadvantage appears at senders who send from frequently
132       changing locations or even when connecting through dynamical IP
133       addresses that are not within the block defined by the mask setting.
134       Their score is difficult or sometimes impossible to track. Another
135       disadvantage is, for example, at a spammer persistently sending spam
136       from the same IP address, just under different email addresses. AWL
137       will not find his previous scores, unless he reuses the same email
138       address again. TxRep uses several identificators, and creates separate
139       database entries for each of them. It tracks not only the email/IP
140       address combination like AWL, but also the standalone email address
141       (regardless of the originating IP), the standalone IP (regardless of
142       email address used), the domain name of the email address, the DKIM
143       signature, and the HELO name of the connecting PC. The influence of
144       each individual identificator may be tuned up with the help of weight
145       factors described in the section "REPUTATION WEIGHTS".
146
147       8. Message Tracking - TxRep (optionally) keeps track of already scanned
148       and/or learned message ID's. This is useful for avoiding to strengthen
149       the reputation score by simply rescanning or relearning the same
150       message multiple times. In the same time it also allows the proper
151       relearning of once wrongly learned messages, or relearning them after
152       the learn penalty or bonus were changed. See the option
153       ""txrep_track_messages"".
154
155       9. User and Global Storages - usually it is recommended to use the per-
156       user setup of SpamAssassin, because each user may have quite different
157       requirements, and may receive quite different sort of email. Especially
158       when using the Bayesian and AWL plugins, the efficiency is much better
159       when SpamAssassin is learned spam and ham separately for each user.
160       However, the disadvantage is that senders and emails already learned
161       many times by different users, will need to be relearned without any
162       recognized history, anytime they arrive to another user. TxRep uses the
163       advantages of both systems. It can use dual storages: the global common
164       storage, where all email processed by SpamAssassin is recorded, and a
165       local storage separate for each user, with reputation data from his
166       email only. See more details at the setting
167       ""txrep_user2global_ratio"".
168
169       10. Outbound Whitelisting - when a local user sends messages to an
170       email address, we assume that he needs to see the eventual answer too,
171       hence the recipient's address should be whitelisted. When SpamAssassin
172       is used for scanning outgoing email too, when local users use the SMTP
173       server where SA is installed, for sending email, and when internal
174       networks are defined, TxREP will improve the reputation of all 'To:'
175       and 'CC' addresses from messages originating in the internal networks.
176       Details can be found at the setting ""txrep_whitelist_out"".
177
178       Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable
179       the AWL to allow TxRep running. TxRep reuses the database handling of
180       the original AWL module, and some its parameters bound to the database
181       handler modules. By default, TxRep creates its own database, but the
182       original auto-whitelist can be reused as a starting point. The AWL
183       database can be renamed to the name defined in TxRep settings, and
184       TxRep will start using it. The original auto-whitelist database has to
185       be backed up, to allow switching back to the original state.
186
187       The spamassassin/Plugin/TxRep.pm file replaces both
188       spamassassin/Plugin/AWL.pm and spamassassin/AutoWhitelist.pm. Another
189       two AWL files, spamassassin/DBBasedAddrList.pm and
190       spamassassin/SQLBasedAddrList.pm are still needed.
191

TEMPLATE TAGS

193       This plugin module adds the following "tags" that can be used as
194       placeholders in certain options.  See Mail::SpamAssassin::Conf for more
195       information on TEMPLATE TAGS.
196
197        _TXREPXXXY_         TXREP modifier
198        _TXREPXXXYMEAN_     Mean score on which TXREP modification is based
199        _TXREPXXXYCOUNT_    Number of messages on which TXREP modification is based
200        _TXREPXXXYPRESCORE_ Score before TXREP
201        _TXREPXXXYUNKNOWN_  New sender (not found in the TXREP list)
202
203       The XXX part of the tag takes the form of one of the following IDs,
204       depending on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or
205       HELO. The Y appendix ID is used only in the case of dual storage, and
206       takes the form of either U (for user storage reputations), or G (for
207       global storage reputations).
208

USER PREFERENCES

210       The following options can be used in both site-wide ("local.cf") and
211       user-specific ("user_prefs") configuration files to customize how
212       SpamAssassin handles incoming email messages.
213
214       use_txrep
215             0 | 1                 (default: 0)
216
217           Whether to use TxRep reputation system.  TxRep tracks the long-term
218           average score for each sender and then shifts the score of new
219           messages toward that long-term average.  This can increase or
220           decrease the score for messages, depending on the long-term
221           behavior of the particular correspondent.
222
223           Note that certain tests are ignored when determining the final
224           message score:
225
226            - rules with tflags set to 'noautolearn'
227
228       txrep_factor
229            range [0..1]           (default: 0.5)
230
231           How much towards the long-term mean for the sender to regress a
232           message.  Basically, the algorithm is to track the long-term total
233           score and the count of messages for the sender ("total" and
234           "count"), and then once we have otherwise fully calculated the
235           score for this message ("score"), we calculate the final score for
236           the message as:
237
238            finalscore = score + factor * (total + score)/(count + 1)
239
240           So if "factor" = 0.5, then we'll move to half way between the
241           calculated score and the new mean value.  If "factor" = 0.3, then
242           we'll move about 1/3 of the way from the score toward the mean.
243           "factor" = 1 means use the long-term mean including also the new
244           unadjusted score; "factor" = 0 mean just use the calculated score,
245           disabling so the score averaging, though still recording the
246           reputation to the database.
247
248       txrep_dilution_factor
249            range [0.7..1.0]               (default: 0.98)
250
251           At any new email from given sender, the historical reputation
252           records are "diluted", or "watered down" by certain fraction given
253           by this factor. It means that the influence of old records will
254           progressively diminish with every new message from given sender.
255           This is important to allow a more flexible handling of changes in
256           sender's behavior, or new improvements or changes of local SA
257           rules.
258
259           Without any dilution expiry (dilution factor set to 1), the new
260           message score is simply add to the total score of given sender in
261           the reputation database. When dilution is used (factor < 1), the
262           impact of the historical reputation average is reduced by the
263           factor before calculating the new average, which in turn is then
264           used to adjust the new total score to be stored in the database.
265
266            newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
267
268           In other words, it means that the older a message is, the less and
269           less impact on the new average its original spam score has. For
270           example if we set the factor to 0.9 (meaning dilution by 10%), the
271           score of the new message will be recorded to its 100%, the last
272           score of the same sender to 90%, the second last to 81% (0.9 * 0.9
273           = 0.81), and for example the 10th last message just to 35%.
274
275           At stable systems, we recommend keeping the factor close to 1 (but
276           still lower than 1). At systems where SA rules tuning and spam
277           learning is still in progress, lower factors will help the
278           reputation to quicker adapt any modifications. In the same time, it
279           will also reduce the impact of the historical reputation though.
280
281       txrep_learn_penalty
282            range [0..200]         (default: 20)
283
284           When SpamAssassin is trained a SPAM message, the given penalty
285           score will be added to the total reputation score of the sender,
286           regardless of the real spam score. The impact of the penalty will
287           be the smaller the higher is the number of messages that the sender
288           already has in the TxRep database.
289
290       txrep_learn_bonus
291            range [0..200]         (default: 20)
292
293           When SpamAssassin is trained a HAM message, the given penalty score
294           will be deduced from the total reputation score of the sender,
295           regardless of the real spam score. The impact of the penalty will
296           be the smaller the higher is the number of messages that the sender
297           already has in the TxRep database.
298
299       txrep_autolearn
300            range [0..5]                   (default: 0)
301
302           When SpamAssassin declares a message a clear spam resp. ham during
303           the message scan, and launches the auto-learn process, sender
304           reputation scores of given message will be adjusted by the value of
305           the option ""txrep_learn_penalty"", resp. the ""txrep_learn_bonus""
306           in the same way as during the manual learning.  Value 0 at this
307           option disables the auto-learn reputation adjustment - only the
308           score calculated before the auto-learn will be stored to the
309           reputation database.
310
311       txrep_track_messages
312             0 | 1                 (default: 1)
313
314           Whether TxRep should keep track of already scanned and/or learned
315           messages.  When enabled, an additional record in the reputation
316           database will be created to avoid false score adjustments due to
317           repeated scanning of the same message, and to allow proper
318           relearning of messages that were either previously wrongly learned,
319           or need to be relearned after modifying the learn penalty or bonus.
320
321       txrep_whitelist_out
322            range [0..200]         (default: 10)
323
324           When the value of this setting is greater than zero, recipients of
325           messages sent from within the internal networks will be whitelisted
326           through improving their total reputation score with the number of
327           points defined by this setting. Since the IP address and other
328           sender identificators are not known when sending the email, only
329           the reputation of the standalone email is being whitelisted. The
330           domain name is intentionally also left unaffected. The outbound
331           whitelisting can only work when SpamAssassin is set up to scan also
332           outgoing email, when local users use the SMTP server for sending
333           email, and when "internal_networks" are defined in SpamAssassin
334           configuration. The improving of the reputation happens at every
335           message sent from internal networks, so the more messages is being
336           sent to the recipient, the better reputation his email address will
337           have.
338
339       txrep_ipv4_mask_len
340            range [0..32]          (default: 16)
341
342           The AWL database keeps only the specified number of most-
343           significant bits of an IPv4 address in its fields, so that
344           different individual IP addresses within a subnet belonging to the
345           same owner are managed under a single database record. As we have
346           no information available on the allocated address ranges of
347           senders, this CIDR mask length is only an approximation.  The
348           default is 16 bits, corresponding to a former class B. Increase the
349           number if a finer granularity is desired, e.g. to 24 (class C) or
350           32.  A value 0 is allowed but is not particularly useful, as it
351           would treat the whole internet as a single organization. The number
352           need not be a multiple of 8, any split is allowed.
353
354       txrep_ipv6_mask_len
355            range [0..128]         (default: 48)
356
357           The AWL database keeps only the specified number of most-
358           significant bits of an IPv6 address in its fields, so that
359           different individual IP addresses within a subnet belonging to the
360           same owner are managed under a single database record. As we have
361           no information available on the allocated address ranges of
362           senders, this CIDR mask length is only an approximation. The
363           default is 48 bits, corresponding to an address range commonly
364           allocated to individual (smaller) organizations. Increase the
365           number for a finer granularity, e.g.  to 64 or 96 or 128, or
366           decrease for wider ranges, e.g. 32.  A value 0 is allowed but is
367           not particularly useful, as it would treat the whole internet as a
368           single organization. The number need not be a multiple of 4, any
369           split is allowed.
370
371       user_awl_sql_override_username
372             string                (default: undefined)
373
374           Used by the SQLBasedAddrList storage implementation.
375
376           If this option is set the SQLBasedAddrList module will override the
377           set username with the value given.  This can be useful for
378           implementing global or group based TxRep databases.
379
380       txrep_user2global_ratio
381            range [0..10]          (default: 0)
382
383           When the option txrep_user2global_ratio is set to a value greater
384           than zero, and if the server configuration allows it, two data
385           storages will be used - user and global (server-wide) storages.
386
387           User storage keeps only senders who send messages to the respective
388           recipient, and will reflect also the corrected/learned scores, when
389           some messages are marked by the user as spam or ham, or when the
390           sender is whitelisted or blacklisted through the API of
391           SpamAssassin.
392
393           Global storage keeps the reputation data of all messages processed
394           by SpamAssassin with their spam scores and spam/ham learning data
395           from all users on the server.  Hence, the module will return a
396           reputation value even at senders not known to the current
397           recipient, as long as he already sent email to anyone else on the
398           server.
399
400           The value of the txrep_user2global_ratio parameter controls the
401           impact of each of the two reputations. When equal to 1, both the
402           global and the user score will have the same impact on the result.
403           When set to 2, the reputation taken from the user storage will have
404           twice the impact of the global value. The final value of the TXREP
405           tag will be calculated as follows:
406
407            total = ( ratio * user + global ) / ( ratio + 1 )
408
409           When no reputation is found in the user storage, and a global
410           reputation is available, the global storage is used fully, without
411           applying the ratio.
412
413           When the ratio is set to zero, only the default storage will be
414           used. And it then depends whether you use the global, or the local
415           user storage by default, which in turn is controlled either by the
416           parameter user_awl_sql_override_username (in case of SQL storage),
417           or the "/auto_whitelist_path" parameter (in case of Berkeley
418           database).
419
420           When this dual storage is enabled, and no global storage is defined
421           by the above mentioned parameters for the Berkeley or SQL
422           databases, TxRep will attempt to use a generic storage - user
423           'GLOBAL' in case of SQL, and in the case of Berkeley database it
424           uses the path defined by '__local_state_dir__/tx-reputation', which
425           typically renders into /var/db/spamassassin/tx-reputation. When the
426           default storages are not available, or are not writable, you would
427           have to set the global storage with the help of the
428           "user_awl_sql_override_username" resp.  "auto_whitelist_path
429           settings".
430
431           Please note that some SpamAssassin installations run always under
432           the same user ID. In such case it is pointless enabling the dual
433           storage, because it would maximally lead to two identical global
434           storages in different locations.
435
436           This feature is disabled by default.
437
438       auto_whitelist_distinguish_signed
439            (default: 1 - enabled)
440
441           Used by the SQLBasedAddrList storage implementation.
442
443           If this option is set the SQLBasedAddrList module will keep
444           separate database entries for DKIM-validated e-mail addresses and
445           for non-validated ones. Without this option, or for domains that do
446           not use a DKIM signature, the reputation of legitimate email can
447           get mixed with the reputation of forgeries. A pre-requisite when
448           setting this option is that a field txrep.signedby exists in a SQL
449           table, otherwise SQL operations will fail.  A DKIM plugin must also
450           be enabled in order for this option to take effect.  This option is
451           highly recommended. Unless you are using a pre-3.3.0 database
452           schema and cannot upgrade, there is no reason to disable this
453           option. If you are upgrading from AWL and using a pre-3.3.0 schema,
454           the txrep.signedby column will not exist. It is recommended that
455           you add this column, but if that is not possible you must set this
456           option to 0 to avoid SQL errors.
457
458       txrep_spf
459             0 | 1                 (default: 1)
460
461           When enabled, TxRep will treat any IP address using a given email
462           address as the same authorized identity, and will not associate any
463           IP address with it.  (The same happens with valid DKIM signatures.
464           No option available for DKIM).
465
466           Note: at domains that define the useless SPF +all (pass all), no IP
467           would be ever associated with the email address, and all addresses
468           (incl. the froged ones) would be treated as coming from the
469           authorized source. However, such domains are hopefully rare, and
470           ask for this kind of treatment anyway.
471
472   REPUTATION WEIGHTS
473       The overall reputation of the sender comprises several elements:
474
475       1) The reputation of the 'From' email address bound to the originating
476       IP address fraction (see the mask parameters for details)
477       2) The reputation of the 'From' email address alone (regardless the IP
478       address being currently used)
479       3) The reputation of the domain name of the 'From' email address
480       4) The reputation of the originating IP address, regardless of sender's
481       email address
482       5) The reputation of the HELO name of the originating computer (if
483       available)
484
485       Each of these partial reputations is weighted with the help of these
486       parameters, and the overall reputation is calculation as the sum of the
487       individual reputations divided by the sum of all their weights:
488
489        sender_reputation = weight_email    * rep_email    +
490                            weight_email_ip * rep_email_ip +
491                            weight_domain   * rep_domain   +
492                            weight_ip       * rep_ip       +
493                            weight_helo     * rep_helo
494
495       You can disable the individual partial reputations by setting their
496       respective weight to zero. This will also reduce the size of the
497       database, since each partial reputation requires a separate entry in
498       the database table. Disabling some of the partial reputations in this
499       way may also help with the performance on busy servers, because the
500       respective database lookups and processing will be skipped too.
501
502       txrep_weight_email
503            range [0..10]          (default: 3)
504
505           This weight factor controls the influence of the reputation of the
506           standalone email address, regardless of the originating IP address.
507           When adjusting the weight, you need to keep on mind that an email
508           address can be easily spoofed, and hence spammers can use 'from'
509           email addresses belonging to senders with good reputation. From
510           this point of view, the email address bound to the originating IP
511           address is a more reliable indicator for the overall reputation.
512
513           On the other hand, some reputable senders may be sending from a
514           bigger number of IP addresses, so looking for the reputation of the
515           standalone email address without regarding the originating IP has
516           some sense too.
517
518           We recommend using a relatively low value for this partial
519           reputation.
520
521       txrep_weight_email_ip
522            range [0..10]          (default: 10)
523
524           This is the standard reputation used in the same way as it was by
525           the original AWL plugin. Each sender's email address is bound to
526           the originating IP, or its part as defined by the
527           txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
528
529           At a user sending from multiple locations, diverse mail servers, or
530           from a dynamic IP range out of the masked block, his email address
531           will have a separate reputation value for each of the different
532           (partial) IP addresses.
533
534           When the option auto_whitelist_distinguish_signed is enabled, in
535           contrary to the original AWL module, TxRep does not record the IP
536           address when DKIM signature is detected. The email address is then
537           not bound to any IP address, but rather just to the DKIM signature,
538           since it is considered that it authenticates the sender more
539           reliably than the IP address (which can also vary).
540
541           This is by design the most relevant reputation, and its weight
542           should be kept high.
543
544       txrep_weight_domain
545            range [0..10]          (default: 2)
546
547           Some spammers may use always their real domain name in the email
548           address, just with multiple or changing local parts. This
549           reputation will record the spam scores of all messages send from
550           the respective domain, regardless of the local part (user name)
551           used.
552
553           Similarly as with the email_ip reputation, the domain reputation is
554           also bound to the originating address (or a masked block, if mask
555           parameters used).  It avoids giving false reputation based on
556           spoofed email addresses.
557
558           In case of a DKIM signature detected, the signature signer is used
559           instead of the domain name extracted from the email address. It is
560           considered that the signing authority is responsible for sending
561           email of any domain name, hence the same reputation applies here.
562
563           The domain reputation will give relevant picture about the owner of
564           the domain in case of small servers, or corporation with strict
565           policies, but will be less relevant for freemailers like Gmail,
566           Hotmail, and similar, because both ham and spam may be sent by
567           their users.
568
569           The default value is set relatively low. Higher weight values may
570           be useful, but we recommend caution and observing the scores before
571           increasing it.
572
573       txrep_weight_ip
574            range [0..10]          (default: 4)
575
576           Spammers can send through the same relay (incl. compromised hosts)
577           under a multitude of email addresses. This is the exact case when
578           the IP reputation can help. This reputation is a kind of a local
579           RBL.
580
581           The weight is set by default lower than for the email_IP
582           reputation, because there may be cases when the same IP address
583           hosts both spammers and acceptable senders (for example the
584           marketing department of a company sends you spam, but you still
585           need to get messages from their billing address).
586
587       txrep_weight_helo
588            range [0..10]          (default: 0.5)
589
590           Big number of spam messages come from compromised hosts, often
591           personal computers, or top-boxes. Their NetBIOS names are usually
592           used as the HELO name when connecting to your mail server. Some of
593           the names are pretty generic and hence may be shared by a big
594           number of hosts, but often the names are quite unique and may be a
595           good indicator for detecting a spammer, despite that he uses
596           different email and IP addresses (spam can come also from portable
597           devices).
598
599           No IP address is bound to the HELO name when stored to the
600           reputation database.  This is intentional, and despite the
601           possibility that numerous devices may share some of the HELO names.
602
603           This option is still considered experimental, hence the low weight
604           value, but after some testing it could be likely at least slightly
605           increased.
606

ADMINISTRATOR SETTINGS

608       These settings differ from the ones above, in that they are considered
609       'more privileged' -- even more than the ones in the PRIVILEGED SETTINGS
610       section.  No matter what "allow_user_rules" is set to, these can never
611       be set from a user's "user_prefs" file.
612
613       txrep_factory module
614            (default: Mail::SpamAssassin::DBBasedAddrList)
615
616           Select alternative database factory module for the TxRep database.
617
618       auto_whitelist_path /path/filename
619            (default: ~/.spamassassin/tx-reputation)
620
621           This is the TxRep directory and filename.  By default, each user
622           has their own reputation database in their "~/.spamassassin"
623           directory with mode 0700.  For system-wide SpamAssassin use, you
624           may want to share this across all users.
625
626       auto_whitelist_db_modules Module ...
627            (default: see below)
628
629           What database modules should be used for the TxRep storage database
630           file.   The first named module that can be loaded from the Perl
631           include path will be used.  The format is:
632
633             PreferredModuleName SecondBest ThirdBest ...
634
635           ie. a space-separated list of Perl module names.  The default is:
636
637             DB_File GDBM_File SDBM_File
638
639           NDBM_File is not supported (see SpamAssassin bug 4353).
640
641       auto_whitelist_file_mode
642            (default: 0700)
643
644           The file mode bits used for the TxRep directory or file.
645
646           Make sure you specify this using the 'x' mode bits set, as it may
647           also be used to create directories.  However, if a file is created,
648           the resulting file will not have any execute bits set (the umask is
649           set to 0111).
650
651       user_awl_dsn DBI:databasetype:databasename:hostname:port
652           Used by the SQLBasedAddrList storage implementation.
653
654           This will set the DSN used to connect.  Example:
655           "DBI:mysql:spamassassin:localhost"
656
657       user_awl_sql_username username
658           Used by the SQLBasedAddrList storage implementation.
659
660           The authorized username to connect to the above DSN.
661
662       user_awl_sql_password password
663           Used by the SQLBasedAddrList storage implementation.
664
665           The password for the database username, for the above DSN.
666
667       user_awl_sql_table tablename
668            (default: txrep)
669
670           Used by the SQLBasedAddrList storage implementation.
671
672           The table name where reputation is to be stored in, for the above
673           DSN.
674

BLACKLISTING / WHITELISTING

676       When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
677       plugin adds a score of 100 (for blacklisting) or -100 (for
678       whitelisting) to the given sender's email address. At a plain address
679       without any IP address, the value is multiplied by the ratio of total
680       reputation weight to the EMAIL reputation weight to account for the
681       reduced impact of the standalone EMAIL reputation when calculating the
682       overall reputation.
683
684          total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
685          blacklisted_reputation = 100 * total_weight / weight_email
686
687       When a standalone email address is blacklisted/whitelisted, all records
688       of the email address bound to an IP address, DKIM signature, or a SPF
689       pass will be removed from the database, and only the standalone record
690       is kept.
691
692       Besides blacklisting/whitelisting of standalone email addresses, the
693       same method may be used also for blacklisting/whitelisting of IP
694       addresses, domain names, and HELO names (only dotless Netbios HELO
695       names can be used).
696
697       When whitelisting/blacklisting an email address or domain name, you can
698       bind them to a specified DKIM signature or SPF record by appending the
699       DKIM signing domain or the tag 'spf' after the ID in the following way:
700
701        spamassassin --add-addr-to-blacklist=spamming.biz,spf
702        spamassassin --add-addr-to-whitelist=friend@good.org,good.org
703
704       When a message contains both a DKIM signature and an SPF pass, the DKIM
705       signature takes the priority, so the record bound to the 'spf' tag
706       won't be checked. Only email addresses and domains can be bound to DKIM
707       or SPF.  Records of IP addresses and HELO names are always without
708       DKIM/SPF.
709
710       In case of dual storage, the black/whitelisting is performed only in
711       the default storage.
712

REPUTATION LOGICS

714       1. The most significant sender identificator is equally as at AWL, the
715          combination of the email address and the originating IP address,
716       resp.
717          its part defined by the IPv4 resp. IPv6 mask setting.
718
719       2. No IP checking for standalone EMAIL address reputation
720
721       3. No signature checking for IP reputation, and for HELO name
722       reputation
723
724       4. The EMAIL_IP weight, and not the standalone EMAIL weight is used
725       when
726          no IP address is available (EMAIL_IP is the main indicator, and has
727          the highest weight)
728
729       5. No IP checking at signed emails (signature authenticates the email
730          instead of the IP address)
731
732       6. No IP checking at SPF pass (we assume the domain owner is
733       responsible
734          for all IP's he authorizes to send from, hence we use the same
735       identity
736          for all of them)
737
738       7. No signature used for standalone EMAIL reputation (would be
739       redundant,
740          since no IP is used at signed EMAIL_IP reputation, and we would
741       store
742          two identical hits)
743
744       8. When available, the DKIM signer is used instead of the domain name
745       for
746          the DOMAIN reputation
747
748       9. No IP and no signature used for HELO reputation (despite the
749       possibility
750          of the possible existence of multiple computers with the same HELO)
751
752       10. The full (unmasked IP) address is used (in the address field,
753       instead the
754           IP field) for the standalone IP reputation
755

LEARNING SPAM / HAM

757       When SpamAssassin is told to learn (or relearn) a given message as spam
758       or ham, all reputations relevant to the message (email, email_ip,
759       domain, ip, helo) in both global and user storages will be updated
760       using the "txrep_learn_penalty" respectively the "rxrep_learn_bonus"
761       values. The new reputation of given sender property (email, domain,...)
762       will be the respective result of one of the following formulas:
763
764          new_reputation = old_reputation + learn_penalty
765          new_reputation = old_reputation - learn_bonus
766
767       The TxRep plugin currently does track each message individually, hence
768       it does not detect when you learn the message repeatedly. It will
769       add/subtract the penalty/bonus score each time the message is fed to
770       the spam learner.
771

OPTIMIZING TXREP

773       TxRep can be optimized for speed and simplicity, or for the precision
774       in assigning the reputation scores.
775
776       First of all TxRep can be quickly disabled and re-enabled through the
777       option ""use_txrep"". It can be done globally, or individually in each
778       respective "user_prefs". Disabling TxRep will not destroy the database,
779       so it can be re-enabled any time later again.
780
781       On many systems, SQL-based storage may perform faster than the default
782       Berkeley DB storage, so you should consider setting it up.
783
784       Then there are multiple settings that can reduce the number of records
785       stored in the database, hence reducing the size of the storage, and
786       also the processing time:
787
788       1. Setting ""txrep_user2global_ratio"" to zero will disable the dual
789       storage, halving so the disk space requirements, and the processing
790       times of this plugin.
791
792       2. You can disable all but one of the "REPUTATION WEIGHTS". The
793       EMAIL_IP is the most specific option, so it is the most likely choice
794       in such case, but you could base the reputation system on any of the
795       remaining scores. Each of the enabled reputations adds a new entry to
796       the database for each new identificator.  So while for example the
797       number of recorded and scored domains may be big, the number of stored
798       IP addresses will be probably higher, and would require more space in
799       the storage.
800
801       3. Disabling the ""txrep_track_messages"" avoids storing a separate
802       entry for every scanned message, hence also reducing the disk space
803       requirements, and the processing time.
804
805       4. Disabling the option ""txrep_autolearn"" will save the processing
806       time at messages that trigger the auto-learning process.
807
808       5. Disabling ""txrep_whitelist_out"" will reduce the processing time at
809       outbound connections.
810
811       6. Keeping the option ""auto_whitelist_distinguish_signed"" enabled may
812       help slightly reducing the size of the database, because at signed
813       messages, the originating IP address is ignored, hence no additional
814       database entries are needed for each separate IP address (resp. a
815       masked block of IP addresses).
816
817       Since TxRep reuses the storage architecture of the former AWL plugin,
818       for initializing the SQL storage, the same instructions apply also to
819       TxRep.  Although the old AWL table can be reused for TxRep, by default
820       TxRep expects the SQL table to be named "txrep".
821
822       To install a new SQL table for TxRep, run the appropriate SQL file for
823       your system under the /sql directory.
824
825       If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
826       instead of ENGINE=MyISAM at the end of the command. You can also use
827       other types of ENGINE (depending on what is available on your system).
828       For example MEMORY engine stores the entire table in the server memory,
829       achieving performance similar to Redis. You would need to care about
830       the replication of the RAM table to disk through a cronjob, to avoid
831       loss of data at reboot.  The InnoDB engine is used by default, offering
832       high scalability (database size and concurrence of accesses). In
833       conjunction with a high value of innodb_buffer_pool or with the
834       memcached plugin (MySQL v5.6+) it can also offer performance comparable
835       to Redis.
836
837
838
839perl v5.34.0                      2022-01-2M2ail::SpamAssassin::Plugin::TxRep(3)