1PROCMAILSC(5) File Formats Manual PROCMAILSC(5)
2
3
4
6 procmailsc - procmail weighted scoring technique
7
9 [*] w^x condition
10
12 In addition to the traditional true or false conditions you can specify
13 on a recipe, you can use a weighted scoring technique to decide if a
14 certain recipe matches or not. When weighted scoring is used in a
15 recipe, then the final score for that recipe must be positive for it to
16 match.
17
18 A certain condition can contribute to the score if you allocate it a
19 `weight' (w) and an `exponent' (x). You do this by preceding the con‐
20 dition (on the same line) with:
21 w^x
22 Whereas both w and x are real numbers between -2147483647.0 and
23 2147483647.0 inclusive.
24
25
27 The first time the regular expression is found, it will add w to the
28 score. The second time it is found, w*x will be added. The third time
29 it is found, w*x*x will be added. The fourth time w*x*x*x will be
30 added. And so forth.
31
32 This can be described by the following concise formula:
33
34 n
35 n k-1 x - 1
36 w * Sum x = w * -------
37 k=1 x - 1
38
39 It represents the total added score for this condition if n matches are
40 found.
41
42 Note that the following case distinctions can be made:
43
44 x=0 Only the first match will contribute w to the score. Any sub‐
45 sequent matches are ignored.
46
47 x=1 Every match will contribute the same w to the score. The score
48 grows linearly with the number of matches found.
49
50 0<x<1 Every match will contribute less to the score than the previous
51 one. The score will asymptotically approach a certain value
52 (see the NOTES section below).
53
54 1<x Every match will contribute more to the score than the previous
55 one. The score will grow exponentially.
56
57 x<0 Can be utilised to favour odd or even number of matches.
58
59 If the regular expression is negated (i.e., matches if it isn't found),
60 then n obviously can either be zero or one.
61
63 If the program returns an exitcode of EXIT_SUCCESS (=0), then the total
64 added score will be w. If it returns any other exitcode (indicating
65 failure), the total added score will be x.
66
67 If the exitcode of the program is negated, then, the exitcode will be
68 considered as if it were a virtual number of matches. Calculation of
69 the added score then proceeds as if it had been a normal regular
70 expression with n=`exitcode' matches.
71
73 If the length of the actual mail is M then:
74
75 * w^x > L
76
77 will generate an additional score of:
78
79 x
80 / M \
81 w * | --- |
82 \ L /
83
84 And:
85
86 * w^x < L
87
88 will generate an additional score of:
89
90 x
91 / L \
92 w * | --- |
93 \ M /
94
95 In both cases, if L=M, this will add w to the score. In the former
96 case however, larger mails will be favoured, in the latter case,
97 smaller mails will be favoured. Although x can be varied to fine-tune
98 the steepness of the function, typical usage sets x=1.
99
101 You can query the final score of all the conditions on a recipe from
102 the environment variable $=. This variable is set every time just
103 after procmail has parsed all conditions on a recipe (even if the
104 recipe is not being executed).
105
107 The following recipe will ditch all mails having more than 150 lines in
108 the body. The first condition contains an empty regular expression
109 which, because it always matches, is used to give our score a negative
110 offset. The second condition then matches every line in the mail, and
111 consumes up the previous negative offset we gave (one point per line).
112 In the end, the score will only be positive if the mail contained more
113 than 150 lines.
114
115 :0 Bh
116 * -150^0
117 * 1^1 ^.*$
118 /dev/null
119
120 Suppose you have a priority folder which you always read first. The
121 next recipe picks out the priority mail and files them in this special
122 folder. The first condition is a regular one, i.e., it doesn't con‐
123 tribute to the score, but simply has to be satisfied. The other condi‐
124 tions describe things like: john and claire usually have something
125 important to say, meetings are usually important, replies are favoured
126 a bit, mails about Elvis (this is merely an example :-) are favoured
127 (the more he is mentioned, the more the mail is favoured, but the maxi‐
128 mum extra score due to Elvis will be 4000, no matter how often he is
129 mentioned), lots of quoted lines are disliked, smileys are appreciated
130 (the score for those will reach a maximum of 3500), those three people
131 usually don't send interesting mails, the mails should preferably be
132 small (e.g., 2000 bytes long mails will score -100, 4000 bytes long
133 mails do -800). As you see, if some of the uninteresting people send
134 mail, then the mail still has a chance of landing in the priority
135 folder, e.g., if it is about a meeting, or if it contains at least two
136 smileys.
137
138 :0 HB
139 * !^Precedence:.*(junk|bulk)
140 * 2000^0 ^From:.*(john@home|claire@work)
141 * 2000^0 ^Subject:.*meeting
142 * 300^0 ^Subject:.*Re:
143 * 1000^.75 elvis|presley
144 * -100^1 ^>
145 * 350^.9 :-\)
146 * -500^0 ^From:.*(boss|jane|henry)@work
147 * -100^3 > 2000
148 priority_folder
149
150 If you are subscribed to a mailinglist, and just would like to read the
151 quality mails, then the following recipes could do the trick. First we
152 make sure that the mail is coming from the mailinglist. Then we check
153 if it is from certain persons of whom we value the opinion, or about a
154 subject we absolutely want to know everything about. If it is, file
155 it. Otherwise, check if the ratio of quoted lines to original lines is
156 at most 1:2. If it exceeds that, ditch the mail. Everything that sur‐
157 vived the previous test, is filed.
158
159 :0
160 ^From mailinglist-request@some.where
161 {
162 :0:
163 * ^(From:.*(paula|bill)|Subject:.*skiing)
164 mailinglist
165
166 :0 Bh
167 * 20^1 ^>
168 * -10^1 ^[^>]
169 /dev/null
170
171 :0:
172 mailinglist
173 }
174
175 For further examples you should look in the procmailex(5) man page.
176
178 Because this speeds up the search by an order of magnitude, the proc‐
179 mail internal egrep will always search for the leftmost shortest match,
180 unless it is determining what to assign to MATCH, in which case it
181 searches the leftmost longest match. E.g. for the leftmost shortest
182 match, by itself, the regular expression:
183
184 .* will always match a zero length string at the same spot.
185
186 .+ will always match one character (except newlines of course).
187
189 procmail(1), procmailrc(5), procmailex(5), sh(1), csh(1), egrep(1),
190 grep(1),
191
193 If, in a length condition, you specify an x that causes an overflow,
194 procmail is at the mercy of the pow(3) function in your mathematical
195 library.
196
197 Floating point numbers in `engineering' format (e.g., 12e5) are not ac‐
198 cepted.
199
201 As soon as `plus infinity' (2147483647) is reached, any subsequent
202 weighted conditions will simply be skipped.
203
204 As soon as `minus infinity' (-2147483647) is reached, the condition
205 will be considered as `no match' and the recipe will terminate early.
206
208 If in a regular expression weighted formula 0<x<1, the total added
209 score for this condition will asymptotically approach:
210
211 w
212 -------
213 1 - x
214
215 In order to reach half the maximum value you need
216
217 - ln 2
218 n = --------
219 ln x
220
221 matches.
222
224 Stephen R. van den Berg
225 <srb@cuci.nl>
226 Philip A. Guenther
227 <guenther@sendmail.com>
228
229
230
231BuGless 2001/08/04 PROCMAILSC(5)