1Mail::SpamAssassin::PluUgsienr::CToenxttrCiabtu(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::Plugin::TextCat(3)
2
3
4
6 Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser
7
9 loadplugin Mail::SpamAssassin::Plugin::TextCat
10
12 This plugin will try to guess the language used in the message body
13 text.
14
15 You can use the "ok_languages" directive to set which languages are
16 considered okay for incoming mail and if the guessed language is not
17 okay, "UNWANTED_LANGUAGE_BODY" is triggered.
18
19 It will always add the results to a "X-Language" name-value pair in the
20 message metadata data structure. This may be useful as Bayes tokens and
21 can also be used in rules for scoring. The results can also be added to
22 marked-up messages using "add_header", with the _LANGUAGES_ tag. See
23 Mail::SpamAssassin::Conf for details.
24
25 Note: the language cannot always be recognized with sufficient
26 confidence. In that case, no action is taken.
27
29 ok_languages xx [ yy zz ... ] (default: all)
30 This option is used to specify which languages are considered okay
31 for incoming mail. SpamAssassin will try to detect the language
32 used in the message body text.
33
34 Note that the language cannot always be recognized with sufficient
35 confidence. In that case, no action is taken.
36
37 The rule "UNWANTED_LANGUAGE_BODY" is triggered if none of the
38 languages detected are in the "ok" list. Note that this is the only
39 effect of the "ok" list. It does not act as a whitelist against any
40 other form of spam scanning.
41
42 In your configuration, you must use the two or three letter
43 language specifier in lowercase, not the English name for the
44 language. You may also specify "all" if a desired language is not
45 listed, or if you want to allow any language. The default setting
46 is "all".
47
48 Examples:
49
50 ok_languages all (allow all languages)
51 ok_languages en (only allow English)
52 ok_languages en ja zh (allow English, Japanese, and Chinese)
53
54 Note: if there are multiple ok_languages lines, only the last one
55 is used.
56
57 Select the languages to allow from the list below:
58
59 af - Afrikaans
60 am - Amharic
61 ar - Arabic
62 be - Byelorussian
63 bg - Bulgarian
64 bs - Bosnian
65 ca - Catalan
66 cs - Czech
67 cy - Welsh
68 da - Danish
69 de - German
70 el - Greek
71 en - English
72 eo - Esperanto
73 es - Spanish
74 et - Estonian
75 eu - Basque
76 fa - Persian
77 fi - Finnish
78 fr - French
79 fy - Frisian
80 ga - Irish Gaelic
81 gd - Scottish Gaelic
82 he - Hebrew
83 hi - Hindi
84 hr - Croatian
85 hu - Hungarian
86 hy - Armenian
87 id - Indonesian
88 is - Icelandic
89 it - Italian
90 ja - Japanese
91 ka - Georgian
92 ko - Korean
93 la - Latin
94 lt - Lithuanian
95 lv - Latvian
96 mr - Marathi
97 ms - Malay
98 ne - Nepali
99 nl - Dutch
100 no - Norwegian
101 pl - Polish
102 pt - Portuguese
103 qu - Quechua
104 rm - Rhaeto-Romance
105 ro - Romanian
106 ru - Russian
107 sa - Sanskrit
108 sco - Scots
109 sk - Slovak
110 sl - Slovenian
111 sq - Albanian
112 sr - Serbian
113 sv - Swedish
114 sw - Swahili
115 ta - Tamil
116 th - Thai
117 tl - Tagalog
118 tr - Turkish
119 uk - Ukrainian
120 vi - Vietnamese
121 yi - Yiddish
122 zh - Chinese (both Traditional and Simplified)
123 zh.big5 - Chinese (Traditional only)
124 zh.gb2312 - Chinese (Simplified only)
125
126
127
128 inactive_languages xx [ yy zz ... ] (default: see below)
129 This option is used to specify which languages will not be
130 considered when trying to guess the language. For performance
131 reasons, supported languages that have fewer than about 5 million
132 speakers are disabled by default. Note that listing a language in
133 "ok_languages" automatically enables it for that user.
134
135 The default setting is:
136
137 bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi
138
139 That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian,
140 Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian,
141 Latvian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
142
143 textcat_max_languages N (default: 3)
144 The maximum number of languages before the classification is
145 considered unknown.
146
147 textcat_optimal_ngrams N (default: 0)
148 If the number of ngrams is lower than this number then they will be
149 removed. This can be used to speed up the program for longer
150 inputs. For shorter inputs, this should be set to 0.
151
152 textcat_max_ngrams N (default: 400)
153 The maximum number of ngrams that should be compared with each of
154 the languages models (note that each of those models is used
155 completely).
156
157 textcat_acceptable_score N (default: 1.02)
158 Include any language that scores at least
159 "textcat_acceptable_score" in the returned list of languages.
160
161
162
163perl v5.12.4 2011-06M-a0i6l::SpamAssassin::Plugin::TextCat(3)