1Mail::SpamAssassin::PluUgsienr::CToenxttrCiabtu(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::Plugin::TextCat(3)
2
3
4
6 Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser
7
9 loadplugin Mail::SpamAssassin::Plugin::TextCat
10
12 This plugin will try to guess the language used in the message body
13 text.
14
15 You can use the "ok_languages" directive to set which languages are
16 considered okay for incoming mail and if the guessed language is not
17 okay, "UNWANTED_LANGUAGE_BODY" is triggered.
18
19 It will always add the results to a "X-Language" name-value pair in the
20 message metadata data structure. This may be useful as Bayes tokens and
21 can also be used in rules for scoring. The results can also be added to
22 marked-up messages using "add_header", with the _LANGUAGES_ tag. See
23 Mail::SpamAssassin::Conf for details.
24
25 Note: the language cannot always be recognized with sufficient
26 confidence. In that case, no action is taken.
27
28 You can use _TEXTCATRESULTS_ tag to view the internal ngram-scoring, it
29 might help fine-tuning settings.
30
32 ok_languages xx [ yy zz ... ] (default: all)
33 This option is used to specify which languages are considered okay
34 for incoming mail. SpamAssassin will try to detect the language
35 used in the message body text.
36
37 Note that the language cannot always be recognized with sufficient
38 confidence. In that case, no action is taken.
39
40 The rule "UNWANTED_LANGUAGE_BODY" is triggered if none of the
41 languages detected are in the "ok" list. Note that this is the only
42 effect of the "ok" list. It does not act as a whitelist against any
43 other form of spam scanning.
44
45 In your configuration, you must use the two or three letter
46 language specifier in lowercase, not the English name for the
47 language. You may also specify "all" if a desired language is not
48 listed, or if you want to allow any language. The default setting
49 is "all".
50
51 Examples:
52
53 ok_languages all (allow all languages)
54 ok_languages en (only allow English)
55 ok_languages en ja zh (allow English, Japanese, and Chinese)
56
57 Note: if there are multiple ok_languages lines, only the last one
58 is used.
59
60 Select the languages to allow from the list below:
61
62 af - Afrikaans
63 am - Amharic
64 ar - Arabic
65 be - Byelorussian
66 bg - Bulgarian
67 bs - Bosnian
68 ca - Catalan
69 cs - Czech
70 cy - Welsh
71 da - Danish
72 de - German
73 el - Greek
74 en - English
75 eo - Esperanto
76 es - Spanish
77 et - Estonian
78 eu - Basque
79 fa - Persian
80 fi - Finnish
81 fr - French
82 fy - Frisian
83 ga - Irish Gaelic
84 gd - Scottish Gaelic
85 he - Hebrew
86 hi - Hindi
87 hr - Croatian
88 hu - Hungarian
89 hy - Armenian
90 id - Indonesian
91 is - Icelandic
92 it - Italian
93 ja - Japanese
94 ka - Georgian
95 ko - Korean
96 la - Latin
97 lt - Lithuanian
98 lv - Latvian
99 mr - Marathi
100 ms - Malay
101 ne - Nepali
102 nl - Dutch
103 no - Norwegian
104 pl - Polish
105 pt - Portuguese
106 qu - Quechua
107 rm - Rhaeto-Romance
108 ro - Romanian
109 ru - Russian
110 sa - Sanskrit
111 sco - Scots
112 sk - Slovak
113 sl - Slovenian
114 sq - Albanian
115 sr - Serbian
116 sv - Swedish
117 sw - Swahili
118 ta - Tamil
119 th - Thai
120 tl - Tagalog
121 tr - Turkish
122 uk - Ukrainian
123 vi - Vietnamese
124 yi - Yiddish
125 zh - Chinese (both Traditional and Simplified)
126 zh.big5 - Chinese (Traditional only)
127 zh.gb2312 - Chinese (Simplified only)
128
129
130
131 inactive_languages xx [ yy zz ... ] (default: see below)
132 This option is used to specify which languages will not be
133 considered when trying to guess the language. For performance
134 reasons, supported languages that have fewer than about 5 million
135 speakers are disabled by default. Note that listing a language in
136 "ok_languages" automatically enables it for that user.
137
138 The default setting is:
139
140 bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi
141
142 That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian,
143 Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian,
144 Latvian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
145
146 textcat_max_languages N (default: 3)
147 The maximum number of languages any one message can simultaneously
148 match before its classification is considered unknown. You can try
149 reducing this to 2 or possibly even 1 for more confident results,
150 as it's unusual for a message to contain multiple languages.
151
152 Read description for textcat_acceptable_score also, as these
153 settings are closely related. Scoring affects how many languages
154 might be matched and here we set the "false positive limit" where
155 we think the engine can't decide what languages message really
156 contain.
157
158 textcat_optimal_ngrams N (default: 0)
159 If the number of ngrams is lower than this number then they will be
160 removed. This can be used to speed up the program for longer
161 inputs. For shorter inputs, this should be set to 0.
162
163 textcat_max_ngrams N (default: 400)
164 The maximum number of ngrams that should be compared with each of
165 the languages models (note that each of those models is used
166 completely).
167
168 textcat_acceptable_score N (default: 1.02)
169 Include any language that scores at least
170 "textcat_acceptable_score" in the returned list of languages.
171
172 This setting is basically a percentile range. Any language having
173 internal ngram-score within N-percent of the best score is included
174 into results. Larger values than 1.05 are not recommended as it
175 can generate many false matches. A setting of 1.00 would mean a
176 single best scoring language is always forcibly selected, but this
177 is not recommended as then textcat_max_languages can't do its job
178 classifying language as uncertain.
179
180 Read the description for textcat_max_languages, as these are
181 settings are closely related.
182
183 You can use _TEXTCATRESULTS_ tag to view the internal ngram-
184 scoring, it might help fine-tuning settings.
185
186
187
188perl v5.26.3 2018-09M-a1i4l::SpamAssassin::Plugin::TextCat(3)