1Mail::SpamAssassin::PluUgsienr::CToenxttrCiabtu(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::Plugin::TextCat(3)
2
3
4
6 Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser
7
9 loadplugin Mail::SpamAssassin::Plugin::TextCat
10
12 This plugin will try to guess the language used in the message text.
13
14 You can then specify which languages are considered okay for incoming
15 mail and if the guessed language is not okay, "UNWANTED_LANGUAGE_BODY"
16 is triggered
17
18 It will always add the results to a "X-Language" name-value pair in the
19 message metadata data structure. This may be useful as Bayes tokens.
20 The results can also be added to marked-up messages using "add_header",
21 with the _LANGUAGES_ tag. See Mail::SpamAssassin::Conf for details.
22
23 Note: the language cannot always be recognized with sufficient confi‐
24 dence. In that case, "UNWANTED_LANGUAGE_BODY" will not trigger.
25
27 ok_languages xx [ yy zz ... ] (default: all)
28 This option is used to specify which languages are considered okay
29 for incoming mail. SpamAssassin will try to detect the language
30 used in the message text.
31
32 Note that the language cannot always be recognized with sufficient
33 confidence. In that case, no points will be assigned.
34
35 The rule "UNWANTED_LANGUAGE_BODY" is triggered based on how this is
36 set.
37
38 In your configuration, you must use the two or three letter lan‐
39 guage specifier in lowercase, not the English name for the lan‐
40 guage. You may also specify "all" if a desired language is not
41 listed, or if you want to allow any language. The default setting
42 is "all".
43
44 Examples:
45
46 ok_languages all (allow all languages)
47 ok_languages en (only allow English)
48 ok_languages en ja zh (allow English, Japanese, and Chinese)
49
50 Note: if there are multiple ok_languages lines, only the last one
51 is used.
52
53 Select the languages to allow from the list below:
54
55 af - Afrikaans
56 am - Amharic
57 ar - Arabic
58 be - Byelorussian
59 bg - Bulgarian
60 bs - Bosnian
61 ca - Catalan
62 cs - Czech
63 cy - Welsh
64 da - Danish
65 de - German
66 el - Greek
67 en - English
68 eo - Esperanto
69 es - Spanish
70 et - Estonian
71 eu - Basque
72 fa - Persian
73 fi - Finnish
74 fr - French
75 fy - Frisian
76 ga - Irish Gaelic
77 gd - Scottish Gaelic
78 he - Hebrew
79 hi - Hindi
80 hr - Croatian
81 hu - Hungarian
82 hy - Armenian
83 id - Indonesian
84 is - Icelandic
85 it - Italian
86 ja - Japanese
87 ka - Georgian
88 ko - Korean
89 la - Latin
90 lt - Lithuanian
91 lv - Latvian
92 mr - Marathi
93 ms - Malay
94 ne - Nepali
95 nl - Dutch
96 no - Norwegian
97 pl - Polish
98 pt - Portuguese
99 qu - Quechua
100 rm - Rhaeto-Romance
101 ro - Romanian
102 ru - Russian
103 sa - Sanskrit
104 sco - Scots
105 sk - Slovak
106 sl - Slovenian
107 sq - Albanian
108 sr - Serbian
109 sv - Swedish
110 sw - Swahili
111 ta - Tamil
112 th - Thai
113 tl - Tagalog
114 tr - Turkish
115 uk - Ukrainian
116 vi - Vietnamese
117 yi - Yiddish
118 zh - Chinese (both Traditional and Simplified)
119 zh.big5 - Chinese (Traditional only)
120 zh.gb2312 - Chinese (Simplified only)
121
122
123
124 inactive_languages xx [ yy zz ... ] (default: see below)
125 This option is used to specify which languages will not be consid‐
126 ered when trying to guess the language. For performance reasons,
127 supported languages that have fewer than about 5 million speakers
128 are disabled by default. Note that listing a language in "ok_lan‐
129 guages" automatically enables it for that user.
130
131 The default setting is:
132
133 bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi
134
135 That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian,
136 Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian, Lat‐
137 vian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
138
139 textcat_max_languages N (default: 5)
140 The maximum number of languages before the classification is con‐
141 sidered unknown.
142
143 textcat_optimal_ngrams N (default: 0)
144 If the number of ngrams is lower than this number then they will be
145 removed. This can be used to speed up the program for longer
146 inputs. For shorter inputs, this should be set to 0.
147
148 textcat_max_ngrams N (default: 400)
149 The maximum number of ngrams that should be compared with each of
150 the languages models (note that each of those models is used com‐
151 pletely).
152
153 textcat_acceptable_score N (default: 1.05)
154 Include any language that scores at least "textcat_accept‐
155 able_score" in the returned list of languages
156
157
158
159perl v5.8.8 2008-01M-a0i5l::SpamAssassin::Plugin::TextCat(3)