Mail::SpamAssassin::Plugin::TextCat(3pm)

1Mail::SpamAssassin::PluUgsienr::CToenxttrCiabtu(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::Plugin::TextCat(3)
2
3
4

NAME

6       Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser
7

SYNOPSIS

9         loadplugin     Mail::SpamAssassin::Plugin::TextCat
10

DESCRIPTION

12       This plugin will try to guess the language used in the message body
13       text.
14
15       You can use the "ok_languages" directive to set which languages are
16       considered okay for incoming mail and if the guessed language is not
17       okay, "UNWANTED_LANGUAGE_BODY" is triggered.
18
19       It will always add the results to a "X-Language" name-value pair in the
20       message metadata data structure. This may be useful as Bayes tokens and
21       can also be used in rules for scoring. The results can also be added to
22       marked-up messages using "add_header", with the _LANGUAGES_ tag. See
23       Mail::SpamAssassin::Conf for details.
24
25       Note: the language cannot always be recognized with sufficient
26       confidence.  In that case, no action is taken.
27

USER OPTIONS

29       ok_languages xx [ yy zz ... ]      (default: all)
30           This option is used to specify which languages are considered okay
31           for incoming mail.  SpamAssassin will try to detect the language
32           used in the message body text.
33
34           Note that the language cannot always be recognized with sufficient
35           confidence. In that case, no action is taken.
36
37           The rule "UNWANTED_LANGUAGE_BODY" is triggered if none of the
38           languages detected are in the "ok" list. Note that this is the only
39           effect of the "ok" list. It does not act as a whitelist against any
40           other form of spam scanning.
41
42           In your configuration, you must use the two or three letter
43           language specifier in lowercase, not the English name for the
44           language.  You may also specify "all" if a desired language is not
45           listed, or if you want to allow any language.  The default setting
46           is "all".
47
48           Examples:
49
50             ok_languages all         (allow all languages)
51             ok_languages en          (only allow English)
52             ok_languages en ja zh    (allow English, Japanese, and Chinese)
53
54           Note: if there are multiple ok_languages lines, only the last one
55           is used.
56
57           Select the languages to allow from the list below:
58
59           af   - Afrikaans
60           am   - Amharic
61           ar   - Arabic
62           be   - Byelorussian
63           bg   - Bulgarian
64           bs   - Bosnian
65           ca   - Catalan
66           cs   - Czech
67           cy   - Welsh
68           da   - Danish
69           de   - German
70           el   - Greek
71           en   - English
72           eo   - Esperanto
73           es   - Spanish
74           et   - Estonian
75           eu   - Basque
76           fa   - Persian
77           fi   - Finnish
78           fr   - French
79           fy   - Frisian
80           ga   - Irish Gaelic
81           gd   - Scottish Gaelic
82           he   - Hebrew
83           hi   - Hindi
84           hr   - Croatian
85           hu   - Hungarian
86           hy   - Armenian
87           id   - Indonesian
88           is   - Icelandic
89           it   - Italian
90           ja   - Japanese
91           ka   - Georgian
92           ko   - Korean
93           la   - Latin
94           lt   - Lithuanian
95           lv   - Latvian
96           mr   - Marathi
97           ms   - Malay
98           ne   - Nepali
99           nl   - Dutch
100           no   - Norwegian
101           pl   - Polish
102           pt   - Portuguese
103           qu   - Quechua
104           rm   - Rhaeto-Romance
105           ro   - Romanian
106           ru   - Russian
107           sa   - Sanskrit
108           sco  - Scots
109           sk   - Slovak
110           sl   - Slovenian
111           sq   - Albanian
112           sr   - Serbian
113           sv   - Swedish
114           sw   - Swahili
115           ta   - Tamil
116           th   - Thai
117           tl   - Tagalog
118           tr   - Turkish
119           uk   - Ukrainian
120           vi   - Vietnamese
121           yi   - Yiddish
122           zh   - Chinese (both Traditional and Simplified)
123           zh.big5   - Chinese (Traditional only)
124           zh.gb2312 - Chinese (Simplified only)
125
126
127
128       inactive_languages xx [ yy zz ... ]          (default: see below)
129           This option is used to specify which languages will not be
130           considered when trying to guess the language.  For performance
131           reasons, supported languages that have fewer than about 5 million
132           speakers are disabled by default.  Note that listing a language in
133           "ok_languages" automatically enables it for that user.
134
135           The default setting is:
136
137           bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi
138
139           That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian,
140           Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian,
141           Latvian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
142
143       textcat_max_languages N (default: 3)
144           The maximum number of languages before the classification is
145           considered unknown.
146
147       textcat_optimal_ngrams N (default: 0)
148           If the number of ngrams is lower than this number then they will be
149           removed.  This can be used to speed up the program for longer
150           inputs.  For shorter inputs, this should be set to 0.
151
152       textcat_max_ngrams N (default: 400)
153           The maximum number of ngrams that should be compared with each of
154           the languages models (note that each of those models is used
155           completely).
156
157       textcat_acceptable_score N (default: 1.02)
158           Include any language that scores at least
159           "textcat_acceptable_score" in the returned list of languages.
160
161
162
163perl v5.12.4                      2011-06M-a0i6l::SpamAssassin::Plugin::TextCat(3)