Mail::SpamAssassin::Plugin::TextCat(3pm)

1Mail::SpamAssassin::PluUgsienr::CToenxttrCiabtu(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::Plugin::TextCat(3)
2
3
4

NAME

6       Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser
7

SYNOPSIS

9         loadplugin     Mail::SpamAssassin::Plugin::TextCat
10

DESCRIPTION

12       This plugin will try to guess the language used in the message text.
13
14       You can then specify which languages are considered okay for incoming
15       mail and if the guessed language is not okay, "UNWANTED_LANGUAGE_BODY"
16       is triggered
17
18       It will always add the results to a "X-Language" name-value pair in the
19       message metadata data structure. This may be useful as Bayes tokens.
20       The results can also be added to marked-up messages using "add_header",
21       with the _LANGUAGES_ tag. See Mail::SpamAssassin::Conf for details.
22
23       Note: the language cannot always be recognized with sufficient confi‐
24       dence.  In that case, "UNWANTED_LANGUAGE_BODY" will not trigger.
25

USER OPTIONS

27       ok_languages xx [ yy zz ... ]      (default: all)
28           This option is used to specify which languages are considered okay
29           for incoming mail.  SpamAssassin will try to detect the language
30           used in the message text.
31
32           Note that the language cannot always be recognized with sufficient
33           confidence.  In that case, no points will be assigned.
34
35           The rule "UNWANTED_LANGUAGE_BODY" is triggered based on how this is
36           set.
37
38           In your configuration, you must use the two or three letter lan‐
39           guage specifier in lowercase, not the English name for the lan‐
40           guage.  You may also specify "all" if a desired language is not
41           listed, or if you want to allow any language.  The default setting
42           is "all".
43
44           Examples:
45
46             ok_languages all         (allow all languages)
47             ok_languages en          (only allow English)
48             ok_languages en ja zh    (allow English, Japanese, and Chinese)
49
50           Note: if there are multiple ok_languages lines, only the last one
51           is used.
52
53           Select the languages to allow from the list below:
54
55           af   - Afrikaans
56           am   - Amharic
57           ar   - Arabic
58           be   - Byelorussian
59           bg   - Bulgarian
60           bs   - Bosnian
61           ca   - Catalan
62           cs   - Czech
63           cy   - Welsh
64           da   - Danish
65           de   - German
66           el   - Greek
67           en   - English
68           eo   - Esperanto
69           es   - Spanish
70           et   - Estonian
71           eu   - Basque
72           fa   - Persian
73           fi   - Finnish
74           fr   - French
75           fy   - Frisian
76           ga   - Irish Gaelic
77           gd   - Scottish Gaelic
78           he   - Hebrew
79           hi   - Hindi
80           hr   - Croatian
81           hu   - Hungarian
82           hy   - Armenian
83           id   - Indonesian
84           is   - Icelandic
85           it   - Italian
86           ja   - Japanese
87           ka   - Georgian
88           ko   - Korean
89           la   - Latin
90           lt   - Lithuanian
91           lv   - Latvian
92           mr   - Marathi
93           ms   - Malay
94           ne   - Nepali
95           nl   - Dutch
96           no   - Norwegian
97           pl   - Polish
98           pt   - Portuguese
99           qu   - Quechua
100           rm   - Rhaeto-Romance
101           ro   - Romanian
102           ru   - Russian
103           sa   - Sanskrit
104           sco  - Scots
105           sk   - Slovak
106           sl   - Slovenian
107           sq   - Albanian
108           sr   - Serbian
109           sv   - Swedish
110           sw   - Swahili
111           ta   - Tamil
112           th   - Thai
113           tl   - Tagalog
114           tr   - Turkish
115           uk   - Ukrainian
116           vi   - Vietnamese
117           yi   - Yiddish
118           zh   - Chinese (both Traditional and Simplified)
119           zh.big5   - Chinese (Traditional only)
120           zh.gb2312 - Chinese (Simplified only)
121
122
123
124       inactive_languages xx [ yy zz ... ]          (default: see below)
125           This option is used to specify which languages will not be consid‐
126           ered when trying to guess the language.  For performance reasons,
127           supported languages that have fewer than about 5 million speakers
128           are disabled by default.  Note that listing a language in "ok_lan‐
129           guages" automatically enables it for that user.
130
131           The default setting is:
132
133           bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi
134
135           That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian,
136           Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian, Lat‐
137           vian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
138
139       textcat_max_languages N (default: 5)
140           The maximum number of languages before the classification is con‐
141           sidered unknown.
142
143       textcat_optimal_ngrams N (default: 0)
144           If the number of ngrams is lower than this number then they will be
145           removed.  This can be used to speed up the program for longer
146           inputs.  For shorter inputs, this should be set to 0.
147
148       textcat_max_ngrams N (default: 400)
149           The maximum number of ngrams that should be compared with each of
150           the languages models (note that each of those models is used com‐
151           pletely).
152
153       textcat_acceptable_score N (default: 1.05)
154           Include any language that scores at least "textcat_accept‐
155           able_score" in the returned list of languages
156
157
158
159perl v5.8.8                       2008-01M-a0i5l::SpamAssassin::Plugin::TextCat(3)