1AI::Categorizer::FeaturUesSeerleCcotnotrr:iA:bICu:ht:ieCSdaqtuPeaegrroelr(i3Dz)oecru:m:eFnetaattuiroenSelector::ChiSquare(3)
2
3
4

NAME

6       AI::Categorizer::FeatureSelector::ChiSquare - ChiSquare Feature
7       Selection class
8

SYNOPSIS

10        # the recommended way to use this class is to let the KnowledgeSet
11        # instanciate it
12
13        use AI::Categorizer::KnowledgeSetSMART;
14        my $ksetCHI = new AI::Categorizer::KnowledgeSetSMART(
15          tfidf_notation =>'Categorizer',
16          feature_selection=>'chi_square', ...other parameters...);
17
18        # however it is also possible to pass an instance to the KnowledgeSet
19
20        use AI::Categorizer::KnowledgeSet;
21        use AI::Categorizer::FeatureSelector::ChiSquare;
22        my $ksetCHI = new AI::Categorizer::KnowledgeSet(
23          feature_selector => new ChiSquare(features_kept=>2000,verbose=>1),
24          ...other parameters...
25          );
26

DESCRIPTION

28       Feature selection with the ChiSquare function.
29
30         Chi-Square(t,ci) = (N.(AD-CB)^2)
31                           -----------------------
32                           (A+C).(B+D).(A+B).(C+D)
33
34       where t = term
35             ci = category i
36             N = number of documents in the collection
37             A = number of times where t and c co-occur
38             B =   "     "   "   t occurs without c
39             C =   "     "   "   c occurs without t
40             D =   "     "   "   neither c nor t occur
41
42       for more details, see : Yiming Yang, Jan O. Pedersen, A Comparative
43       Study on Feature Selection in Text Categorization, in Proceedings of
44       ICML-97, 14th International Conference on Machine Learning, 1997.
45       (available on citeseer.nj.nec.com)
46

METHODS

AUTHOR

49       Francois Paradis, paradifr@iro.umontreal.ca with inspiration from Ken
50       Williams AI::Categorizer code
51
52
53
54perl v5.34.0                    AI2:0:2C2a-t0e1g-o2r0izer::FeatureSelector::ChiSquare(3)
Impressum