1AI::Categorizer::FeaturUesSeerleCcotnotrr:iA:bICu:ht:ieCSdaqtuPeaegrroelr(i3Dz)oecru:m:eFnetaattuiroenSelector::ChiSquare(3)
2
3
4
6 AI::Categorizer::FeatureSelector::ChiSquare - ChiSquare Feature
7 Selection class
8
10 # the recommended way to use this class is to let the KnowledgeSet
11 # instanciate it
12
13 use AI::Categorizer::KnowledgeSetSMART;
14 my $ksetCHI = new AI::Categorizer::KnowledgeSetSMART(
15 tfidf_notation =>'Categorizer',
16 feature_selection=>'chi_square', ...other parameters...);
17
18 # however it is also possible to pass an instance to the KnowledgeSet
19
20 use AI::Categorizer::KnowledgeSet;
21 use AI::Categorizer::FeatureSelector::ChiSquare;
22 my $ksetCHI = new AI::Categorizer::KnowledgeSet(
23 feature_selector => new ChiSquare(features_kept=>2000,verbose=>1),
24 ...other parameters...
25 );
26
28 Feature selection with the ChiSquare function.
29
30 Chi-Square(t,ci) = (N.(AD-CB)^2)
31 -----------------------
32 (A+C).(B+D).(A+B).(C+D)
33
34 where t = term
35 ci = category i
36 N = number of documents in the collection
37 A = number of times where t and c co-occur
38 B = " " " t occurs without c
39 C = " " " c occurs without t
40 D = " " " neither c nor t occur
41
42 for more details, see : Yiming Yang, Jan O. Pedersen, A Comparative
43 Study on Feature Selection in Text Categorization, in Proceedings of
44 ICML-97, 14th International Conference on Machine Learning, 1997.
45 (available on citeseer.nj.nec.com)
46
49 Francois Paradis, paradifr@iro.umontreal.ca with inspiration from Ken
50 Williams AI::Categorizer code
51
52
53
54perl v5.34.0 AI2:0:2C1a-t0e7g-o2r2izer::FeatureSelector::ChiSquare(3)