1Statistics::ContingencyU(s3e)r Contributed Perl DocumentaSttiaotnistics::Contingency(3)
2
3
4
6 Statistics::Contingency - Calculate precision, recall, F1, accuracy,
7 etc.
8
10 version 0.09
11
13 use Statistics::Contingency;
14 my $s = new Statistics::Contingency(categories => \@all_categories);
15
16 while (...something...) {
17 ...
18 $s->add_result($assigned_categories, $correct_categories);
19 }
20
21 print "Micro F1: ", $s->micro_F1, "\n"; # Access a single statistic
22 print $s->stats_table; # Show several stats in table form
23
25 The "Statistics::Contingency" class helps you calculate several useful
26 statistical measures based on 2x2 "contingency tables". I use these
27 measures to help judge the results of automatic text categorization
28 experiments, but they are useful in other situations as well.
29
30 The general usage flow is to tally a whole bunch of results in the
31 "Statistics::Contingency" object, then query that object to obtain the
32 measures you are interested in. When all results have been collected,
33 you can get a report on accuracy, precision, recall, F1, and so on,
34 with both macro-averaging and micro-averaging over categories.
35
36 Macro vs. Micro Statistics
37 All of the statistics offered by this module can be calculated for each
38 category and then averaged, or can be calculated over all decisions and
39 then averaged. The former is called macro-averaging (specifically,
40 macro-averaging with respect to category), and the latter is called
41 micro-averaging. The two procedures bias the results differently -
42 micro-averaging tends to over-emphasize the performance on the largest
43 categories, while macro-averaging over-emphasizes the performance on
44 the smallest. It's often best to look at both of them to get a good
45 idea of how your data distributes across categories.
46
47 Statistics available
48 All of the statistics are calculated based on a so-called "contingency
49 table", which looks like this:
50
51 Correct=Y Correct=N
52 +-----------+-----------+
53 Assigned=Y | a | b |
54 +-----------+-----------+
55 Assigned=N | c | d |
56 +-----------+-----------+
57
58 a, b, c, and d are counts that reflect how the assigned categories
59 matched the correct categories. Depending on whether a macro-statistic
60 or a micro-statistic is being calculated, these numbers will be tallied
61 per-category or for the entire result set.
62
63 The following statistics are available:
64
65 • accuracy
66
67 This measures the portion of all decisions that were correct
68 decisions. It is defined as "(a+d)/(a+b+c+d)". It falls in the
69 range from 0 to 1, with 1 being the best score.
70
71 Note that macro-accuracy and micro-accuracy will always give the
72 same number.
73
74 • error
75
76 This measures the portion of all decisions that were incorrect
77 decisions. It is defined as "(b+c)/(a+b+c+d)". It falls in the
78 range from 0 to 1, with 0 being the best score.
79
80 Note that macro-error and micro-error will always give the same
81 number.
82
83 • precision
84
85 This measures the portion of the assigned categories that were
86 correct. It is defined as "a/(a+b)". It falls in the range from 0
87 to 1, with 1 being the best score.
88
89 • recall
90
91 This measures the portion of the correct categories that were
92 assigned. It is defined as "a/(a+c)". It falls in the range from
93 0 to 1, with 1 being the best score.
94
95 • F1
96
97 This measures an even combination of precision and recall. It is
98 defined as "2*p*r/(p+r)". In terms of a, b, and c, it may be
99 expressed as "2a/(2a+b+c)". It falls in the range from 0 to 1,
100 with 1 being the best score.
101
102 The F1 measure is often the only simple measure that is worth trying to
103 maximize on its own - consider the fact that you can get a perfect
104 precision score by always assigning zero categories, or a perfect
105 recall score by always assigning every category. A truly smart system
106 will assign the correct categories and only the correct categories,
107 maximizing precision and recall at the same time, and therefore
108 maximizing the F1 score.
109
110 Sometimes it's worth trying to maximize the accuracy score, but
111 accuracy (and its counterpart error) are considered fairly crude scores
112 that don't give much information about the performance of a
113 categorizer.
114
116 The general execution flow when using this class is to create a
117 "Statistics::Contingency" object, add a bunch of results to it, and
118 then report on the results.
119
120 • $e = Statistics::Contingency->new()
121
122 Returns a new "Statistics::Contingency" object. Expects a
123 "categories" parameter specifying the entire set of categories that
124 may be assigned during this experiment. Also accepts a "verbose"
125 parameter - if true, some diagnostic status information will be
126 displayed when certain actions are performed.
127
128 • $e->add_result($assigned_categories, $correct_categories, $name)
129
130 Adds a new result to the experiment. The lists of assigned and
131 correct categories can be given as an array of category names
132 (strings), as a hash whose keys are the category names and whose
133 values are anything logically true, or as a single string if there
134 is only one category.
135
136 If you've already got the lists in hash form, this will be the
137 fastest way to pass them. Otherwise, the current implementation
138 will convert them to hash form internally in order to make its
139 calculations efficient.
140
141 The $name parameter is an optional name for this result. It will
142 only be used in error messages or debugging/progress output.
143
144 In the current implementation, we only store the contingency tables
145 per category, as well as a table for the entire result set. This
146 means that you can't recover information about any particular
147 single result from the "Statistics::Contingency" object.
148
149 • $e->set_entries($a, $b, $c, $d)
150
151 If you don't wish to use the c<add_result()> interface, but still
152 take advantage of the calculation methods and the various edge
153 cases they handle, you can directly set the four elements of the
154 contingency table with this method.
155
156 • $e->micro_accuracy
157
158 Returns the micro-averaged accuracy for the data set.
159
160 • $e->micro_error
161
162 Returns the micro-averaged error for the data set.
163
164 • $e->micro_precision
165
166 Returns the micro-averaged precision for the data set.
167
168 • $e->micro_recall
169
170 Returns the micro-averaged recall for the data set.
171
172 • $e->micro_F1
173
174 Returns the micro-averaged F1 for the data set.
175
176 • $e->macro_accuracy
177
178 Returns the macro-averaged accuracy for the data set.
179
180 • $e->macro_error
181
182 Returns the macro-averaged error for the data set.
183
184 • $e->macro_precision
185
186 Returns the macro-averaged precision for the data set.
187
188 • $e->macro_recall
189
190 Returns the macro-averaged recall for the data set.
191
192 • $e->macro_F1
193
194 Returns the macro-averaged F1 for the data set.
195
196 • $e->stats_table
197
198 Returns a string combining several statistics in one graphic table.
199 Since accuracy is 1 minus error, we only report error since it
200 takes less space to print. An optional argument specifies the
201 number of significant digits to show in the data - the default is 3
202 significant digits.
203
204 • $e->category_stats
205
206 Returns a hash reference whose keys are the names of each category,
207 and whose values contain the various statistical measures
208 (accuracy, error, precision, recall, or F1) about each category as
209 a hash reference. For example, to print a single statistic:
210
211 print $e->category_stats->{sports}{recall}, "\n";
212
213 Or to print certain statistics for all categtories:
214
215 my $stats = $e->category_stats;
216 while (my ($cat, $value) = each %$stats) {
217 print "Category '$cat': \n";
218 print " Accuracy: $value->{accuracy}\n";
219 print " Precision: $value->{precision}\n";
220 print " F1: $value->{F1}\n";
221 }
222
224 Ken Williams <kwilliams@cpan.org>
225
227 Copyright 2002-2008 Ken Williams. All rights reserved.
228
229 This distribution is free software; you can redistribute it and/or
230 modify it under the same terms as Perl itself.
231
232
233
234perl v5.36.0 2022-07-22 Statistics::Contingency(3)