1Fsdb::Filter::dbcolstatUss(e3r)Contributed Perl DocumentFastdibo:n:Filter::dbcolstats(3)
2
3
4
6 dbcolstats - compute statistics on a fsdb column
7
9 dbcolstats [-amS] [-c ConfidenceFraction] [-q NumberOfQuantiles]
10 column
11
13 Compute statistics over a COLUMN of data. Records containing non-
14 numeric data are considered null do not contribute to the stats (with
15 the "-a" option they are treated as zeros).
16
17 Confidence intervals are a t-test (+/- (t_{a/2})*s/sqrt(n)) and assume
18 the population takes a normal distribution with a small number of
19 samples (< 100).
20
21 By default, all statistics are computed for as a population sample
22 (with an ``n-1'' term), not as representing the whole population (using
23 ``n''). Select between them with --sample or --nosample. When you
24 measure the entire population, use the latter option.
25
26 The output of this program is probably best looked at after
27 reformatting with dblistize.
28
29 Dbcolstats runs in O(1) memory. Median or quantile requires sorting
30 the data and invokes dbsort. Sorting will run in constant RAM but
31 O(number of records) disk space. If median or quantile is required and
32 the data is already sorted, dbcolstats will run more efficiently with
33 the -S option.
34
36 -a or --include-non-numeric
37 Compute stats over all records (treat non-numeric records as zero
38 rather than just ignoring them).
39
40 -c FRACTION or --confidence FRACTION
41 Specify FRACTION for the confidence interval. Defaults to 0.95 for
42 a 95% confidence factor.
43
44 -f FORMAT or --format FORMAT
45 Specify a printf(3)-style format for output statistics. Defaults
46 to "%.5g".
47
48 -m or --median
49 Compute median value. (Will sort data if necessary.) (Median is
50 the quantitle for N=2.)
51
52 -q N or --quantile N
53 Compute quantile (quartile when N is 4), or an arbitrary quantile
54 for other values of N, where the scores that are 1 Nth of the way
55 across the population.
56
57 --sample
58 Compute sample population statistics (e.g., the sample standard
59 deviation), assuming n-1 degrees of freedom.
60
61 --nosample
62 Compute whole population statistics (e.g., the population standard
63 devation).
64
65 -S or --pre-sorted
66 Assume data is already sorted. With one -S, we check and confirm
67 this precondition. When repeated, we skip the check. (This flag
68 is ignored if quartiles are not requested.)
69
70 --parallelism=N or "-j N"
71 Allow sorting to happen in parallel. Defaults on. (Only relevant
72 if using non-pre-sorted data with quantiles.)
73
74 -F or --fs or --fieldseparator S
75 Specify the field (column) separator as "S". See dbfilealter for
76 valid field separators.
77
78 -T TmpDir
79 where to put temporary data. Only used if median or quantiles are
80 requested. Also uses environment variable TMPDIR, if -T is not
81 specified. Default is /tmp.
82
83 -k KeyField
84 Do multi-stats, grouped by each key. Assumes keys are sorted.
85 (Use dbmultistats to guarantee sorting order.)
86
87 --output-on-no-input
88 Enables null output (all fields are "-", n is 0) if we get input
89 with a schema but no records. Without this option, just output the
90 schema but no rows. Default: no output if no input.
91
92 This module also supports the standard fsdb options:
93
94 -d Enable debugging output.
95
96 -i or --input InputSource
97 Read from InputSource, typically a file name, or "-" for standard
98 input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue
99 objects.
100
101 -o or --output OutputDestination
102 Write to OutputDestination, typically a file name, or "-" for
103 standard output, or (if in Perl) a IO::Handle, Fsdb::IO or
104 Fsdb::BoundedQueue objects.
105
106 --autorun or --noautorun
107 By default, programs process automatically, but Fsdb::Filter
108 objects in Perl do not run until you invoke the run() method. The
109 "--(no)autorun" option controls that behavior within Perl.
110
111 --help
112 Show help.
113
114 --man
115 Show full manual.
116
118 Input:
119 #fsdb absdiff
120 0
121 0.046953
122 0.072074
123 0.075413
124 0.094088
125 0.096602
126 # | /home/johnh/BIN/DB/dbrow
127 # | /home/johnh/BIN/DB/dbcol event clock
128 # | dbrowdiff clock
129 # | /home/johnh/BIN/DB/dbcol absdiff
130
131 Command:
132 cat data.fsdb | dbcolstats absdiff
133
134 Output:
135 #fsdb mean stddev pct_rsd conf_range conf_low conf_high conf_pct sum sum_squared min max n
136 0.064188 0.036194 56.387 0.037989 0.026199 0.102180.95 0.38513 0.031271 0 0.096602 6
137 # | /home/johnh/BIN/DB/dbrow
138 # | /home/johnh/BIN/DB/dbcol event clock
139 # | dbrowdiff clock
140 # | /home/johnh/BIN/DB/dbcol absdiff
141 # | dbcolstats absdiff
142 # 0.95 confidence intervals assume normal distribution and small n.
143
145 dbmultistats(1), handles multiple experiments in a single file.
146
147 dblistize(1), to pretty-print the output of dbcolstats.
148
149 dbcolpercentile(1), to compute an even more general version of
150 median/quantiles.
151
152 dbcolstatscores(1), to compute z-scores or t-scores for each row
153
154 dbrvstatdiff(1), to see if two sample populations are statistically
155 different.
156
157 Fsdb.
158
160 The algorithms used to compute variance have not been audited to check
161 for numerical stability. (See
162 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance).)
163 Variance may be incorrect when standard deviation is small relative to
164 the mean.
165
166 The field "conf_pct" implies percentage, but it's actually reported as
167 a fraction (0.95 means 95%).
168
169 Because of limits of floating point, statistics on numbers of widely
170 different scales may be incorrect. See the test cases
171 dbcolstats_extrema for examples.
172
174 new
175 $filter = new Fsdb::Filter::dbcolstats(@arguments);
176
177 Create a new dbcolstats object, taking command-line arguments.
178
179 set_defaults
180 $filter->set_defaults();
181
182 Internal: set up defaults.
183
184 parse_options
185 $filter->parse_options(@ARGV);
186
187 Internal: parse command-line arguments.
188
189 setup
190 $filter->setup();
191
192 Internal: setup, parse headers.
193
194 _round_up
195 $i = _round_up($x);
196
197 Internal: Round up to the next integer.
198
199 _compute_quantile
200 ($median, $quantile_aref) = _compute_quantile($n, $mean);
201
202 Internal: Compute quantile from the saved data. Not generalizable. We
203 assume the saved output is closed before we enter.
204
205 run_one_key
206 $filter->run_one_key();
207
208 Internal: run over each row, for a given key.
209
210 run
211 $filter->run();
212
213 Internal: run over each row, for one or many keys.
214
216 Copyright (C) 1991-2018 by John Heidemann <johnh@isi.edu>
217
218 This program is distributed under terms of the GNU general public
219 license, version 2. See the file COPYING with the distribution for
220 details.
221
222
223
224perl v5.30.0 2019-09-19 Fsdb::Filter::dbcolstats(3)