1DBCOLSTATS(1)         User Contributed Perl Documentation        DBCOLSTATS(1)
2
3
4

NAME

6       dbcolstats - compute statistics on a fsdb column
7

SYNOPSIS

9       dbcolstats  [-amS] [-c ConfidenceFraction] [-q NumberOfQuantiles]
10       column
11

DESCRIPTION

13       Compute statistics over a COLUMN of data.  Records containing non-
14       numeric data are considered null do not contribute to the stats (with
15       the "-a" option they are treated as zeros).
16
17       Confidence intervals are a t-test (+/- (t_{a/2})*s/sqrt(n)) and assume
18       the population takes a normal distribution with a small number of
19       samples (< 100).
20
21       By default, all statistics are computed for as a population sample
22       (with an ``n-1'' term), not as representing the whole population (using
23       ``n'').  Select between them with --sample or --nosample.  When you
24       measure the entire population, use the latter option.
25
26       The output of this program is probably best looked at after
27       reformatting with dblistize.
28
29       Dbcolstats runs in O(1) memory.  Median or quantile requires sorting
30       the data and invokes dbsort.  Sorting will run in constant RAM but
31       O(number of records) disk space.  If median or quantile is required and
32       the data is already sorted, dbcolstats will run more efficiently with
33       the -S option.
34

OPTIONS

36       -a or --include-non-numeric
37           Compute stats over all records (treat non-numeric records as zero
38           rather than just ignoring them).
39
40       -c FRACTION or --confidence FRACTION
41           Specify FRACTION for the confidence interval.  Defaults to 0.95 for
42           a 95% confidence factor.
43
44       -f FORMAT or --format FORMAT
45           Specify a printf(3)-style format for output statistics.  Defaults
46           to "%.5g".
47
48       -m or --median
49           Compute median value.  (Will sort data if necessary.)  (Median is
50           the quantitle for N=2.)
51
52       -q N or --quantile N
53           Compute quantile (quartile when N is 4), or an arbitrary quantile
54           for other values of N, where the scores that are 1 Nth of the way
55           across the population.
56
57       --sample
58           Compute sample population statistics (e.g., the sample standard
59           deviation), assuming n-1 degrees of freedom.
60
61       --nosample
62           Compute whole population statistics (e.g., the population standard
63           devation).
64
65       -S or --pre-sorted
66           Assume data is already sorted.  With one -S, we check and confirm
67           this precondition.  When repeated, we skip the check.  (This flag
68           is ignored if quartiles are not requested.)
69
70       --parallelism=N or "-j N"
71           Allow sorting to happen in parallel.  Defaults on.  (Only relevant
72           if using non-pre-sorted data with quantiles.)
73
74       -F or --fs or --fieldseparator S
75           Specify the field (column) separator as "S".  See dbfilealter for
76           valid field separators.
77
78       -T TmpDir
79           where to put temporary data.  Only used if median or quantiles are
80           requested.  Also uses environment variable TMPDIR, if -T is not
81           specified.  Default is /tmp.
82
83       -k KeyField
84           Do multi-stats, grouped by each key.  Assumes keys are sorted.
85           (Use dbmultistats to guarantee sorting order.)
86
87       --output-on-no-input
88           Enables null output (all fields are "-", n is 0) if we get input
89           with a schema but no records.  Without this option, just output the
90           schema but no rows.  Default: no output if no input.
91
92       This module also supports the standard fsdb options:
93
94       -d  Enable debugging output.
95
96       -i or --input InputSource
97           Read from InputSource, typically a file name, or "-" for standard
98           input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue
99           objects.
100
101       -o or --output OutputDestination
102           Write to OutputDestination, typically a file name, or "-" for
103           standard output, or (if in Perl) a IO::Handle, Fsdb::IO or
104           Fsdb::BoundedQueue objects.
105
106       --autorun or --noautorun
107           By default, programs process automatically, but Fsdb::Filter
108           objects in Perl do not run until you invoke the run() method.  The
109           "--(no)autorun" option controls that behavior within Perl.
110
111       --help
112           Show help.
113
114       --man
115           Show full manual.
116

SAMPLE USAGE

118   Input:
119           #fsdb      absdiff
120           0
121           0.046953
122           0.072074
123           0.075413
124           0.094088
125           0.096602
126           #  | /home/johnh/BIN/DB/dbrow
127           #  | /home/johnh/BIN/DB/dbcol event clock
128           #  | dbrowdiff clock
129           #  | /home/johnh/BIN/DB/dbcol absdiff
130
131   Command:
132           cat data.fsdb | dbcolstats absdiff
133
134   Output:
135           #fsdb mean:d stddev:d pct_rsd:d conf_range:d conf_low:d conf_high:d conf_pct:d sum:d sum_squared:d min:d max:d n:q
136           0.064188        0.036194        56.387  0.037989        0.026199        0.102180.95     0.38513 0.031271        0       0.096602        6
137           #  | /home/johnh/BIN/DB/dbrow
138           #  | /home/johnh/BIN/DB/dbcol event clock
139           #  | dbrowdiff clock
140           #  | /home/johnh/BIN/DB/dbcol absdiff
141           #  | dbcolstats absdiff
142           #               0.95 confidence intervals assume normal distribution and small n.
143

SEE ALSO

145       dbmultistats(1), handles multiple experiments in a single file.
146
147       dblistize(1), to  pretty-print the output of dbcolstats.
148
149       dbcolpercentile(1), to compute an even more general version of
150       median/quantiles.
151
152       dbcolstatscores(1), to compute z-scores or t-scores for each row
153
154       dbrvstatdiff(1), to see if two sample populations are statistically
155       different.
156
157       Fsdb.
158

BUGS

160       The algorithms used to compute variance have not been audited to check
161       for numerical stability.  (See
162       http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance).)
163       Variance may be incorrect when standard deviation is small relative to
164       the mean.
165
166       The field "conf_pct" implies percentage, but it's actually reported as
167       a fraction (0.95 means 95%).
168
169       Because of limits of floating point, statistics on numbers of widely
170       different scales may be incorrect.  See the test cases
171       dbcolstats_extrema for examples.
172
174       Copyright (C) 1991-2018 by John Heidemann <johnh@isi.edu>
175
176       This program is distributed under terms of the GNU general public
177       license, version 2.  See the file COPYING with the distribution for
178       details.
179
180
181
182perl v5.34.1                      2022-04-04                     DBCOLSTATS(1)
Impressum