1Fsdb::Filter::dbmultistUastesr(3C)ontributed Perl DocumeFnstdabt:i:oFnilter::dbmultistats(3)
2
3
4
6 dbmultistats - run dbcolstats over each group of inputs identified by
7 some key
8
10 $0 [-dm] [-c ConfidencePercent] [-f FormatForm] [-q NumberOfQuartiles]
11 -k KeyField ValueField
12
14 The input table is grouped by KeyField, then we compute a separate set
15 of column statistics on ValueField for each group with a unique key.
16
17 Assumptions and requirements are the same as dbmapreduce (this program
18 is just a wrapper around that program):
19
20 By default, data can be provided in arbitrary order and the program
21 consumes O(number of unique tags) memory, and O(size of data) disk
22 space.
23
24 With the -S option, data must arrive group by tags (not necessarily
25 sorted), and the program consumes O(number of tags) memory and no disk
26 space. The program will check and abort if this precondition is not
27 met.
28
29 With two -S's, program consumes O(1) memory, but doesn't verify that
30 the data-arrival precondition is met.
31
32 (Note that these semantics are exactly like
33 dbmapreduce -k KeyField -- dbcolstats ValueField dbmultistats
34 provides a simpler API that passes through statistics-specific
35 arguments and is optimized when data is pre-sorted and there are no
36 quarties or medians.)
37
39 Options are the same as dbcolstats.
40
41 -k or --key KeyField
42 specify which column is the key for grouping (default: the first
43 column)
44
45 --output-on-no-input
46 Enables null output (all fields are "-", n is 0) if we get input
47 with a schema but no records. Without this option, just output the
48 schema but no rows. Default: no output if no input.
49
50 -a or --include-non-numeric
51 Compute stats over all records (treat non-numeric records as zero
52 rather than just ignoring them).
53
54 -c FRACTION or --confidence FRACTION
55 Specify FRACTION for the confidence interval. Defaults to 0.95 for
56 a 95% confidence factor.
57
58 -f FORMAT or --format FORMAT
59 Specify a printf(3)-style format for output statistics. Defaults
60 to "%.5g".
61
62 -m or --median
63 Compute median value. (Will sort data if necessary.) (Median is
64 the quantitle for N=2.)
65
66 -q N or --quantile N
67 Compute quantile (quartile when N is 4), or an arbitrary quantile
68 for other values of N, where the scores that are 1 Nth of the way
69 across the population.
70
71 -S or --pre-sorted
72 Assume data is already sorted. With one -S, we check and confirm
73 this precondition. When repeated, we skip the check.
74
75 -T TmpDir
76 where to put temporary data. Only used if median or quantiles are
77 requested. Also uses environment variable TMPDIR, if -T is not
78 specified. Default is /tmp.
79
80 --parallelism=N or -j N
81 Allow up to N reducers to run in parallel. Default is the number
82 of CPUs in the machine.
83
84 -F or --fs or --fieldseparator S
85 Specify the field (column) separator as "S". See dbfilealter for
86 valid field separators.
87
88 This module also supports the standard fsdb options:
89
90 -d Enable debugging output.
91
92 -i or --input InputSource
93 Read from InputSource, typically a file name, or "-" for standard
94 input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue
95 objects.
96
97 -o or --output OutputDestination
98 Write to OutputDestination, typically a file name, or "-" for
99 standard output, or (if in Perl) a IO::Handle, Fsdb::IO or
100 Fsdb::BoundedQueue objects.
101
102 --autorun or --noautorun
103 By default, programs process automatically, but Fsdb::Filter
104 objects in Perl do not run until you invoke the run() method. The
105 "--(no)autorun" option controls that behavior within Perl.
106
107 --help
108 Show help.
109
110 --man
111 Show full manual.
112
114 Input:
115 #fsdb experiment duration
116 ufs_mab_sys 37.2
117 ufs_mab_sys 37.3
118 ufs_rcp_real 264.5
119 ufs_rcp_real 277.9
120
121 Command:
122 cat DATA/stats.fsdb | dbmultistats -k experiment duration
123
124 Output:
125 #fsdb experiment mean stddev pct_rsd conf_range conf_low conf_high conf_pct sum sum_squared min max n
126 ufs_mab_sys 37.25 0.070711 0.18983 0.6353 36.615 37.885 0.95 74.5 2775.1 37.2 37.3 2
127 ufs_rcp_real 271.2 9.4752 3.4938 85.13 186.07 356.33 0.95 542.4 1.4719e+05 264.5 277.9 2
128 # | /home/johnh/BIN/DB/dbmultistats experiment duration
129
131 Fsdb. dbmapreduce. dbcolstats.
132
134 new
135 $filter = new Fsdb::Filter::dbmultistats(@arguments);
136
137 Create a new dbmultistats object, taking command-line arguments.
138
139 set_defaults
140 $filter->set_defaults();
141
142 Internal: set up defaults.
143
144 parse_options
145 $filter->parse_options(@ARGV);
146
147 Internal: parse command-line arguments.
148
149 setup
150 $filter->setup();
151
152 Internal: setup, parse headers.
153
154 Pass the right options to dbmapreduce and dbcolstats.
155
156 run
157 $filter->run();
158
159 Internal: run over each rows.
160
161 finish
162 $filter->finish();
163
164 Internal: write trailer.
165
167 Copyright (C) 1991-2015 by John Heidemann <johnh@isi.edu>
168
169 This program is distributed under terms of the GNU general public
170 license, version 2. See the file COPYING with the distribution for
171 details.
172
173
174
175perl v5.30.1 2020-01-30 Fsdb::Filter::dbmultistats(3)