1DBMULTISTATS(1)       User Contributed Perl Documentation      DBMULTISTATS(1)
2
3
4

NAME

6       dbmultistats - run dbcolstats over each group of inputs identified by
7       some key
8

SYNOPSIS

10       $0 [-dm] [-c ConfidencePercent] [-f FormatForm] [-q NumberOfQuartiles]
11       -k KeyField ValueField
12

DESCRIPTION

14       The input table is grouped by KeyField, then we compute a separate set
15       of column statistics on ValueField for each group with a unique key.
16
17       Assumptions and requirements are the same as dbmapreduce (this program
18       is just a wrapper around that program):
19
20       By default, data can be provided in arbitrary order and the program
21       consumes O(number of unique tags) memory, and O(size of data) disk
22       space.
23
24       With the -S option, data must arrive group by tags (not necessarily
25       sorted), and the program consumes O(number of tags) memory and no disk
26       space.  The program will check and abort if this precondition is not
27       met.
28
29       With two -S's, program consumes O(1) memory, but doesn't verify that
30       the data-arrival precondition is met.
31
32       (Note that these semantics are exactly like
33           dbmapreduce -k KeyField -- dbcolstats ValueField dbmultistats
34       provides a simpler API that passes through statistics-specific
35       arguments and is optimized when data is pre-sorted and there are no
36       quarties or medians.)
37

OPTIONS

39       Options are the same as dbcolstats.
40
41       -k or --key KeyField
42           specify which column is the key for grouping (default: the first
43           column)
44
45       --output-on-no-input
46           Enables null output (all fields are "-", n is 0) if we get input
47           with a schema but no records.  Without this option, just output the
48           schema but no rows.  Default: no output if no input.
49
50       -a or --include-non-numeric
51           Compute stats over all records (treat non-numeric records as zero
52           rather than just ignoring them).
53
54       -c FRACTION or --confidence FRACTION
55           Specify FRACTION for the confidence interval.  Defaults to 0.95 for
56           a 95% confidence factor.
57
58       -f FORMAT or --format FORMAT
59           Specify a printf(3)-style format for output statistics.  Defaults
60           to "%.5g".
61
62       -m or --median
63           Compute median value.  (Will sort data if necessary.)  (Median is
64           the quantitle for N=2.)
65
66       -q N or --quantile N
67           Compute quantile (quartile when N is 4), or an arbitrary quantile
68           for other values of N, where the scores that are 1 Nth of the way
69           across the population.
70
71       -S or --pre-sorted
72           Assume data is already sorted.  With one -S, we check and confirm
73           this precondition.  When repeated, we skip the check.
74
75       -T TmpDir
76           where to put temporary data.  Only used if median or quantiles are
77           requested.  Also uses environment variable TMPDIR, if -T is not
78           specified.  Default is /tmp.
79
80       --parallelism=N or -j N
81           Allow up to N reducers to run in parallel.  Default is the number
82           of CPUs in the machine.
83
84       -F or --fs or --fieldseparator S
85           Specify the field (column) separator as "S".  See dbfilealter for
86           valid field separators.
87
88       This module also supports the standard fsdb options:
89
90       -d  Enable debugging output.
91
92       -i or --input InputSource
93           Read from InputSource, typically a file name, or "-" for standard
94           input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue
95           objects.
96
97       -o or --output OutputDestination
98           Write to OutputDestination, typically a file name, or "-" for
99           standard output, or (if in Perl) a IO::Handle, Fsdb::IO or
100           Fsdb::BoundedQueue objects.
101
102       --autorun or --noautorun
103           By default, programs process automatically, but Fsdb::Filter
104           objects in Perl do not run until you invoke the run() method.  The
105           "--(no)autorun" option controls that behavior within Perl.
106
107       --header H
108           Use H as the full Fsdb header, rather than reading a header from
109           then input.
110
111       --help
112           Show help.
113
114       --man
115           Show full manual.
116

SAMPLE USAGE

118   Input:
119           #fsdb experiment duration
120           ufs_mab_sys 37.2
121           ufs_mab_sys 37.3
122           ufs_rcp_real 264.5
123           ufs_rcp_real 277.9
124
125   Command:
126           cat DATA/stats.fsdb | dbmultistats -k experiment duration
127
128   Output:
129           #fsdb      experiment      mean    stddev  pct_rsd conf_range      conf_low       conf_high        conf_pct        sum     sum_squared     min     max     n
130           ufs_mab_sys     37.25 0.070711 0.18983 0.6353 36.615 37.885 0.95 74.5 2775.1 37.2 37.3 2
131           ufs_rcp_real    271.2 9.4752 3.4938 85.13 186.07 356.33 0.95 542.4 1.4719e+05 264.5 277.9 2
132           #  | /home/johnh/BIN/DB/dbmultistats experiment duration
133

SEE ALSO

135       Fsdb.  dbmapreduce.  dbcolstats.
136
138       Copyright (C) 1991-2015 by John Heidemann <johnh@isi.edu>
139
140       This program is distributed under terms of the GNU general public
141       license, version 2.  See the file COPYING with the distribution for
142       details.
143
144
145
146perl v5.34.1                      2022-04-04                   DBMULTISTATS(1)
Impressum