1PMLOGEXTRACT(1) General Commands Manual PMLOGEXTRACT(1)
2
3
4
6 pmlogextract - reduce, extract, concatenate and merge Performance Co-
7 Pilot archives
8
10 pmlogextract [-dfmwz] [-c configfile] [-S starttime] [-s samples] [-T
11 endtime] [-v volsamples] [-Z timezone] input [...] output
12
14 pmlogextract reads one or more Performance Co-Pilot (PCP) archive logs
15 identified by input and creates a temporally merged and/or reduced PCP
16 archive log in output. input is a comma-separated list of names, each
17 of which may be the base name of an archive or the name of a directory
18 containing one or more archives. The nature of merging is controlled
19 by the number of input archive logs, while the nature of data reduction
20 is controlled by the command line arguments. The input(s) must be sets
21 of PCP archive logs created by pmlogger(1) with performance data col‐
22 lected from the same host, but usually over different time periods and
23 possibly (although not usually) with different performance metrics
24 being logged.
25
26 If only one input is specified, then the default behavior simply copies
27 the input set of PCP archive logs, into the output PCP archive log.
28 When two or more sets of PCP archive logs are specified as input, the
29 sets of logs are merged (or concatenated) and written to output.
30
31 In the output archive log a <mark> record may be inserted at a time
32 just past the end of each of the input archive logs to indicate a pos‐
33 sible temporal discontinuity between the end of one input archive log
34 and the start of the next input archive log. See the MARK RECORDS sec‐
35 tion below for more information. There is no <mark> record after the
36 end of the last (in temporal order) of the input archive logs.
37
39 The command line options for pmlogextract are as follows:
40
41 -c configfile
42 Extract only the metrics specified in configfile from the input
43 PCP archive log(s). The configfile syntax accepted by pmlogex‐
44 tract is explained in more detail in the Configuration File Syn‐
45 tax section.
46
47 -d Desperate mode. Normally if a fatal error occurs, all trace of
48 the partially written PCP archive output is removed. With the
49 -d option, the output archive log is not removed.
50
51 -f For most common uses, all of the input archive logs will have
52 been collected in the same timezone. But if this is not the
53 case, then pmlogextract must choose one of the timezones from
54 the input archive logs to be used as the timezone for the output
55 archive log. The default is to use the timezone from the last
56 input archive log. The -f option forces the timezone from the
57 first input archive log to be used.
58
59 -m As described in the MARK RECORDS section below, sometimes it is
60 possible to safely omit <mark> records from the output archive.
61 If the -m option is specified, then the epilogue and prologue
62 test is skipped and a <mark> record will always be inserted at
63 the end of each input archive (except the last). This is the
64 original behaviour for pmlogextract.
65
66 -S starttime
67 Define the start of a time window to restrict the samples
68 retrieved or specify a ``natural'' alignment of the output sam‐
69 ple times; refer to PCPIntro(1). See also the -w option.
70
71 -s samples
72 The argument samples defines the number of samples to be written
73 to output. If samples is 0 or -s is not specified, pmlogextract
74 will sample until the end of the PCP archive log, or the end of
75 the time window as specified by -T, whichever comes first. The
76 -s option will override the -T option if it occurs sooner.
77
78 -T endtime
79 Define the termination of a time window to restrict the samples
80 retrieved or specify a ``natural'' alignment of the output sam‐
81 ple times; refer to PCPIntro(1). See also the -w option.
82
83 -v volsamples
84 The output archive log is potentially a multi-volume data set,
85 and the -v option causes pmlogextract to start a new volume
86 after volsamples log records have been written to the archive
87 log.
88
89 Independent of any -v option, each volume of an archive is lim‐
90 ited to no more than 2^31 bytes, so pmlogextract will automati‐
91 cally create a new volume for the archive before this limit is
92 reached.
93
94 -w Where -S and -T specify a time window within the same day, the
95 -w flag will cause the data within the time window to be
96 extracted, for every day in the archive log. For example, the
97 options -w -S @11:00 -T @15:00 specify that pmlogextract should
98 include archive log records only for the periods from 11am to
99 3pm on each day. When -w is used, the output archive log will
100 contain <mark> records to indicate the temporal discontinuity
101 between the end of one time window and the start of the next.
102
103 -Z timezone
104 Use timezone when displaying the date and time. Timezone is in
105 the format of the environment variable TZ as described in envi‐
106 ron(7).
107
108 -z Use the local timezone of the host from the input archive logs.
109 The default is to initially use the timezone of the local host.
110
112 The configfile contains metrics of interest - only those metrics (or
113 instances) mentioned explicitly or implicitly in the configuration file
114 will be included in the output archive. Each specifications must begin
115 on a new line, and may span multiple lines in the configuration file.
116 Instances may also be specified, but they are optional. The format for
117 each specification is
118
119 metric [[instance[,instance...]]]
120
121 where metric may be a leaf or a non-leaf name in the Performance Met‐
122 rics Name Space (PMNS, see pmns(5)). If a metric refers to a non-leaf
123 node in the PMNS, pmlogextract will recursively descend the PMNS and
124 include all metrics corresponding to descendent leaf nodes.
125
126 Instances are optional, and may be specified as a list of one or more
127 space (or comma) separated names, numbers or strings (enclosed in sin‐
128 gle or double quotes). Elements in the list that are numbers are
129 assumed to be internal instance identifiers - see pmGetInDom(3) for
130 more information. If no instances are given, then all instances of the
131 associated metric(s) will be extracted.
132
133 Any additional white space is ignored and comments may be added with a
134 `#' prefix.
135
137 This is an example of a valid configfile:
138
139 #
140 # config file for pmlogextract
141 #
142
143 kernel.all.cpu
144 kernel.percpu.cpu.sys ["cpu0","cpu1"]
145 disk.dev ["dks0d1"]
146
148 When more than one input archive log contributes performance data to
149 the output archive log, then <mark> records may be inserted to indicate
150 a possible discontinuity in the performance data.
151
152 A <mark> record contains a timestamp and no performance data and is
153 used to indicate that there is a time period in the PCP archive log
154 where we do not know the values of any performance metrics, because
155 there was no pmlogger(1) collecting performance data during this
156 period. Since these periods are often associated with the restart of a
157 service or pmcd(1) or a system, there may be considerable doubt as to
158 the continuity of performance data across this time period.
159
160 Most current archives are created with a prologue record at the begin‐
161 ning and an epilogue record at the end. These records identify the
162 state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
163 mine that there is no discontinuity between the end of one archive and
164 the next output record, and as a consequence the <mark> record can
165 safely be omitted from the output archive.
166
167 The rationale behind <mark> records may be demonstrated with an exam‐
168 ple. Consider one input archive log that starts at 00:10 and ends at
169 09:15 on the same day, and another input archive log that starts at
170 09:20 on the same day and ends at 00:10 the following morning. This
171 would be a very common case for archives managed and rotated by pmlog‐
172 ger_check(1) and pmlogger_daily(1).
173
174 The output archive log created by pmlogextract would contain:
175 00:10.000 first record from first input archive log
176 ...
177 09:15.000 last record from first input archive log
178 09:15.001 <mark> record
179 09:20.000 first record from second input archive log
180 ...
181 01:10.000 last record from second input archive log
182
183 The time period where the performance data is missing starts just after
184 09:15 and ends just before 09:20. When the output archive log is pro‐
185 cessed with any of the PCP reporting tools, the <mark> record is used
186 to indicate a period of missing data. For example using the output ar‐
187 chive above, imagine one was reporting the average I/O rate at 30
188 minute intervals aligned on the hour and half-hour. The I/O count met‐
189 ric is a counter, so the average I/O rate requires two valid values
190 from consecutive sample times. There would be values for all the
191 intervals ending at 09:00, then no values at 09:30 because of the
192 <mark> record, then no values at 10:00 because the ``prior'' value at
193 09:30 is not available, then the rate would be reported again at 10:30
194 and continue every 30 minutes until the last reported value at 01:00.
195
196 The presence of <mark> records in a PCP archive log can be established
197 using pmdumplog(1) where a timestamp and the annotation <mark> is used
198 to indicate a <mark> record.
199
201 When more than one input archive set is specified, pmlogextract per‐
202 forms a number of checks to ensure the metadata is consistent for met‐
203 rics appearing in more than one of the input archive sets. These
204 checks include:
205
206 * metric data type is the same
207 * metric semantics are the same
208 * metric units are the same
209 * metric is either always singular or always has the same instance
210 domain
211 * metrics with the same name have the same PMID
212 * metrics with the same PMID have the same name
213
214 If any of these checks fail, pmlogextract reports the details and
215 aborts without creating the output archive.
216
217 To address these semantic issues, use pmlogrewrite(1) to translate the
218 input archives into equivalent archives with consistent metdadata
219 before using pmlogextract.
220
222 For each of the input and output archive logs, several physical files
223 are used.
224 archive.meta
225 metadata (metric descriptions, instance domains, etc.) for
226 the archive log
227 archive.0 initial volume of metrics values (subsequent volumes have
228 suffixes 1, 2, ...) - for input these files may have been
229 previously compressed with bzip2(1) or gzip(1) and thus may
230 have an additional .bz2 or .gz suffix.
231 archive.index
232 temporal index to support rapid random access to the other
233 files in the archive log.
234
236 Environment variables with the prefix PCP_ are used to parameterize the
237 file and directory names used by PCP. On each installation, the file
238 /etc/pcp.conf contains the local values for these variables. The
239 $PCP_CONF variable may be used to specify an alternative configuration
240 file, as described in pcp.conf(5).
241
243 PCPIntro(1), pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1), pmlo‐
244 grewrite(1), pcp.conf(5) and pcp.env(5).
245
247 All error conditions detected by pmlogextract are reported on stderr
248 with textual (if sometimes terse) explanation.
249
250 Should one of the input archive logs be corrupted (this can happen if
251 the pmlogger instance writing the log suddenly dies), then pmlogextract
252 will detect and report the position of the corruption in the file, and
253 any subsequent information from that archive log will not be processed.
254
255 If any error is detected, pmlogextract will exit with a non-zero sta‐
256 tus.
257
259 The preamble metrics (pmcd.pmlogger.archive, pmcd.pmlogger.host, and
260 pmcd.pmlogger.port), which are automatically recorded by pmlogger at
261 the start of the archive, may not be present in the archive output by
262 pmlogextract. These metrics are only relevant while the archive is
263 being created, and have no significance once recording has finished.
264
265
266
267Performance Co-Pilot PCP PMLOGEXTRACT(1)