1PMLOGEXTRACT(1) General Commands Manual PMLOGEXTRACT(1)
2
3
4
6 pmlogextract - reduce, extract, concatenate and merge Performance Co-
7 Pilot archives
8
10 pmlogextract [-dfmwxz] [-c configfile] [-S starttime] [-s samples] [-T
11 endtime] [-v volsamples] [-Z timezone] input [...] output
12
14 pmlogextract reads one or more Performance Co-Pilot (PCP) archive logs
15 identified by input and creates a temporally merged and/or reduced PCP
16 archive log in output. input is a comma-separated list of names, each
17 of which may be the base name of an archive or the name of a directory
18 containing one or more archives. The nature of merging is controlled
19 by the number of input archive logs, while the nature of data reduction
20 is controlled by the command line arguments. The input(s) must be sets
21 of PCP archive logs created by pmlogger(1) with performance data col‐
22 lected from the same host, but usually over different time periods and
23 possibly (although not usually) with different performance metrics
24 being logged.
25
26 If only one input is specified, then the default behavior simply copies
27 the input set of PCP archive logs, into the output PCP archive log.
28 When two or more sets of PCP archive logs are specified as input, the
29 sets of logs are merged (or concatenated) and written to output.
30
31 In the output archive log a <mark> record may be inserted at a time
32 just past the end of each of the input archive logs to indicate a pos‐
33 sible temporal discontinuity between the end of one input archive log
34 and the start of the next input archive log. See the MARK RECORDS sec‐
35 tion below for more information. There is no <mark> record after the
36 end of the last (in temporal order) of the input archive logs.
37
39 The command line options for pmlogextract are as follows:
40
41 -c configfile
42 Extract only the metrics specified in configfile from the input
43 PCP archive log(s). The configfile syntax accepted by pmlogex‐
44 tract is explained in more detail in the Configuration File Syn‐
45 tax section.
46
47 -d Desperate mode. Normally if a fatal error occurs, all trace of
48 the partially written PCP archive output is removed. With the
49 -d option, the output archive log is not removed.
50
51 -f For most common uses, all of the input archive logs will have
52 been collected in the same timezone. But if this is not the
53 case, then pmlogextract must choose one of the timezones from
54 the input archive logs to be used as the timezone for the output
55 archive log. The default is to use the timezone from the last
56 input archive log. The -f option forces the timezone from the
57 first input archive log to be used.
58
59 -m As described in the MARK RECORDS section below, sometimes it is
60 possible to safely omit <mark> records from the output archive.
61 If the -m option is specified, then the epilogue and prologue
62 test is skipped and a <mark> record will always be inserted at
63 the end of each input archive (except the last). This is the
64 original behaviour for pmlogextract.
65
66 -S starttime
67 Define the start of a time window to restrict the samples
68 retrieved or specify a ``natural'' alignment of the output sam‐
69 ple times; refer to PCPIntro(1). See also the -w option.
70
71 -s samples
72 The argument samples defines the number of samples to be written
73 to output. If samples is 0 or -s is not specified, pmlogextract
74 will sample until the end of the PCP archive log, or the end of
75 the time window as specified by -T, whichever comes first. The
76 -s option will override the -T option if it occurs sooner.
77
78 -T endtime
79 Define the termination of a time window to restrict the samples
80 retrieved or specify a ``natural'' alignment of the output sam‐
81 ple times; refer to PCPIntro(1). See also the -w option.
82
83 -v volsamples
84 The output archive log is potentially a multi-volume data set,
85 and the -v option causes pmlogextract to start a new volume
86 after volsamples log records have been written to the archive
87 log.
88
89 Independent of any -v option, each volume of an archive is lim‐
90 ited to no more than 2^31 bytes, so pmlogextract will automati‐
91 cally create a new volume for the archive before this limit is
92 reached.
93
94 -w Where -S and -T specify a time window within the same day, the
95 -w flag will cause the data within the time window to be
96 extracted, for every day in the archive log. For example, the
97 options -w -S @11:00 -T @15:00 specify that pmlogextract should
98 include archive log records only for the periods from 11am to
99 3pm on each day. When -w is used, the output archive log will
100 contain <mark> records to indicate the temporal discontinuity
101 between the end of one time window and the start of the next.
102
103 -x It is expected that the metadata (name, PMID, type, semantics
104 and units) for each metric will be consistent across all of the
105 input PCP archive log(s) in which that metric appears. In rare
106 cases, e.g. in development, in QA and when a PMDA is upgraded,
107 this may not be the case and pmlogextract will report the issue
108 and abort without creating the output archive log. This is done
109 so the problem can be fixed with pmlogrewrite(1) before retrying
110 the merge. In unattended or QA environments it may be prefer‐
111 able to force the merge and omit the metrics with the mismatched
112 metadata. The -x option does this.
113
114 -Z timezone
115 Use timezone when displaying the date and time. Timezone is in
116 the format of the environment variable TZ as described in envi‐
117 ron(7).
118
119 -z Use the local timezone of the host from the input archive logs.
120 The default is to initially use the timezone of the local host.
121
123 The configfile contains metrics of interest - only those metrics (or
124 instances) mentioned explicitly or implicitly in the configuration file
125 will be included in the output archive. Each specifications must begin
126 on a new line, and may span multiple lines in the configuration file.
127 Instances may also be specified, but they are optional. The format for
128 each specification is
129
130 metric [[instance[,instance...]]]
131
132 where metric may be a leaf or a non-leaf name in the Performance Met‐
133 rics Name Space (PMNS, see PMNS(5)). If a metric refers to a non-leaf
134 node in the PMNS, pmlogextract will recursively descend the PMNS and
135 include all metrics corresponding to descendent leaf nodes.
136
137 Instances are optional, and may be specified as a list of one or more
138 space (or comma) separated names, numbers or strings (enclosed in sin‐
139 gle or double quotes). Elements in the list that are numbers are
140 assumed to be internal instance identifiers - see pmGetInDom(3) for
141 more information. If no instances are given, then all instances of the
142 associated metric(s) will be extracted.
143
144 Any additional white space is ignored and comments may be added with a
145 `#' prefix.
146
148 This is an example of a valid configfile:
149
150 #
151 # config file for pmlogextract
152 #
153
154 kernel.all.cpu
155 kernel.percpu.cpu.sys ["cpu0","cpu1"]
156 disk.dev ["dks0d1"]
157
159 When more than one input archive log contributes performance data to
160 the output archive log, then <mark> records may be inserted to indicate
161 a possible discontinuity in the performance data.
162
163 A <mark> record contains a timestamp and no performance data and is
164 used to indicate that there is a time period in the PCP archive log
165 where we do not know the values of any performance metrics, because
166 there was no pmlogger(1) collecting performance data during this
167 period. Since these periods are often associated with the restart of a
168 service or pmcd(1) or a system, there may be considerable doubt as to
169 the continuity of performance data across this time period.
170
171 Most current archives are created with a prologue record at the begin‐
172 ning and an epilogue record at the end. These records identify the
173 state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
174 mine that there is no discontinuity between the end of one archive and
175 the next output record, and as a consequence the <mark> record can
176 safely be omitted from the output archive.
177
178 The rationale behind <mark> records may be demonstrated with an exam‐
179 ple. Consider one input archive log that starts at 00:10 and ends at
180 09:15 on the same day, and another input archive log that starts at
181 09:20 on the same day and ends at 00:10 the following morning. This
182 would be a very common case for archives managed and rotated by pmlog‐
183 ger_check(1) and pmlogger_daily(1).
184
185 The output archive log created by pmlogextract would contain:
186 00:10.000 first record from first input archive log
187 ...
188 09:15.000 last record from first input archive log
189 09:15.001 <mark> record
190 09:20.000 first record from second input archive log
191 ...
192 01:10.000 last record from second input archive log
193
194 The time period where the performance data is missing starts just after
195 09:15 and ends just before 09:20. When the output archive log is pro‐
196 cessed with any of the PCP reporting tools, the <mark> record is used
197 to indicate a period of missing data. For example using the output ar‐
198 chive above, imagine one was reporting the average I/O rate at 30
199 minute intervals aligned on the hour and half-hour. The I/O count met‐
200 ric is a counter, so the average I/O rate requires two valid values
201 from consecutive sample times. There would be values for all the
202 intervals ending at 09:00, then no values at 09:30 because of the
203 <mark> record, then no values at 10:00 because the ``prior'' value at
204 09:30 is not available, then the rate would be reported again at 10:30
205 and continue every 30 minutes until the last reported value at 01:00.
206
207 The presence of <mark> records in a PCP archive log can be established
208 using pmdumplog(1) where a timestamp and the annotation <mark> is used
209 to indicate a <mark> record.
210
212 When more than one input archive set is specified, pmlogextract per‐
213 forms a number of checks to ensure the metadata is consistent for met‐
214 rics appearing in more than one of the input archive sets. These
215 checks include:
216
217 * metric data type is the same
218 * metric semantics are the same
219 * metric units are the same
220 * metric is either always singular or always has the same instance
221 domain
222 * metrics with the same name have the same PMID
223 * metrics with the same PMID have the same name
224
225 If any of these checks fail, pmlogextract reports the details and
226 aborts without creating the output archive.
227
228 To address these semantic issues, use pmlogrewrite(1) to translate the
229 input archives into equivalent archives with consistent metdadata
230 before using pmlogextract.
231
233 For each of the input and output archive logs, several physical files
234 are used.
235 archive.meta
236 metadata (metric descriptions, instance domains, etc.) for
237 the archive log
238 archive.0 initial volume of metrics values (subsequent volumes have
239 suffixes 1, 2, ...) - for input these files may have been
240 previously compressed with bzip2(1) or gzip(1) and thus may
241 have an additional .bz2 or .gz suffix.
242 archive.index
243 temporal index to support rapid random access to the other
244 files in the archive log.
245
247 Environment variables with the prefix PCP_ are used to parameterize the
248 file and directory names used by PCP. On each installation, the file
249 /etc/pcp.conf contains the local values for these variables. The
250 $PCP_CONF variable may be used to specify an alternative configuration
251 file, as described in pcp.conf(5).
252
254 PCPIntro(1), pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1), pmlo‐
255 grewrite(1), pcp.conf(5), pcp.env(5) and PMNS(5).
256
258 All error conditions detected by pmlogextract are reported on stderr
259 with textual (if sometimes terse) explanation.
260
261 Should one of the input archive logs be corrupted (this can happen if
262 the pmlogger instance writing the log suddenly dies), then pmlogextract
263 will detect and report the position of the corruption in the file, and
264 any subsequent information from that archive log will not be processed.
265
266 If any error is detected, pmlogextract will exit with a non-zero sta‐
267 tus.
268
270 The preamble metrics (pmcd.pmlogger.archive, pmcd.pmlogger.host, and
271 pmcd.pmlogger.port), which are automatically recorded by pmlogger at
272 the start of the archive, may not be present in the archive output by
273 pmlogextract. These metrics are only relevant while the archive is
274 being created, and have no significance once recording has finished.
275
276
277
278Performance Co-Pilot PCP PMLOGEXTRACT(1)