1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwxz?]  [-c configfile] [-S  starttime]  [-s  samples]
11       [-T  endtime]  [-V  version]  [-v volsamples] [-Z timezone] input [...]
12       output
13

DESCRIPTION

15       pmlogextract reads one or  more  Performance  Co-Pilot  (PCP)  archives
16       identified  by input and creates a merged and/or reduced PCP archive in
17       output.  Each input argument is either a name or a comma-separated list
18       of  names, and each name is the name of one file from an archive or the
19       base name of an archive or the name of a directory  containing  one  or
20       more  archives.   The  nature of merging is controlled by the number of
21       input archives, while the nature of data reduction is controlled by the
22       command  line  arguments.  The input arguments must be archives created
23       by pmlogger(1) with performance data collected from the same host,  but
24       usually over different time periods and possibly (although not usually)
25       with different performance metrics being logged.
26
27       If only one input is specified, then the default behavior simply copies
28       the  input  PCP archive (with possible conversion to a newer version of
29       the archive format, see -V below), into the output PCP  archive.   When
30       two  or  more  PCP  archives  are  specified as input, the archives are
31       merged (or concatenated) and written to output.
32
33       In the output archive a <mark> record may be inserted at  a  time  just
34       past the end of each of the input archive to indicate a possible tempo‐
35       ral discontinuity between the end of one input archive and the start of
36       the  next  input  archive.  See the MARK RECORDS section below for more
37       information.  There is no <mark> record after the end of the  last  (in
38       temporal order) of the records from the input archive(s).
39

OPTIONS

41       The available command line options are:
42
43       -c config, --config=config
44            Extract  only  the  metrics specified in config from the input PCP
45            archive(s).  The config syntax accepted  by  pmlogextract  is  ex‐
46            plained in more detail in the CONFIGURATION FILE SYNTAX section.
47
48       -d, --desperate
49            Desperate  mode.   Normally  if a fatal error occurs, all trace of
50            the partially written PCP archive output is removed.  With the  -d
51            option, the output archive is not removed.
52
53       -f, --first
54            For  most  common  uses,  all of the input archives will have been
55            collected in the same timezone.  But if this is not the case, then
56            pmlogextract  must  choose one of the timezones from the input ar‐
57            chives to be used as the timezone for the output archive.  The de‐
58            fault  is to use the timezone from the last input archive.  The -f
59            option forces the timezone from the  first  input  archive  to  be
60            used.
61
62       -m, --mark
63            As  described  in  the MARK RECORDS section below, sometimes it is
64            possible to safely omit <mark> records from  the  output  archive.
65            If the -m option is specified, then the epilogue and prologue test
66            is skipped and a <mark> record will always be inserted at the  end
67            of each input archive (except the last).  This is the original be‐
68            haviour for pmlogextract.
69
70       -S starttime, --start=starttime
71            Define the start of a time window to  restrict  the  records  pro‐
72            cessed; refer to PCPIntro(1).  See also the -w option.
73
74       -s samples, --samples=samples
75            The argument samples defines the number of samples (or records) to
76            be written to output.  If samples is 0 or -s is not specified, pm‐
77            logextract  will  continue until the end of all the input archives
78            or until the end of the time window as specified by -T,  whichever
79            comes  first.  The -s option will override the -T option if it oc‐
80            curs sooner.
81
82       -T endtime, --finish=endtime
83            Define the end of a time window to restrict the records processed;
84            refer to PCPIntro(1).  See also the -w option.
85
86       -V version, --outputversion=version
87            Each  PCP  archive  has  a version for the physical record format,
88            currently 2 or 3.  By default, the output archive is created  with
89            a  version  equal  to  the maximum of the version of the input ar‐
90            chives.  The -V option may be used to explicitly force the version
91            for  output,  provided version is no smaller than the archive ver‐
92            sion that would have been chosen by the default rule.
93
94            For example, specifying -V 3 may be used to produce  a  version  3
95            output archive from input archives that could be a mixture of ver‐
96            sion 2 and/or version 3.
97
98       -v volsamples
99            The output archive is potentially a multi-volume data set, and the
100            -v  option causes pmlogextract to start a new volume after volsam‐
101            ples log records have been written to the archive.
102
103            Independent of any -v option, each volume of an archive is limited
104            to  no  more  than  2^31 bytes, so pmlogextract will automatically
105            create a new volume for the archive before this limit is reached.
106
107       -w   Where -S and -T specify a time window within the same day, the  -w
108            flag  will  cause the data within the time window to be extracted,
109            for every day in the archive.  For  example,  the  options  -w  -S
110            @11:00  -T @15:00 specify that pmlogextract should include archive
111            records only for the periods from 11am to 3pm on each  day.   When
112            -w  is used, the output archive will contain <mark> records to in‐
113            dicate the temporal discontinuity between the end of one time win‐
114            dow and the start of the next.
115
116       -x   It  is expected that the metadata (name, PMID, type, semantics and
117            units) for each metric will be consistent across all of the  input
118            PCP  archive(s) in which that metric appears.  In rare cases, e.g.
119            in development, in QA and when a PMDA is upgraded, this may not be
120            the  case and pmlogextract will report the issue and abort without
121            creating the output archive.  This is done so the problem  can  be
122            fixed  with  pmlogrewrite(1)  before retrying the merge.  In unat‐
123            tended or QA environments it may be preferable to force the  merge
124            and  omit the metrics with the mismatched metadata.  The -x option
125            does this.
126
127       -Z timezone, --timezone=timezone
128            Use timezone when displaying the date  and  time  in  diagnostics.
129            Timezone  is  in  the format of the environment variable TZ as de‐
130            scribed in environ(7).  The default is to initially use the  time‐
131            zone of the local host.
132
133       -z, --hostzone
134            Use  the local timezone of the host from the input archive(s) when
135            displaying the date and time in diagnostics.  The  default  is  to
136            initially use the timezone of the local host.
137
138       -?, --help
139            Display usage message and exit.
140

CONFIGURATION FILE SYNTAX

142       The  configfile  contains  metrics of interest - only those metrics (or
143       instances) mentioned explicitly or implicitly in the configuration file
144       will  be included in the output archive.  Each specification must begin
145       on a new line, and may span multiple lines in the  configuration  file.
146       Instances may also be specified, but they are optional.  The format for
147       each specification is
148
149               metric
150       or
151               metric [ instance ... ]
152
153       where metric may be a leaf or a non-leaf name of a metric in  the  Per‐
154       formance Metrics Name Space (PMNS, see PMNS(5)).  If a metric refers to
155       a non-leaf node in the PMNS, pmlogextract will recursively descend  the
156       PMNS and include all metrics corresponding to descendent leaf nodes.
157
158       Instances  are  optional  and  are specified as a list space (or comma)
159       separated of instance identifiers, with the  list  enclosed  by  square
160       brackets.   Each  instance  identifier may be a number or a string (en‐
161       closed in single or double quotes).  instance identifiers that are num‐
162       bers  are  assumed to be internal instance identifiers, else the string
163       values  are  assumed  to  be   external   instance   identifiers;   see
164       pmGetInDom(3)  for  more  information.  If no instances are given, then
165       all instances of the associated metric(s) will be extracted.
166
167       Any additional white space is ignored and comments may be added with  a
168       `#' prefix.
169

CONFIGURATION FILE EXAMPLE

171       This is an example of a valid configfile:
172
173               #
174               # config file for pmlogextract
175               #
176
177               kernel.all.cpu
178               kernel.percpu.cpu.sys ["cpu0","cpu1"]
179               disk.dev ["dks0d1"]
180

MARK RECORDS

182       When  more  than  one input archive contributes performance data to the
183       output archive, then <mark> records may be inserted to indicate a  pos‐
184       sible temporal discontinuity in the performance data.
185
186       A  <mark>  record  contains  a timestamp and no performance data and is
187       used to indicate that there is a time period in the PCP  archive  where
188       we do not know the values of any performance metrics, because there was
189       no pmlogger(1) collecting performance data during this  period.   Since
190       these  periods  are  often  associated with the restart of a service or
191       pmcd(1) or a system reboot, there may be considerable doubt as  to  the
192       continuity of performance data across this time period.
193
194       Most  current archives are created with a prologue record at the begin‐
195       ning and an epilogue record at the end.   These  records  identify  the
196       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
197       mine that there is no discontinuity between the end of one archive  and
198       the  next  output  record,  and  as a consequence the <mark> record can
199       safely be omitted from the output archive.
200
201       The rationale behind <mark> records may be demonstrated with  an  exam‐
202       ple.  Consider one input archive that starts at 00:10 and ends at 09:15
203       on the same day, and another input archive that starts at 09:20 on  the
204       same day and ends at 00:10 the following morning.  This would be a very
205       common case for archives managed and rotated by  pmlogger_check(1)  and
206       pmlogger_daily(1).
207
208       The output archive created by pmlogextract would contain:
209       00:10.000    first record from first input archive
210       ...
211       09:15.000    last record from first input archive
212       09:15.001    <mark> record
213       09:20.000    first record from second input archive
214       ...
215       01:10.000    last record from second input archive
216
217       The time period where the performance data is missing starts just after
218       09:15 and ends just before 09:20.  When the output archive is processed
219       with any of the PCP reporting tools, the <mark> record is used to indi‐
220       cate a period of missing data.  For example using  the  output  archive
221       above,  imagine one was reporting the average I/O rate at 30 minute in‐
222       tervals aligned on the hour and half-hour.  The I/O count metric  is  a
223       counter, so the average I/O rate requires two valid values from consec‐
224       utive sample times.  There would be values for all the intervals ending
225       at 09:00, then no values at 09:30 because of the <mark> record, then no
226       values at 10:00 because the ``prior'' value at 09:30 is not  available,
227       then  the  rate  would be reported again at 10:30 and continue every 30
228       minutes until the last reported value at 01:00.
229
230       The presence of <mark> records in a PCP archive can be established  us‐
231       ing pmdumplog(1) where a timestamp and the annotation <mark> is used to
232       indicate a <mark> record.
233

METADATA CHECKS

235       When more than one input archive is specified, pmlogextract performs  a
236       number  of  checks to ensure the metadata is consistent for metrics ap‐
237       pearing in more than one of the input archives.  These checks include:
238
239       * metric data type is the same
240       * metric semantics are the same
241       * metric units are the same
242       * metric is either always singular or always has the same instance  do‐
243         main
244       * metrics with the same name have the same PMID
245       * metrics with the same PMID have the same name
246
247       If  any  of  these  checks  fail,  pmlogextract reports the details and
248       aborts without creating the output archive.
249
250       To address these semantic issues, use pmlogrewrite(1) to translate  the
251       input archives into equivalent archives with consistent metadata before
252       using pmlogextract.
253
254       Refer to the -x and -d command line options above for  alternatives  to
255       the default handling of errors during metadata checks.
256

CAVEATS

258       The  prologue  metrics  (pmcd.pmlogger.archive, pmcd.pmlogger.host, and
259       pmcd.pmlogger.port), which are automatically recorded  by  pmlogger  at
260       the  start  of the archive, may not be present in the archive output by
261       pmlogextract.  These metrics are only relevant while the archive is be‐
262       ing created, and have no significance once recording has finished.
263

DIAGNOSTICS

265       All  error  conditions  detected by pmlogextract are reported on stderr
266       with textual (if sometimes terse) explanation.
267
268       If one of the input  archives  contains  no  archive  records  then  an
269       ``empty archive'' warning is issued and that archive is skipped.
270
271       Should one of the input archive(s) be corrupted (this can happen if the
272       pmlogger instance writing the archive suddenly dies), then pmlogextract
273       will  detect and report the position of the corruption in the file, and
274       any subsequent information from that archive will not be processed.
275
276       If any error is detected, pmlogextract will exit with a  non-zero  sta‐
277       tus.
278

FILES

280       For  each  of  the input and output archive, several physical files are
281       used.
282
283       archive.meta
284            metadata (metric descriptions, instance domains, etc.) for the ar‐
285            chive
286
287       archive.0
288            initial volume of metrics values (subsequent volumes have suffixes
289            1, 2, ...) - for input these files may have been  previously  com‐
290            pressed  with  bzip2(1) or gzip(1) and thus may have an additional
291            .bz2 or .gz suffix.
292
293       archive.index
294            temporal index to support rapid random access to the  other  files
295            in the archive.
296

PCP ENVIRONMENT

298       Environment variables with the prefix PCP_ are used to parameterize the
299       file and directory names used by PCP.  On each installation,  the  file
300       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
301       $PCP_CONF variable may be used to specify an alternative  configuration
302       file, as described in pcp.conf(5).
303
304       For environment variables affecting PCP tools, see pmGetOptions(3).
305

SEE ALSO

307       PCPIntro(1),  pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1), pmlo‐
308       grewrite(1), pcp.conf(5), pcp.env(5) and PMNS(5).
309
310
311
312Performance Co-Pilot                  PCP                      PMLOGEXTRACT(1)
Impressum