1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwxz] [-c configfile] [-S starttime] [-s samples]  [-T
11       endtime] [-v volsamples] [-Z timezone] input [...] output
12

DESCRIPTION

14       pmlogextract  reads one or more Performance Co-Pilot (PCP) archive logs
15       identified by input and creates a temporally merged and/or reduced  PCP
16       archive  log in output.  input is a comma-separated list of names, each
17       of which may be the base name of an archive or the name of a  directory
18       containing  one  or more archives.  The nature of merging is controlled
19       by the number of input archive logs, while the nature of data reduction
20       is controlled by the command line arguments.  The input(s) must be sets
21       of PCP archive logs created by pmlogger(1) with performance  data  col‐
22       lected  from the same host, but usually over different time periods and
23       possibly (although not  usually)  with  different  performance  metrics
24       being logged.
25
26       If only one input is specified, then the default behavior simply copies
27       the input set of PCP archive logs, into the  output  PCP  archive  log.
28       When  two  or more sets of PCP archive logs are specified as input, the
29       sets of logs are merged (or concatenated) and written to output.
30
31       In the output archive log a <mark> record may be  inserted  at  a  time
32       just  past the end of each of the input archive logs to indicate a pos‐
33       sible temporal discontinuity between the end of one input  archive  log
34       and the start of the next input archive log.  See the MARK RECORDS sec‐
35       tion below for more information.  There is no <mark> record  after  the
36       end of the last (in temporal order) of the input archive logs.
37

OPTIONS

39       The command line options for pmlogextract are as follows:
40
41       -c configfile
42              Extract  only the metrics specified in configfile from the input
43              PCP archive log(s).  The configfile syntax accepted by  pmlogex‐
44              tract is explained in more detail in the Configuration File Syn‐
45              tax section.
46
47       -d     Desperate mode.  Normally if a fatal error occurs, all trace  of
48              the  partially  written PCP archive output is removed.  With the
49              -d option, the output archive log is not removed.
50
51       -f     For most common uses, all of the input archive  logs  will  have
52              been  collected  in  the  same timezone.  But if this is not the
53              case, then pmlogextract must choose one of  the  timezones  from
54              the input archive logs to be used as the timezone for the output
55              archive log.  The default is to use the timezone from  the  last
56              input  archive  log.  The -f option forces the timezone from the
57              first input archive log to be used.
58
59       -m     As described in the MARK RECORDS section below, sometimes it  is
60              possible  to safely omit <mark> records from the output archive.
61              If the -m option is specified, then the  epilogue  and  prologue
62              test  is  skipped and a <mark> record will always be inserted at
63              the end of each input archive (except the last).   This  is  the
64              original behaviour for pmlogextract.
65
66       -S starttime
67              Define  the  start  of  a  time  window  to restrict the samples
68              retrieved or specify a ``natural'' alignment of the output  sam‐
69              ple times; refer to PCPIntro(1).  See also the -w option.
70
71       -s samples
72              The argument samples defines the number of samples to be written
73              to output.  If samples is 0 or -s is not specified, pmlogextract
74              will  sample until the end of the PCP archive log, or the end of
75              the time window as specified by -T, whichever comes first.   The
76              -s option will override the -T option if it occurs sooner.
77
78       -T endtime
79              Define  the termination of a time window to restrict the samples
80              retrieved or specify a ``natural'' alignment of the output  sam‐
81              ple times; refer to PCPIntro(1).  See also the -w option.
82
83       -v volsamples
84              The  output  archive log is potentially a multi-volume data set,
85              and the -v option causes pmlogextract  to  start  a  new  volume
86              after  volsamples  log  records have been written to the archive
87              log.
88
89              Independent of any -v option, each volume of an archive is  lim‐
90              ited  to no more than 2^31 bytes, so pmlogextract will automati‐
91              cally create a new volume for the archive before this  limit  is
92              reached.
93
94       -w     Where  -S  and -T specify a time window within the same day, the
95              -w flag will cause  the  data  within  the  time  window  to  be
96              extracted,  for  every day in the archive log.  For example, the
97              options -w -S @11:00 -T @15:00 specify that pmlogextract  should
98              include  archive  log  records only for the periods from 11am to
99              3pm on each day.  When -w is used, the output archive  log  will
100              contain  <mark>  records  to indicate the temporal discontinuity
101              between the end of one time window and the start of the next.
102
103       -x     It is expected that the metadata (name,  PMID,  type,  semantics
104              and  units) for each metric will be consistent across all of the
105              input PCP archive log(s) in which that metric appears.  In  rare
106              cases,  e.g.  in development, in QA and when a PMDA is upgraded,
107              this may not be the case and pmlogextract will report the  issue
108              and abort without creating the output archive log.  This is done
109              so the problem can be fixed with pmlogrewrite(1) before retrying
110              the  merge.   In unattended or QA environments it may be prefer‐
111              able to force the merge and omit the metrics with the mismatched
112              metadata.  The -x option does this.
113
114       -Z timezone
115              Use  timezone when displaying the date and time.  Timezone is in
116              the format of the environment variable TZ as described in  envi‐
117              ron(7).
118
119       -z     Use  the local timezone of the host from the input archive logs.
120              The default is to initially use the timezone of the local host.
121

CONFIGURATION FILE SYNTAX

123       The configfile contains metrics of interest - only  those  metrics  (or
124       instances) mentioned explicitly or implicitly in the configuration file
125       will be included in the output archive.  Each specifications must begin
126       on  a  new line, and may span multiple lines in the configuration file.
127       Instances may also be specified, but they are optional.  The format for
128       each specification is
129
130               metric [[instance[,instance...]]]
131
132       where  metric  may be a leaf or a non-leaf name in the Performance Met‐
133       rics Name Space (PMNS, see PMNS(5)).  If a metric refers to a  non-leaf
134       node  in  the  PMNS, pmlogextract will recursively descend the PMNS and
135       include all metrics corresponding to descendent leaf nodes.
136
137       Instances are optional, and may be specified as a list of one  or  more
138       space  (or comma) separated names, numbers or strings (enclosed in sin‐
139       gle or double quotes).  Elements in  the  list  that  are  numbers  are
140       assumed  to  be  internal  instance identifiers - see pmGetInDom(3) for
141       more information.  If no instances are given, then all instances of the
142       associated metric(s) will be extracted.
143
144       Any  additional white space is ignored and comments may be added with a
145       `#' prefix.
146

CONFIGURATION FILE EXAMPLE

148       This is an example of a valid configfile:
149
150               #
151               # config file for pmlogextract
152               #
153
154               kernel.all.cpu
155               kernel.percpu.cpu.sys ["cpu0","cpu1"]
156               disk.dev ["dks0d1"]
157

MARK RECORDS

159       When more than one input archive log contributes  performance  data  to
160       the output archive log, then <mark> records may be inserted to indicate
161       a possible discontinuity in the performance data.
162
163       A <mark> record contains a timestamp and no  performance  data  and  is
164       used  to  indicate  that  there is a time period in the PCP archive log
165       where we do not know the values of  any  performance  metrics,  because
166       there  was  no  pmlogger(1)  collecting  performance  data  during this
167       period.  Since these periods are often associated with the restart of a
168       service  or  pmcd(1) or a system, there may be considerable doubt as to
169       the continuity of performance data across this time period.
170
171       Most current archives are created with a prologue record at the  begin‐
172       ning  and  an  epilogue  record at the end.  These records identify the
173       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
174       mine  that there is no discontinuity between the end of one archive and
175       the next output record, and as a  consequence  the  <mark>  record  can
176       safely be omitted from the output archive.
177
178       The  rationale  behind <mark> records may be demonstrated with an exam‐
179       ple.  Consider one input archive log that starts at 00:10 and  ends  at
180       09:15  on  the  same  day, and another input archive log that starts at
181       09:20 on the same day and ends at 00:10 the  following  morning.   This
182       would  be a very common case for archives managed and rotated by pmlog‐
183       ger_check(1) and pmlogger_daily(1).
184
185       The output archive log created by pmlogextract would contain:
186       00:10.000   first record from first input archive log
187       ...
188       09:15.000   last record from first input archive log
189       09:15.001   <mark> record
190       09:20.000   first record from second input archive log
191       ...
192       01:10.000   last record from second input archive log
193
194       The time period where the performance data is missing starts just after
195       09:15  and ends just before 09:20.  When the output archive log is pro‐
196       cessed with any of the PCP reporting tools, the <mark> record  is  used
197       to indicate a period of missing data.  For example using the output ar‐
198       chive above, imagine one was reporting  the  average  I/O  rate  at  30
199       minute intervals aligned on the hour and half-hour.  The I/O count met‐
200       ric is a counter, so the average I/O rate  requires  two  valid  values
201       from  consecutive  sample  times.   There  would  be values for all the
202       intervals ending at 09:00, then no  values  at  09:30  because  of  the
203       <mark>  record,  then no values at 10:00 because the ``prior'' value at
204       09:30 is not available, then the rate would be reported again at  10:30
205       and continue every 30 minutes until the last reported value at 01:00.
206
207       The  presence of <mark> records in a PCP archive log can be established
208       using pmdumplog(1) where a timestamp and the annotation <mark> is  used
209       to indicate a <mark> record.
210

METADATA CHECKS

212       When  more  than  one input archive set is specified, pmlogextract per‐
213       forms a number of checks to ensure the metadata is consistent for  met‐
214       rics  appearing  in  more  than  one  of the input archive sets.  These
215       checks include:
216
217       * metric data type is the same
218       * metric semantics are the same
219       * metric units are the same
220       * metric is either always singular or  always  has  the  same  instance
221         domain
222       * metrics with the same name have the same PMID
223       * metrics with the same PMID have the same name
224
225       If  any  of  these  checks  fail,  pmlogextract reports the details and
226       aborts without creating the output archive.
227
228       To address these semantic issues, use pmlogrewrite(1) to translate  the
229       input  archives  into  equivalent  archives  with  consistent metdadata
230       before using pmlogextract.
231

FILES

233       For each of the input and output archive logs, several  physical  files
234       are used.
235       archive.meta
236                 metadata  (metric  descriptions,  instance domains, etc.) for
237                 the archive log
238       archive.0 initial volume of metrics  values  (subsequent  volumes  have
239                 suffixes  1,  2,  ...)  - for input these files may have been
240                 previously compressed with bzip2(1) or gzip(1) and  thus  may
241                 have an additional .bz2 or .gz suffix.
242       archive.index
243                 temporal  index  to  support rapid random access to the other
244                 files in the archive log.
245

PCP ENVIRONMENT

247       Environment variables with the prefix PCP_ are used to parameterize the
248       file  and  directory names used by PCP.  On each installation, the file
249       /etc/pcp.conf contains the  local  values  for  these  variables.   The
250       $PCP_CONF  variable may be used to specify an alternative configuration
251       file, as described in pcp.conf(5).
252

SEE ALSO

254       PCPIntro(1), pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1),  pmlo‐
255       grewrite(1), pcp.conf(5), pcp.env(5) and PMNS(5).
256

DIAGNOSTICS

258       All  error  conditions  detected by pmlogextract are reported on stderr
259       with textual (if sometimes terse) explanation.
260
261       Should one of the input archive logs be corrupted (this can  happen  if
262       the pmlogger instance writing the log suddenly dies), then pmlogextract
263       will detect and report the position of the corruption in the file,  and
264       any subsequent information from that archive log will not be processed.
265
266       If  any  error is detected, pmlogextract will exit with a non-zero sta‐
267       tus.
268

CAVEATS

270       The preamble metrics  (pmcd.pmlogger.archive,  pmcd.pmlogger.host,  and
271       pmcd.pmlogger.port),  which  are  automatically recorded by pmlogger at
272       the start of the archive, may not be present in the archive  output  by
273       pmlogextract.   These  metrics  are  only relevant while the archive is
274       being created, and have no significance once recording has finished.
275
276
277
278Performance Co-Pilot                  PCP                      PMLOGEXTRACT(1)
Impressum