1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwz] [-c configfile] [-S starttime] [-s  samples]  [-T
11       endtime] [-v volsamples] [-Z timezone] input [...] output
12

DESCRIPTION

14       pmlogextract  reads one or more Performance Co-Pilot (PCP) archive logs
15       identified by input and creates a temporally merged and/or reduced  PCP
16       archive  log in output.  input is a comma-separated list of names, each
17       of which may be the base name of an archive or the name of a  directory
18       containing  one  or more archives.  The nature of merging is controlled
19       by the number of input archive logs, while the nature of data reduction
20       is controlled by the command line arguments.  The input(s) must be sets
21       of PCP archive logs created by pmlogger(1) with performance  data  col‐
22       lected  from the same host, but usually over different time periods and
23       possibly (although not  usually)  with  different  performance  metrics
24       being logged.
25
26       If only one input is specified, then the default behavior simply copies
27       the input set of PCP archive logs, into the  output  PCP  archive  log.
28       When  two  or more sets of PCP archive logs are specified as input, the
29       sets of logs are merged (or concatenated) and written to output.
30
31       In the output archive log a <mark> record may be  inserted  at  a  time
32       just  past the end of each of the input archive logs to indicate a pos‐
33       sible temporal discontinuity between the end of one input  archive  log
34       and the start of the next input archive log.  See the MARK RECORDS sec‐
35       tion below for more information.  There is no <mark> record  after  the
36       end of the last (in temporal order) of the input archive logs.
37

OPTIONS

39       The command line options for pmlogextract are as follows:
40
41       -c configfile
42              Extract  only the metrics specified in configfile from the input
43              PCP archive log(s).  The configfile syntax accepted by  pmlogex‐
44              tract is explained in more detail in the Configuration File Syn‐
45              tax section.
46
47       -d     Desperate mode.  Normally if a fatal error occurs, all trace  of
48              the  partially  written PCP archive output is removed.  With the
49              -d option, the output archive log is not removed.
50
51       -f     For most common uses, all of the input archive  logs  will  have
52              been  collected  in  the  same timezone.  But if this is not the
53              case, then pmlogextract must choose one of  the  timezones  from
54              the input archive logs to be used as the timezone for the output
55              archive log.  The default is to use the timezone from  the  last
56              input  archive  log.  The -f option forces the timezone from the
57              first input archive log to be used.
58
59       -m     As described in the MARK RECORDS section below, sometimes it  is
60              possible  to safely omit <mark> records from the output archive.
61              If the -m option is specified, then the  epilogue  and  prologue
62              test  is  skipped and a <mark> record will always be inserted at
63              the end of each input archive (except the last).   This  is  the
64              original behaviour for pmlogextract.
65
66       -S starttime
67              Define  the  start  of  a  time  window  to restrict the samples
68              retrieved or specify a ``natural'' alignment of the output  sam‐
69              ple times; refer to PCPIntro(1).  See also the -w option.
70
71       -s samples
72              The argument samples defines the number of samples to be written
73              to output.  If samples is 0 or -s is not specified, pmlogextract
74              will  sample until the end of the PCP archive log, or the end of
75              the time window as specified by -T, whichever comes first.   The
76              -s option will override the -T option if it occurs sooner.
77
78       -T endtime
79              Define  the termination of a time window to restrict the samples
80              retrieved or specify a ``natural'' alignment of the output  sam‐
81              ple times; refer to PCPIntro(1).  See also the -w option.
82
83       -v volsamples
84              The  output  archive log is potentially a multi-volume data set,
85              and the -v option causes pmlogextract  to  start  a  new  volume
86              after  volsamples  log  records have been written to the archive
87              log.
88
89              Independent of any -v option, each volume of an archive is  lim‐
90              ited  to no more than 2^31 bytes, so pmlogextract will automati‐
91              cally create a new volume for the archive before this  limit  is
92              reached.
93
94       -w     Where  -S  and -T specify a time window within the same day, the
95              -w flag will cause  the  data  within  the  time  window  to  be
96              extracted,  for  every day in the archive log.  For example, the
97              options -w -S @11:00 -T @15:00 specify that pmlogextract  should
98              include  archive  log  records only for the periods from 11am to
99              3pm on each day.  When -w is used, the output archive  log  will
100              contain  <mark>  records  to indicate the temporal discontinuity
101              between the end of one time window and the start of the next.
102
103       -Z timezone
104              Use timezone when displaying the date and time.  Timezone is  in
105              the  format of the environment variable TZ as described in envi‐
106              ron(7).
107
108       -z     Use the local timezone of the host from the input archive  logs.
109              The default is to initially use the timezone of the local host.
110

CONFIGURATION FILE SYNTAX

112       The  configfile  contains  metrics of interest - only those metrics (or
113       instances) mentioned explicitly or implicitly in the configuration file
114       will be included in the output archive.  Each specifications must begin
115       on a new line, and may span multiple lines in the  configuration  file.
116       Instances may also be specified, but they are optional.  The format for
117       each specification is
118
119               metric [[instance[,instance...]]]
120
121       where metric may be a leaf or a non-leaf name in the  Performance  Met‐
122       rics  Name Space (PMNS, see pmns(5)).  If a metric refers to a non-leaf
123       node in the PMNS, pmlogextract will recursively descend  the  PMNS  and
124       include all metrics corresponding to descendent leaf nodes.
125
126       Instances  are  optional, and may be specified as a list of one or more
127       space (or comma) separated names, numbers or strings (enclosed in  sin‐
128       gle  or  double  quotes).   Elements  in  the list that are numbers are
129       assumed to be internal instance identifiers  -  see  pmGetInDom(3)  for
130       more information.  If no instances are given, then all instances of the
131       associated metric(s) will be extracted.
132
133       Any additional white space is ignored and comments may be added with  a
134       `#' prefix.
135

CONFIGURATION FILE EXAMPLE

137       This is an example of a valid configfile:
138
139               #
140               # config file for pmlogextract
141               #
142
143               kernel.all.cpu
144               kernel.percpu.cpu.sys ["cpu0","cpu1"]
145               disk.dev ["dks0d1"]
146

MARK RECORDS

148       When  more  than  one input archive log contributes performance data to
149       the output archive log, then <mark> records may be inserted to indicate
150       a possible discontinuity in the performance data.
151
152       A  <mark>  record  contains  a timestamp and no performance data and is
153       used to indicate that there is a time period in  the  PCP  archive  log
154       where  we  do  not  know the values of any performance metrics, because
155       there was  no  pmlogger(1)  collecting  performance  data  during  this
156       period.  Since these periods are often associated with the restart of a
157       service or pmcd(1) or a system, there may be considerable doubt  as  to
158       the continuity of performance data across this time period.
159
160       Most  current archives are created with a prologue record at the begin‐
161       ning and an epilogue record at the end.   These  records  identify  the
162       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
163       mine that there is no discontinuity between the end of one archive  and
164       the  next  output  record,  and  as a consequence the <mark> record can
165       safely be omitted from the output archive.
166
167       The rationale behind <mark> records may be demonstrated with  an  exam‐
168       ple.   Consider  one input archive log that starts at 00:10 and ends at
169       09:15 on the same day, and another input archive  log  that  starts  at
170       09:20  on  the  same day and ends at 00:10 the following morning.  This
171       would be a very common case for archives managed and rotated by  pmlog‐
172       ger_check(1) and pmlogger_daily(1).
173
174       The output archive log created by pmlogextract would contain:
175       00:10.000   first record from first input archive log
176       ...
177       09:15.000   last record from first input archive log
178       09:15.001   <mark> record
179       09:20.000   first record from second input archive log
180       ...
181       01:10.000   last record from second input archive log
182
183       The time period where the performance data is missing starts just after
184       09:15 and ends just before 09:20.  When the output archive log is  pro‐
185       cessed  with  any of the PCP reporting tools, the <mark> record is used
186       to indicate a period of missing data.  For example using the output ar‐
187       chive  above,  imagine  one  was  reporting  the average I/O rate at 30
188       minute intervals aligned on the hour and half-hour.  The I/O count met‐
189       ric  is  a  counter,  so the average I/O rate requires two valid values
190       from consecutive sample times.  There  would  be  values  for  all  the
191       intervals  ending  at  09:00,  then  no  values at 09:30 because of the
192       <mark> record, then no values at 10:00 because the ``prior''  value  at
193       09:30  is not available, then the rate would be reported again at 10:30
194       and continue every 30 minutes until the last reported value at 01:00.
195
196       The presence of <mark> records in a PCP archive log can be  established
197       using  pmdumplog(1) where a timestamp and the annotation <mark> is used
198       to indicate a <mark> record.
199

METADATA CHECKS

201       When more than one input archive set is  specified,  pmlogextract  per‐
202       forms  a number of checks to ensure the metadata is consistent for met‐
203       rics appearing in more than one  of  the  input  archive  sets.   These
204       checks include:
205
206       * metric data type is the same
207       * metric semantics are the same
208       * metric units are the same
209       * metric  is  either  always  singular  or always has the same instance
210         domain
211       * metrics with the same name have the same PMID
212       * metrics with the same PMID have the same name
213
214       If any of these checks  fail,  pmlogextract  reports  the  details  and
215       aborts without creating the output archive.
216
217       To  address these semantic issues, use pmlogrewrite(1) to translate the
218       input archives  into  equivalent  archives  with  consistent  metdadata
219       before using pmlogextract.
220

FILES

222       For  each  of the input and output archive logs, several physical files
223       are used.
224       archive.meta
225                 metadata (metric descriptions, instance  domains,  etc.)  for
226                 the archive log
227       archive.0 initial  volume  of  metrics  values (subsequent volumes have
228                 suffixes 1, 2, ...) - for input these  files  may  have  been
229                 previously  compressed  with bzip2(1) or gzip(1) and thus may
230                 have an additional .bz2 or .gz suffix.
231       archive.index
232                 temporal index to support rapid random access  to  the  other
233                 files in the archive log.
234

PCP ENVIRONMENT

236       Environment variables with the prefix PCP_ are used to parameterize the
237       file and directory names used by PCP.  On each installation,  the  file
238       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
239       $PCP_CONF variable may be used to specify an alternative  configuration
240       file, as described in pcp.conf(5).
241

SEE ALSO

243       PCPIntro(1),  pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1), pmlo‐
244       grewrite(1), pcp.conf(5) and pcp.env(5).
245

DIAGNOSTICS

247       All error conditions detected by pmlogextract are  reported  on  stderr
248       with textual (if sometimes terse) explanation.
249
250       Should  one  of the input archive logs be corrupted (this can happen if
251       the pmlogger instance writing the log suddenly dies), then pmlogextract
252       will  detect and report the position of the corruption in the file, and
253       any subsequent information from that archive log will not be processed.
254
255       If any error is detected, pmlogextract will exit with a  non-zero  sta‐
256       tus.
257

CAVEATS

259       The  preamble  metrics  (pmcd.pmlogger.archive, pmcd.pmlogger.host, and
260       pmcd.pmlogger.port), which are automatically recorded  by  pmlogger  at
261       the  start  of the archive, may not be present in the archive output by
262       pmlogextract.  These metrics are only relevant  while  the  archive  is
263       being created, and have no significance once recording has finished.
264
265
266
267Performance Co-Pilot                  PCP                      PMLOGEXTRACT(1)
Impressum