1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwxz?]  [-c configfile] [-S  starttime]  [-s  samples]
11       [-T endtime] [-v volsamples] [-Z timezone] input [...] output
12

DESCRIPTION

14       pmlogextract  reads one or more Performance Co-Pilot (PCP) archive logs
15       identified by input and creates a temporally merged and/or reduced  PCP
16       archive  log in output.  input is a comma-separated list of names, each
17       of which may be the base name of an archive or the name of a  directory
18       containing  one  or more archives.  The nature of merging is controlled
19       by the number of input archive logs, while the nature of data reduction
20       is controlled by the command line arguments.  The input(s) must be sets
21       of PCP archive logs created by pmlogger(1) with performance  data  col‐
22       lected  from the same host, but usually over different time periods and
23       possibly (although not  usually)  with  different  performance  metrics
24       being logged.
25
26       If only one input is specified, then the default behavior simply copies
27       the input set of PCP archive logs, into the  output  PCP  archive  log.
28       When  two  or more sets of PCP archive logs are specified as input, the
29       sets of logs are merged (or concatenated) and written to output.
30
31       In the output archive log a <mark> record may be  inserted  at  a  time
32       just  past the end of each of the input archive logs to indicate a pos‐
33       sible temporal discontinuity between the end of one input  archive  log
34       and the start of the next input archive log.  See the MARK RECORDS sec‐
35       tion below for more information.  There is no <mark> record  after  the
36       end of the last (in temporal order) of the input archive logs.
37

OPTIONS

39       The available command line options are:
40
41       -c config, --config=config
42            Extract  only  the  metrics specified in config from the input PCP
43            archive log(s).  The config syntax  accepted  by  pmlogextract  is
44            explained in more detail in the Configuration File Syntax section.
45
46       -d, --desperate
47            Desperate  mode.   Normally  if a fatal error occurs, all trace of
48            the partially written PCP archive output is removed.  With the  -d
49            option, the output archive log is not removed.
50
51       -f, --first
52            For most common uses, all of the input archive logs will have been
53            collected in the same timezone.  But if this is not the case, then
54            pmlogextract  must  choose one of the timezones from the input ar‐
55            chive logs to be used as the timezone for the output archive  log.
56            The  default  is  to  use the timezone from the last input archive
57            log.  The -f option forces the timezone from the first  input  ar‐
58            chive log to be used.
59
60       -m, --mark
61            As  described  in  the MARK RECORDS section below, sometimes it is
62            possible to safely omit <mark> records from  the  output  archive.
63            If the -m option is specified, then the epilogue and prologue test
64            is skipped and a <mark> record will always be inserted at the  end
65            of each input archive (except the last).  This is the original be‐
66            haviour for pmlogextract.
67
68       -S starttime, --start=starttime
69            Define the  start  of  a  time  window  to  restrict  the  samples
70            retrieved  or specify a ``natural'' alignment of the output sample
71            times; refer to PCPIntro(1).  See also the -w option.
72
73       -s samples, --samples=samples
74            The argument samples defines the number of samples to  be  written
75            to  output.   If samples is 0 or -s is not specified, pmlogextract
76            will sample until the end of the PCP archive log, or  the  end  of
77            the time window as specified by -T, whichever comes first.  The -s
78            option will override the -T option if it occurs sooner.
79
80       -T endtime, --finish=endtime
81            Define the termination of a time window to  restrict  the  samples
82            retrieved  or specify a ``natural'' alignment of the output sample
83            times; refer to PCPIntro(1).  See also the -w option.
84
85       -v volsamples
86            The output archive log is potentially a multi-volume data set, and
87            the -v option causes pmlogextract to start a new volume after vol‐
88            samples log records have been written to the archive log.
89
90            Independent of any -v option, each volume of an archive is limited
91            to  no  more  than  2^31 bytes, so pmlogextract will automatically
92            create a new volume for the archive before this limit is reached.
93
94       -w   Where -S and -T specify a time window within the same day, the  -w
95            flag  will  cause the data within the time window to be extracted,
96            for every day in the archive log.  For example, the options -w  -S
97            @11:00  -T @15:00 specify that pmlogextract should include archive
98            log records only for the periods from 11am to  3pm  on  each  day.
99            When  -w  is  used,  the  output  archive  log will contain <mark>
100            records to indicate the temporal discontinuity between the end  of
101            one time window and the start of the next.
102
103       -x   It  is expected that the metadata (name, PMID, type, semantics and
104            units) for each metric will be consistent across all of the  input
105            PCP  archive  log(s) in which that metric appears.  In rare cases,
106            e.g. in development, in QA and when a PMDA is upgraded,  this  may
107            not  be  the case and pmlogextract will report the issue and abort
108            without creating the output archive log.   This  is  done  so  the
109            problem  can  be  fixed  with  pmlogrewrite(1) before retrying the
110            merge.  In unattended or QA environments it may be  preferable  to
111            force the merge and omit the metrics with the mismatched metadata.
112            The -x option does this.
113
114       -Z timezone, --timezone=timezone
115            Use timezone when displaying the date and time.   Timezone  is  in
116            the  format  of  the environment variable TZ as described in envi‐
117            ron(7).  The default is to initially use the timezone of the local
118            host.
119
120       -z   Use  the  local  timezone of the host from the input archive logs.
121            The default is to initially use the timezone of the local host.
122
123       -?   Display usage message and exit.
124

CONFIGURATION FILE SYNTAX

126       The configfile contains metrics of interest - only  those  metrics  (or
127       instances) mentioned explicitly or implicitly in the configuration file
128       will be included in the output archive.  Each specifications must begin
129       on  a  new line, and may span multiple lines in the configuration file.
130       Instances may also be specified, but they are optional.  The format for
131       each specification is
132
133               metric [[instance[,instance...]]]
134
135       where  metric  may be a leaf or a non-leaf name in the Performance Met‐
136       rics Name Space (PMNS, see PMNS(5)).  If a metric refers to a  non-leaf
137       node  in  the  PMNS, pmlogextract will recursively descend the PMNS and
138       include all metrics corresponding to descendent leaf nodes.
139
140       Instances are optional, and may be specified as a list of one  or  more
141       space  (or comma) separated names, numbers or strings (enclosed in sin‐
142       gle or double quotes).  Elements in  the  list  that  are  numbers  are
143       assumed  to  be  internal  instance identifiers - see pmGetInDom(3) for
144       more information.  If no instances are given, then all instances of the
145       associated metric(s) will be extracted.
146
147       Any  additional white space is ignored and comments may be added with a
148       `#' prefix.
149

CONFIGURATION FILE EXAMPLE

151       This is an example of a valid configfile:
152
153               #
154               # config file for pmlogextract
155               #
156
157               kernel.all.cpu
158               kernel.percpu.cpu.sys ["cpu0","cpu1"]
159               disk.dev ["dks0d1"]
160

MARK RECORDS

162       When more than one input archive log contributes  performance  data  to
163       the output archive log, then <mark> records may be inserted to indicate
164       a possible discontinuity in the performance data.
165
166       A <mark> record contains a timestamp and no  performance  data  and  is
167       used  to  indicate  that  there is a time period in the PCP archive log
168       where we do not know the values of  any  performance  metrics,  because
169       there  was  no  pmlogger(1)  collecting  performance  data  during this
170       period.  Since these periods are often associated with the restart of a
171       service  or  pmcd(1) or a system, there may be considerable doubt as to
172       the continuity of performance data across this time period.
173
174       Most current archives are created with a prologue record at the  begin‐
175       ning  and  an  epilogue  record at the end.  These records identify the
176       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
177       mine  that there is no discontinuity between the end of one archive and
178       the next output record, and as a  consequence  the  <mark>  record  can
179       safely be omitted from the output archive.
180
181       The  rationale  behind <mark> records may be demonstrated with an exam‐
182       ple.  Consider one input archive log that starts at 00:10 and  ends  at
183       09:15  on  the  same  day, and another input archive log that starts at
184       09:20 on the same day and ends at 00:10 the  following  morning.   This
185       would  be a very common case for archives managed and rotated by pmlog‐
186       ger_check(1) and pmlogger_daily(1).
187
188       The output archive log created by pmlogextract would contain:
189       00:10.000    first record from first input archive log
190       ...
191       09:15.000    last record from first input archive log
192       09:15.001    <mark> record
193       09:20.000    first record from second input archive log
194       ...
195       01:10.000    last record from second input archive log
196
197       The time period where the performance data is missing starts just after
198       09:15  and ends just before 09:20.  When the output archive log is pro‐
199       cessed with any of the PCP reporting tools, the <mark> record  is  used
200       to indicate a period of missing data.  For example using the output ar‐
201       chive above, imagine one was reporting  the  average  I/O  rate  at  30
202       minute intervals aligned on the hour and half-hour.  The I/O count met‐
203       ric is a counter, so the average I/O rate  requires  two  valid  values
204       from  consecutive  sample  times.   There  would  be values for all the
205       intervals ending at 09:00, then no  values  at  09:30  because  of  the
206       <mark>  record,  then no values at 10:00 because the ``prior'' value at
207       09:30 is not available, then the rate would be reported again at  10:30
208       and continue every 30 minutes until the last reported value at 01:00.
209
210       The  presence of <mark> records in a PCP archive log can be established
211       using pmdumplog(1) where a timestamp and the annotation <mark> is  used
212       to indicate a <mark> record.
213

METADATA CHECKS

215       When  more  than  one input archive set is specified, pmlogextract per‐
216       forms a number of checks to ensure the metadata is consistent for  met‐
217       rics  appearing  in  more  than  one  of the input archive sets.  These
218       checks include:
219
220       * metric data type is the same
221       * metric semantics are the same
222       * metric units are the same
223       * metric is either always singular or  always  has  the  same  instance
224         domain
225       * metrics with the same name have the same PMID
226       * metrics with the same PMID have the same name
227
228       If  any  of  these  checks  fail,  pmlogextract reports the details and
229       aborts without creating the output archive.
230
231       To address these semantic issues, use pmlogrewrite(1) to translate  the
232       input  archives  into  equivalent  archives  with  consistent metdadata
233       before using pmlogextract.
234

CAVEATS

236       The preamble metrics  (pmcd.pmlogger.archive,  pmcd.pmlogger.host,  and
237       pmcd.pmlogger.port),  which  are  automatically recorded by pmlogger at
238       the start of the archive, may not be present in the archive  output  by
239       pmlogextract.   These  metrics  are  only relevant while the archive is
240       being created, and have no significance once recording has finished.
241

DIAGNOSTICS

243       All error conditions detected by pmlogextract are  reported  on  stderr
244       with textual (if sometimes terse) explanation.
245
246       If  one  of  the  input  archives  contains  no archive records then an
247       ``empty archive'' warning is issued and that archive is skipped.
248
249       Should one of the input archive logs be corrupted (this can  happen  if
250       the pmlogger instance writing the log suddenly dies), then pmlogextract
251       will detect and report the position of the corruption in the file,  and
252       any subsequent information from that archive log will not be processed.
253
254       If  any  error is detected, pmlogextract will exit with a non-zero sta‐
255       tus.
256

FILES

258       For each of the input and output archive logs, several  physical  files
259       are used.
260
261       archive.meta
262            metadata (metric descriptions, instance domains, etc.) for the ar‐
263            chive log
264
265       archive.0
266            initial volume of metrics values (subsequent volumes have suffixes
267            1,  2,  ...) - for input these files may have been previously com‐
268            pressed with bzip2(1) or gzip(1) and thus may have  an  additional
269            .bz2 or .gz suffix.
270
271       archive.index
272            temporal  index  to support rapid random access to the other files
273            in the archive log.
274

PCP ENVIRONMENT

276       Environment variables with the prefix PCP_ are used to parameterize the
277       file  and  directory names used by PCP.  On each installation, the file
278       /etc/pcp.conf contains the  local  values  for  these  variables.   The
279       $PCP_CONF  variable may be used to specify an alternative configuration
280       file, as described in pcp.conf(5).
281
282       For environment variables affecting PCP tools, see pmGetOptions(3).
283

SEE ALSO

285       PCPIntro(1), pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1),  pmlo‐
286       grewrite(1), pcp.conf(5), pcp.env(5) and PMNS(5).
287
288
289
290Performance Co-Pilot                  PCP                      PMLOGEXTRACT(1)
Impressum