1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwxz?]  [-c configfile] [-S  starttime]  [-s  samples]
11       [-T endtime] [-v volsamples] [-Z timezone] input [...] output
12

DESCRIPTION

14       pmlogextract  reads one or more Performance Co-Pilot (PCP) archive logs
15       identified by input and creates a temporally merged and/or reduced  PCP
16       archive  log in output.  input is a comma-separated list of names, each
17       of which may be the base name of an archive or the name of a  directory
18       containing  one  or more archives.  The nature of merging is controlled
19       by the number of input archive logs, while the nature of data reduction
20       is controlled by the command line arguments.  The input(s) must be sets
21       of PCP archive logs created by pmlogger(1) with performance  data  col‐
22       lected  from the same host, but usually over different time periods and
23       possibly (although not usually) with different performance metrics  be‐
24       ing logged.
25
26       If only one input is specified, then the default behavior simply copies
27       the input set of PCP archive logs, into the  output  PCP  archive  log.
28       When  two  or more sets of PCP archive logs are specified as input, the
29       sets of logs are merged (or concatenated) and written to output.
30
31       In the output archive log a <mark> record may be  inserted  at  a  time
32       just  past the end of each of the input archive logs to indicate a pos‐
33       sible temporal discontinuity between the end of one input  archive  log
34       and the start of the next input archive log.  See the MARK RECORDS sec‐
35       tion below for more information.  There is no <mark> record  after  the
36       end of the last (in temporal order) of the input archive logs.
37

OPTIONS

39       The available command line options are:
40
41       -c config, --config=config
42            Extract  only  the  metrics specified in config from the input PCP
43            archive log(s).  The config syntax accepted by pmlogextract is ex‐
44            plained in more detail in the Configuration File Syntax section.
45
46       -d, --desperate
47            Desperate  mode.   Normally  if a fatal error occurs, all trace of
48            the partially written PCP archive output is removed.  With the  -d
49            option, the output archive log is not removed.
50
51       -f, --first
52            For most common uses, all of the input archive logs will have been
53            collected in the same timezone.  But if this is not the case, then
54            pmlogextract  must  choose one of the timezones from the input ar‐
55            chive logs to be used as the timezone for the output archive  log.
56            The  default  is  to  use the timezone from the last input archive
57            log.  The -f option forces the timezone from the first  input  ar‐
58            chive log to be used.
59
60       -m, --mark
61            As  described  in  the MARK RECORDS section below, sometimes it is
62            possible to safely omit <mark> records from  the  output  archive.
63            If the -m option is specified, then the epilogue and prologue test
64            is skipped and a <mark> record will always be inserted at the  end
65            of each input archive (except the last).  This is the original be‐
66            haviour for pmlogextract.
67
68       -S starttime, --start=starttime
69            Define the start of a time window  to  restrict  the  samples  re‐
70            trieved  or  specify  a ``natural'' alignment of the output sample
71            times; refer to PCPIntro(1).  See also the -w option.
72
73       -s samples, --samples=samples
74            The argument samples defines the number of samples to  be  written
75            to  output.   If samples is 0 or -s is not specified, pmlogextract
76            will sample until the end of the PCP archive log, or  the  end  of
77            the time window as specified by -T, whichever comes first.  The -s
78            option will override the -T option if it occurs sooner.
79
80       -T endtime, --finish=endtime
81            Define the termination of a time window to  restrict  the  samples
82            retrieved  or specify a ``natural'' alignment of the output sample
83            times; refer to PCPIntro(1).  See also the -w option.
84
85       -v volsamples
86            The output archive log is potentially a multi-volume data set, and
87            the -v option causes pmlogextract to start a new volume after vol‐
88            samples log records have been written to the archive log.
89
90            Independent of any -v option, each volume of an archive is limited
91            to  no  more  than  2^31 bytes, so pmlogextract will automatically
92            create a new volume for the archive before this limit is reached.
93
94       -w   Where -S and -T specify a time window within the same day, the  -w
95            flag  will  cause the data within the time window to be extracted,
96            for every day in the archive log.  For example, the options -w  -S
97            @11:00  -T @15:00 specify that pmlogextract should include archive
98            log records only for the periods from 11am to  3pm  on  each  day.
99            When  -w  is  used,  the  output  archive  log will contain <mark>
100            records to indicate the temporal discontinuity between the end  of
101            one time window and the start of the next.
102
103       -x   It  is expected that the metadata (name, PMID, type, semantics and
104            units) for each metric will be consistent across all of the  input
105            PCP  archive  log(s) in which that metric appears.  In rare cases,
106            e.g. in development, in QA and when a PMDA is upgraded,  this  may
107            not  be  the case and pmlogextract will report the issue and abort
108            without creating the output archive log.   This  is  done  so  the
109            problem  can  be  fixed  with  pmlogrewrite(1) before retrying the
110            merge.  In unattended or QA environments it may be  preferable  to
111            force the merge and omit the metrics with the mismatched metadata.
112            The -x option does this.
113
114       -Z timezone, --timezone=timezone
115            Use timezone when displaying the date and time.   Timezone  is  in
116            the  format  of  the environment variable TZ as described in envi‐
117            ron(7).  The default is to initially use the timezone of the local
118            host.
119
120       -z, --hostzone
121            Use  the  local  timezone of the host from the input archive logs.
122            The default is to initially use the timezone of the local host.
123
124       -?   Display usage message and exit.
125

CONFIGURATION FILE SYNTAX

127       The configfile contains metrics of interest - only  those  metrics  (or
128       instances) mentioned explicitly or implicitly in the configuration file
129       will be included in the output archive.  Each specifications must begin
130       on  a  new line, and may span multiple lines in the configuration file.
131       Instances may also be specified, but they are optional.  The format for
132       each specification is
133
134               metric [[instance[,instance...]]]
135
136       where  metric  may be a leaf or a non-leaf name in the Performance Met‐
137       rics Name Space (PMNS, see PMNS(5)).  If a metric refers to a  non-leaf
138       node  in  the  PMNS, pmlogextract will recursively descend the PMNS and
139       include all metrics corresponding to descendent leaf nodes.
140
141       Instances are optional, and may be specified as a list of one  or  more
142       space  (or comma) separated names, numbers or strings (enclosed in sin‐
143       gle or double quotes).  Elements in the list that are numbers  are  as‐
144       sumed  to be internal instance identifiers - see pmGetInDom(3) for more
145       information.  If no instances are given, then all instances of the  as‐
146       sociated metric(s) will be extracted.
147
148       Any  additional white space is ignored and comments may be added with a
149       `#' prefix.
150

CONFIGURATION FILE EXAMPLE

152       This is an example of a valid configfile:
153
154               #
155               # config file for pmlogextract
156               #
157
158               kernel.all.cpu
159               kernel.percpu.cpu.sys ["cpu0","cpu1"]
160               disk.dev ["dks0d1"]
161

MARK RECORDS

163       When more than one input archive log contributes  performance  data  to
164       the output archive log, then <mark> records may be inserted to indicate
165       a possible discontinuity in the performance data.
166
167       A <mark> record contains a timestamp and no  performance  data  and  is
168       used  to  indicate  that  there is a time period in the PCP archive log
169       where we do not know the values of  any  performance  metrics,  because
170       there  was  no  pmlogger(1) collecting performance data during this pe‐
171       riod.  Since these periods are often associated with the restart  of  a
172       service  or  pmcd(1) or a system, there may be considerable doubt as to
173       the continuity of performance data across this time period.
174
175       Most current archives are created with a prologue record at the  begin‐
176       ning  and  an  epilogue  record at the end.  These records identify the
177       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
178       mine  that there is no discontinuity between the end of one archive and
179       the next output record, and as a  consequence  the  <mark>  record  can
180       safely be omitted from the output archive.
181
182       The  rationale  behind <mark> records may be demonstrated with an exam‐
183       ple.  Consider one input archive log that starts at 00:10 and  ends  at
184       09:15  on  the  same  day, and another input archive log that starts at
185       09:20 on the same day and ends at 00:10 the  following  morning.   This
186       would  be a very common case for archives managed and rotated by pmlog‐
187       ger_check(1) and pmlogger_daily(1).
188
189       The output archive log created by pmlogextract would contain:
190       00:10.000    first record from first input archive log
191       ...
192       09:15.000    last record from first input archive log
193       09:15.001    <mark> record
194       09:20.000    first record from second input archive log
195       ...
196       01:10.000    last record from second input archive log
197
198       The time period where the performance data is missing starts just after
199       09:15  and ends just before 09:20.  When the output archive log is pro‐
200       cessed with any of the PCP reporting tools, the <mark> record  is  used
201       to indicate a period of missing data.  For example using the output ar‐
202       chive above, imagine one was reporting  the  average  I/O  rate  at  30
203       minute intervals aligned on the hour and half-hour.  The I/O count met‐
204       ric is a counter, so the average I/O rate  requires  two  valid  values
205       from  consecutive  sample times.  There would be values for all the in‐
206       tervals ending at 09:00, then no values at 09:30 because of the  <mark>
207       record, then no values at 10:00 because the ``prior'' value at 09:30 is
208       not available, then the rate would be reported again at 10:30 and  con‐
209       tinue every 30 minutes until the last reported value at 01:00.
210
211       The  presence of <mark> records in a PCP archive log can be established
212       using pmdumplog(1) where a timestamp and the annotation <mark> is  used
213       to indicate a <mark> record.
214

METADATA CHECKS

216       When  more  than  one input archive set is specified, pmlogextract per‐
217       forms a number of checks to ensure the metadata is consistent for  met‐
218       rics  appearing  in  more  than  one  of the input archive sets.  These
219       checks include:
220
221       * metric data type is the same
222       * metric semantics are the same
223       * metric units are the same
224       * metric is either always singular or always has the same instance  do‐
225         main
226       * metrics with the same name have the same PMID
227       * metrics with the same PMID have the same name
228
229       If  any  of  these  checks  fail,  pmlogextract reports the details and
230       aborts without creating the output archive.
231
232       To address these semantic issues, use pmlogrewrite(1) to translate  the
233       input  archives  into equivalent archives with consistent metdadata be‐
234       fore using pmlogextract.
235

CAVEATS

237       The preamble metrics  (pmcd.pmlogger.archive,  pmcd.pmlogger.host,  and
238       pmcd.pmlogger.port),  which  are  automatically recorded by pmlogger at
239       the start of the archive, may not be present in the archive  output  by
240       pmlogextract.  These metrics are only relevant while the archive is be‐
241       ing created, and have no significance once recording has finished.
242

DIAGNOSTICS

244       All error conditions detected by pmlogextract are  reported  on  stderr
245       with textual (if sometimes terse) explanation.
246
247       If  one  of  the  input  archives  contains  no archive records then an
248       ``empty archive'' warning is issued and that archive is skipped.
249
250       Should one of the input archive logs be corrupted (this can  happen  if
251       the pmlogger instance writing the log suddenly dies), then pmlogextract
252       will detect and report the position of the corruption in the file,  and
253       any subsequent information from that archive log will not be processed.
254
255       If  any  error is detected, pmlogextract will exit with a non-zero sta‐
256       tus.
257

FILES

259       For each of the input and output archive logs, several  physical  files
260       are used.
261
262       archive.meta
263            metadata (metric descriptions, instance domains, etc.) for the ar‐
264            chive log
265
266       archive.0
267            initial volume of metrics values (subsequent volumes have suffixes
268            1,  2,  ...) - for input these files may have been previously com‐
269            pressed with bzip2(1) or gzip(1) and thus may have  an  additional
270            .bz2 or .gz suffix.
271
272       archive.index
273            temporal  index  to support rapid random access to the other files
274            in the archive log.
275

PCP ENVIRONMENT

277       Environment variables with the prefix PCP_ are used to parameterize the
278       file  and  directory names used by PCP.  On each installation, the file
279       /etc/pcp.conf contains the  local  values  for  these  variables.   The
280       $PCP_CONF  variable may be used to specify an alternative configuration
281       file, as described in pcp.conf(5).
282
283       For environment variables affecting PCP tools, see pmGetOptions(3).
284

SEE ALSO

286       PCPIntro(1), pmdumplog(1), pmlc(1), pmlogger(1), pmlogreduce(1),  pmlo‐
287       grewrite(1), pcp.conf(5), pcp.env(5) and PMNS(5).
288
289
290
291Performance Co-Pilot                  PCP                      PMLOGEXTRACT(1)
Impressum