pmlogextract(1)

1PMLOGEXTRACT(1)             General Commands Manual            PMLOGEXTRACT(1)
2
3
4

NAME

6       pmlogextract  -  reduce, extract, concatenate and merge Performance Co-
7       Pilot archives
8

SYNOPSIS

10       pmlogextract [-dfmwxz?]  [-c configfile] [-S  starttime]  [-s  samples]
11       [-T endtime] [-v volsamples] [-Z timezone] input [...] output
12

DESCRIPTION

14       pmlogextract  reads one or more Performance Co-Pilot (PCP) archive logs
15       identified by input and creates a temporally merged and/or reduced  PCP
16       archive  log in output.  input is a comma-separated list of names, each
17       of which may be the base name of an archive or the name of a  directory
18       containing  one  or more archives.  The nature of merging is controlled
19       by the number of input archive logs, while the nature of data reduction
20       is controlled by the command line arguments.  The input(s) must be sets
21       of PCP archive logs created by pmlogger(1) with performance  data  col‐
22       lected  from the same host, but usually over different time periods and
23       possibly (although not usually) with different performance metrics  be‐
24       ing logged.
25
26       If only one input is specified, then the default behavior simply copies
27       the input set of PCP archive logs, into the  output  PCP  archive  log.
28       When  two  or more sets of PCP archive logs are specified as input, the
29       sets of logs are merged (or concatenated) and written to output.
30
31       In the output archive log a <mark> record may be  inserted  at  a  time
32       just  past the end of each of the input archive logs to indicate a pos‐
33       sible temporal discontinuity between the end of one input  archive  log
34       and the start of the next input archive log.  See the MARK RECORDS sec‐
35       tion below for more information.  There is no <mark> record  after  the
36       end of the last (in temporal order) of the input archive logs.
37

OPTIONS

39       The available command line options are:
40
41       -c config, --config=config
42            Extract  only  the  metrics specified in config from the input PCP
43            archive log(s).  The config syntax accepted by pmlogextract is ex‐
44            plained in more detail in the Configuration File Syntax section.
45
46       -d, --desperate
47            Desperate  mode.   Normally  if a fatal error occurs, all trace of
48            the partially written PCP archive output is removed.  With the  -d
49            option, the output archive log is not removed.
50
51       -f, --first
52            For most common uses, all of the input archive logs will have been
53            collected in the same timezone.  But if this is not the case, then
54            pmlogextract  must  choose one of the timezones from the input ar‐
55            chive logs to be used as the timezone for the output archive  log.
56            The  default  is  to  use the timezone from the last input archive
57            log.  The -f option forces the timezone from the first  input  ar‐
58            chive log to be used.
59
60       -m, --mark
61            As  described  in  the MARK RECORDS section below, sometimes it is
62            possible to safely omit <mark> records from  the  output  archive.
63            If the -m option is specified, then the epilogue and prologue test
64            is skipped and a <mark> record will always be inserted at the  end
65            of each input archive (except the last).  This is the original be‐
66            haviour for pmlogextract.
67
68       -S starttime, --start=starttime
69            Define the start of a time window  to  restrict  the  samples  re‐
70            trieved  or  specify  a ``natural'' alignment of the output sample
71            times; refer to PCPIntro(1).  See also the -w option.
72
73       -s samples, --samples=samples
74            The argument samples defines the number of samples to  be  written
75            to  output.   If samples is 0 or -s is not specified, pmlogextract
76            will sample until the end of the PCP archive log, or  the  end  of
77            the time window as specified by -T, whichever comes first.  The -s
78            option will override the -T option if it occurs sooner.
79
80       -T endtime, --finish=endtime
81            Define the termination of a time window to  restrict  the  samples
82            retrieved  or specify a ``natural'' alignment of the output sample
83            times; refer to PCPIntro(1).  See also the -w option.
84
85       -v volsamples
86            The output archive log is potentially a multi-volume data set, and
87            the -v option causes pmlogextract to start a new volume after vol‐
88            samples log records have been written to the archive log.
89
90            Independent of any -v option, each volume of an archive is limited
91            to  no  more  than  2^31 bytes, so pmlogextract will automatically
92            create a new volume for the archive before this limit is reached.
93
94       -w   Where -S and -T specify a time window within the same day, the  -w
95            flag  will  cause the data within the time window to be extracted,
96            for every day in the archive log.  For example, the options -w  -S
97            @11:00  -T @15:00 specify that pmlogextract should include archive
98            log records only for the periods from 11am to  3pm  on  each  day.
99            When  -w  is  used,  the  output  archive  log will contain <mark>
100            records to indicate the temporal discontinuity between the end  of
101            one time window and the start of the next.
102
103       -x   It  is expected that the metadata (name, PMID, type, semantics and
104            units) for each metric will be consistent across all of the  input
105            PCP  archive  log(s) in which that metric appears.  In rare cases,
106            e.g. in development, in QA and when a PMDA is upgraded,  this  may
107            not  be  the case and pmlogextract will report the issue and abort
108            without creating the output archive log.   This  is  done  so  the
109            problem  can  be  fixed  with  pmlogrewrite(1) before retrying the
110            merge.  In unattended or QA environments it may be  preferable  to
111            force the merge and omit the metrics with the mismatched metadata.
112            The -x option does this.
113
114       -Z timezone, --timezone=timezone
115            Use timezone when displaying the date and time.   Timezone  is  in
116            the  format  of  the environment variable TZ as described in envi‐
117            ron(7).  The default is to initially use the timezone of the local
118            host.
119
120       -z, --hostzone
121            Use  the  local  timezone of the host from the input archive logs.
122            The default is to initially use the timezone of the local host.
123
124       -?, --help
125            Display usage message and exit.
126

CONFIGURATION FILE SYNTAX

128       The configfile contains metrics of interest - only  those  metrics  (or
129       instances) mentioned explicitly or implicitly in the configuration file
130       will be included in the output archive.  Each specifications must begin
131       on  a  new line, and may span multiple lines in the configuration file.
132       Instances may also be specified, but they are optional.  The format for
133       each specification is
134
135               metric [[instance[,instance...]]]
136
137       where  metric  may be a leaf or a non-leaf name in the Performance Met‐
138       rics Name Space (PMNS, see PMNS(5)).  If a metric refers to a  non-leaf
139       node  in  the  PMNS, pmlogextract will recursively descend the PMNS and
140       include all metrics corresponding to descendent leaf nodes.
141
142       Instances are optional, and may be specified as a list of one  or  more
143       space  (or comma) separated names, numbers or strings (enclosed in sin‐
144       gle or double quotes).  Elements in the list that are numbers  are  as‐
145       sumed  to be internal instance identifiers - see pmGetInDom(3) for more
146       information.  If no instances are given, then all instances of the  as‐
147       sociated metric(s) will be extracted.
148
149       Any  additional white space is ignored and comments may be added with a
150       `#' prefix.
151

CONFIGURATION FILE EXAMPLE

153       This is an example of a valid configfile:
154
155               #
156               # config file for pmlogextract
157               #
158
159               kernel.all.cpu
160               kernel.percpu.cpu.sys ["cpu0","cpu1"]
161               disk.dev ["dks0d1"]
162

MARK RECORDS

164       When more than one input archive log contributes  performance  data  to
165       the output archive log, then <mark> records may be inserted to indicate
166       a possible discontinuity in the performance data.
167
168       A <mark> record contains a timestamp and no  performance  data  and  is
169       used  to  indicate  that  there is a time period in the PCP archive log
170       where we do not know the values of  any  performance  metrics,  because
171       there  was  no  pmlogger(1) collecting performance data during this pe‐
172       riod.  Since these periods are often associated with the restart  of  a
173       service  or  pmcd(1) or a system, there may be considerable doubt as to
174       the continuity of performance data across this time period.
175
176       Most current archives are created with a prologue record at the  begin‐
177       ning  and  an  epilogue  record at the end.  These records identify the
178       state of pmcd(1) at the time, and may be used by pmlogextract to deter‐
179       mine  that there is no discontinuity between the end of one archive and
180       the next output record, and as a  consequence  the  <mark>  record  can
181       safely be omitted from the output archive.
182
183       The  rationale  behind <mark> records may be demonstrated with an exam‐
184       ple.  Consider one input archive log that starts at 00:10 and  ends  at
185       09:15  on  the  same  day, and another input archive log that starts at
186       09:20 on the same day and ends at 00:10 the  following  morning.   This
187       would  be a very common case for archives managed and rotated by pmlog‐
188       ger_check(1) and pmlogger_daily(1).
189
190       The output archive log created by pmlogextract would contain:
191       00:10.000    first record from first input archive log
192       ...
193       09:15.000    last record from first input archive log
194       09:15.001    <mark> record
195       09:20.000    first record from second input archive log
196       ...
197       01:10.000    last record from second input archive log
198
199       The time period where the performance data is missing starts just after
200       09:15  and ends just before 09:20.  When the output archive log is pro‐
201       cessed with any of the PCP reporting tools, the <mark> record  is  used
202       to indicate a period of missing data.  For example using the output ar‐
203       chive above, imagine one was reporting  the  average  I/O  rate  at  30
204       minute intervals aligned on the hour and half-hour.  The I/O count met‐
205       ric is a counter, so the average I/O rate  requires  two  valid  values
206       from  consecutive  sample times.  There would be values for all the in‐
207       tervals ending at 09:00, then no values at 09:30 because of the  <mark>
208       record, then no values at 10:00 because the ``prior'' value at 09:30 is
209       not available, then the rate would be reported again at 10:30 and  con‐
210       tinue every 30 minutes until the last reported value at 01:00.
211
212       The  presence of <mark> records in a PCP archive log can be established
213       using pmdumplog(1) where a timestamp and the annotation <mark> is  used
214       to indicate a <mark> record.
215

METADATA CHECKS

217       When  more  than  one input archive set is specified, pmlogextract per‐
218       forms a number of checks to ensure the metadata is consistent for  met‐
219       rics  appearing  in  more  than  one  of the input archive sets.  These
220       checks include:
221
222       * metric data type is the same
223       * metric semantics are the same
224       * metric units are the same
225       * metric is either always singular or always has the same instance  do‐
226         main
227       * metrics with the same name have the same PMID
228       * metrics with the same PMID have the same name
229
230       If  any  of  these  checks  fail,  pmlogextract reports the details and
231       aborts without creating the output archive.
232
233       To address these semantic issues, use pmlogrewrite(1) to translate  the
234       input  archives  into equivalent archives with consistent metdadata be‐
235       fore using pmlogextract.
236

CAVEATS

238       The preamble metrics  (pmcd.pmlogger.archive,  pmcd.pmlogger.host,  and
239       pmcd.pmlogger.port),  which  are  automatically recorded by pmlogger at
240       the start of the archive, may not be present in the archive  output  by
241       pmlogextract.  These metrics are only relevant while the archive is be‐
242       ing created, and have no significance once recording has finished.
243

DIAGNOSTICS

245       All error conditions detected by pmlogextract are  reported  on  stderr
246       with textual (if sometimes terse) explanation.
247
248       If  one  of  the  input  archives  contains  no archive records then an
249       ``empty archive'' warning is issued and that archive is skipped.
250
251       Should one of the input archive logs be corrupted (this can  happen  if
252       the pmlogger instance writing the log suddenly dies), then pmlogextract
253       will detect and report the position of the corruption in the file,  and
254       any subsequent information from that archive log will not be processed.
255
256       If  any  error is detected, pmlogextract will exit with a non-zero sta‐
257       tus.
258

FILES

260       For each of the input and output archive logs, several  physical  files
261       are used.
262
263       archive.meta
264            metadata (metric descriptions, instance domains, etc.) for the ar‐
265            chive log
266
267       archive.0
268            initial volume of metrics values (subsequent volumes have suffixes
269            1,  2,  ...) - for input these files may have been previously com‐
270            pressed with bzip2(1) or gzip(1) and thus may have  an  additional
271            .bz2 or .gz suffix.
272
273       archive.index
274            temporal  index  to support rapid random access to the other files
275            in the archive log.
276

PCP ENVIRONMENT

278       Environment variables with the prefix PCP_ are used to parameterize the
279       file  and  directory names used by PCP.  On each installation, the file
280       /etc/pcp.conf contains the  local  values  for  these  variables.   The
281       $PCP_CONF  variable may be used to specify an alternative configuration
282       file, as described in pcp.conf(5).
283
284       For environment variables affecting PCP tools, see pmGetOptions(3).
285