1PCP2SPARK(1)                General Commands Manual               PCP2SPARK(1)
2
3
4

NAME

6       pcp2spark - pcp-to-spark metrics exporter
7

SYNOPSIS

9       pcp2spark [-5CGHIjLmnrRvV?]  [-4 action] [-8|-9 limit] [-a archive] [-A
10       align] [--archive-folio folio] [-b|-B space-scale] [-c config]  [--con‐
11       tainer  container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12       instances] [-J rank] [-K spec] [-N predicate]  [-O  origin]  [-p  port]
13       [-P|-0  precision]  [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14       interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15

DESCRIPTION

17       pcp2spark is a customizable performance metrics exporter tool from  PCP
18       to  Apache  Spark.  Any available performance metric, live or archived,
19       system and/or application, can be selected for exporting  using  either
20       command line arguments or a configuration file.
21
22       pcp2spark  acts as a bridge which provides a network socket stream on a
23       given address/port which an Apache Spark worker task can connect to and
24       pull the configured PCP metrics from pcp2spark exporting them using the
25       streaming extensions of the Apache Spark API.
26
27       pcp2spark is a close relative of pmrep(1).  Refer to pmrep(1)  for  the
28       metricspec  description  accepted  on  pcp2spark command line.  See pm‐
29       rep.conf(5) for description of the  pcp2spark.conf  configuration  file
30       syntax.   This page describes pcp2spark specific options and configura‐
31       tion file differences with pmrep.conf(5).  pmrep(1) also lists some us‐
32       age examples of which most are applicable with pcp2spark as well.
33
34       Only  the command line options listed on this page are supported, other
35       options available for pmrep(1) are not supported.
36
37       Options via environment values (see pmGetOptions(3)) override the  cor‐
38       responding  built-in  default  values (if any).  Configuration file op‐
39       tions override the corresponding environment variables (if any).   Com‐
40       mand line options override the corresponding configuration file options
41       (if any).
42

GENERAL USAGE

44       A general setup for making use of pcp2spark would involve the user con‐
45       figuring  pcp2spark  for the PCP metrics to export followed by starting
46       the pcp2spark application. The pcp2spark application will then wait and
47       listen  on the given address/port for a connection from an Apache Spark
48       worker thread to be started.  The worker thread will  then  connect  to
49       pcp2spark.
50
51       When  an Apache Spark worker thread has connected, pcp2spark will begin
52       streaming PCP metric data to Apache Spark until the worker thread  com‐
53       pletes  or  the connection is interrupted.  If the connection is inter‐
54       rupted or the socket is closed from  the  Apache  Spark  worker  thread
55       pcp2spark will exit.
56
57       For  an  example  Apache  Spark  worker  job  which  will connect to an
58       pcp2spark instance on a given address/port and pull in PCP metric  data
59       see  the  example  provided in the PCP examples directory for pcp2spark
60       (often provided by the PCP development package) or the  online  version
61       at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.
62

CONFIGURATION FILE

64       pcp2spark  uses  a  configuration  file  with  syntax  described in pm‐
65       rep.conf(5).  The following options are common  with  pmrep.conf:  ver‐
66       sion,  source,  speclocal, derived, header, globals, samples, interval,
67       type, type_prefer, ignore_incompat, names_change, instances,  live_fil‐
68       ter,  rank, limit_filter, limit_filter_force, invert_filter, predicate,
69       omit_flat,  include_labels,  precision,  precision_force,  count_scale,
70       count_scale_force,    space_scale,    space_scale_force,    time_scale,
71       time_scale_force.  The rest of the pmrep.conf  options  are  recognized
72       but ignored for compatibility.
73
74   pcp2spark specific options
75       spark_server (string)
76           Specify  the address on which pcp2spark will listen for connections
77           from an Apache Spark worker thread.  Corresponding command line op‐
78           tion is -g.  Defaults to 127.0.0.1.
79
80       spark_port (integer)
81           Specify  the  port  on which pcp2spark will listen for connections.
82           Corresponding command line option is -p.  Defaults to 44325.
83

OPTIONS

85       The available command line options are:
86
87       -0 precision, --precision-force=precision
88            Like -P but this option will override per-metric specifications.
89
90       -4 action, --names-change=action
91            Specify which action to take on receiving a  metric  names  change
92            event  during  sampling.  These events occur when a PMDA discovers
93            new metrics sometime after starting up, and informs running client
94            tools like pcp2spark.  Valid values for action are update (refresh
95            metrics being sampled), ignore (do nothing -  the  default  behav‐
96            iour) and abort (exit the program if such an event occurs).
97
98       -5, --ignore-unknown
99            Silently ignore any metric name that cannot be resolved.  At least
100            one metric must be found for the tool to start.
101
102       -8 limit, --limit-filter=limit
103            Limit results to instances with values above/below limit.  A posi‐
104            tive  integer  will  include instances with values at or above the
105            limit in reporting.  A negative  integer  will  include  instances
106            with  values  at or below the limit in reporting.  A value of zero
107            performs no limit filtering.  This option will not override possi‐
108            ble per-metric specifications.  See also -J and -N.
109
110       -9 limit, --limit-filter-force=limit
111            Like -8 but this option will override per-metric specifications.
112
113       -a archive, --archive=archive
114            Performance  metric  values  are retrieved from the set of Perfor‐
115            mance Co-Pilot (PCP) archive log files identified by  the  archive
116            argument,  which is a comma-separated list of names, each of which
117            may be the base name of an archive or the name of a directory con‐
118            taining one or more archives.
119
120       -A align, --align=align
121            Force  the initial sample to be aligned on the boundary of a natu‐
122            ral time unit align.  Refer to PCPIntro(1) for a complete descrip‐
123            tion of the syntax for align.
124
125       --archive-folio=folio
126            Read  metric source archives from the PCP archive folio created by
127            tools like pmchart(1) or, less often, manually with mkaf(1).
128
129       -b scale, --space-scale=scale
130            Unit/scale for  space  (byte)  metrics,  possible  values  include
131            bytes, Kbytes, KB, Mbytes, MB, and so forth.  This option will not
132            override possible per-metric specifications.  See  also  pmParseU‐
133            nitsStr(3).
134
135       -B scale, --space-scale-force=scale
136            Like -b but this option will override per-metric specifications.
137
138       -c config, --config=config
139            Specify  the config file or directory to use.  In case config is a
140            directory all files in it ending .conf will be included.  The  de‐
141            fault     is     the    first    found    of:    ./pcp2spark.conf,
142            $HOME/.pcp2spark.conf,        $HOME/pcp/pcp2spark.conf,        and
143            $PCP_SYSCONF_DIR/pcp2spark.conf.   For details, see the above sec‐
144            tion and pmrep.conf(5).
145
146       --container=container
147            Fetch performance metrics from the specified container, either lo‐
148            cal or remote (see -h).
149
150       -C, --check
151            Exit before reporting any values, but after parsing the configura‐
152            tion and metrics and printing possible headers.
153
154       --daemonize
155            Daemonize on startup.
156
157       -e derived, --derived=derived
158            Specify derived performance metrics.  If  derived  starts  with  a
159            slash  (``/'')  or  with a dot (``.'') it will be interpreted as a
160            PCP derived metrics configuration file, otherwise it will  be  in‐
161            terpreted  as comma- or semicolon-separated derived metric expres‐
162            sions.  For complete description of derived metrics  and  PCP  de‐
163            rived  metrics  configuration files see pmLoadDerivedConfig(3) and
164            pmRegisterDerived(3).  Alternatively, using pmrep.conf(5) configu‐
165            ration  syntax  allows defining derived metrics as part of metric‐
166            sets.
167
168       -g server, --spark-server=server
169            pcp2spark local server address.
170
171       -G, --no-globals
172            Do not include global metrics in reporting (see pmrep.conf(5)).
173
174       -h host, --host=host
175            Fetch performance metrics from pmcd(1) on host, rather  than  from
176            the default localhost.
177
178       -H, --no-header
179            Do not print any headers.
180
181       -i instances, --instances=instances
182            Retrieve  and  report only the specified metric instances.  By de‐
183            fault all instances, present and future, are reported.
184
185            Refer to pmrep(1) for complete description of this option.
186
187       -I, --ignore-incompat
188            Ignore incompatible  metrics.   By  default  incompatible  metrics
189            (that  is,  their  type is unsupported or they cannot be scaled as
190            requested) will cause pcp2spark to terminate with  an  error  mes‐
191            sage.   With  this  option  all  incompatible metrics are silently
192            omitted from reporting.  This may be especially  useful  when  re‐
193            questing non-leaf nodes of the PMNS tree for reporting.
194
195       -j, --live-filter
196            Perform  instance live filtering.  This allows capturing all named
197            instances even if processes are restarted at  some  point  (unlike
198            without  live  filtering).   Performing live filtering over a huge
199            number of instances will add some internal overhead so  a  bit  of
200            user caution is advised.  See also -n.
201
202       -J rank, --rank=rank
203            Limit  results  to  highest/lowest  ranked instances of set-valued
204            metrics.  A positive integer will include highest valued instances
205            in  reporting.   A negative integer will include lowest valued in‐
206            stances in reporting.  A value of zero performs no ranking.  Rank‐
207            ing does not imply sorting, see -6.  See also -8.
208
209       -K spec, --spec-local=spec
210            When fetching metrics from a local context (see -L), the -K option
211            may be used to control the DSO PMDAs that should be made  accessi‐
212            ble.   The  spec  argument conforms to the syntax described in pm‐
213            SpecLocalPMDA(3).  More than one -K option may be used.
214
215       -L, --local-PMDA
216            Use a local context to collect metrics from DSO PMDAs on the local
217            host without PMCD.  See also -K.
218
219       -m, --include-labels
220            Include PCP metric labels in the output.
221
222       -n, --invert-filter
223            Perform  ranking  before live filtering.  By default instance live
224            filtering (when requested, see -j) happens before instance ranking
225            (when  requested, see -J).  With this option the logic is inverted
226            and ranking happens before live filtering.
227
228       -N predicate, --predicate=predicate
229            Specify a comma-separated list of predicate filter reference  met‐
230            rics.   By  default ranking (see -J) happens for each metric indi‐
231            vidually.  With predicates, ranking is done only for the specified
232            predicate  metrics.   When  reporting, rest of the metrics sharing
233            the same instance domain (see PCPIntro(1)) as the  predicate  will
234            include  only  the  highest/lowest ranking instances of the corre‐
235            sponding predicate.  Ranking does not imply sorting, see -6.
236
237            So for example, using proc.memory.rss  (resident  memory  size  of
238            process) as the predicate metric together with proc.io.total_bytes
239            and mem.util.used as metrics to be reported,  only  the  processes
240            using  most/least (as per -J) memory will be included when report‐
241            ing total bytes written by processes.  Since  mem.util.used  is  a
242            single-valued metric (thus not sharing the same instance domain as
243            the process related metrics), it will be reported as usual.
244
245       -O origin, --origin=origin
246            When reporting archived metrics, start reporting at origin  within
247            the  time window (see -S and -T).  Refer to PCPIntro(1) for a com‐
248            plete description of the syntax for origin.
249
250       -p port, --spark-port=port
251            pcp2spark local port.
252
253       -P precision, --precision=precision
254            Use precision for numeric non-integer output values.  The  default
255            is  to  use  3 decimal places (when applicable).  This option will
256            not override possible per-metric specifications.
257
258       -q scale, --count-scale=scale
259            Unit/scale for count metrics,  possible  values  include  count  x
260            10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
261            10^7.  (These values are currently space-sensitive.)  This  option
262            will  not  override  possible per-metric specifications.  See also
263            pmParseUnitsStr(3).
264
265       -Q scale, --count-scale-force=scale
266            Like -q but this option will override per-metric specifications.
267
268       -r, --raw
269            Output raw metric values, do not convert  cumulative  counters  to
270            rates.   This  option will override possible per-metric specifica‐
271            tions.
272
273       -R, --raw-prefer
274            Like -r but this option will not  override  per-metric  specifica‐
275            tions.
276
277       -s samples, --samples=samples
278            The samples argument defines the number of samples to be retrieved
279            and reported.  If samples is 0 or -s is not  specified,  pcp2spark
280            will  sample  and report continuously (in real time mode) or until
281            the end of the set of PCP archives (in archive  mode).   See  also
282            -T.
283
284       -S starttime, --start=starttime
285            When  reporting archived metrics, the report will be restricted to
286            those records logged at or after starttime.  Refer to  PCPIntro(1)
287            for a complete description of the syntax for starttime.
288
289       -t interval, --interval=interval
290            Set  the  reporting interval to something other than the default 1
291            second.  The interval argument follows  the  syntax  described  in
292            PCPIntro(1),  and  in the simplest form may be an unsigned integer
293            (the implied units in this case are seconds).  See also the -T op‐
294            tion.
295
296       -T endtime, --finish=endtime
297            When  reporting archived metrics, the report will be restricted to
298            those records logged before or at endtime.  Refer  to  PCPIntro(1)
299            for a complete description of the syntax for endtime.
300
301            When  used to define the runtime before pcp2spark will exit, if no
302            samples is given (see -s) then the number of reported samples  de‐
303            pends  on  interval  (see  -t).  If samples is given then interval
304            will be adjusted to allow reporting of samples during runtime.  In
305            case  all  of -T, -s, and -t are given, endtime determines the ac‐
306            tual time pcp2spark will run.
307
308       -v, --omit-flat
309            Report only set-valued metrics with instances (e.g. disk.dev.read)
310            and  omit  single-valued  ``flat'' metrics without instances (e.g.
311            kernel.all.sysfork).  See -i and -I.
312
313       -V, --version
314            Display version number and exit.
315
316       -y scale, --time-scale=scale
317            Unit/scale for time metrics, possible values include nanosec,  ns,
318            microsec, us, millisec, ms, and so forth up to hour, hr.  This op‐
319            tion will not override possible  per-metric  specifications.   See
320            also pmParseUnitsStr(3).
321
322       -Y scale, --time-scale-force=scale
323            Like -y but this option will override per-metric specifications.
324
325       -?, --help
326            Display usage message and exit.
327

FILES

329       pcp2spark.conf
330            pcp2spark configuration file (see -c)
331
332       $PCP_SYSCONF_DIR/pmrep/*.conf
333            system provided default pmrep configuration files
334

PCP ENVIRONMENT

336       Environment variables with the prefix PCP_ are used to parameterize the
337       file and directory names used by PCP.  On each installation,  the  file
338       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
339       $PCP_CONF variable may be used to specify an alternative  configuration
340       file, as described in pcp.conf(5).
341
342       For environment variables affecting PCP tools, see pmGetOptions(3).
343

SEE ALSO

345       PCPIntro(1),  mkaf(1),  pcp(1),  pcp2elasticsearch(1), pcp2graphite(1),
346       pcp2influxdb(1), pcp2json(1), pcp2xlsx(1),  pcp2xml(1),  pcp2zabbix(1),
347       pmcd(1),  pminfo(1), pmrep(1), pmGetOptions(3), pmLoadDerivedConfig(3),
348       pmParseUnitsStr(3),  pmRegisterDerived(3),  pmSpecLocalPMDA(3),   LOGA‐
349       RCHIVE(5), pcp.conf(5), pmrep.conf(5) and PMNS(5).
350
351
352
353Performance Co-Pilot                  PCP                         PCP2SPARK(1)
Impressum