1PCP2SPARK(1)                General Commands Manual               PCP2SPARK(1)
2
3
4

NAME

6       pcp2spark - pcp-to-spark metrics exporter
7

SYNOPSIS

9       pcp2spark  [-5CGHIjLnrRvV?]   [-4  action]  [-8|-9  limit] [-a archive]
10       [--archive-folio folio] [-A  align]  [-b|-B  space-scale]  [-c  config]
11       [--container  container]  [--daemonize]  [-e  derived]  [-g server] [-h
12       host] [-i instances] [-J rank] [-K spec] [-N predicate] [-O origin] [-p
13       port] [-P|-0 precision] [-q|-Q count-scale] [-s samples] [-S starttime]
14       [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15

DESCRIPTION

17       pcp2spark is a customizable performance metrics exporter tool from  PCP
18       to  Apache  Spark.  Any available performance metric, live or archived,
19       system and/or application, can be selected for exporting  using  either
20       command line arguments or a configuration file.
21
22       pcp2spark  acts as a bridge which provides a network socket stream on a
23       given address/port which an Apache Spark worker task can connect to and
24       pull the configured PCP metrics from pcp2spark exporting them using the
25       streaming extensions of the Apache Spark API.
26
27       pcp2spark is a close relative of pmrep(1).  Please  refer  to  pmrep(1)
28       for  the  metricspec description accepted on pcp2spark command line and
29       pmrep.conf(5) for description of the pcp2spark.conf configuration  file
30       overall syntax, this page describes pcp2spark specific options and con‐
31       figuration file differences with pmrep.conf(5).   pmrep(1)  also  lists
32       some  usage  examples  of  which  most are applicable with pcp2spark as
33       well.
34
35       Only the command line options listed on this page are supported,  other
36       options recognized by pmrep(1) are not supported.
37
38       Options  via environment values (see pmGetOptions(3)) override the cor‐
39       responding  built-in  default  values  (if  any).   Configuration  file
40       options  override  the  corresponding  environment  variables (if any).
41       Command line options  override  the  corresponding  configuration  file
42       options (if any).
43

GENERAL USAGE

45       A general setup for making use of pcp2spark would involve the user con‐
46       figuring pcp2spark for the PCP metrics to export followed  by  starting
47       the pcp2spark application. The pcp2spark application will then wait and
48       listen on the given address/port for a connection from an Apache  Spark
49       worker  thread  to  be started.  The worker thread will then connect to
50       pcp2spark.
51
52       When an Apache Spark worker thread has connected pcp2spark  will  begin
53       streaming  PCP metric data to Apache Spark until the worker thread com‐
54       pletes or the connection is interrupted.  If  the  connectionis  inter‐
55       rupted  or  the  socket  is  closed from the Apache Spark worker thread
56       pcp2spark will exit.
57
58       For an example Apache  Spark  worker  job  which  will  connect  to  an
59       pcp2spark  instance on a given address/port and pull in PCP metric data
60       please see the example provided  in  the  PCP  examples  directory  for
61       pcp2spark (often provided by the PCP development package) or the online
62       version                                                              at
63https://github.com/performancecopilot/pcp/blob/master/src/pcp2spark/⟩.
64

CONFIGURATION FILE

66       pcp2spark  uses  a  configuration file with overall syntax described in
67       pmrep.conf(5).  The following options are common with pmrep.conf:  ver‐
68       sion,  source,  speclocal, derived, header, globals, samples, interval,
69       type, type_prefer, ignore_incompat, names_change, instances,  live_fil‐
70       ter,  rank, limit_filter, limit_filter_force, invert_filter, predicate,
71       omit_flat, precision, precision_force, count_scale,  count_scale_force,
72       space_scale, space_scale_force, time_scale, time_scale_force.  The out‐
73       put option is recognized but ignored for pmrep.conf compatibility.
74
75   pcp2spark specific options
76       spark_server (string)
77           Specify the address on which pcp2spark will listen for  connections
78           from  an  Apache  Spark  worker thread.  Corresponding command line
79           option is -g.  Default is 127.0.0.1.
80
81       spark_port (integer)
82           Specify the port to run pcp2spark on.  Corresponding  command  line
83           option is -p.  Default is 44325.
84

OPTIONS

86       The available command line options are:
87
88       -0 precision, --precision-force=precision
89            Like -P but this option will override per-metric specifications.
90
91       -4 action, --names-change=action
92            Specify  which  action  to take on receiving a metric names change
93            event during sampling.  These events occur when a  PMDA  discovers
94            new metrics sometime after starting up, and informs running client
95            tools like pcp2spark.  Valid values for action are update (refresh
96            metrics  being  sampled),  ignore (do nothing - the default behav‐
97            iour) and abort (exit the program if such an event happens).
98
99       -5, --ignore-unknown
100            Silently ignore any metric name that cannot be resolved.  At least
101            one metric must be found for the tool to start.
102
103       -8 limit, --limit-filter=limit
104            Limit results to instances with values above/below limit.  A posi‐
105            tive integer will include instances with values at  or  above  the
106            limit  in  reporting.   A  negative integer will include instances
107            with values at or below the limit in reporting.  A value  of  zero
108            performs no limit filtering.  This option will not override possi‐
109            ble per-metric specifications.  See also -J and -N.
110
111       -9 limit, --limit-filter-force=limit
112            Like -8 but this option will override per-metric specifications.
113
114       -a archive, --archive=archive
115            Performance metric values are retrieved from the  set  of  Perfor‐
116            mance  Co-Pilot (PCP) archive log files identified by the argument
117            archive, which is a comma-separated list of names, each  of  which
118            may be the base name of an archive or the name of a directory con‐
119            taining one or more archives.
120
121       --archive-folio=folio
122            Read metric source archives from the PCP archive folio created  by
123            tools like pmchart(1) or, less often, manually with mkaf(1).
124
125       -A align, --align=align
126            Force  the initial sample to be aligned on the boundary of a natu‐
127            ral time unit align.  Refer to PCPIntro(1) for a complete descrip‐
128            tion of the syntax for align.
129
130       -b scale, --space-scale=scale
131            Unit/scale  for  space  (byte)  metrics,  possible  values include
132            bytes, Kbytes, KB, Mbytes, MB, and so forth.  This option will not
133            override  possible  per-metric specifications.  See also pmParseU‐
134            nitsStr(3).
135
136       -B scale, --space-scale-force=scale
137            Like -b but this option will override per-metric specifications.
138
139       -c config, --config=config
140            Specify the config file to use.  The default is  the  first  found
141            of:            ./pcp2spark.conf,            $HOME/.pcp2spark.conf,
142            $HOME/pcp/pcp2spark.conf,   and   $PCP_SYSCONF_DIR/pcp2spark.conf.
143            For details, see the above section and pmrep.conf(5).
144
145       --container=container
146            Fetch  performance  metrics  from  the specified container, either
147            local or remote (see -h).
148
149       -C, --check
150            Exit before reporting any values, but after parsing the configura‐
151            tion and metrics and printing possible headers.
152
153       --daemonize
154            Daemonize on startup.
155
156       -e derived, --derived=derived
157            Specify  derived  performance  metrics.   If derived starts with a
158            slash (``/'') or with a dot (``.'') it will be  interpreted  as  a
159            derived  metrics  configuration  file, otherwise it will be inter‐
160            preted as comma- or  semicolon-separated  derived  metric  expres‐
161            sions.   For  details  see  pmLoadDerivedConfig(3) and pmRegister‐
162            Derived(3).
163
164       -g server, --spark-server=server
165            Spark server to send the metrics to.
166
167       -G, --no-globals
168            Do not include global metrics in reporting (see pmrep.conf(5)).
169
170       -h host, --host=host
171            Fetch performance metrics from pmcd(1) on host, rather  than  from
172            the default localhost.
173
174       -H, --no-header
175            Do not print any headers.
176
177       -i instances, --instances=instances
178            Report  only  the  listed  instances  from  current  instances (if
179            present, see also -j).  By  default  all  instances,  present  and
180            future,  are  reported.   This is a global option that is used for
181            all metrics unless a metric-specific instance definition  is  pro‐
182            vided  as part of a metricspec.  By default single-valued ``flat''
183            metrics without multiple instances are still  reported  as  usual,
184            use  -v to change this.  Please refer to pmrep(1) for more details
185            on this option.
186
187       -I, --ignore-incompat
188            Ignore incompatible  metrics.   By  default  incompatible  metrics
189            (that  is,  their  type is unsupported or they cannot be scaled as
190            requested) will cause pcp2spark to terminate with  an  error  mes‐
191            sage.   With  this  option  all  incompatible metrics are silently
192            omitted from  reporting.   This  may  be  especially  useful  when
193            requesting non-leaf nodes of the PMNS tree for reporting.
194
195       -j, --live-filter
196            Perform  instance  live filtering.  This allows capturing all fil‐
197            tered instances even if processes  are  restarted  at  some  point
198            (unlike without live filtering).  Doing live filtering over a huge
199            amount of instances naturally comes with some overhead so a bit of
200            user caution is advised.
201
202       -J rank, --rank=rank
203            Limit  results to highest/lowest rank instances of set-valued met‐
204            rics.  A positive integer will include highest valued instances in
205            reporting.    A   negative  integer  will  include  lowest  valued
206            instances in reporting.  A value of zero performs no ranking.  See
207            also -8.
208
209       -K spec, --spec-local=spec
210            When fetching metrics from a local context (see -L), the -K option
211            may be used to control the DSO PMDAs that should be made  accessi‐
212            ble.   The  spec  argument  conforms  to  the  syntax described in
213            pmSpecLocalPMDA(3).  More than one -K option may be used.
214
215       -L, --local-PMDA
216            Use a local context to collect metrics from DSO PMDAs on the local
217            host without PMCD.  See also -K.
218
219       -n, --invert-filter
220            Perform  ranking  before live filtering.  By default instance live
221            filter filtering (when requested, see -j) happens before  instance
222            ranking  (when  requested, see -J).  With this option the logic is
223            inverted and ranking happens before live filtering.
224
225       -N predicate, --predicate=predicate
226            Specify a comma-separated list of predicate filter reference  met‐
227            rics.   By  default ranking (see -J) happens for each metric indi‐
228            vidually.  With predicate filter  reference  metrics,  ranking  is
229            done  only for the specified metrics.  When reporting, the rest of
230            the metrics sharing the same instance domain (see PCPIntro(1))  as
231            the  predicates  will  include  only  the  highest/lowest  ranking
232            instances of the corresponding predicates.
233
234            So for example, when the using proc.memory.rss (resident  size  of
235            process)  as  the  predicate and including proc.io.total_bytes and
236            mem.util.used as metrics to be reported, only the processes  using
237            most/least  memory  (as  per  -J)  will be included when reporting
238            total bytes written by processes.  Since mem.util.used is  a  sin‐
239            gle-valued  metric  (thus  not sharing the same instance domain as
240            the process-related metrics), it will be reported as usual.
241
242       -O origin, --origin=origin
243            When reporting archived metrics, start reporting at origin  within
244            the  time window (see -S and -T).  Refer to PCPIntro(1) for a com‐
245            plete description of the syntax for origin.
246
247       -p port, --spark-port=port
248            Spark server port.
249
250       -P precision, --precision=precision
251            Use precision for numeric non-integer output values.  The  default
252            is  to  use  3 decimal places (when applicable).  This option will
253            not override possible per-metric specifications.
254
255       -q scale, --count-scale=scale
256            Unit/scale for count metrics,  possible  values  include  count  x
257            10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
258            10^7.  (These values are currently space-sensitive.)  This  option
259            will  not  override  possible per-metric specifications.  See also
260            pmParseUnitsStr(3).
261
262       -Q scale, --count-scale-force=scale
263            Like -q but this option will override per-metric specifications.
264
265       -r, --raw
266            Output raw metric values, do not convert  cumulative  counters  to
267            rates.   This  option will override possible per-metric specifica‐
268            tions.
269
270       -R, --raw-prefer
271            Like -r but this option will not  override  per-metric  specifica‐
272            tions.
273
274       -s samples, --samples=samples
275            The argument samples defines the number of samples to be retrieved
276            and reported.  If samples is 0 or -s is not  specified,  pcp2spark
277            will  sample  and report continuously (in real time mode) or until
278            the end of the set of PCP archives (in archive  mode).   See  also
279            -T.
280
281       -S starttime, --start=starttime
282            When  reporting archived metrics, the report will be restricted to
283            those records logged at or after starttime.  Refer to  PCPIntro(1)
284            for a complete description of the syntax for starttime.
285
286       -t interval, --interval=interval
287            The default update interval may be set to something other than the
288            default 1  second.   The  interval  argument  follows  the  syntax
289            described  in  PCPIntro(1),  and  in  the  simplest form may be an
290            unsigned integer (the implied units in  this  case  are  seconds).
291            See also the -T option.
292
293       -T endtime, --finish=endtime
294            When  reporting archived metrics, the report will be restricted to
295            those records logged before or at endtime.  Refer  to  PCPIntro(1)
296            for a complete description of the syntax for endtime.
297
298            When  used to define the runtime before pcp2spark will exit, if no
299            samples is given (see -s) then  the  number  of  reported  samples
300            depends  on  interval (see -t).  If samples is given then interval
301            will be adjusted to allow reporting of samples during runtime.  In
302            case  all  of  -T,  -s,  and  -t are given, endtime determines the
303            actual time pcp2spark will run.
304
305       -v, --omit-flat
306            Omit single-valued ``flat'' metrics from reporting, only  consider
307            set-valued  metrics  (i.e.,  metrics  with  multiple  values)  for
308            reporting.  See -i and -I.
309
310       -V, --version
311            Display version number and exit.
312
313       -y scale, --time-scale=scale
314            Unit/scale for time metrics, possible values include nanosec,  ns,
315            microsec,  us,  millisec,  ms,  and so forth up to hour, hr.  This
316            option will not override possible per-metric specifications.   See
317            also pmParseUnitsStr(3).
318
319       -Y scale, --time-scale-force=scale
320            Like -y but this option will override per-metric specifications.
321
322       -?, --help
323            Display usage message and exit.
324

FILES

326       pcp2spark.conf
327              pcp2spark configuration file (see -c)
328

PCP ENVIRONMENT

330       Environment variables with the prefix PCP_ are used to parameterize the
331       file and directory names used by PCP.  On each installation,  the  file
332       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
333       $PCP_CONF variable may be used to specify an alternative  configuration
334       file, as described in pcp.conf(5).
335
336       For environment variables affecting PCP tools, see pmGetOptions(3).
337

SEE ALSO

339       mkaf(1),  PCPIntro(1),  pcp(1),  pcp2elasticsearch(1), pcp2graphite(1),
340       pcp2influxdb(1), pcp2json(1), pcp2xlsx(1),  pcp2xml(1),  pcp2zabbix(1),
341       pmcd(1),   pminfo(1),  pmrep(1),  pmGetOptions(3),  pmSpecLocalPMDA(3),
342       pmLoadDerivedConfig(3), pmParseUnitsStr(3), pmRegisterDerived(3), LOGA‐
343       RCHIVE(5), pcp.conf(5), pmns(5) and pmrep.conf(5).
344
345
346
347Performance Co-Pilot                  PCP                         PCP2SPARK(1)
Impressum