1PCP2SPARK(1)                General Commands Manual               PCP2SPARK(1)
2
3
4

NAME

6       pcp2spark - pcp-to-spark metrics exporter
7

SYNOPSIS

9       pcp2spark [-5CGHIjLmnrRvV?]  [-4 action] [-8|-9 limit] [-a archive] [-A
10       align] [--archive-folio folio] [-b|-B space-scale] [-c config]  [--con‐
11       tainer  container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12       instances] [-J rank] [-K spec] [-N predicate]  [-O  origin]  [-p  port]
13       [-P|-0  precision]  [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14       interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15

DESCRIPTION

17       pcp2spark is a customizable performance metrics exporter tool from  PCP
18       to  Apache  Spark.  Any available performance metric, live or archived,
19       system and/or application, can be selected for exporting  using  either
20       command line arguments or a configuration file.
21
22       pcp2spark  acts as a bridge which provides a network socket stream on a
23       given address/port which an Apache Spark worker task can connect to and
24       pull the configured PCP metrics from pcp2spark exporting them using the
25       streaming extensions of the Apache Spark API.
26
27       pcp2spark is a close relative of pmrep(1).  Please  refer  to  pmrep(1)
28       for  the  metricspec description accepted on pcp2spark command line and
29       pmrep.conf(5) for description of the pcp2spark.conf configuration  file
30       overall syntax, this page describes pcp2spark specific options and con‐
31       figuration file differences with pmrep.conf(5).   pmrep(1)  also  lists
32       some  usage  examples  of  which  most are applicable with pcp2spark as
33       well.
34
35       Only the command line options listed on this page are supported,  other
36       options recognized by pmrep(1) are not supported.
37
38       Options  via environment values (see pmGetOptions(3)) override the cor‐
39       responding  built-in  default  values  (if  any).   Configuration  file
40       options  override  the  corresponding  environment  variables (if any).
41       Command line options  override  the  corresponding  configuration  file
42       options (if any).
43

GENERAL USAGE

45       A general setup for making use of pcp2spark would involve the user con‐
46       figuring pcp2spark for the PCP metrics to export followed  by  starting
47       the pcp2spark application. The pcp2spark application will then wait and
48       listen on the given address/port for a connection from an Apache  Spark
49       worker  thread  to  be started.  The worker thread will then connect to
50       pcp2spark.
51
52       When an Apache Spark worker thread has connected pcp2spark  will  begin
53       streaming  PCP metric data to Apache Spark until the worker thread com‐
54       pletes or the connection is interrupted.  If  the  connectionis  inter‐
55       rupted  or  the  socket  is  closed from the Apache Spark worker thread
56       pcp2spark will exit.
57
58       For an example Apache  Spark  worker  job  which  will  connect  to  an
59       pcp2spark  instance on a given address/port and pull in PCP metric data
60       please see the example provided  in  the  PCP  examples  directory  for
61       pcp2spark (often provided by the PCP development package) or the online
62       version     at      https://github.com/performancecopilot/pcp/blob/mas
63       ter/src/pcp2spark/.
64

CONFIGURATION FILE

66       pcp2spark  uses  a  configuration file with overall syntax described in
67       pmrep.conf(5).  The following options are common with pmrep.conf:  ver‐
68       sion,  source,  speclocal, derived, header, globals, samples, interval,
69       type, type_prefer, ignore_incompat, names_change, instances,  live_fil‐
70       ter,  rank, limit_filter, limit_filter_force, invert_filter, predicate,
71       omit_flat,  include_labels,  precision,  precision_force,  count_scale,
72       count_scale_force,    space_scale,    space_scale_force,    time_scale,
73       time_scale_force.  The output option  is  recognized  but  ignored  for
74       pmrep.conf compatibility.
75
76   pcp2spark specific options
77       spark_server (string)
78           Specify  the address on which pcp2spark will listen for connections
79           from an Apache Spark worker  thread.   Corresponding  command  line
80           option is -g.  Default is 127.0.0.1.
81
82       spark_port (integer)
83           Specify  the  port to run pcp2spark on.  Corresponding command line
84           option is -p.  Default is 44325.
85

OPTIONS

87       The available command line options are:
88
89       -0 precision, --precision-force=precision
90            Like -P but this option will override per-metric specifications.
91
92       -4 action, --names-change=action
93            Specify which action to take on receiving a  metric  names  change
94            event  during  sampling.  These events occur when a PMDA discovers
95            new metrics sometime after starting up, and informs running client
96            tools like pcp2spark.  Valid values for action are update (refresh
97            metrics being sampled), ignore (do nothing -  the  default  behav‐
98            iour) and abort (exit the program if such an event happens).
99
100       -5, --ignore-unknown
101            Silently ignore any metric name that cannot be resolved.  At least
102            one metric must be found for the tool to start.
103
104       -8 limit, --limit-filter=limit
105            Limit results to instances with values above/below limit.  A posi‐
106            tive  integer  will  include instances with values at or above the
107            limit in reporting.  A negative  integer  will  include  instances
108            with  values  at or below the limit in reporting.  A value of zero
109            performs no limit filtering.  This option will not override possi‐
110            ble per-metric specifications.  See also -J and -N.
111
112       -9 limit, --limit-filter-force=limit
113            Like -8 but this option will override per-metric specifications.
114
115       -a archive, --archive=archive
116            Performance  metric  values  are retrieved from the set of Perfor‐
117            mance Co-Pilot (PCP) archive log files identified by  the  archive
118            argument,  which is a comma-separated list of names, each of which
119            may be the base name of an archive or the name of a directory con‐
120            taining one or more archives.
121
122       -A align, --align=align
123            Force  the initial sample to be aligned on the boundary of a natu‐
124            ral time unit align.  Refer to PCPIntro(1) for a complete descrip‐
125            tion of the syntax for align.
126
127       --archive-folio=folio
128            Read  metric source archives from the PCP archive folio created by
129            tools like pmchart(1) or, less often, manually with mkaf(1).
130
131       -b scale, --space-scale=scale
132            Unit/scale for  space  (byte)  metrics,  possible  values  include
133            bytes, Kbytes, KB, Mbytes, MB, and so forth.  This option will not
134            override possible per-metric specifications.  See  also  pmParseU‐
135            nitsStr(3).
136
137       -B scale, --space-scale-force=scale
138            Like -b but this option will override per-metric specifications.
139
140       -c config, --config=config
141            Specify  the  config  file to use.  The default is the first found
142            of:            ./pcp2spark.conf,            $HOME/.pcp2spark.conf,
143            $HOME/pcp/pcp2spark.conf,   and   $PCP_SYSCONF_DIR/pcp2spark.conf.
144            For details, see the above section and pmrep.conf(5).
145
146       --container=container
147            Fetch performance metrics from  the  specified  container,  either
148            local or remote (see -h).
149
150       -C, --check
151            Exit before reporting any values, but after parsing the configura‐
152            tion and metrics and printing possible headers.
153
154       --daemonize
155            Daemonize on startup.
156
157       -e derived, --derived=derived
158            Specify derived performance metrics.  If  derived  starts  with  a
159            slash  (``/'')  or  with a dot (``.'') it will be interpreted as a
160            derived metrics configuration file, otherwise it  will  be  inter‐
161            preted  as  comma-  or  semicolon-separated derived metric expres‐
162            sions.  For details  see  pmLoadDerivedConfig(3)  and  pmRegister‐
163            Derived(3).
164
165       -g server, --spark-server=server
166            Spark server to send the metrics to.
167
168       -G, --no-globals
169            Do not include global metrics in reporting (see pmrep.conf(5)).
170
171       -h host, --host=host
172            Fetch  performance  metrics from pmcd(1) on host, rather than from
173            the default localhost.
174
175       -H, --no-header
176            Do not print any headers.
177
178       -i instances, --instances=instances
179            Report only  the  listed  instances  from  current  instances  (if
180            present,  see  also  -j).   By  default all instances, present and
181            future, are reported.  This is a global option that  is  used  for
182            all  metrics  unless a metric-specific instance definition is pro‐
183            vided as part of a metricspec.  By default single-valued  ``flat''
184            metrics  without  multiple  instances are still reported as usual,
185            use -v to change this.  Please refer to pmrep(1) for more  details
186            on this option.
187
188       -I, --ignore-incompat
189            Ignore  incompatible  metrics.   By  default  incompatible metrics
190            (that is, their type is unsupported or they cannot  be  scaled  as
191            requested)  will  cause  pcp2spark to terminate with an error mes‐
192            sage.  With this option  all  incompatible  metrics  are  silently
193            omitted  from  reporting.   This  may  be  especially  useful when
194            requesting non-leaf nodes of the PMNS tree for reporting.
195
196       -j, --live-filter
197            Perform instance live filtering.  This allows capturing  all  fil‐
198            tered  instances  even  if  processes  are restarted at some point
199            (unlike without live filtering).  Performing live filtering over a
200            huge  amount of instances will add some internal overhead so a bit
201            of user caution is advised.  See also -n.
202
203       -J rank, --rank=rank
204            Limit results to highest/lowest  ranked  instances  of  set-valued
205            metrics.  A positive integer will include highest valued instances
206            in reporting.  A  negative  integer  will  include  lowest  valued
207            instances  in  reporting.   A  value  of zero performs no ranking.
208            Ranking does not imply sorting, see -6.  See also -8.
209
210       -K spec, --spec-local=spec
211            When fetching metrics from a local context (see -L), the -K option
212            may  be used to control the DSO PMDAs that should be made accessi‐
213            ble.  The spec  argument  conforms  to  the  syntax  described  in
214            pmSpecLocalPMDA(3).  More than one -K option may be used.
215
216       -L, --local-PMDA
217            Use a local context to collect metrics from DSO PMDAs on the local
218            host without PMCD.  See also -K.
219
220       -m, --include-labels
221            Include metric labels in the output.
222
223       -n, --invert-filter
224            Perform ranking before live filtering.  By default  instance  live
225            filtering (when requested, see -j) happens before instance ranking
226            (when requested, see -J).  With this option the logic is  inverted
227            and ranking happens before live filtering.
228
229       -N predicate, --predicate=predicate
230            Specify  a comma-separated list of predicate filter reference met‐
231            rics.  By default ranking (see -J) happens for each  metric  indi‐
232            vidually.  With predicates, ranking is done only for the specified
233            predicate metrics.  When reporting, rest of  the  metrics  sharing
234            the  same  instance domain (see PCPIntro(1)) as the predicate will
235            include only the highest/lowest ranking instances  of  the  corre‐
236            sponding predicate.  Ranking does not imply sorting, see -6.
237
238            So  for  example,  using  proc.memory.rss (resident memory size of
239            process) as the predicate metric together with proc.io.total_bytes
240            and  mem.util.used  as  metrics to be reported, only the processes
241            using most/least (as per -J) memory will be included when  report‐
242            ing  total  bytes  written by processes.  Since mem.util.used is a
243            single-valued metric (thus not sharing the same instance domain as
244            the process-related metrics), it will be reported as usual.
245
246       -O origin, --origin=origin
247            When  reporting archived metrics, start reporting at origin within
248            the time window (see -S and -T).  Refer to PCPIntro(1) for a  com‐
249            plete description of the syntax for origin.
250
251       -p port, --spark-port=port
252            Spark server port.
253
254       -P precision, --precision=precision
255            Use  precision for numeric non-integer output values.  The default
256            is to use 3 decimal places (when applicable).   This  option  will
257            not override possible per-metric specifications.
258
259       -q scale, --count-scale=scale
260            Unit/scale  for  count  metrics,  possible  values include count x
261            10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
262            10^7.   (These values are currently space-sensitive.)  This option
263            will not override possible per-metric  specifications.   See  also
264            pmParseUnitsStr(3).
265
266       -Q scale, --count-scale-force=scale
267            Like -q but this option will override per-metric specifications.
268
269       -r, --raw
270            Output  raw  metric  values, do not convert cumulative counters to
271            rates.  This option will override possible  per-metric  specifica‐
272            tions.
273
274       -R, --raw-prefer
275            Like  -r  but  this option will not override per-metric specifica‐
276            tions.
277
278       -s samples, --samples=samples
279            The samples argument defines the number of samples to be retrieved
280            and  reported.   If samples is 0 or -s is not specified, pcp2spark
281            will sample and report continuously (in real time mode)  or  until
282            the  end  of  the set of PCP archives (in archive mode).  See also
283            -T.
284
285       -S starttime, --start=starttime
286            When reporting archived metrics, the report will be restricted  to
287            those  records logged at or after starttime.  Refer to PCPIntro(1)
288            for a complete description of the syntax for starttime.
289
290       -t interval, --interval=interval
291            Set the reporting interval to something other than the  default  1
292            second.   The  interval  argument  follows the syntax described in
293            PCPIntro(1), and in the simplest form may be an  unsigned  integer
294            (the  implied  units  in  this case are seconds).  See also the -T
295            option.
296
297       -T endtime, --finish=endtime
298            When reporting archived metrics, the report will be restricted  to
299            those  records  logged before or at endtime.  Refer to PCPIntro(1)
300            for a complete description of the syntax for endtime.
301
302            When used to define the runtime before pcp2spark will exit, if  no
303            samples  is  given  (see  -s)  then the number of reported samples
304            depends on interval (see -t).  If samples is given  then  interval
305            will be adjusted to allow reporting of samples during runtime.  In
306            case all of -T, -s, and  -t  are  given,  endtime  determines  the
307            actual time pcp2spark will run.
308
309       -v, --omit-flat
310            Omit  single-valued ``flat'' metrics from reporting, only consider
311            set-valued  metrics  (i.e.,  metrics  with  multiple  values)  for
312            reporting.  See -i and -I.
313
314       -V, --version
315            Display version number and exit.
316
317       -y scale, --time-scale=scale
318            Unit/scale  for time metrics, possible values include nanosec, ns,
319            microsec, us, millisec, ms, and so forth up  to  hour,  hr.   This
320            option  will not override possible per-metric specifications.  See
321            also pmParseUnitsStr(3).
322
323       -Y scale, --time-scale-force=scale
324            Like -y but this option will override per-metric specifications.
325
326       -?, --help
327            Display usage message and exit.
328

FILES

330       pcp2spark.conf
331            pcp2spark configuration file (see -c)
332

PCP ENVIRONMENT

334       Environment variables with the prefix PCP_ are used to parameterize the
335       file  and  directory names used by PCP.  On each installation, the file
336       /etc/pcp.conf contains the  local  values  for  these  variables.   The
337       $PCP_CONF  variable may be used to specify an alternative configuration
338       file, as described in pcp.conf(5).
339
340       For environment variables affecting PCP tools, see pmGetOptions(3).
341

SEE ALSO

343       mkaf(1), PCPIntro(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),
344       pcp2influxdb(1),  pcp2json(1),  pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1),
345       pmcd(1),  pminfo(1),  pmrep(1),  pmGetOptions(3),   pmSpecLocalPMDA(3),
346       pmLoadDerivedConfig(3), pmParseUnitsStr(3), pmRegisterDerived(3), LOGA‐
347       RCHIVE(5), pcp.conf(5), PMNS(5) and pmrep.conf(5).
348
349
350
351Performance Co-Pilot                  PCP                         PCP2SPARK(1)
Impressum