1PCP2SPARK(1)                General Commands Manual               PCP2SPARK(1)
2
3
4

NAME

6       pcp2spark - pcp-to-spark metrics exporter
7

SYNOPSIS

9       pcp2spark [-5CGHIjLmnrRvV?]  [-4 action] [-8|-9 limit] [-a archive] [-A
10       align] [--archive-folio folio] [-b|-B space-scale] [-c config]  [--con‐
11       tainer  container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12       instances] [-J rank] [-K spec] [-N predicate]  [-O  origin]  [-p  port]
13       [-P|-0  precision]  [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14       interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15

DESCRIPTION

17       pcp2spark is a customizable performance metrics exporter tool from  PCP
18       to  Apache  Spark.  Any available performance metric, live or archived,
19       system and/or application, can be selected for exporting  using  either
20       command line arguments or a configuration file.
21
22       pcp2spark  acts as a bridge which provides a network socket stream on a
23       given address/port which an Apache Spark worker task can connect to and
24       pull the configured PCP metrics from pcp2spark exporting them using the
25       streaming extensions of the Apache Spark API.
26
27       pcp2spark is a close relative of pmrep(1).  Refer to pmrep(1)  for  the
28       metricspec  description  accepted  on  pcp2spark command line.  See pm‐
29       rep.conf(5) for description of the  pcp2spark.conf  configuration  file
30       overall  syntax.   This  page  describes pcp2spark specific options and
31       configuration file differences with pmrep.conf(5).  pmrep(1) also lists
32       some  usage  examples  of  which  most are applicable with pcp2spark as
33       well.
34
35       Only the command line options listed on this page are supported,  other
36       options recognized by pmrep(1) are not supported.
37
38       Options  via environment values (see pmGetOptions(3)) override the cor‐
39       responding built-in default values (if any).   Configuration  file  op‐
40       tions  override the corresponding environment variables (if any).  Com‐
41       mand line options override the corresponding configuration file options
42       (if any).
43

GENERAL USAGE

45       A general setup for making use of pcp2spark would involve the user con‐
46       figuring pcp2spark for the PCP metrics to export followed  by  starting
47       the pcp2spark application. The pcp2spark application will then wait and
48       listen on the given address/port for a connection from an Apache  Spark
49       worker  thread  to  be started.  The worker thread will then connect to
50       pcp2spark.
51
52       When an Apache Spark worker thread has connected pcp2spark  will  begin
53       streaming  PCP metric data to Apache Spark until the worker thread com‐
54       pletes or the connection is interrupted.  If  the  connectionis  inter‐
55       rupted  or  the  socket  is  closed from the Apache Spark worker thread
56       pcp2spark will exit.
57
58       For an example Apache  Spark  worker  job  which  will  connect  to  an
59       pcp2spark  instance on a given address/port and pull in PCP metric data
60       see the example provided in the PCP examples  directory  for  pcp2spark
61       (often  provided  by the PCP development package) or the online version
62       at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.
63

CONFIGURATION FILE

65       pcp2spark uses a configuration file with overall  syntax  described  in
66       pmrep.conf(5).   The following options are common with pmrep.conf: ver‐
67       sion, source, speclocal, derived, header, globals,  samples,  interval,
68       type,  type_prefer, ignore_incompat, names_change, instances, live_fil‐
69       ter, rank, limit_filter, limit_filter_force, invert_filter,  predicate,
70       omit_flat,  include_labels,  precision,  precision_force,  count_scale,
71       count_scale_force,    space_scale,    space_scale_force,    time_scale,
72       time_scale_force.   The output option is recognized but ignored for pm‐
73       rep.conf compatibility.
74
75   pcp2spark specific options
76       spark_server (string)
77           Specify the address on which pcp2spark will listen for  connections
78           from an Apache Spark worker thread.  Corresponding command line op‐
79           tion is -g.  Default is 127.0.0.1.
80
81       spark_port (integer)
82           Specify the port to run pcp2spark on.  Corresponding  command  line
83           option is -p.  Default is 44325.
84

OPTIONS

86       The available command line options are:
87
88       -0 precision, --precision-force=precision
89            Like -P but this option will override per-metric specifications.
90
91       -4 action, --names-change=action
92            Specify  which  action  to take on receiving a metric names change
93            event during sampling.  These events occur when a  PMDA  discovers
94            new metrics sometime after starting up, and informs running client
95            tools like pcp2spark.  Valid values for action are update (refresh
96            metrics  being  sampled),  ignore (do nothing - the default behav‐
97            iour) and abort (exit the program if such an event happens).
98
99       -5, --ignore-unknown
100            Silently ignore any metric name that cannot be resolved.  At least
101            one metric must be found for the tool to start.
102
103       -8 limit, --limit-filter=limit
104            Limit results to instances with values above/below limit.  A posi‐
105            tive integer will include instances with values at  or  above  the
106            limit  in  reporting.   A  negative integer will include instances
107            with values at or below the limit in reporting.  A value  of  zero
108            performs no limit filtering.  This option will not override possi‐
109            ble per-metric specifications.  See also -J and -N.
110
111       -9 limit, --limit-filter-force=limit
112            Like -8 but this option will override per-metric specifications.
113
114       -a archive, --archive=archive
115            Performance metric values are retrieved from the  set  of  Perfor‐
116            mance  Co-Pilot  (PCP) archive log files identified by the archive
117            argument, which is a comma-separated list of names, each of  which
118            may be the base name of an archive or the name of a directory con‐
119            taining one or more archives.
120
121       -A align, --align=align
122            Force the initial sample to be aligned on the boundary of a  natu‐
123            ral time unit align.  Refer to PCPIntro(1) for a complete descrip‐
124            tion of the syntax for align.
125
126       --archive-folio=folio
127            Read metric source archives from the PCP archive folio created  by
128            tools like pmchart(1) or, less often, manually with mkaf(1).
129
130       -b scale, --space-scale=scale
131            Unit/scale  for  space  (byte)  metrics,  possible  values include
132            bytes, Kbytes, KB, Mbytes, MB, and so forth.  This option will not
133            override  possible  per-metric specifications.  See also pmParseU‐
134            nitsStr(3).
135
136       -B scale, --space-scale-force=scale
137            Like -b but this option will override per-metric specifications.
138
139       -c config, --config=config
140            Specify the config file or directory to use.  In case config is  a
141            directory  all  files under it ending .conf will be included.  The
142            default    is    the    first    found    of:    ./pcp2spark.conf,
143            $HOME/.pcp2spark.conf,        $HOME/pcp/pcp2spark.conf,        and
144            $PCP_SYSCONF_DIR/pcp2spark.conf.  For details, see the above  sec‐
145            tion and pmrep.conf(5).
146
147       --container=container
148            Fetch performance metrics from the specified container, either lo‐
149            cal or remote (see -h).
150
151       -C, --check
152            Exit before reporting any values, but after parsing the configura‐
153            tion and metrics and printing possible headers.
154
155       --daemonize
156            Daemonize on startup.
157
158       -e derived, --derived=derived
159            Specify  derived  performance  metrics.   If derived starts with a
160            slash (``/'') or with a dot (``.'') it will be  interpreted  as  a
161            derived  metrics  configuration  file, otherwise it will be inter‐
162            preted as comma- or  semicolon-separated  derived  metric  expres‐
163            sions.   For  details  see  pmLoadDerivedConfig(3) and pmRegister‐
164            Derived(3).
165
166       -g server, --spark-server=server
167            Spark server to send the metrics to.
168
169       -G, --no-globals
170            Do not include global metrics in reporting (see pmrep.conf(5)).
171
172       -h host, --host=host
173            Fetch performance metrics from pmcd(1) on host, rather  than  from
174            the default localhost.
175
176       -H, --no-header
177            Do not print any headers.
178
179       -i instances, --instances=instances
180            Retrieve  and  report only the specified metric instances.  By de‐
181            fault all instances, present and future, are reported.
182
183            Refer to pmrep(1) for complete description of this option.
184
185       -I, --ignore-incompat
186            Ignore incompatible  metrics.   By  default  incompatible  metrics
187            (that  is,  their  type is unsupported or they cannot be scaled as
188            requested) will cause pcp2spark to terminate with  an  error  mes‐
189            sage.   With  this  option  all  incompatible metrics are silently
190            omitted from reporting.  This may be especially  useful  when  re‐
191            questing non-leaf nodes of the PMNS tree for reporting.
192
193       -j, --live-filter
194            Perform  instance live filtering.  This allows capturing all named
195            instances even if processes are restarted at  some  point  (unlike
196            without  live  filtering).   Performing live filtering over a huge
197            number of instances will add some internal overhead so  a  bit  of
198            user caution is advised.  See also -n.
199
200       -J rank, --rank=rank
201            Limit  results  to  highest/lowest  ranked instances of set-valued
202            metrics.  A positive integer will include highest valued instances
203            in  reporting.   A negative integer will include lowest valued in‐
204            stances in reporting.  A value of zero performs no ranking.  Rank‐
205            ing does not imply sorting, see -6.  See also -8.
206
207       -K spec, --spec-local=spec
208            When fetching metrics from a local context (see -L), the -K option
209            may be used to control the DSO PMDAs that should be made  accessi‐
210            ble.   The  spec  argument conforms to the syntax described in pm‐
211            SpecLocalPMDA(3).  More than one -K option may be used.
212
213       -L, --local-PMDA
214            Use a local context to collect metrics from DSO PMDAs on the local
215            host without PMCD.  See also -K.
216
217       -m, --include-labels
218            Include metric labels in the output.
219
220       -n, --invert-filter
221            Perform  ranking  before live filtering.  By default instance live
222            filtering (when requested, see -j) happens before instance ranking
223            (when  requested, see -J).  With this option the logic is inverted
224            and ranking happens before live filtering.
225
226       -N predicate, --predicate=predicate
227            Specify a comma-separated list of predicate filter reference  met‐
228            rics.   By  default ranking (see -J) happens for each metric indi‐
229            vidually.  With predicates, ranking is done only for the specified
230            predicate  metrics.   When  reporting, rest of the metrics sharing
231            the same instance domain (see PCPIntro(1)) as the  predicate  will
232            include  only  the  highest/lowest ranking instances of the corre‐
233            sponding predicate.  Ranking does not imply sorting, see -6.
234
235            So for example, using proc.memory.rss  (resident  memory  size  of
236            process) as the predicate metric together with proc.io.total_bytes
237            and mem.util.used as metrics to be reported,  only  the  processes
238            using  most/least (as per -J) memory will be included when report‐
239            ing total bytes written by processes.  Since  mem.util.used  is  a
240            single-valued metric (thus not sharing the same instance domain as
241            the process related metrics), it will be reported as usual.
242
243       -O origin, --origin=origin
244            When reporting archived metrics, start reporting at origin  within
245            the  time window (see -S and -T).  Refer to PCPIntro(1) for a com‐
246            plete description of the syntax for origin.
247
248       -p port, --spark-port=port
249            Spark server port.
250
251       -P precision, --precision=precision
252            Use precision for numeric non-integer output values.  The  default
253            is  to  use  3 decimal places (when applicable).  This option will
254            not override possible per-metric specifications.
255
256       -q scale, --count-scale=scale
257            Unit/scale for count metrics,  possible  values  include  count  x
258            10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
259            10^7.  (These values are currently space-sensitive.)  This  option
260            will  not  override  possible per-metric specifications.  See also
261            pmParseUnitsStr(3).
262
263       -Q scale, --count-scale-force=scale
264            Like -q but this option will override per-metric specifications.
265
266       -r, --raw
267            Output raw metric values, do not convert  cumulative  counters  to
268            rates.   This  option will override possible per-metric specifica‐
269            tions.
270
271       -R, --raw-prefer
272            Like -r but this option will not  override  per-metric  specifica‐
273            tions.
274
275       -s samples, --samples=samples
276            The samples argument defines the number of samples to be retrieved
277            and reported.  If samples is 0 or -s is not  specified,  pcp2spark
278            will  sample  and report continuously (in real time mode) or until
279            the end of the set of PCP archives (in archive  mode).   See  also
280            -T.
281
282       -S starttime, --start=starttime
283            When  reporting archived metrics, the report will be restricted to
284            those records logged at or after starttime.  Refer to  PCPIntro(1)
285            for a complete description of the syntax for starttime.
286
287       -t interval, --interval=interval
288            Set  the  reporting interval to something other than the default 1
289            second.  The interval argument follows  the  syntax  described  in
290            PCPIntro(1),  and  in the simplest form may be an unsigned integer
291            (the implied units in this case are seconds).  See also the -T op‐
292            tion.
293
294       -T endtime, --finish=endtime
295            When  reporting archived metrics, the report will be restricted to
296            those records logged before or at endtime.  Refer  to  PCPIntro(1)
297            for a complete description of the syntax for endtime.
298
299            When  used to define the runtime before pcp2spark will exit, if no
300            samples is given (see -s) then the number of reported samples  de‐
301            pends  on  interval  (see  -t).  If samples is given then interval
302            will be adjusted to allow reporting of samples during runtime.  In
303            case  all  of -T, -s, and -t are given, endtime determines the ac‐
304            tual time pcp2spark will run.
305
306       -v, --omit-flat
307            Report only set-valued metrics with instances (e.g. disk.dev.read)
308            and  omit  single-valued  ``flat'' metrics without instances (e.g.
309            kernel.all.sysfork).  See -i and -I.
310
311       -V, --version
312            Display version number and exit.
313
314       -y scale, --time-scale=scale
315            Unit/scale for time metrics, possible values include nanosec,  ns,
316            microsec, us, millisec, ms, and so forth up to hour, hr.  This op‐
317            tion will not override possible  per-metric  specifications.   See
318            also pmParseUnitsStr(3).
319
320       -Y scale, --time-scale-force=scale
321            Like -y but this option will override per-metric specifications.
322
323       -?, --help
324            Display usage message and exit.
325

FILES

327       pcp2spark.conf
328            pcp2spark configuration file (see -c)
329

PCP ENVIRONMENT

331       Environment variables with the prefix PCP_ are used to parameterize the
332       file and directory names used by PCP.  On each installation,  the  file
333       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
334       $PCP_CONF variable may be used to specify an alternative  configuration
335       file, as described in pcp.conf(5).
336
337       For environment variables affecting PCP tools, see pmGetOptions(3).
338

SEE ALSO

340       mkaf(1),  PCPIntro(1),  pcp(1),  pcp2elasticsearch(1), pcp2graphite(1),
341       pcp2influxdb(1), pcp2json(1), pcp2xlsx(1),  pcp2xml(1),  pcp2zabbix(1),
342       pmcd(1),  pminfo(1), pmrep(1), pmGetOptions(3), pmSpecLocalPMDA(3), pm‐
343       LoadDerivedConfig(3), pmParseUnitsStr(3),  pmRegisterDerived(3),  LOGA‐
344       RCHIVE(5), pcp.conf(5), PMNS(5) and pmrep.conf(5).
345
346
347
348Performance Co-Pilot                  PCP                         PCP2SPARK(1)
Impressum