1PCP2SPARK(1) General Commands Manual PCP2SPARK(1)
2
3
4
6 pcp2spark - pcp-to-spark metrics exporter
7
9 pcp2spark [-5CGHIjLnrRvV?] [-4 action] [-8|-9 limit] [-a archive] [-A
10 align] [--archive-folio folio] [-b|-B space-scale] [-c config] [--con‐
11 tainer container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12 instances] [-J rank] [-K spec] [-N predicate] [-O origin] [-p port]
13 [-P|-0 precision] [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14 interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15
17 pcp2spark is a customizable performance metrics exporter tool from PCP
18 to Apache Spark. Any available performance metric, live or archived,
19 system and/or application, can be selected for exporting using either
20 command line arguments or a configuration file.
21
22 pcp2spark acts as a bridge which provides a network socket stream on a
23 given address/port which an Apache Spark worker task can connect to and
24 pull the configured PCP metrics from pcp2spark exporting them using the
25 streaming extensions of the Apache Spark API.
26
27 pcp2spark is a close relative of pmrep(1). Please refer to pmrep(1)
28 for the metricspec description accepted on pcp2spark command line and
29 pmrep.conf(5) for description of the pcp2spark.conf configuration file
30 overall syntax, this page describes pcp2spark specific options and con‐
31 figuration file differences with pmrep.conf(5). pmrep(1) also lists
32 some usage examples of which most are applicable with pcp2spark as
33 well.
34
35 Only the command line options listed on this page are supported, other
36 options recognized by pmrep(1) are not supported.
37
38 Options via environment values (see pmGetOptions(3)) override the cor‐
39 responding built-in default values (if any). Configuration file
40 options override the corresponding environment variables (if any).
41 Command line options override the corresponding configuration file
42 options (if any).
43
45 A general setup for making use of pcp2spark would involve the user con‐
46 figuring pcp2spark for the PCP metrics to export followed by starting
47 the pcp2spark application. The pcp2spark application will then wait and
48 listen on the given address/port for a connection from an Apache Spark
49 worker thread to be started. The worker thread will then connect to
50 pcp2spark.
51
52 When an Apache Spark worker thread has connected pcp2spark will begin
53 streaming PCP metric data to Apache Spark until the worker thread com‐
54 pletes or the connection is interrupted. If the connectionis inter‐
55 rupted or the socket is closed from the Apache Spark worker thread
56 pcp2spark will exit.
57
58 For an example Apache Spark worker job which will connect to an
59 pcp2spark instance on a given address/port and pull in PCP metric data
60 please see the example provided in the PCP examples directory for
61 pcp2spark (often provided by the PCP development package) or the online
62 version at
63 ⟨https://github.com/performancecopilot/pcp/blob/master/src/pcp2spark/⟩.
64
66 pcp2spark uses a configuration file with overall syntax described in
67 pmrep.conf(5). The following options are common with pmrep.conf: ver‐
68 sion, source, speclocal, derived, header, globals, samples, interval,
69 type, type_prefer, ignore_incompat, names_change, instances, live_fil‐
70 ter, rank, limit_filter, limit_filter_force, invert_filter, predicate,
71 omit_flat, precision, precision_force, count_scale, count_scale_force,
72 space_scale, space_scale_force, time_scale, time_scale_force. The out‐
73 put option is recognized but ignored for pmrep.conf compatibility.
74
75 pcp2spark specific options
76 spark_server (string)
77 Specify the address on which pcp2spark will listen for connections
78 from an Apache Spark worker thread. Corresponding command line
79 option is -g. Default is 127.0.0.1.
80
81 spark_port (integer)
82 Specify the port to run pcp2spark on. Corresponding command line
83 option is -p. Default is 44325.
84
86 The available command line options are:
87
88 -0 precision, --precision-force=precision
89 Like -P but this option will override per-metric specifications.
90
91 -4 action, --names-change=action
92 Specify which action to take on receiving a metric names change
93 event during sampling. These events occur when a PMDA discovers
94 new metrics sometime after starting up, and informs running client
95 tools like pcp2spark. Valid values for action are update (refresh
96 metrics being sampled), ignore (do nothing - the default behav‐
97 iour) and abort (exit the program if such an event happens).
98
99 -5, --ignore-unknown
100 Silently ignore any metric name that cannot be resolved. At least
101 one metric must be found for the tool to start.
102
103 -8 limit, --limit-filter=limit
104 Limit results to instances with values above/below limit. A posi‐
105 tive integer will include instances with values at or above the
106 limit in reporting. A negative integer will include instances
107 with values at or below the limit in reporting. A value of zero
108 performs no limit filtering. This option will not override possi‐
109 ble per-metric specifications. See also -J and -N.
110
111 -9 limit, --limit-filter-force=limit
112 Like -8 but this option will override per-metric specifications.
113
114 -a archive, --archive=archive
115 Performance metric values are retrieved from the set of Perfor‐
116 mance Co-Pilot (PCP) archive log files identified by the argument
117 archive, which is a comma-separated list of names, each of which
118 may be the base name of an archive or the name of a directory con‐
119 taining one or more archives.
120
121 -A align, --align=align
122 Force the initial sample to be aligned on the boundary of a natu‐
123 ral time unit align. Refer to PCPIntro(1) for a complete descrip‐
124 tion of the syntax for align.
125
126 --archive-folio=folio
127 Read metric source archives from the PCP archive folio created by
128 tools like pmchart(1) or, less often, manually with mkaf(1).
129
130 -b scale, --space-scale=scale
131 Unit/scale for space (byte) metrics, possible values include
132 bytes, Kbytes, KB, Mbytes, MB, and so forth. This option will not
133 override possible per-metric specifications. See also pmParseU‐
134 nitsStr(3).
135
136 -B scale, --space-scale-force=scale
137 Like -b but this option will override per-metric specifications.
138
139 -c config, --config=config
140 Specify the config file to use. The default is the first found
141 of: ./pcp2spark.conf, $HOME/.pcp2spark.conf,
142 $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.
143 For details, see the above section and pmrep.conf(5).
144
145 --container=container
146 Fetch performance metrics from the specified container, either
147 local or remote (see -h).
148
149 -C, --check
150 Exit before reporting any values, but after parsing the configura‐
151 tion and metrics and printing possible headers.
152
153 --daemonize
154 Daemonize on startup.
155
156 -e derived, --derived=derived
157 Specify derived performance metrics. If derived starts with a
158 slash (``/'') or with a dot (``.'') it will be interpreted as a
159 derived metrics configuration file, otherwise it will be inter‐
160 preted as comma- or semicolon-separated derived metric expres‐
161 sions. For details see pmLoadDerivedConfig(3) and pmRegister‐
162 Derived(3).
163
164 -g server, --spark-server=server
165 Spark server to send the metrics to.
166
167 -G, --no-globals
168 Do not include global metrics in reporting (see pmrep.conf(5)).
169
170 -h host, --host=host
171 Fetch performance metrics from pmcd(1) on host, rather than from
172 the default localhost.
173
174 -H, --no-header
175 Do not print any headers.
176
177 -i instances, --instances=instances
178 Report only the listed instances from current instances (if
179 present, see also -j). By default all instances, present and
180 future, are reported. This is a global option that is used for
181 all metrics unless a metric-specific instance definition is pro‐
182 vided as part of a metricspec. By default single-valued ``flat''
183 metrics without multiple instances are still reported as usual,
184 use -v to change this. Please refer to pmrep(1) for more details
185 on this option.
186
187 -I, --ignore-incompat
188 Ignore incompatible metrics. By default incompatible metrics
189 (that is, their type is unsupported or they cannot be scaled as
190 requested) will cause pcp2spark to terminate with an error mes‐
191 sage. With this option all incompatible metrics are silently
192 omitted from reporting. This may be especially useful when
193 requesting non-leaf nodes of the PMNS tree for reporting.
194
195 -j, --live-filter
196 Perform instance live filtering. This allows capturing all fil‐
197 tered instances even if processes are restarted at some point
198 (unlike without live filtering). Performing live filtering over a
199 huge amount of instances will add some internal overhead so a bit
200 of user caution is advised. See also -n.
201
202 -J rank, --rank=rank
203 Limit results to highest/lowest rank instances of set-valued met‐
204 rics. A positive integer will include highest valued instances in
205 reporting. A negative integer will include lowest valued
206 instances in reporting. A value of zero performs no ranking. See
207 also -8.
208
209 -K spec, --spec-local=spec
210 When fetching metrics from a local context (see -L), the -K option
211 may be used to control the DSO PMDAs that should be made accessi‐
212 ble. The spec argument conforms to the syntax described in
213 pmSpecLocalPMDA(3). More than one -K option may be used.
214
215 -L, --local-PMDA
216 Use a local context to collect metrics from DSO PMDAs on the local
217 host without PMCD. See also -K.
218
219 -n, --invert-filter
220 Perform ranking before live filtering. By default instance live
221 filtering (when requested, see -j) happens before instance ranking
222 (when requested, see -J). With this option the logic is inverted
223 and ranking happens before live filtering.
224
225 -N predicate, --predicate=predicate
226 Specify a comma-separated list of predicate filter reference met‐
227 rics. By default ranking (see -J) happens for each metric indi‐
228 vidually. With predicates, ranking is done only for the specified
229 predicate metrics. When reporting, rest of the metrics sharing
230 the same instance domain (see PCPIntro(1)) as the predicate will
231 include only the highest/lowest ranking instances of the corre‐
232 sponding predicate.
233
234 So for example, using proc.memory.rss (resident memory size of
235 process) as the predicate metric together with proc.io.total_bytes
236 and mem.util.used as metrics to be reported, only the processes
237 using most/least (as per -J) memory will be included when report‐
238 ing total bytes written by processes. Since mem.util.used is a
239 single-valued metric (thus not sharing the same instance domain as
240 the process-related metrics), it will be reported as usual.
241
242 -O origin, --origin=origin
243 When reporting archived metrics, start reporting at origin within
244 the time window (see -S and -T). Refer to PCPIntro(1) for a com‐
245 plete description of the syntax for origin.
246
247 -p port, --spark-port=port
248 Spark server port.
249
250 -P precision, --precision=precision
251 Use precision for numeric non-integer output values. The default
252 is to use 3 decimal places (when applicable). This option will
253 not override possible per-metric specifications.
254
255 -q scale, --count-scale=scale
256 Unit/scale for count metrics, possible values include count x
257 10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
258 10^7. (These values are currently space-sensitive.) This option
259 will not override possible per-metric specifications. See also
260 pmParseUnitsStr(3).
261
262 -Q scale, --count-scale-force=scale
263 Like -q but this option will override per-metric specifications.
264
265 -r, --raw
266 Output raw metric values, do not convert cumulative counters to
267 rates. This option will override possible per-metric specifica‐
268 tions.
269
270 -R, --raw-prefer
271 Like -r but this option will not override per-metric specifica‐
272 tions.
273
274 -s samples, --samples=samples
275 The argument samples defines the number of samples to be retrieved
276 and reported. If samples is 0 or -s is not specified, pcp2spark
277 will sample and report continuously (in real time mode) or until
278 the end of the set of PCP archives (in archive mode). See also
279 -T.
280
281 -S starttime, --start=starttime
282 When reporting archived metrics, the report will be restricted to
283 those records logged at or after starttime. Refer to PCPIntro(1)
284 for a complete description of the syntax for starttime.
285
286 -t interval, --interval=interval
287 The default update interval may be set to something other than the
288 default 1 second. The interval argument follows the syntax
289 described in PCPIntro(1), and in the simplest form may be an
290 unsigned integer (the implied units in this case are seconds).
291 See also the -T option.
292
293 -T endtime, --finish=endtime
294 When reporting archived metrics, the report will be restricted to
295 those records logged before or at endtime. Refer to PCPIntro(1)
296 for a complete description of the syntax for endtime.
297
298 When used to define the runtime before pcp2spark will exit, if no
299 samples is given (see -s) then the number of reported samples
300 depends on interval (see -t). If samples is given then interval
301 will be adjusted to allow reporting of samples during runtime. In
302 case all of -T, -s, and -t are given, endtime determines the
303 actual time pcp2spark will run.
304
305 -v, --omit-flat
306 Omit single-valued ``flat'' metrics from reporting, only consider
307 set-valued metrics (i.e., metrics with multiple values) for
308 reporting. See -i and -I.
309
310 -V, --version
311 Display version number and exit.
312
313 -y scale, --time-scale=scale
314 Unit/scale for time metrics, possible values include nanosec, ns,
315 microsec, us, millisec, ms, and so forth up to hour, hr. This
316 option will not override possible per-metric specifications. See
317 also pmParseUnitsStr(3).
318
319 -Y scale, --time-scale-force=scale
320 Like -y but this option will override per-metric specifications.
321
322 -?, --help
323 Display usage message and exit.
324
326 pcp2spark.conf
327 pcp2spark configuration file (see -c)
328
330 Environment variables with the prefix PCP_ are used to parameterize the
331 file and directory names used by PCP. On each installation, the file
332 /etc/pcp.conf contains the local values for these variables. The
333 $PCP_CONF variable may be used to specify an alternative configuration
334 file, as described in pcp.conf(5).
335
336 For environment variables affecting PCP tools, see pmGetOptions(3).
337
339 mkaf(1), PCPIntro(1), pcp(1), pcp2elasticsearch(1), pcp2graphite(1),
340 pcp2influxdb(1), pcp2json(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1),
341 pmcd(1), pminfo(1), pmrep(1), pmGetOptions(3), pmSpecLocalPMDA(3),
342 pmLoadDerivedConfig(3), pmParseUnitsStr(3), pmRegisterDerived(3), LOGA‐
343 RCHIVE(5), pcp.conf(5), pmns(5) and pmrep.conf(5).
344
345
346
347Performance Co-Pilot PCP PCP2SPARK(1)