1PCP2SPARK(1) General Commands Manual PCP2SPARK(1)
2
3
4
6 pcp2spark - pcp-to-spark metrics exporter
7
9 pcp2spark [-5CGHIjLmnrRvV?] [-4 action] [-8|-9 limit] [-a archive] [-A
10 align] [--archive-folio folio] [-b|-B space-scale] [-c config] [--con‐
11 tainer container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12 instances] [-J rank] [-K spec] [-N predicate] [-O origin] [-p port]
13 [-P|-0 precision] [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14 interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15
17 pcp2spark is a customizable performance metrics exporter tool from PCP
18 to Apache Spark. Any available performance metric, live or archived,
19 system and/or application, can be selected for exporting using either
20 command line arguments or a configuration file.
21
22 pcp2spark acts as a bridge which provides a network socket stream on a
23 given address/port which an Apache Spark worker task can connect to and
24 pull the configured PCP metrics from pcp2spark exporting them using the
25 streaming extensions of the Apache Spark API.
26
27 pcp2spark is a close relative of pmrep(1). Refer to pmrep(1) for the
28 metricspec description accepted on pcp2spark command line. See pm‐
29 rep.conf(5) for description of the pcp2spark.conf configuration file
30 syntax. This page describes pcp2spark specific options and configura‐
31 tion file differences with pmrep.conf(5). pmrep(1) also lists some us‐
32 age examples of which most are applicable with pcp2spark as well.
33
34 Only the command line options listed on this page are supported, other
35 options available for pmrep(1) are not supported.
36
37 Options via environment values (see pmGetOptions(3)) override the cor‐
38 responding built-in default values (if any). Configuration file op‐
39 tions override the corresponding environment variables (if any). Com‐
40 mand line options override the corresponding configuration file options
41 (if any).
42
44 A general setup for making use of pcp2spark would involve the user con‐
45 figuring pcp2spark for the PCP metrics to export followed by starting
46 the pcp2spark application. The pcp2spark application will then wait and
47 listen on the given address/port for a connection from an Apache Spark
48 worker thread to be started. The worker thread will then connect to
49 pcp2spark.
50
51 When an Apache Spark worker thread has connected, pcp2spark will begin
52 streaming PCP metric data to Apache Spark until the worker thread com‐
53 pletes or the connection is interrupted. If the connection is inter‐
54 rupted or the socket is closed from the Apache Spark worker thread
55 pcp2spark will exit.
56
57 For an example Apache Spark worker job which will connect to an
58 pcp2spark instance on a given address/port and pull in PCP metric data
59 see the example provided in the PCP examples directory for pcp2spark
60 (often provided by the PCP development package) or the online version
61 at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.
62
64 pcp2spark uses a configuration file with syntax described in pm‐
65 rep.conf(5). The following options are common with pmrep.conf: ver‐
66 sion, source, speclocal, derived, header, globals, samples, interval,
67 type, type_prefer, ignore_incompat, names_change, instances, live_fil‐
68 ter, rank, limit_filter, limit_filter_force, invert_filter, predicate,
69 omit_flat, include_labels, precision, precision_force, count_scale,
70 count_scale_force, space_scale, space_scale_force, time_scale,
71 time_scale_force. The rest of the pmrep.conf options are recognized
72 but ignored for compatibility.
73
74 pcp2spark specific options
75 spark_server (string)
76 Specify the address on which pcp2spark will listen for connections
77 from an Apache Spark worker thread. Corresponding command line op‐
78 tion is -g. Defaults to 127.0.0.1.
79
80 spark_port (integer)
81 Specify the port on which pcp2spark will listen for connections.
82 Corresponding command line option is -p. Defaults to 44325.
83
85 The available command line options are:
86
87 -0 precision, --precision-force=precision
88 Like -P but this option will override per-metric specifications.
89
90 -4 action, --names-change=action
91 Specify which action to take on receiving a metric names change
92 event during sampling. These events occur when a PMDA discovers
93 new metrics sometime after starting up, and informs running client
94 tools like pcp2spark. Valid values for action are update (refresh
95 metrics being sampled), ignore (do nothing - the default behav‐
96 iour) and abort (exit the program if such an event occurs).
97
98 -5, --ignore-unknown
99 Silently ignore any metric name that cannot be resolved. At least
100 one metric must be found for the tool to start.
101
102 -8 limit, --limit-filter=limit
103 Limit results to instances with values above/below limit. A posi‐
104 tive integer will include instances with values at or above the
105 limit in reporting. A negative integer will include instances
106 with values at or below the limit in reporting. A value of zero
107 performs no limit filtering. This option will not override possi‐
108 ble per-metric specifications. See also -J and -N.
109
110 -9 limit, --limit-filter-force=limit
111 Like -8 but this option will override per-metric specifications.
112
113 -a archive, --archive=archive
114 Performance metric values are retrieved from the set of Perfor‐
115 mance Co-Pilot (PCP) archive log files identified by the archive
116 argument, which is a comma-separated list of names, each of which
117 may be the base name of an archive or the name of a directory con‐
118 taining one or more archives.
119
120 -A align, --align=align
121 Force the initial sample to be aligned on the boundary of a natu‐
122 ral time unit align. Refer to PCPIntro(1) for a complete descrip‐
123 tion of the syntax for align.
124
125 --archive-folio=folio
126 Read metric source archives from the PCP archive folio created by
127 tools like pmchart(1) or, less often, manually with mkaf(1).
128
129 -b scale, --space-scale=scale
130 Unit/scale for space (byte) metrics, possible values include
131 bytes, Kbytes, KB, Mbytes, MB, and so forth. This option will not
132 override possible per-metric specifications. See also pmParseU‐
133 nitsStr(3).
134
135 -B scale, --space-scale-force=scale
136 Like -b but this option will override per-metric specifications.
137
138 -c config, --config=config
139 Specify the config file or directory to use. In case config is a
140 directory all files in it ending .conf will be included. The de‐
141 fault is the first found of: ./pcp2spark.conf,
142 $HOME/.pcp2spark.conf, $HOME/pcp/pcp2spark.conf, and
143 $PCP_SYSCONF_DIR/pcp2spark.conf. For details, see the above sec‐
144 tion and pmrep.conf(5).
145
146 --container=container
147 Fetch performance metrics from the specified container, either lo‐
148 cal or remote (see -h).
149
150 -C, --check
151 Exit before reporting any values, but after parsing the configura‐
152 tion and metrics and printing possible headers.
153
154 --daemonize
155 Daemonize on startup.
156
157 -e derived, --derived=derived
158 Specify derived performance metrics. If derived starts with a
159 slash (``/'') or with a dot (``.'') it will be interpreted as a
160 PCP derived metrics configuration file, otherwise it will be in‐
161 terpreted as comma- or semicolon-separated derived metric expres‐
162 sions. For complete description of derived metrics and PCP de‐
163 rived metrics configuration files see pmLoadDerivedConfig(3) and
164 pmRegisterDerived(3). Alternatively, using pmrep.conf(5) configu‐
165 ration syntax allows defining derived metrics as part of metric‐
166 sets.
167
168 -g server, --spark-server=server
169 pcp2spark local server address.
170
171 -G, --no-globals
172 Do not include global metrics in reporting (see pmrep.conf(5)).
173
174 -h host, --host=host
175 Fetch performance metrics from pmcd(1) on host, rather than from
176 the default localhost.
177
178 -H, --no-header
179 Do not print any headers.
180
181 -i instances, --instances=instances
182 Retrieve and report only the specified metric instances. By de‐
183 fault all instances, present and future, are reported.
184
185 Refer to pmrep(1) for complete description of this option.
186
187 -I, --ignore-incompat
188 Ignore incompatible metrics. By default incompatible metrics
189 (that is, their type is unsupported or they cannot be scaled as
190 requested) will cause pcp2spark to terminate with an error mes‐
191 sage. With this option all incompatible metrics are silently
192 omitted from reporting. This may be especially useful when re‐
193 questing non-leaf nodes of the PMNS tree for reporting.
194
195 -j, --live-filter
196 Perform instance live filtering. This allows capturing all named
197 instances even if processes are restarted at some point (unlike
198 without live filtering). Performing live filtering over a huge
199 number of instances will add some internal overhead so a bit of
200 user caution is advised. See also -n.
201
202 -J rank, --rank=rank
203 Limit results to highest/lowest ranked instances of set-valued
204 metrics. A positive integer will include highest valued instances
205 in reporting. A negative integer will include lowest valued in‐
206 stances in reporting. A value of zero performs no ranking. Rank‐
207 ing does not imply sorting, see -6. See also -8.
208
209 -K spec, --spec-local=spec
210 When fetching metrics from a local context (see -L), the -K option
211 may be used to control the DSO PMDAs that should be made accessi‐
212 ble. The spec argument conforms to the syntax described in pm‐
213 SpecLocalPMDA(3). More than one -K option may be used.
214
215 -L, --local-PMDA
216 Use a local context to collect metrics from DSO PMDAs on the local
217 host without PMCD. See also -K.
218
219 -m, --include-labels
220 Include PCP metric labels in the output.
221
222 -n, --invert-filter
223 Perform ranking before live filtering. By default instance live
224 filtering (when requested, see -j) happens before instance ranking
225 (when requested, see -J). With this option the logic is inverted
226 and ranking happens before live filtering.
227
228 -N predicate, --predicate=predicate
229 Specify a comma-separated list of predicate filter reference met‐
230 rics. By default ranking (see -J) happens for each metric indi‐
231 vidually. With predicates, ranking is done only for the specified
232 predicate metrics. When reporting, rest of the metrics sharing
233 the same instance domain (see PCPIntro(1)) as the predicate will
234 include only the highest/lowest ranking instances of the corre‐
235 sponding predicate. Ranking does not imply sorting, see -6.
236
237 So for example, using proc.memory.rss (resident memory size of
238 process) as the predicate metric together with proc.io.total_bytes
239 and mem.util.used as metrics to be reported, only the processes
240 using most/least (as per -J) memory will be included when report‐
241 ing total bytes written by processes. Since mem.util.used is a
242 single-valued metric (thus not sharing the same instance domain as
243 the process related metrics), it will be reported as usual.
244
245 -O origin, --origin=origin
246 When reporting archived metrics, start reporting at origin within
247 the time window (see -S and -T). Refer to PCPIntro(1) for a com‐
248 plete description of the syntax for origin.
249
250 -p port, --spark-port=port
251 pcp2spark local port.
252
253 -P precision, --precision=precision
254 Use precision for numeric non-integer output values. The default
255 is to use 3 decimal places (when applicable). This option will
256 not override possible per-metric specifications.
257
258 -q scale, --count-scale=scale
259 Unit/scale for count metrics, possible values include count x
260 10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
261 10^7. (These values are currently space-sensitive.) This option
262 will not override possible per-metric specifications. See also
263 pmParseUnitsStr(3).
264
265 -Q scale, --count-scale-force=scale
266 Like -q but this option will override per-metric specifications.
267
268 -r, --raw
269 Output raw metric values, do not convert cumulative counters to
270 rates. This option will override possible per-metric specifica‐
271 tions.
272
273 -R, --raw-prefer
274 Like -r but this option will not override per-metric specifica‐
275 tions.
276
277 -s samples, --samples=samples
278 The samples argument defines the number of samples to be retrieved
279 and reported. If samples is 0 or -s is not specified, pcp2spark
280 will sample and report continuously (in real time mode) or until
281 the end of the set of PCP archives (in archive mode). See also
282 -T.
283
284 -S starttime, --start=starttime
285 When reporting archived metrics, the report will be restricted to
286 those records logged at or after starttime. Refer to PCPIntro(1)
287 for a complete description of the syntax for starttime.
288
289 -t interval, --interval=interval
290 Set the reporting interval to something other than the default 1
291 second. The interval argument follows the syntax described in
292 PCPIntro(1), and in the simplest form may be an unsigned integer
293 (the implied units in this case are seconds). See also the -T op‐
294 tion.
295
296 -T endtime, --finish=endtime
297 When reporting archived metrics, the report will be restricted to
298 those records logged before or at endtime. Refer to PCPIntro(1)
299 for a complete description of the syntax for endtime.
300
301 When used to define the runtime before pcp2spark will exit, if no
302 samples is given (see -s) then the number of reported samples de‐
303 pends on interval (see -t). If samples is given then interval
304 will be adjusted to allow reporting of samples during runtime. In
305 case all of -T, -s, and -t are given, endtime determines the ac‐
306 tual time pcp2spark will run.
307
308 -v, --omit-flat
309 Report only set-valued metrics with instances (e.g. disk.dev.read)
310 and omit single-valued ``flat'' metrics without instances (e.g.
311 kernel.all.sysfork). See -i and -I.
312
313 -V, --version
314 Display version number and exit.
315
316 -y scale, --time-scale=scale
317 Unit/scale for time metrics, possible values include nanosec, ns,
318 microsec, us, millisec, ms, and so forth up to hour, hr. This op‐
319 tion will not override possible per-metric specifications. See
320 also pmParseUnitsStr(3).
321
322 -Y scale, --time-scale-force=scale
323 Like -y but this option will override per-metric specifications.
324
325 -?, --help
326 Display usage message and exit.
327
329 pcp2spark.conf
330 pcp2spark configuration file (see -c)
331
332 $PCP_SYSCONF_DIR/pmrep/*.conf
333 system provided default pmrep configuration files
334
336 Environment variables with the prefix PCP_ are used to parameterize the
337 file and directory names used by PCP. On each installation, the file
338 /etc/pcp.conf contains the local values for these variables. The
339 $PCP_CONF variable may be used to specify an alternative configuration
340 file, as described in pcp.conf(5).
341
342 For environment variables affecting PCP tools, see pmGetOptions(3).
343
345 PCPIntro(1), mkaf(1), pcp(1), pcp2elasticsearch(1), pcp2graphite(1),
346 pcp2influxdb(1), pcp2json(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1),
347 pmcd(1), pminfo(1), pmrep(1), pmGetOptions(3), pmLoadDerivedConfig(3),
348 pmParseUnitsStr(3), pmRegisterDerived(3), pmSpecLocalPMDA(3), LOGA‐
349 RCHIVE(5), pcp.conf(5), pmrep.conf(5) and PMNS(5).
350
351
352
353Performance Co-Pilot PCP PCP2SPARK(1)