1PCP2SPARK(1) General Commands Manual PCP2SPARK(1)
2
3
4
6 pcp2spark - pcp-to-spark metrics exporter
7
9 pcp2spark [-5CGHIjLmnrRvV?] [-4 action] [-8|-9 limit] [-a archive] [-A
10 align] [--archive-folio folio] [-b|-B space-scale] [-c config] [--con‐
11 tainer container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12 instances] [-J rank] [-K spec] [-N predicate] [-O origin] [-p port]
13 [-P|-0 precision] [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14 interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15
17 pcp2spark is a customizable performance metrics exporter tool from PCP
18 to Apache Spark. Any available performance metric, live or archived,
19 system and/or application, can be selected for exporting using either
20 command line arguments or a configuration file.
21
22 pcp2spark acts as a bridge which provides a network socket stream on a
23 given address/port which an Apache Spark worker task can connect to and
24 pull the configured PCP metrics from pcp2spark exporting them using the
25 streaming extensions of the Apache Spark API.
26
27 pcp2spark is a close relative of pmrep(1). Refer to pmrep(1) for the
28 metricspec description accepted on pcp2spark command line. See pm‐
29 rep.conf(5) for description of the pcp2spark.conf configuration file
30 overall syntax. This page describes pcp2spark specific options and
31 configuration file differences with pmrep.conf(5). pmrep(1) also lists
32 some usage examples of which most are applicable with pcp2spark as
33 well.
34
35 Only the command line options listed on this page are supported, other
36 options recognized by pmrep(1) are not supported.
37
38 Options via environment values (see pmGetOptions(3)) override the cor‐
39 responding built-in default values (if any). Configuration file op‐
40 tions override the corresponding environment variables (if any). Com‐
41 mand line options override the corresponding configuration file options
42 (if any).
43
45 A general setup for making use of pcp2spark would involve the user con‐
46 figuring pcp2spark for the PCP metrics to export followed by starting
47 the pcp2spark application. The pcp2spark application will then wait and
48 listen on the given address/port for a connection from an Apache Spark
49 worker thread to be started. The worker thread will then connect to
50 pcp2spark.
51
52 When an Apache Spark worker thread has connected pcp2spark will begin
53 streaming PCP metric data to Apache Spark until the worker thread com‐
54 pletes or the connection is interrupted. If the connectionis inter‐
55 rupted or the socket is closed from the Apache Spark worker thread
56 pcp2spark will exit.
57
58 For an example Apache Spark worker job which will connect to an
59 pcp2spark instance on a given address/port and pull in PCP metric data
60 see the example provided in the PCP examples directory for pcp2spark
61 (often provided by the PCP development package) or the online version
62 at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.
63
65 pcp2spark uses a configuration file with overall syntax described in
66 pmrep.conf(5). The following options are common with pmrep.conf: ver‐
67 sion, source, speclocal, derived, header, globals, samples, interval,
68 type, type_prefer, ignore_incompat, names_change, instances, live_fil‐
69 ter, rank, limit_filter, limit_filter_force, invert_filter, predicate,
70 omit_flat, include_labels, precision, precision_force, count_scale,
71 count_scale_force, space_scale, space_scale_force, time_scale,
72 time_scale_force. The output option is recognized but ignored for pm‐
73 rep.conf compatibility.
74
75 pcp2spark specific options
76 spark_server (string)
77 Specify the address on which pcp2spark will listen for connections
78 from an Apache Spark worker thread. Corresponding command line op‐
79 tion is -g. Default is 127.0.0.1.
80
81 spark_port (integer)
82 Specify the port to run pcp2spark on. Corresponding command line
83 option is -p. Default is 44325.
84
86 The available command line options are:
87
88 -0 precision, --precision-force=precision
89 Like -P but this option will override per-metric specifications.
90
91 -4 action, --names-change=action
92 Specify which action to take on receiving a metric names change
93 event during sampling. These events occur when a PMDA discovers
94 new metrics sometime after starting up, and informs running client
95 tools like pcp2spark. Valid values for action are update (refresh
96 metrics being sampled), ignore (do nothing - the default behav‐
97 iour) and abort (exit the program if such an event happens).
98
99 -5, --ignore-unknown
100 Silently ignore any metric name that cannot be resolved. At least
101 one metric must be found for the tool to start.
102
103 -8 limit, --limit-filter=limit
104 Limit results to instances with values above/below limit. A posi‐
105 tive integer will include instances with values at or above the
106 limit in reporting. A negative integer will include instances
107 with values at or below the limit in reporting. A value of zero
108 performs no limit filtering. This option will not override possi‐
109 ble per-metric specifications. See also -J and -N.
110
111 -9 limit, --limit-filter-force=limit
112 Like -8 but this option will override per-metric specifications.
113
114 -a archive, --archive=archive
115 Performance metric values are retrieved from the set of Perfor‐
116 mance Co-Pilot (PCP) archive log files identified by the archive
117 argument, which is a comma-separated list of names, each of which
118 may be the base name of an archive or the name of a directory con‐
119 taining one or more archives.
120
121 -A align, --align=align
122 Force the initial sample to be aligned on the boundary of a natu‐
123 ral time unit align. Refer to PCPIntro(1) for a complete descrip‐
124 tion of the syntax for align.
125
126 --archive-folio=folio
127 Read metric source archives from the PCP archive folio created by
128 tools like pmchart(1) or, less often, manually with mkaf(1).
129
130 -b scale, --space-scale=scale
131 Unit/scale for space (byte) metrics, possible values include
132 bytes, Kbytes, KB, Mbytes, MB, and so forth. This option will not
133 override possible per-metric specifications. See also pmParseU‐
134 nitsStr(3).
135
136 -B scale, --space-scale-force=scale
137 Like -b but this option will override per-metric specifications.
138
139 -c config, --config=config
140 Specify the config file or directory to use. In case config is a
141 directory all files under it ending .conf will be included. The
142 default is the first found of: ./pcp2spark.conf,
143 $HOME/.pcp2spark.conf, $HOME/pcp/pcp2spark.conf, and
144 $PCP_SYSCONF_DIR/pcp2spark.conf. For details, see the above sec‐
145 tion and pmrep.conf(5).
146
147 --container=container
148 Fetch performance metrics from the specified container, either lo‐
149 cal or remote (see -h).
150
151 -C, --check
152 Exit before reporting any values, but after parsing the configura‐
153 tion and metrics and printing possible headers.
154
155 --daemonize
156 Daemonize on startup.
157
158 -e derived, --derived=derived
159 Specify derived performance metrics. If derived starts with a
160 slash (``/'') or with a dot (``.'') it will be interpreted as a
161 derived metrics configuration file, otherwise it will be inter‐
162 preted as comma- or semicolon-separated derived metric expres‐
163 sions. For details see pmLoadDerivedConfig(3) and pmRegister‐
164 Derived(3).
165
166 -g server, --spark-server=server
167 Spark server to send the metrics to.
168
169 -G, --no-globals
170 Do not include global metrics in reporting (see pmrep.conf(5)).
171
172 -h host, --host=host
173 Fetch performance metrics from pmcd(1) on host, rather than from
174 the default localhost.
175
176 -H, --no-header
177 Do not print any headers.
178
179 -i instances, --instances=instances
180 Retrieve and report only the specified metric instances. By de‐
181 fault all instances, present and future, are reported.
182
183 Refer to pmrep(1) for complete description of this option.
184
185 -I, --ignore-incompat
186 Ignore incompatible metrics. By default incompatible metrics
187 (that is, their type is unsupported or they cannot be scaled as
188 requested) will cause pcp2spark to terminate with an error mes‐
189 sage. With this option all incompatible metrics are silently
190 omitted from reporting. This may be especially useful when re‐
191 questing non-leaf nodes of the PMNS tree for reporting.
192
193 -j, --live-filter
194 Perform instance live filtering. This allows capturing all named
195 instances even if processes are restarted at some point (unlike
196 without live filtering). Performing live filtering over a huge
197 number of instances will add some internal overhead so a bit of
198 user caution is advised. See also -n.
199
200 -J rank, --rank=rank
201 Limit results to highest/lowest ranked instances of set-valued
202 metrics. A positive integer will include highest valued instances
203 in reporting. A negative integer will include lowest valued in‐
204 stances in reporting. A value of zero performs no ranking. Rank‐
205 ing does not imply sorting, see -6. See also -8.
206
207 -K spec, --spec-local=spec
208 When fetching metrics from a local context (see -L), the -K option
209 may be used to control the DSO PMDAs that should be made accessi‐
210 ble. The spec argument conforms to the syntax described in pm‐
211 SpecLocalPMDA(3). More than one -K option may be used.
212
213 -L, --local-PMDA
214 Use a local context to collect metrics from DSO PMDAs on the local
215 host without PMCD. See also -K.
216
217 -m, --include-labels
218 Include metric labels in the output.
219
220 -n, --invert-filter
221 Perform ranking before live filtering. By default instance live
222 filtering (when requested, see -j) happens before instance ranking
223 (when requested, see -J). With this option the logic is inverted
224 and ranking happens before live filtering.
225
226 -N predicate, --predicate=predicate
227 Specify a comma-separated list of predicate filter reference met‐
228 rics. By default ranking (see -J) happens for each metric indi‐
229 vidually. With predicates, ranking is done only for the specified
230 predicate metrics. When reporting, rest of the metrics sharing
231 the same instance domain (see PCPIntro(1)) as the predicate will
232 include only the highest/lowest ranking instances of the corre‐
233 sponding predicate. Ranking does not imply sorting, see -6.
234
235 So for example, using proc.memory.rss (resident memory size of
236 process) as the predicate metric together with proc.io.total_bytes
237 and mem.util.used as metrics to be reported, only the processes
238 using most/least (as per -J) memory will be included when report‐
239 ing total bytes written by processes. Since mem.util.used is a
240 single-valued metric (thus not sharing the same instance domain as
241 the process related metrics), it will be reported as usual.
242
243 -O origin, --origin=origin
244 When reporting archived metrics, start reporting at origin within
245 the time window (see -S and -T). Refer to PCPIntro(1) for a com‐
246 plete description of the syntax for origin.
247
248 -p port, --spark-port=port
249 Spark server port.
250
251 -P precision, --precision=precision
252 Use precision for numeric non-integer output values. The default
253 is to use 3 decimal places (when applicable). This option will
254 not override possible per-metric specifications.
255
256 -q scale, --count-scale=scale
257 Unit/scale for count metrics, possible values include count x
258 10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
259 10^7. (These values are currently space-sensitive.) This option
260 will not override possible per-metric specifications. See also
261 pmParseUnitsStr(3).
262
263 -Q scale, --count-scale-force=scale
264 Like -q but this option will override per-metric specifications.
265
266 -r, --raw
267 Output raw metric values, do not convert cumulative counters to
268 rates. This option will override possible per-metric specifica‐
269 tions.
270
271 -R, --raw-prefer
272 Like -r but this option will not override per-metric specifica‐
273 tions.
274
275 -s samples, --samples=samples
276 The samples argument defines the number of samples to be retrieved
277 and reported. If samples is 0 or -s is not specified, pcp2spark
278 will sample and report continuously (in real time mode) or until
279 the end of the set of PCP archives (in archive mode). See also
280 -T.
281
282 -S starttime, --start=starttime
283 When reporting archived metrics, the report will be restricted to
284 those records logged at or after starttime. Refer to PCPIntro(1)
285 for a complete description of the syntax for starttime.
286
287 -t interval, --interval=interval
288 Set the reporting interval to something other than the default 1
289 second. The interval argument follows the syntax described in
290 PCPIntro(1), and in the simplest form may be an unsigned integer
291 (the implied units in this case are seconds). See also the -T op‐
292 tion.
293
294 -T endtime, --finish=endtime
295 When reporting archived metrics, the report will be restricted to
296 those records logged before or at endtime. Refer to PCPIntro(1)
297 for a complete description of the syntax for endtime.
298
299 When used to define the runtime before pcp2spark will exit, if no
300 samples is given (see -s) then the number of reported samples de‐
301 pends on interval (see -t). If samples is given then interval
302 will be adjusted to allow reporting of samples during runtime. In
303 case all of -T, -s, and -t are given, endtime determines the ac‐
304 tual time pcp2spark will run.
305
306 -v, --omit-flat
307 Report only set-valued metrics with instances (e.g. disk.dev.read)
308 and omit single-valued ``flat'' metrics without instances (e.g.
309 kernel.all.sysfork). See -i and -I.
310
311 -V, --version
312 Display version number and exit.
313
314 -y scale, --time-scale=scale
315 Unit/scale for time metrics, possible values include nanosec, ns,
316 microsec, us, millisec, ms, and so forth up to hour, hr. This op‐
317 tion will not override possible per-metric specifications. See
318 also pmParseUnitsStr(3).
319
320 -Y scale, --time-scale-force=scale
321 Like -y but this option will override per-metric specifications.
322
323 -?, --help
324 Display usage message and exit.
325
327 pcp2spark.conf
328 pcp2spark configuration file (see -c)
329
331 Environment variables with the prefix PCP_ are used to parameterize the
332 file and directory names used by PCP. On each installation, the file
333 /etc/pcp.conf contains the local values for these variables. The
334 $PCP_CONF variable may be used to specify an alternative configuration
335 file, as described in pcp.conf(5).
336
337 For environment variables affecting PCP tools, see pmGetOptions(3).
338
340 mkaf(1), PCPIntro(1), pcp(1), pcp2elasticsearch(1), pcp2graphite(1),
341 pcp2influxdb(1), pcp2json(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1),
342 pmcd(1), pminfo(1), pmrep(1), pmGetOptions(3), pmSpecLocalPMDA(3), pm‐
343 LoadDerivedConfig(3), pmParseUnitsStr(3), pmRegisterDerived(3), LOGA‐
344 RCHIVE(5), pcp.conf(5), PMNS(5) and pmrep.conf(5).
345
346
347
348Performance Co-Pilot PCP PCP2SPARK(1)