1PCP2SPARK(1) General Commands Manual PCP2SPARK(1)
2
3
4
6 pcp2spark - pcp-to-spark metrics exporter
7
9 pcp2spark [-5CGHIjLmnrRvV?] [-4 action] [-8|-9 limit] [-a archive] [-A
10 align] [--archive-folio folio] [-b|-B space-scale] [-c config] [--con‐
11 tainer container] [--daemonize] [-e derived] [-g server] [-h host] [-i
12 instances] [-J rank] [-K spec] [-N predicate] [-O origin] [-p port]
13 [-P|-0 precision] [-q|-Q count-scale] [-s samples] [-S starttime] [-t
14 interval] [-T endtime] [-y|-Y time-scale] metricspec [...]
15
17 pcp2spark is a customizable performance metrics exporter tool from PCP
18 to Apache Spark. Any available performance metric, live or archived,
19 system and/or application, can be selected for exporting using either
20 command line arguments or a configuration file.
21
22 pcp2spark acts as a bridge which provides a network socket stream on a
23 given address/port which an Apache Spark worker task can connect to and
24 pull the configured PCP metrics from pcp2spark exporting them using the
25 streaming extensions of the Apache Spark API.
26
27 pcp2spark is a close relative of pmrep(1). Please refer to pmrep(1)
28 for the metricspec description accepted on pcp2spark command line and
29 pmrep.conf(5) for description of the pcp2spark.conf configuration file
30 overall syntax, this page describes pcp2spark specific options and con‐
31 figuration file differences with pmrep.conf(5). pmrep(1) also lists
32 some usage examples of which most are applicable with pcp2spark as
33 well.
34
35 Only the command line options listed on this page are supported, other
36 options recognized by pmrep(1) are not supported.
37
38 Options via environment values (see pmGetOptions(3)) override the cor‐
39 responding built-in default values (if any). Configuration file
40 options override the corresponding environment variables (if any).
41 Command line options override the corresponding configuration file
42 options (if any).
43
45 A general setup for making use of pcp2spark would involve the user con‐
46 figuring pcp2spark for the PCP metrics to export followed by starting
47 the pcp2spark application. The pcp2spark application will then wait and
48 listen on the given address/port for a connection from an Apache Spark
49 worker thread to be started. The worker thread will then connect to
50 pcp2spark.
51
52 When an Apache Spark worker thread has connected pcp2spark will begin
53 streaming PCP metric data to Apache Spark until the worker thread com‐
54 pletes or the connection is interrupted. If the connectionis inter‐
55 rupted or the socket is closed from the Apache Spark worker thread
56 pcp2spark will exit.
57
58 For an example Apache Spark worker job which will connect to an
59 pcp2spark instance on a given address/port and pull in PCP metric data
60 please see the example provided in the PCP examples directory for
61 pcp2spark (often provided by the PCP development package) or the online
62 version at https://github.com/performancecopilot/pcp/blob/mas‐
63 ter/src/pcp2spark/.
64
66 pcp2spark uses a configuration file with overall syntax described in
67 pmrep.conf(5). The following options are common with pmrep.conf: ver‐
68 sion, source, speclocal, derived, header, globals, samples, interval,
69 type, type_prefer, ignore_incompat, names_change, instances, live_fil‐
70 ter, rank, limit_filter, limit_filter_force, invert_filter, predicate,
71 omit_flat, include_labels, precision, precision_force, count_scale,
72 count_scale_force, space_scale, space_scale_force, time_scale,
73 time_scale_force. The output option is recognized but ignored for
74 pmrep.conf compatibility.
75
76 pcp2spark specific options
77 spark_server (string)
78 Specify the address on which pcp2spark will listen for connections
79 from an Apache Spark worker thread. Corresponding command line
80 option is -g. Default is 127.0.0.1.
81
82 spark_port (integer)
83 Specify the port to run pcp2spark on. Corresponding command line
84 option is -p. Default is 44325.
85
87 The available command line options are:
88
89 -0 precision, --precision-force=precision
90 Like -P but this option will override per-metric specifications.
91
92 -4 action, --names-change=action
93 Specify which action to take on receiving a metric names change
94 event during sampling. These events occur when a PMDA discovers
95 new metrics sometime after starting up, and informs running client
96 tools like pcp2spark. Valid values for action are update (refresh
97 metrics being sampled), ignore (do nothing - the default behav‐
98 iour) and abort (exit the program if such an event happens).
99
100 -5, --ignore-unknown
101 Silently ignore any metric name that cannot be resolved. At least
102 one metric must be found for the tool to start.
103
104 -8 limit, --limit-filter=limit
105 Limit results to instances with values above/below limit. A posi‐
106 tive integer will include instances with values at or above the
107 limit in reporting. A negative integer will include instances
108 with values at or below the limit in reporting. A value of zero
109 performs no limit filtering. This option will not override possi‐
110 ble per-metric specifications. See also -J and -N.
111
112 -9 limit, --limit-filter-force=limit
113 Like -8 but this option will override per-metric specifications.
114
115 -a archive, --archive=archive
116 Performance metric values are retrieved from the set of Perfor‐
117 mance Co-Pilot (PCP) archive log files identified by the archive
118 argument, which is a comma-separated list of names, each of which
119 may be the base name of an archive or the name of a directory con‐
120 taining one or more archives.
121
122 -A align, --align=align
123 Force the initial sample to be aligned on the boundary of a natu‐
124 ral time unit align. Refer to PCPIntro(1) for a complete descrip‐
125 tion of the syntax for align.
126
127 --archive-folio=folio
128 Read metric source archives from the PCP archive folio created by
129 tools like pmchart(1) or, less often, manually with mkaf(1).
130
131 -b scale, --space-scale=scale
132 Unit/scale for space (byte) metrics, possible values include
133 bytes, Kbytes, KB, Mbytes, MB, and so forth. This option will not
134 override possible per-metric specifications. See also pmParseU‐
135 nitsStr(3).
136
137 -B scale, --space-scale-force=scale
138 Like -b but this option will override per-metric specifications.
139
140 -c config, --config=config
141 Specify the config file to use. The default is the first found
142 of: ./pcp2spark.conf, $HOME/.pcp2spark.conf,
143 $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.
144 For details, see the above section and pmrep.conf(5).
145
146 --container=container
147 Fetch performance metrics from the specified container, either
148 local or remote (see -h).
149
150 -C, --check
151 Exit before reporting any values, but after parsing the configura‐
152 tion and metrics and printing possible headers.
153
154 --daemonize
155 Daemonize on startup.
156
157 -e derived, --derived=derived
158 Specify derived performance metrics. If derived starts with a
159 slash (``/'') or with a dot (``.'') it will be interpreted as a
160 derived metrics configuration file, otherwise it will be inter‐
161 preted as comma- or semicolon-separated derived metric expres‐
162 sions. For details see pmLoadDerivedConfig(3) and pmRegister‐
163 Derived(3).
164
165 -g server, --spark-server=server
166 Spark server to send the metrics to.
167
168 -G, --no-globals
169 Do not include global metrics in reporting (see pmrep.conf(5)).
170
171 -h host, --host=host
172 Fetch performance metrics from pmcd(1) on host, rather than from
173 the default localhost.
174
175 -H, --no-header
176 Do not print any headers.
177
178 -i instances, --instances=instances
179 Report only the listed instances from current instances (if
180 present, see also -j). By default all instances, present and
181 future, are reported. This is a global option that is used for
182 all metrics unless a metric-specific instance definition is pro‐
183 vided as part of a metricspec. By default single-valued ``flat''
184 metrics without multiple instances are still reported as usual,
185 use -v to change this. Please refer to pmrep(1) for more details
186 on this option.
187
188 -I, --ignore-incompat
189 Ignore incompatible metrics. By default incompatible metrics
190 (that is, their type is unsupported or they cannot be scaled as
191 requested) will cause pcp2spark to terminate with an error mes‐
192 sage. With this option all incompatible metrics are silently
193 omitted from reporting. This may be especially useful when
194 requesting non-leaf nodes of the PMNS tree for reporting.
195
196 -j, --live-filter
197 Perform instance live filtering. This allows capturing all fil‐
198 tered instances even if processes are restarted at some point
199 (unlike without live filtering). Performing live filtering over a
200 huge amount of instances will add some internal overhead so a bit
201 of user caution is advised. See also -n.
202
203 -J rank, --rank=rank
204 Limit results to highest/lowest ranked instances of set-valued
205 metrics. A positive integer will include highest valued instances
206 in reporting. A negative integer will include lowest valued
207 instances in reporting. A value of zero performs no ranking.
208 Ranking does not imply sorting, see -6. See also -8.
209
210 -K spec, --spec-local=spec
211 When fetching metrics from a local context (see -L), the -K option
212 may be used to control the DSO PMDAs that should be made accessi‐
213 ble. The spec argument conforms to the syntax described in
214 pmSpecLocalPMDA(3). More than one -K option may be used.
215
216 -L, --local-PMDA
217 Use a local context to collect metrics from DSO PMDAs on the local
218 host without PMCD. See also -K.
219
220 -m, --include-labels
221 Include metric labels in the output.
222
223 -n, --invert-filter
224 Perform ranking before live filtering. By default instance live
225 filtering (when requested, see -j) happens before instance ranking
226 (when requested, see -J). With this option the logic is inverted
227 and ranking happens before live filtering.
228
229 -N predicate, --predicate=predicate
230 Specify a comma-separated list of predicate filter reference met‐
231 rics. By default ranking (see -J) happens for each metric indi‐
232 vidually. With predicates, ranking is done only for the specified
233 predicate metrics. When reporting, rest of the metrics sharing
234 the same instance domain (see PCPIntro(1)) as the predicate will
235 include only the highest/lowest ranking instances of the corre‐
236 sponding predicate. Ranking does not imply sorting, see -6.
237
238 So for example, using proc.memory.rss (resident memory size of
239 process) as the predicate metric together with proc.io.total_bytes
240 and mem.util.used as metrics to be reported, only the processes
241 using most/least (as per -J) memory will be included when report‐
242 ing total bytes written by processes. Since mem.util.used is a
243 single-valued metric (thus not sharing the same instance domain as
244 the process-related metrics), it will be reported as usual.
245
246 -O origin, --origin=origin
247 When reporting archived metrics, start reporting at origin within
248 the time window (see -S and -T). Refer to PCPIntro(1) for a com‐
249 plete description of the syntax for origin.
250
251 -p port, --spark-port=port
252 Spark server port.
253
254 -P precision, --precision=precision
255 Use precision for numeric non-integer output values. The default
256 is to use 3 decimal places (when applicable). This option will
257 not override possible per-metric specifications.
258
259 -q scale, --count-scale=scale
260 Unit/scale for count metrics, possible values include count x
261 10^-1, count, count x 10, count x 10^2, and so forth from 10^-8 to
262 10^7. (These values are currently space-sensitive.) This option
263 will not override possible per-metric specifications. See also
264 pmParseUnitsStr(3).
265
266 -Q scale, --count-scale-force=scale
267 Like -q but this option will override per-metric specifications.
268
269 -r, --raw
270 Output raw metric values, do not convert cumulative counters to
271 rates. This option will override possible per-metric specifica‐
272 tions.
273
274 -R, --raw-prefer
275 Like -r but this option will not override per-metric specifica‐
276 tions.
277
278 -s samples, --samples=samples
279 The samples argument defines the number of samples to be retrieved
280 and reported. If samples is 0 or -s is not specified, pcp2spark
281 will sample and report continuously (in real time mode) or until
282 the end of the set of PCP archives (in archive mode). See also
283 -T.
284
285 -S starttime, --start=starttime
286 When reporting archived metrics, the report will be restricted to
287 those records logged at or after starttime. Refer to PCPIntro(1)
288 for a complete description of the syntax for starttime.
289
290 -t interval, --interval=interval
291 Set the reporting interval to something other than the default 1
292 second. The interval argument follows the syntax described in
293 PCPIntro(1), and in the simplest form may be an unsigned integer
294 (the implied units in this case are seconds). See also the -T
295 option.
296
297 -T endtime, --finish=endtime
298 When reporting archived metrics, the report will be restricted to
299 those records logged before or at endtime. Refer to PCPIntro(1)
300 for a complete description of the syntax for endtime.
301
302 When used to define the runtime before pcp2spark will exit, if no
303 samples is given (see -s) then the number of reported samples
304 depends on interval (see -t). If samples is given then interval
305 will be adjusted to allow reporting of samples during runtime. In
306 case all of -T, -s, and -t are given, endtime determines the
307 actual time pcp2spark will run.
308
309 -v, --omit-flat
310 Omit single-valued ``flat'' metrics from reporting, only consider
311 set-valued metrics (i.e., metrics with multiple values) for
312 reporting. See -i and -I.
313
314 -V, --version
315 Display version number and exit.
316
317 -y scale, --time-scale=scale
318 Unit/scale for time metrics, possible values include nanosec, ns,
319 microsec, us, millisec, ms, and so forth up to hour, hr. This
320 option will not override possible per-metric specifications. See
321 also pmParseUnitsStr(3).
322
323 -Y scale, --time-scale-force=scale
324 Like -y but this option will override per-metric specifications.
325
326 -?, --help
327 Display usage message and exit.
328
330 pcp2spark.conf
331 pcp2spark configuration file (see -c)
332
334 Environment variables with the prefix PCP_ are used to parameterize the
335 file and directory names used by PCP. On each installation, the file
336 /etc/pcp.conf contains the local values for these variables. The
337 $PCP_CONF variable may be used to specify an alternative configuration
338 file, as described in pcp.conf(5).
339
340 For environment variables affecting PCP tools, see pmGetOptions(3).
341
343 mkaf(1), PCPIntro(1), pcp(1), pcp2elasticsearch(1), pcp2graphite(1),
344 pcp2influxdb(1), pcp2json(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1),
345 pmcd(1), pminfo(1), pmrep(1), pmGetOptions(3), pmSpecLocalPMDA(3),
346 pmLoadDerivedConfig(3), pmParseUnitsStr(3), pmRegisterDerived(3), LOGA‐
347 RCHIVE(5), pcp.conf(5), PMNS(5) and pmrep.conf(5).
348
349
350
351Performance Co-Pilot PCP PCP2SPARK(1)