1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdeFfPqvVWxXz?] [-a archive] [-A align] [-c filename] [-h
10 host] [-l logfile] [-m note] [-j stompfile] [-n pmnsfile] [-o format]
11 [-O offset] [-S starttime] [-t interval] [-T endtime] [-U username] [-Z
12 timezone] [filename ...]
13
15 pmie accepts a collection of arithmetic, logical, and rule expressions
16 to be evaluated at specified frequencies. The base data for the ex‐
17 pressions consists of performance metrics values delivered in real-time
18 from any host running the Performance Metrics Collection Daemon (PMCD),
19 or using historical data from Performance Co-Pilot (PCP) archives.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 Output from pmie is directed to standard output and standard error as
31 follows:
32
33 stdout
34 Expression values printed in the verbose -v mode and the output of
35 print actions.
36
37 stderr
38 Error and warning messages for any syntactic or semantic problems
39 during expression parsing, and any semantic or performance metrics
40 availability problems during expression evaluation.
41
43 The available command line options are:
44
45 -a archive, --archive=archive
46 archive which is a comma-separated list of names, each of which
47 may be the base name of an archive or the name of a directory con‐
48 taining one or more archives written by pmlogger(1). Multiple in‐
49 stances of the -a flag may appear on the command line to specify a
50 list of sets of archives. In this case, it is required that only
51 one set of archives be present for any one host. Also, any ex‐
52 plicit host names occurring in a pmie expression must match the
53 host name recorded in one of the archive labels. In the case of
54 multiple sets of archives, timestamps recorded in the archives are
55 used to ensure temporal consistency.
56
57 -A align, --align=align
58 Force the initial time window to be aligned on the boundary of a
59 natural time unit align. Refer to PCPIntro(1) for a complete de‐
60 scription of the syntax for align.
61
62 -b, --buffer
63 Output will be line buffered and standard output is attached to
64 standard error. This is most useful for background execution in
65 conjunction with the -l option. The -b option is always used for
66 pmie instances launched from pmie_check(1).
67
68 -c config, --config=config
69 An alternative to specifying filename at the end of the command
70 line.
71
72 -C, --check
73 Parse the configuration file(s) and exit before performing any
74 evaluations. Any errors in the configuration file are reported.
75
76 -d, --interact
77 Normally pmie would be launched as a non-interactive process to
78 monitor and manage the performance of one or more hosts. Given
79 the -d flag however, execution is interactive and the user is pre‐
80 sented with a menu of options. Interactive mode is useful mainly
81 for debugging new expressions.
82
83 -e, --timestamp
84 When used with -V, -v or -W, this option forces timestamps to be
85 reported with each expression. The timestamps are in ctime(3)
86 format, enclosed in parenthesis and appear after the expression
87 name and before the expression value, e.g.
88 expr_1 (Tue Feb 6 19:55:10 2001): 12
89
90 -f, --foreground
91 If the -l option is specified and there is no -a option (i.e.
92 real-time monitoring) then pmie is run as a daemon in the back‐
93 ground (in all other cases foreground is the default). The -f
94 (and -F, see below) options force pmie to be run in the fore‐
95 ground, independent of any other options.
96
97 -F, --systemd
98 Like -f, the -F option runs pmie in the foreground, but also does
99 some housekeeping (like create a pid file, change user id and no‐
100 tify systemd(1) when pmie has started or is shutting down). This
101 is intended for use when pmie is launched from systemd(1) and the
102 daemonizing has already been done. The -f and -F options are mu‐
103 tually exclusive.
104
105 -h host, --host=host
106 By default performance data is fetched from the local host (in
107 real-time mode) or the host for the first named set of archives on
108 the command line (in archive mode). The host argument overrides
109 this default. It does not override hosts explicitly named in the
110 expressions being evaluated. The host argument is interpreted as
111 a connection specification for pmNewContext, and is later mapped
112 to the remote pmcd's self-reported host name for reporting pur‐
113 poses. See also the %h vs. %c substitutions in rule action
114 strings below.
115
116 -j file
117 An alternative STOMP protocol configuration is loaded from stomp‐
118 file. If this option is not used, and the stomp action is used in
119 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
120 will be used.
121
122 -l logfile, --logfile=logfile
123 Standard error is sent to logfile.
124
125 -m note, --note=note
126 Used to indicate where pmie has been launched from, e.g.
127 pmie_check(1) and pmie_daily(1) use -m pmie_check and this is used
128 by pmie to determine if it needs to be restarted should the PMCD
129 hostname change, as described in the HOSTNAME CHANGES section be‐
130 low.
131
132 -n pmnsfile, --namespace=pmnsfile
133 An alternative Performance Metrics Name Space (PMNS) is loaded
134 from the file pmnsfile.
135
136 -o format, --format=format
137 When precessing performance data from an archive, the -o option
138 may be used to specify an alternate output format when a rule ac‐
139 tion is executed. See the DIFFERENCES IN HOST AND ARCHIVE MODES
140 section for a description of how the output format may be con‐
141 structed.
142
143 -O origin, --origin=origin
144 Specify the origin of the time window. See PCPIntro(1) for com‐
145 plete description of this option.
146
147 -P, --primary
148 Identifies this as the primary pmie instance for a host. See the
149 ``AUTOMATIC RESTART'' section below for further details.
150
151 -q, --quiet
152 Suppresses diagnostic messages that would be printed to standard
153 output by default, especially the "evaluator exiting" message as
154 this can confuse scripts.
155
156 -S starttime, --start=starttime
157 Specify the starttime of the time window. See PCPIntro(1) for
158 complete description of this option.
159
160 -t interval, --interval=interval
161 The interval argument follows the syntax described in PCPIntro(1),
162 and in the simplest form may be an unsigned integer (the implied
163 units in this case are seconds). The value is used to determine
164 the sample interval for expressions that do not explicitly set
165 their sample interval using the pmie variable delta described be‐
166 low. The default is 10.0 seconds.
167
168 -T endtime, --finish=endtime
169 Specify the endtime of the time window. See PCPIntro(1) for com‐
170 plete description of this option.
171
172 -U username, --username=username
173 User account under which to run pmie. The default is the current
174 user account for interactive use. When run as a daemon, the un‐
175 privileged "pcp" account is used in current versions of PCP, but
176 in older versions the superuser account ("root") was used by de‐
177 fault.
178
179 -v Unless one of the verbose options -V, -v or -W appears on the com‐
180 mand line, expressions are evaluated silently, the only output is
181 as a result of any actions being executed. In the verbose mode,
182 specified using the -v flag, the value of each expression is
183 printed as it is evaluated. The values are in canonical units;
184 bytes in the dimension of ``space'', seconds in the dimension of
185 ``time'' and events in the dimension of ``count''. See pm‐
186 LookupDesc(3) for details of the supported dimension and scaling
187 mechanisms for performance metrics. The verbose mode is useful in
188 monitoring the value of given expressions, evaluating derived per‐
189 formance metrics, passing these values on to other tools for fur‐
190 ther processing and in debugging new expressions.
191
192 -V, --verbose
193 This option has the same effect as the -v option, except that the
194 name of the host and instance (if applicable) are printed as well
195 as expression values.
196
197 -W This option has the same effect as the -V option described above,
198 except that for boolean expressions, only those names and values
199 that make the expression true are printed. These are the same
200 names and values accessible to rule actions as the %h, %i, %c and
201 %v bindings, as described below.
202
203 -x, --secret-agent
204 Execute in domain agent mode. This mode is used within the Per‐
205 formance Co-Pilot product to derive values for summary metrics,
206 see pmdasummary(1). Only restricted functionality is available in
207 this mode (expressions with actions may not be used).
208
209 -X, --secret-applet
210 Run in secret applet mode (thin client).
211
212 -z, --hostzone
213 Change the reporting timezone to the timezone of the host that is
214 the source of the performance metrics, as identified via either
215 the -h option or the first named set of archives (as described
216 above for the -a option).
217
218 -Z timezone, --timezone=timezone
219 Change the reporting timezone to timezone in the format of the en‐
220 vironment variable TZ as described in environ(7).
221
222 -?, --help
223 Display usage message and exit.
224
226 The following example expressions demonstrate some of the capabilities
227 of the inference engine.
228
229 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
230 examples of pmie expressions.
231
232 The variable delta controls expression evaluation frequency. Specify
233 that subsequent expressions be evaluated once a second, until further
234 notice:
235
236 delta = 1 sec;
237
238 If the total context switch rate exceeds 10000 per second per CPU, then
239 display an alarm notifier:
240
241 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
242 -> alarm "high context switch rate %v";
243
244 If the high context switch rate is sustained for 10 consecutive sam‐
245 ples, then launch top(1) in an xterm(1) window to monitor processes,
246 but do this at most once every 5 minutes:
247
248 all_sample (
249 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
250 ) -> shell 5 min "xterm -e 'top'";
251
252 The following rules are evaluated once every 20 seconds:
253
254 delta = 20 sec;
255
256 If any disk is performing more than 60 I/Os per second, then print a
257 message identifying the busy disk to standard output and launch
258 dkvis(1):
259
260 some_inst (
261 disk.dev.total > 60 count/sec
262 ) -> print "busy disks:" " %i" &
263 shell 5 min "dkvis";
264
265 Refine the preceding rule to apply only between the hours of 9am and
266 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
267 before executing the action:
268
269 $hour >= 9 && $hour <= 17 &&
270 some_inst (
271 75 %_sample (
272 disk.dev.total @0..3 > 60 count/sec
273 )
274 ) -> print "disks busy for 20 sec:" " [%h]%i";
275
276 The following two rules are evaluated once every 10 minutes:
277
278 delta = 10 min;
279
280 If either the / or the /usr filesystem is more than 95% full, display
281 an alarm popup, but not if it has already been displayed during the
282 last 4 hours:
283
284 filesys.free #'/dev/root' /
285 filesys.capacity #'/dev/root' < 0.05
286 -> alarm 4 hour "root filesystem (almost) full";
287
288 filesys.free #'/dev/usr' /
289 filesys.capacity #'/dev/usr' < 0.05
290 -> alarm 4 hour "/usr filesystem (almost) full";
291
292 The following rule requires a machine that supports the lmsensors met‐
293 rics. If the machine environment temperature rises more than 2 degrees
294 over a 10 minute interval, write an entry in the system log:
295
296 lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
297 -> alarm "temperature rising fast" &
298 syslog "machine room temperature rise alarm";
299
300 And something interesting if you have performance problems with your
301 Oracle database:
302
303 // back to 30sec evaluations
304 delta = 30 sec;
305 sid = "ptg1"; # $ORACLE_SID setting
306 lid = "223"; # latch ID from v$latch
307 lru = "#'$sid/$lid cache buffers lru chain'";
308 host = ":moomba.melbourne.sgi.com";
309 gets = "oracle.latch.gets $host $lru";
310 total = "oracle.latch.gets $host $lru +
311 oracle.latch.misses $host $lru +
312 oracle.latch.immisses $host $lru";
313
314 $total > 100 && $gets / $total < 0.2
315 -> alarm "high lru latch contention in database $sid";
316
317 The following ruleset will emit exactly one message depending on the
318 availability and value of the 1-minute load average.
319
320 delta = 1 minute;
321 ruleset
322 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
323 print "extreme load average %v"
324 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
325 print "moderate load average %v"
326 unknown ->
327 print "load average unavailable"
328 otherwise ->
329 print "load average OK"
330 ;
331
332 The following rule will emit a message when some filesystem is more
333 than 75% full and is filling at a rate that if sustained would fill the
334 filesystem to 100% in less than 30 minutes.
335
336 some_inst (
337 100 * filesys.used / filesys.capacity > 75 &&
338 filesys.used + 30min * (rate filesys.used) > filesys.capacity
339 ) -> print "filesystem will be full within 30 mins:" " %i";
340
341 If the metric mypmda.errors counts errors then the following rule will
342 emit a message if the rate of errors exceeds 1 per second provided the
343 error count is less than 100.
344
345 mypmda.errors > 1 && instant mypmda.errors < 100
346 -> print "high error rate: %v";
347
349 The pmie specification language is powerful and large.
350
351 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
352 vides a facility for generating a pmie configuration file from a set of
353 generalized pmie rules. The supplied set of rules covers a wide range
354 of performance scenarios.
355
356 The Performance Co-Pilot User's and Administrator's Guide provides a
357 detailed tutorial-style chapter covering pmie.
358
360 This description is terse and informal. For a more comprehensive de‐
361 scription see the Performance Co-Pilot User's and Administrator's
362 Guide.
363
364 A pmie specification is a sequence of semicolon terminated expressions.
365
366 Basic operators are modeled on the arithmetic, relational and Boolean
367 operators of the C programming language. Precedence rules are as ex‐
368 pected, although the use of parentheses is encouraged to enhance read‐
369 ability and remove ambiguity.
370
371 Operands are performance metric names (see PMNS(5)) and the normal lit‐
372 eral constants.
373
374 Operands involving performance metrics may produce sets of values, as a
375 result of enumeration in the dimensions of hosts, instances and time.
376 Special qualifiers may appear after a performance metric name to define
377 the enumeration in each dimension. For example,
378
379 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
380
381 defines 6 values corresponding to the time spent executing in user mode
382 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
383 samples. The default interpretation in the absence of : (host), # (in‐
384 stance) and @ (time) qualifiers is all instances at the most recent
385 sample time for the default source of PCP performance metrics.
386
387 Host and instance names that do not follow the rules for variables in
388 programming languages, i.e. alphabetic optionally followed by alphanu‐
389 merics, should be enclosed in single quotes.
390
391 Expression evaluation follows the law of ``least surprises''. Where
392 performance metrics have the semantics of a counter, pmie will automat‐
393 ically convert to a rate based upon consecutive samples and the time
394 interval between these samples. All numeric expressions are evaluated
395 in double precision, and where appropriate, automatically scaled into
396 canonical units of ``bytes'', ``seconds'' and ``counts''.
397
398 A rule is a special form of expression that specifies a condition or
399 logical expression, a special operator (->) and actions to be performed
400 when the condition is found to be true.
401
402 The following table summarizes the basic pmie operators:
403
404 ┌────────────────┬────────────────────────────────────────────────┐
405 │ Operators │ Explanation │
406 ├────────────────┼────────────────────────────────────────────────┤
407 │+ - * / │ Arithmetic │
408 │< <= == >= > != │ Relational (value comparison) │
409 │! && || │ Boolean │
410 │-> │ Rule │
411 │rising │ Boolean, false to true transition │
412 │falling │ Boolean, true to false transition │
413 │rate │ Explicit rate conversion (rarely required) │
414 │instant │ No automatic rate conversion (rarely required) │
415 └────────────────┴────────────────────────────────────────────────┘
416 All operators are supported for numeric-valued operands and expres‐
417 sions. For string-valued operands, namely literal string constants en‐
418 closed in double quotes or metrics with a data type of string
419 (PM_TYPE_STRING), only the operators == and != are supported.
420
421 The rate and instant operators are the logical inverse of one another,
422 so an arithmetic expression expr is equal to rate instant expr. The
423 more useful cases involve using rate with a metric that is not a
424 counter to determine the rate of change over time or instant with a
425 metric that is a counter to determine if the current value is above or
426 below some threshold.
427
428 Aggregate operators may be used to aggregate or summarize along one di‐
429 mension of a set-valued expression. The following aggregate operators
430 map from a logical expression to a logical expression of lower dimen‐
431 sion.
432
433 ┌─────────────────────────┬─────────────┬──────────────────────────┐
434 │ Operators │ Type │ Explanation │
435 ├─────────────────────────┼─────────────┼──────────────────────────┤
436 │some_inst │ Existential │ True if at least one set │
437 │some_host │ │ member is true in the │
438 │some_sample │ │ associated dimension │
439 ├─────────────────────────┼─────────────┼──────────────────────────┤
440 │all_inst │ Universal │ True if all set members │
441 │all_host │ │ are true in the associ‐ │
442 │all_sample │ │ ated dimension │
443 ├─────────────────────────┼─────────────┼──────────────────────────┤
444 │N%_inst │ Percentile │ True if at least N per‐ │
445 │N%_host │ │ cent of set members are │
446 │N%_sample │ │ true in the associated │
447 │ │ │ dimension │
448 └─────────────────────────┴─────────────┴──────────────────────────┘
449 The following instantial operators may be used to filter or limit a
450 set-valued logical expression, based on regular expression matching of
451 instance names. The logical expression must be a set involving the di‐
452 mension of instances, and the regular expression is of the form used by
453 egrep(1) or the Extended Regular Expressions of regcomp(3).
454
455 ┌─────────────┬──────────────────────────────────────────┐
456 │ Operators │ Explanation │
457 ├─────────────┼──────────────────────────────────────────┤
458 │match_inst │ For each value of the logical expression │
459 │ │ that is ``true'', the result is ``true'' │
460 │ │ if the associated instance name matches │
461 │ │ the regular expression. Otherwise the │
462 │ │ result is ``false''. │
463 ├─────────────┼──────────────────────────────────────────┤
464 │nomatch_inst │ For each value of the logical expression │
465 │ │ that is ``true'', the result is ``true'' │
466 │ │ if the associated instance name does not │
467 │ │ match the regular expression. Otherwise │
468 │ │ the result is ``false''. │
469 └─────────────┴──────────────────────────────────────────┘
470 For example, the expression below will be ``true'' for disks attached
471 to controllers 2 or 3 performing more than 20 operations per second:
472 match_inst "^dks[23]d" disk.dev.total > 20;
473
474 The following aggregate operators map from an arithmetic expression to
475 an arithmetic expression of lower dimension.
476
477 ┌─────────────────────────┬───────────┬──────────────────────────┐
478 │ Operators │ Type │ Explanation │
479 ├─────────────────────────┼───────────┼──────────────────────────┤
480 │min_inst │ Extrema │ Minimum value across all │
481 │min_host │ │ set members in the asso‐ │
482 │min_sample │ │ ciated dimension │
483 ├─────────────────────────┼───────────┼──────────────────────────┤
484 │max_inst │ Extrema │ Maximum value across all │
485 │max_host │ │ set members in the asso‐ │
486 │max_sample │ │ ciated dimension │
487 ├─────────────────────────┼───────────┼──────────────────────────┤
488 │sum_inst │ Aggregate │ Sum of values across all │
489 │sum_host │ │ set members in the asso‐ │
490 │sum_sample │ │ ciated dimension │
491 ├─────────────────────────┼───────────┼──────────────────────────┤
492 │avg_inst │ Aggregate │ Average value across all │
493 │avg_host │ │ set members in the asso‐ │
494 │avg_sample │ │ ciated dimension │
495 └─────────────────────────┴───────────┴──────────────────────────┘
496 The aggregate operators count_inst, count_host and count_sample map
497 from a logical expression to an arithmetic expression of lower dimen‐
498 sion by counting the number of set members for which the expression is
499 true in the associated dimension.
500
501 For action rules, the following actions are defined:
502
503 ┌──────────┬────────────────────────────────────────┐
504 │Operators │ Explanation │
505 ├──────────┼────────────────────────────────────────┤
506 │alarm │ Raise a visible alarm with xconfirm(1) │
507 │print │ Display on standard output │
508 │shell │ Execute with sh(1) │
509 │stomp │ Send a STOMP message to a JMS server │
510 │syslog │ Append a message to system log file │
511 └──────────┴────────────────────────────────────────┘
512 Multiple actions may be separated by the & and | operators to specify
513 respectively sequential execution (both actions are executed) and al‐
514 ternate execution (the second action will only be executed if the exe‐
515 cution of the first action returns a non-zero error status.
516
517 Arguments to actions are an optional suppression time, and then one or
518 more expressions (a string is an expression in this context). Strings
519 appearing as arguments to an action may include the following special
520 selectors that will be replaced at the time the action is executed.
521
522 %h Host name(s) that make the left-most top-level expression in the
523 condition true.
524
525 %c Connection specification string(s) or files for a PCP tool to reach
526 the hosts or archives that make the left-most top-level expression
527 in the condition true.
528
529 %i Instance(s) that make the left-most top-level expression in the
530 condition true.
531
532 %v One value from the left-most top-level expression in the condition
533 for each host and instance pair that makes the condition true.
534
535 Note that expansion of the special selectors is done by repeating the
536 whole argument once for each unique binding to any of the qualifying
537 special selectors. For example if a rule were true for the host mumble
538 with instances grunt and snort, and for host fumble the instance puff
539 makes the rule true, then the action
540 ...
541 -> shell myscript "Warning: %h:%i busy ";
542 will execute myscript with the argument string "Warning: mumble:grunt
543 busy Warning: mumble:snort busy Warning: fumble:puff busy".
544
545 By comparison, if the action
546 ...
547 -> shell myscript "Warning! busy:" " %h:%i";
548 were executed under the same circumstances, then myscript would be exe‐
549 cuted with the argument string "Warning! busy: mumble:grunt mum‐
550 ble:snort fumble:puff".
551
552 The semantics of the expansion of the special selectors leads to a com‐
553 mon usage pattern in an action, where one argument is a constant (con‐
554 tains no special selectors) the second argument contains the desired
555 special selectors with minimal separator characters, and an optional
556 third argument provides a constant postscript (e.g. to terminate any
557 argument quoting from the first argument). If necessary post-process‐
558 ing (e.g. in myscript) can provide the necessary enumeration over each
559 unique expansion of the string containing just the special selectors.
560
561 For complex conditions, the bindings to these selectors is not obvious.
562 It is strongly recommended that pmie be used in the debugging mode
563 (specify the -W command line option in particular) during rule develop‐
564 ment.
565
567 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
568 10 or some_inst ( my.table < 0 ) are assigned the values true or false
569 or unknown. A value is unknown if one or more of the underlying metric
570 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
571 the metric is not in the PCP archive, no values are currently avail‐
572 able, insufficient values have been fetched to allow a rate converted
573 value to be computed or insufficient values have been fetched to in‐
574 stantiate the required number of samples in the temporal domain.
575
576 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
577 logic) when combining values that include unknown:
578
579 ┌────────────┬───────────────────────────┐
580 │ │ B │
581 │ A and B ├─────────┬───────┬─────────┤
582 │ │ true │ false │ unknown │
583 ├──┬─────────┼─────────┼───────┼─────────┤
584 │ │ true │ true │ false │ unknown │
585 │ ├─────────┼─────────┼───────┼─────────┤
586 │A │ false │ false │ false │ false │
587 │ ├─────────┼─────────┼───────┼─────────┤
588 │ │ unknown │ unknown │ false │ unknown │
589 └──┴─────────┴─────────┴───────┴─────────┘
590 ┌────────────┬──────────────────────────┐
591 │ │ B │
592 │ A or B ├──────┬─────────┬─────────┤
593 │ │ true │ false │ unknown │
594 ├──┬─────────┼──────┼─────────┼─────────┤
595 │ │ true │ true │ true │ true │
596 │ ├─────────┼──────┼─────────┼─────────┤
597 │A │ false │ true │ false │ unknown │
598 │ ├─────────┼──────┼─────────┼─────────┤
599 │ │ unknown │ true │ unknown │ unknown │
600 └──┴─────────┴──────┴─────────┴─────────┘
601 ┌────────┬─────────┐
602 │ A │ not A │
603 ├────────┼─────────┤
604 │ true │ false │
605 ├────────┼─────────┤
606 │ false │ true │
607 ├────────┼─────────┤
608 │unknown │ unknown │
609 └────────┴─────────┘
611 The ruleset clause is used to define a set of rules and actions that
612 are evaluated in order until some action is executed, at which point
613 the remaining rules and actions are skipped until the ruleset is again
614 scheduled for evaluation. The keyword else is used to separate rules.
615 After one or more regular rules (with a predicate and an action), a
616 ruleset may include an optional
617 unknown -> action
618 clause, optionally followed by a
619 otherwise -> action
620 clause.
621
622 If all of the predicates in the rules evaluate to unknown and an un‐
623 known clause has been specified then action associated with the unknown
624 clause will be executed.
625
626 If no rule predicate is true and the unknown action is either not spec‐
627 ified or not executed and an otherwise clause has been specified, then
628 the action associated with the otherwise clause will be executed.
629
631 Scale factors may be appended to arithmetic expressions and force lin‐
632 ear scaling of the value to canonical units. Simple scale factors are
633 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
634 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
635 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
636 the operator /, for example ``Kbytes / hour''.
637
639 Macros are defined using expressions of the form:
640
641 name = constexpr;
642
643 Where name follows the normal rules for variables in programming lan‐
644 guages, i.e. alphabetic optionally followed by alphanumerics. const‐
645 expr must be a constant expression, either a string (enclosed in double
646 quotes) or an arithmetic expression optionally followed by a scale fac‐
647 tor.
648
649 Macros are expanded when their name, prefixed by a dollar ($) appears
650 in an expression, and macros may be nested within a constexpr string.
651
652 The following reserved macro names are understood.
653
654 minute Current minute of the hour.
655
656 hour Current hour of the day, in the range 0 to 23.
657
658 day Current day of the month, in the range 1 to 31.
659
660 month Current month of the year, in the range 0 (January) to 11
661 (December).
662
663 year Current year.
664
665 day_of_week
666 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
667 day).
668
669 delta Sample interval in effect for this expression.
670
671 Dates and times are presented in the reporting time zone (see descrip‐
672 tion of -Z and -z command line options above).
673
675 It is often useful for pmie processes to be started and stopped when
676 the local host is booted or shutdown, or when they have been detected
677 as no longer running (when they have unexpectedly exited for some rea‐
678 son). Refer to pmie_check(1) for details on automating this process.
679
680 Optionally, each system running pmcd(1) may also be configured to run a
681 ``primary'' pmie instance. This pmie instance is launched by
682 $PCP_RC_DIR/pmie, and is affected by the files
683 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
684 chkconfig(8), systemctl(1) or similar platform-specific commands to ac‐
685 tivate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
686 fig/pmie/config.default (the default initial configuration file for the
687 primary pmie).
688
689 The primary pmie instance is identified by the -P option. There may be
690 at most one ``primary'' pmie instance on each system. The primary pmie
691 instance (if any) must be running on the same host as the pmcd(1) to
692 which it connects (if any), so the -h and -P options are mutually ex‐
693 clusive.
694
696 It is common for production systems to be monitored in a central loca‐
697 tion. Traditionally on UNIX systems this has been performed by the
698 system log facilities - see logger(1), and syslogd(1). On Windows,
699 communication with the system event log is handled by pcp-eventlog(1).
700
701 pmie fits into this model when rules use the syslog action. Note that
702 if the action string begins with -p (priority) and/or -t (tag) then
703 these are extracted from the string and treated in the same way as in
704 logger(1) and pcp-eventlog(1).
705
706 However, it is common to have other event monitoring frameworks also,
707 into which you may wish to incorporate performance events from pmie.
708 You can often use the shell action to send events to these frameworks,
709 as they usually provide their a program for injecting events into the
710 framework from external sources.
711
712 A final option is use of the stomp (Streaming Text Oriented Messaging
713 Protocol) action, which allows pmie to connect to a central JMS (Java
714 Messaging System) server and send events to the PMIE topic. Tools can
715 be written to extract these text messages and present them to opera‐
716 tions people (via desktop popup windows, etc). Use of the stomp action
717 requires a stomp configuration file to be setup, which specifies the
718 location of the JMS server host, port number, and username/password.
719
720 The format of this file is as follows:
721
722 host=messages.sgi.com # this is the JMS server (required)
723 port=61616 # and its listening here (required)
724 timeout=2 # seconds to wait for server (optional)
725 username=joe # (required)
726 password=j03ST0MP # (required)
727 topic=PMIE # JMS topic for pmie messages (optional)
728
729 The timeout value specifies the time (in seconds) that pmie should wait
730 for acknowledgements from the JMS server after sending a message (as
731 required by the STOMP protocol). Note that on startup, pmie will wait
732 indefinitely for a connection, and will not begin rule evaluation until
733 that initial connection has been established. Should the connection to
734 the JMS server be lost at any time while pmie is running, pmie will at‐
735 tempt to reconnect on each subsequent truthful evaluation of a rule
736 with a stomp action, but not more than once per minute. This is to
737 avoid contributing to network congestion. In this situation, where the
738 STOMP connection to the JMS server has been severed, the stomp action
739 will return a non-zero error value.
740
742 When running in host mode, the delta interval for each rule determines
743 a real-time delay between rule evaluation, so pmie spends most if its
744 time sleeping and waiting for the next scheduled rule evaluation.
745
746 When running in archive mode, pmie uses the delta interval for each
747 rule to determine how frequently the rules are evaluated against the
748 archive data, but unlike host mode there are no real-time delays as the
749 archive is ``replayed'' as fast as possible.
750
751 In archive mode when a rule predicate evaluates true then the action is
752 modified, so that rather than posting to syslog or raising a visible
753 alarm or running a shell command or sending a stomp message, pmie
754 prints the name of the action, the timestamp from the archive when the
755 rule predicate triggering the action was true and all of the arguments
756 that would have been passed to the real action in host mode.
757
758 For example, given the rule:
759 delta = 10 sec;
760 kernel.all.nprocs > 10 * hinv.ncpu -> print "lotsaprocs:" " %v";
761 when run against an archive, the output appears as:
762 print Mon Sep 4 00:10:21 2017: lotsaprocs: 1292
763 print Mon Sep 4 00:10:31 2017: lotsaprocs: 1294
764 print Mon Sep 4 00:10:41 2017: lotsaprocs: 1291
765 ...
766
767 The rationale is that the context in which the action would have been
768 executed (in host mode) was at a time in the past and the possibly on a
769 different host (if the archive was collected from one host, but pmie is
770 being run on a different host). So flooding syslog with misleading
771 messages or an avalanche visual alarms or a lot of STOMP messages or a
772 shell command that might not even work on the host where pmie is being
773 run, are all examples of ``badness'' to be avoided. Rather the output
774 is text in a regular format suitable for post-processing with a range
775 of filters and performance analysis tools.
776
777 The output format can be changed using the -o option which consists of
778 literal characters with the following embedded ``meta-field'' tokens:
779
780 %a The name of the action, e.g. print, syslog, etc.
781
782 %d The date and time in ctime(3) format when the action would have
783 been executed.
784
785 %f The name of the configuration file containing the action being exe‐
786 cuted, else <stdin> if the rules were read from standard input.
787
788 %l The (approximate) line number in the configuration file for the ac‐
789 tion being executed.
790
791 %m The message component of the action.
792
793 %u The date and time when the action would have been executed in ex‐
794 tended ctime(3) format with microsecond precision for the time.
795
796 %% A literal percent character.
797
798 The default output format is equivalent to a format of %a %d: %m.
799
801 If pmie is sent a SIGHUP signal, the logfile will be closed, unlinked
802 and re-opened. This is used by pmie_daily(1) to achieve nightly log
803 rotation.
804
805 Most of the time pmie is sleeping, waiting until the next set of rules
806 needs to be evaluated. Sending pmie a SIGUSR1 signal will cause the
807 details for the next set of rules to be dumped on logfile, including
808 how long the current sleep is and how much time remains. The schedul‐
809 ing of rules is not changed by this action.
810
812 The hostname of the PMCD that is providing metrics to pmie is used in
813 several ways.
814
815 PMCD's hostname is user internally to provide a value for the %h sub‐
816 stitutions in rule action strings.
817
818 For pmie instances using a local PMCD that are launched and managed by
819 pmie_check(1) and pmie_daily(1), (or the systemd(1) or cron(8) services
820 that use these scripts), the local hostname may also be used to con‐
821 struct the name of a directory where the pmie logs for one host are
822 stored, e.g. $PCP_LOG_DIR/pmie/<hostname>.
823
824 The hostname of the PMCD host may change during boot time when the sys‐
825 tem transitions from a temporary hostname to a persistent hostname, or
826 by explicit administrative action anytime after the system has been
827 booted. When this happens, pmie may need to take special action,
828 specifically if the pmie instance was launched from pmie_check(1) or
829 pmie_daily(1), then pmie must exit. Under normal circumstances sys‐
830 temd(1) or cron(8) will launch a new pmie shortly thereafter, and this
831 new pmie instance will be operating in the context of the new hostname
832 for the host where PMCD is running.
833
835 The lexical scanner and parser will attempt to recover after an error
836 in the input expressions. Parsing resumes after skipping input up to
837 the next semi-colon (;), however during this skipping process the scan‐
838 ner is ignorant of comments and strings, so an embedded semi-colon may
839 cause parsing to resume at an unexpected place. This behavior is
840 largely benign, as until the initial syntax error is corrected, pmie
841 will not attempt any expression evaluation.
842
844 $PCP_DEMOS_DIR/pmie/*
845 annotated example rules
846
847 $PCP_VAR_DIR/pmns/*
848 default PMNS specification files
849
850 $PCP_TMP_DIR/pmie
851 pmie maintains files in this directory to identify the running
852 pmie instances and to export runtime information about each in‐
853 stance - this data forms the basis of the pmcd.pmie performance
854 metrics
855
856 $PCP_PMIECONTROL_PATH
857 the default set of pmie instances to start at boot time - refer to
858 pmie_check(1) for details
859
861 Environment variables with the prefix PCP_ are used to parameterize the
862 file and directory names used by PCP. On each installation, the file
863 /etc/pcp.conf contains the local values for these variables. The
864 $PCP_CONF variable may be used to specify an alternative configuration
865 file, as described in pcp.conf(5).
866
867 When executing shell actions, pmie overrides two variables - IFS and
868 PATH - in the environment of the child process. IFS is set to "\t\n".
869 The PATH is set to a combination of a default path for all platforms
870 ("/usr/sbin:/sbin:/usr/bin:/bin") and several configurable components.
871 These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and $PCP_PLAT‐
872 FORM_PATHS.
873
874 When executing popup alarm actions, pmie will use the value of
875 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
876 typically set to pmconfirm(1), a cross-platform dialog box.
877
879 logger(1).
880
882 pcp-eventlog(1).
883
885 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmie_check(1),
886 pmieconf(1), pmie_daily(1), pminfo(1), pmlogger(1), pmval(1), sys‐
887 temd(1), ctime(3), PMAPI(3), pcp.conf(5), pcp.env(5) and PMNS(5).
888
890 For a more complete description of the pmie language, refer to the Per‐
891 formance Co-Pilot Users and Administrators Guide. This is available
892 online from:
893 https://pcp.readthedocs.io/en/latest/UAG/PerformanceMetricsInferenceEngine.html
894
895
896
897Performance Co-Pilot PCP PMIE(1)