1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdeFfPqvVWxXz?] [-a archive] [-A align] [-c filename] [-h
10 host] [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S start‐
11 time] [-t interval] [-T endtime] [-U username] [-Z timezone] [filename
12 ...]
13
15 pmie accepts a collection of arithmetic, logical, and rule expressions
16 to be evaluated at specified frequencies. The base data for the ex‐
17 pressions consists of performance metrics values delivered in real-time
18 from any host running the Performance Metrics Collection Daemon (PMCD),
19 or using historical data from Performance Co-Pilot (PCP) archive logs.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 Output from pmie is directed to standard output and standard error as
31 follows:
32
33 stdout
34 Expression values printed in the verbose -v mode and the output of
35 print actions.
36
37 stderr
38 Error and warning messages for any syntactic or semantic problems
39 during expression parsing, and any semantic or performance metrics
40 availability problems during expression evaluation.
41
43 The available command line options are:
44
45 -a archive, --archive=archive
46 archive which is a comma-separated list of names, each of which
47 may be the base name of an archive or the name of a directory con‐
48 taining one or more archives written by pmlogger(1). Multiple in‐
49 stances of the -a flag may appear on the command line to specify a
50 list of sets of archives. In this case, it is required that only
51 one set of archives be present for any one host. Also, any ex‐
52 plicit host names occurring in a pmie expression must match the
53 host name recorded in one of the archive labels. In the case of
54 multiple sets of archives, timestamps recorded in the archives are
55 used to ensure temporal consistency.
56
57 -A align, --align=align
58 Force the initial time window to be aligned on the boundary of a
59 natural time unit align. Refer to PCPIntro(1) for a complete de‐
60 scription of the syntax for align.
61
62 -b, --buffer
63 Output will be line buffered and standard output is attached to
64 standard error. This is most useful for background execution in
65 conjunction with the -l option. The -b option is always used for
66 pmie instances launched from pmie_check(1).
67
68 -c config, --config=config
69 An alternative to specifying filename at the end of the command
70 line.
71
72 -C, --check
73 Parse the configuration file(s) and exit before performing any
74 evaluations. Any errors in the configuration file are reported.
75
76 -d, --interact
77 Normally pmie would be launched as a non-interactive process to
78 monitor and manage the performance of one or more hosts. Given
79 the -d flag however, execution is interactive and the user is pre‐
80 sented with a menu of options. Interactive mode is useful mainly
81 for debugging new expressions.
82
83 -e, --timestamp
84 When used with -V, -v or -W, this option forces timestamps to be
85 reported with each expression. The timestamps are in ctime(3)
86 format, enclosed in parenthesis and appear after the expression
87 name and before the expression value, e.g.
88 expr_1 (Tue Feb 6 19:55:10 2001): 12
89
90 -f, --foreground
91 If the -l option is specified and there is no -a option (ie. real-
92 time monitoring) then pmie is run as a daemon in the background
93 (in all other cases foreground is the default). The -f (and -F,
94 see below) options force pmie to be run in the foreground, inde‐
95 pendent of any other options.
96
97 -F, --systemd
98 Like -f, the -F option runs pmie in the foreground, but also does
99 some housekeeping (like create a pid file, change user id and no‐
100 tify systemd(1) when pmie has started or is shutting down). This
101 is intended for use when pmie is launched from systemd(1) and the
102 daemonizing has already been done. The -f and -F options are mu‐
103 tually exclusive.
104
105 -h host, --host=host
106 By default performance data is fetched from the local host (in
107 real-time mode) or the host for the first named set of archives on
108 the command line (in archive mode). The host argument overrides
109 this default. It does not override hosts explicitly named in the
110 expressions being evaluated. The host argument is interpreted as
111 a connection specification for pmNewContext, and is later mapped
112 to the remote pmcd's self-reported host name for reporting pur‐
113 poses. See also the %h vs. %c substitutions in rule action
114 strings below.
115
116 -l logfile, --logfile=logfile
117 Standard error is sent to logfile.
118
119 -j file
120 An alternative STOMP protocol configuration is loaded from stomp‐
121 file. If this option is not used, and the stomp action is used in
122 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
123 will be used.
124
125 -n pmnsfile, --namespace=pmnsfile
126 An alternative Performance Metrics Name Space (PMNS) is loaded
127 from the file pmnsfile.
128
129 -O origin, --origin=origin
130 Specify the origin of the time window. See PCPIntro(1) for com‐
131 plete description of this option.
132
133 -P, --primary
134 Identifies this as the primary pmie instance for a host. See the
135 ``AUTOMATIC RESTART'' section below for further details.
136
137 -q, --quiet
138 Suppresses diagnostic messages that would be printed to standard
139 output by default, especially the "evaluator exiting" message as
140 this can confuse scripts.
141
142 -S starttime, --start=starttime
143 Specify the starttime of the time window. See PCPIntro(1) for
144 complete description of this option.
145
146 -t interval, --interval=interval
147 The interval argument follows the syntax described in PCPIntro(1),
148 and in the simplest form may be an unsigned integer (the implied
149 units in this case are seconds). The value is used to determine
150 the sample interval for expressions that do not explicitly set
151 their sample interval using the pmie variable delta described be‐
152 low. The default is 10.0 seconds.
153
154 -T endtime, --finish=endtime
155 Specify the endtime of the time window. See PCPIntro(1) for com‐
156 plete description of this option.
157
158 -U username, --username=username
159 User account under which to run pmie. The default is the current
160 user account for interactive use. When run as a daemon, the un‐
161 privileged "pcp" account is used in current versions of PCP, but
162 in older versions the superuser account ("root") was used by de‐
163 fault.
164
165 -v Unless one of the verbose options -V, -v or -W appears on the com‐
166 mand line, expressions are evaluated silently, the only output is
167 as a result of any actions being executed. In the verbose mode,
168 specified using the -v flag, the value of each expression is
169 printed as it is evaluated. The values are in canonical units;
170 bytes in the dimension of ``space'', seconds in the dimension of
171 ``time'' and events in the dimension of ``count''. See pm‐
172 LookupDesc(3) for details of the supported dimension and scaling
173 mechanisms for performance metrics. The verbose mode is useful in
174 monitoring the value of given expressions, evaluating derived per‐
175 formance metrics, passing these values on to other tools for fur‐
176 ther processing and in debugging new expressions.
177
178 -V, --verbose
179 This option has the same effect as the -v option, except that the
180 name of the host and instance (if applicable) are printed as well
181 as expression values.
182
183 -W This option has the same effect as the -V option described above,
184 except that for boolean expressions, only those names and values
185 that make the expression true are printed. These are the same
186 names and values accessible to rule actions as the %h, %i, %c and
187 %v bindings, as described below.
188
189 -x, --secret-agent
190 Execute in domain agent mode. This mode is used within the Per‐
191 formance Co-Pilot product to derive values for summary metrics,
192 see pmdasummary(1). Only restricted functionality is available in
193 this mode (expressions with actions may not be used).
194
195 -X, --secret-applet
196 Run in secret applet mode (thin client).
197
198 -z, --hostzone
199 Change the reporting timezone to the timezone of the host that is
200 the source of the performance metrics, as identified via either
201 the -h option or the first named set of archives (as described
202 above for the -a option).
203
204 -Z timezone, --timezone=timezone
205 Change the reporting timezone to timezone in the format of the en‐
206 vironment variable TZ as described in environ(7).
207
208 -?, --help
209 Display usage message and exit.
210
212 The following example expressions demonstrate some of the capabilities
213 of the inference engine.
214
215 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
216 examples of pmie expressions.
217
218 The variable delta controls expression evaluation frequency. Specify
219 that subsequent expressions be evaluated once a second, until further
220 notice:
221
222 delta = 1 sec;
223
224 If the total context switch rate exceeds 10000 per second per CPU, then
225 display an alarm notifier:
226
227 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
228 -> alarm "high context switch rate %v";
229
230 If the high context switch rate is sustained for 10 consecutive sam‐
231 ples, then launch top(1) in an xterm(1) window to monitor processes,
232 but do this at most once every 5 minutes:
233
234 all_sample (
235 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
236 ) -> shell 5 min "xterm -e 'top'";
237
238 The following rules are evaluated once every 20 seconds:
239
240 delta = 20 sec;
241
242 If any disk is performing more than 60 I/Os per second, then print a
243 message identifying the busy disk to standard output and launch
244 dkvis(1):
245
246 some_inst (
247 disk.dev.total > 60 count/sec
248 ) -> print "busy disks:" " %i" &
249 shell 5 min "dkvis";
250
251 Refine the preceding rule to apply only between the hours of 9am and
252 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
253 before executing the action:
254
255 $hour >= 9 && $hour <= 17 &&
256 some_inst (
257 75 %_sample (
258 disk.dev.total @0..3 > 60 count/sec
259 )
260 ) -> print "disks busy for 20 sec:" " [%h]%i";
261
262 The following two rules are evaluated once every 10 minutes:
263
264 delta = 10 min;
265
266 If either the / or the /usr filesystem is more than 95% full, display
267 an alarm popup, but not if it has already been displayed during the
268 last 4 hours:
269
270 filesys.free #'/dev/root' /
271 filesys.capacity #'/dev/root' < 0.05
272 -> alarm 4 hour "root filesystem (almost) full";
273
274 filesys.free #'/dev/usr' /
275 filesys.capacity #'/dev/usr' < 0.05
276 -> alarm 4 hour "/usr filesystem (almost) full";
277
278 The following rule requires a machine that supports the lmsensors met‐
279 rics. If the machine environment temperature rises more than 2 degrees
280 over a 10 minute interval, write an entry in the system log:
281
282 lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
283 -> alarm "temperature rising fast" &
284 syslog "machine room temperature rise alarm";
285
286 And something interesting if you have performance problems with your
287 Oracle database:
288
289 // back to 30sec evaluations
290 delta = 30 sec;
291 sid = "ptg1"; # $ORACLE_SID setting
292 lid = "223"; # latch ID from v$latch
293 lru = "#'$sid/$lid cache buffers lru chain'";
294 host = ":moomba.melbourne.sgi.com";
295 gets = "oracle.latch.gets $host $lru";
296 total = "oracle.latch.gets $host $lru +
297 oracle.latch.misses $host $lru +
298 oracle.latch.immisses $host $lru";
299
300 $total > 100 && $gets / $total < 0.2
301 -> alarm "high lru latch contention in database $sid";
302
303 The following ruleset will emit exactly one message depending on the
304 availability and value of the 1-minute load average.
305
306 delta = 1 minute;
307 ruleset
308 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
309 print "extreme load average %v"
310 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
311 print "moderate load average %v"
312 unknown ->
313 print "load average unavailable"
314 otherwise ->
315 print "load average OK"
316 ;
317
318 The following rule will emit a message when some filesystem is more
319 than 75% full and is filling at a rate that if sustained would fill the
320 filesystem to 100% in less than 30 minutes.
321
322 some_inst (
323 100 * filesys.used / filesys.capacity > 75 &&
324 filesys.used + 30min * (rate filesys.used) > filesys.capacity
325 ) -> print "filesystem will be full within 30 mins:" " %i";
326
327 If the metric mypmda.errors counts errors then the following rule will
328 emit a message if the rate of errors exceeds 1 per second provided the
329 error count is less than 100.
330
331 mypmda.errors > 1 && instant mypmda.errors < 100
332 -> print "high error rate: %v";
333
335 The pmie specification language is powerful and large.
336
337 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
338 vides a facility for generating a pmie configuration file from a set of
339 generalized pmie rules. The supplied set of rules covers a wide range
340 of performance scenarios.
341
342 The Performance Co-Pilot User's and Administrator's Guide provides a
343 detailed tutorial-style chapter covering pmie.
344
346 This description is terse and informal. For a more comprehensive de‐
347 scription see the Performance Co-Pilot User's and Administrator's
348 Guide.
349
350 A pmie specification is a sequence of semicolon terminated expressions.
351
352 Basic operators are modeled on the arithmetic, relational and Boolean
353 operators of the C programming language. Precedence rules are as ex‐
354 pected, although the use of parentheses is encouraged to enhance read‐
355 ability and remove ambiguity.
356
357 Operands are performance metric names (see PMNS(5)) and the normal lit‐
358 eral constants.
359
360 Operands involving performance metrics may produce sets of values, as a
361 result of enumeration in the dimensions of hosts, instances and time.
362 Special qualifiers may appear after a performance metric name to define
363 the enumeration in each dimension. For example,
364
365 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
366
367 defines 6 values corresponding to the time spent executing in user mode
368 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
369 samples. The default interpretation in the absence of : (host), # (in‐
370 stance) and @ (time) qualifiers is all instances at the most recent
371 sample time for the default source of PCP performance metrics.
372
373 Host and instance names that do not follow the rules for variables in
374 programming languages, ie. alphabetic optionally followed by alphanu‐
375 merics, should be enclosed in single quotes.
376
377 Expression evaluation follows the law of ``least surprises''. Where
378 performance metrics have the semantics of a counter, pmie will automat‐
379 ically convert to a rate based upon consecutive samples and the time
380 interval between these samples. All numeric expressions are evaluated
381 in double precision, and where appropriate, automatically scaled into
382 canonical units of ``bytes'', ``seconds'' and ``counts''.
383
384 A rule is a special form of expression that specifies a condition or
385 logical expression, a special operator (->) and actions to be performed
386 when the condition is found to be true.
387
388 The following table summarizes the basic pmie operators:
389
390 ┌────────────────┬────────────────────────────────────────────────┐
391 │ Operators │ Explanation │
392 ├────────────────┼────────────────────────────────────────────────┤
393 │+ - * / │ Arithmetic │
394 │< <= == >= > != │ Relational (value comparison) │
395 │! && || │ Boolean │
396 │-> │ Rule │
397 │rising │ Boolean, false to true transition │
398 │falling │ Boolean, true to false transition │
399 │rate │ Explicit rate conversion (rarely required) │
400 │instant │ No automatic rate conversion (rarely required) │
401 └────────────────┴────────────────────────────────────────────────┘
402 All operators are supported for numeric-valued operands and expres‐
403 sions. For string-valued operands, namely literal string constants en‐
404 closed in double quotes or metrics with a data type of string
405 (PM_TYPE_STRING), only the operators == and != are supported.
406
407 The rate and instant operators are the logical inverse of one another,
408 so an arithmetic expression expr is equal to rate instant expr. The
409 more useful cases involve using rate with a metric that is not a
410 counter to determine the rate of change over time or instant with a
411 metric that is a counter to determine if the current value is above or
412 below some threshold.
413
414 Aggregate operators may be used to aggregate or summarize along one di‐
415 mension of a set-valued expression. The following aggregate operators
416 map from a logical expression to a logical expression of lower dimen‐
417 sion.
418
419 ┌─────────────────────────┬─────────────┬──────────────────────────┐
420 │ Operators │ Type │ Explanation │
421 ├─────────────────────────┼─────────────┼──────────────────────────┤
422 │some_inst │ Existential │ True if at least one set │
423 │some_host │ │ member is true in the │
424 │some_sample │ │ associated dimension │
425 ├─────────────────────────┼─────────────┼──────────────────────────┤
426 │all_inst │ Universal │ True if all set members │
427 │all_host │ │ are true in the associ‐ │
428 │all_sample │ │ ated dimension │
429 ├─────────────────────────┼─────────────┼──────────────────────────┤
430 │N%_inst │ Percentile │ True if at least N per‐ │
431 │N%_host │ │ cent of set members are │
432 │N%_sample │ │ true in the associated │
433 │ │ │ dimension │
434 └─────────────────────────┴─────────────┴──────────────────────────┘
435 The following instantial operators may be used to filter or limit a
436 set-valued logical expression, based on regular expression matching of
437 instance names. The logical expression must be a set involving the di‐
438 mension of instances, and the regular expression is of the form used by
439 egrep(1) or the Extended Regular Expressions of regcomp(3).
440
441 ┌─────────────┬──────────────────────────────────────────┐
442 │ Operators │ Explanation │
443 ├─────────────┼──────────────────────────────────────────┤
444 │match_inst │ For each value of the logical expression │
445 │ │ that is ``true'', the result is ``true'' │
446 │ │ if the associated instance name matches │
447 │ │ the regular expression. Otherwise the │
448 │ │ result is ``false''. │
449 ├─────────────┼──────────────────────────────────────────┤
450 │nomatch_inst │ For each value of the logical expression │
451 │ │ that is ``true'', the result is ``true'' │
452 │ │ if the associated instance name does not │
453 │ │ match the regular expression. Otherwise │
454 │ │ the result is ``false''. │
455 └─────────────┴──────────────────────────────────────────┘
456 For example, the expression below will be ``true'' for disks attached
457 to controllers 2 or 3 performing more than 20 operations per second:
458 match_inst "^dks[23]d" disk.dev.total > 20;
459
460 The following aggregate operators map from an arithmetic expression to
461 an arithmetic expression of lower dimension.
462
463 ┌─────────────────────────┬───────────┬──────────────────────────┐
464 │ Operators │ Type │ Explanation │
465 ├─────────────────────────┼───────────┼──────────────────────────┤
466 │min_inst │ Extrema │ Minimum value across all │
467 │min_host │ │ set members in the asso‐ │
468 │min_sample │ │ ciated dimension │
469 ├─────────────────────────┼───────────┼──────────────────────────┤
470 │max_inst │ Extrema │ Maximum value across all │
471 │max_host │ │ set members in the asso‐ │
472 │max_sample │ │ ciated dimension │
473 ├─────────────────────────┼───────────┼──────────────────────────┤
474 │sum_inst │ Aggregate │ Sum of values across all │
475 │sum_host │ │ set members in the asso‐ │
476 │sum_sample │ │ ciated dimension │
477 ├─────────────────────────┼───────────┼──────────────────────────┤
478 │avg_inst │ Aggregate │ Average value across all │
479 │avg_host │ │ set members in the asso‐ │
480 │avg_sample │ │ ciated dimension │
481 └─────────────────────────┴───────────┴──────────────────────────┘
482 The aggregate operators count_inst, count_host and count_sample map
483 from a logical expression to an arithmetic expression of lower dimen‐
484 sion by counting the number of set members for which the expression is
485 true in the associated dimension.
486
487 For action rules, the following actions are defined:
488
489 ┌──────────┬────────────────────────────────────────┐
490 │Operators │ Explanation │
491 ├──────────┼────────────────────────────────────────┤
492 │alarm │ Raise a visible alarm with xconfirm(1) │
493 │print │ Display on standard output │
494 │shell │ Execute with sh(1) │
495 │stomp │ Send a STOMP message to a JMS server │
496 │syslog │ Append a message to system log file │
497 └──────────┴────────────────────────────────────────┘
498 Multiple actions may be separated by the & and | operators to specify
499 respectively sequential execution (both actions are executed) and al‐
500 ternate execution (the second action will only be executed if the exe‐
501 cution of the first action returns a non-zero error status.
502
503 Arguments to actions are an optional suppression time, and then one or
504 more expressions (a string is an expression in this context). Strings
505 appearing as arguments to an action may include the following special
506 selectors that will be replaced at the time the action is executed.
507
508 %h Host name(s) that make the left-most top-level expression in the
509 condition true.
510
511 %c Connection specification string(s) or files for a PCP tool to reach
512 the hosts or archives that make the left-most top-level expression
513 in the condition true.
514
515 %i Instance(s) that make the left-most top-level expression in the
516 condition true.
517
518 %v One value from the left-most top-level expression in the condition
519 for each host and instance pair that makes the condition true.
520
521 Note that expansion of the special selectors is done by repeating the
522 whole argument once for each unique binding to any of the qualifying
523 special selectors. For example if a rule were true for the host mumble
524 with instances grunt and snort, and for host fumble the instance puff
525 makes the rule true, then the action
526 ...
527 -> shell myscript "Warning: %h:%i busy ";
528 will execute myscript with the argument string "Warning: mumble:grunt
529 busy Warning: mumble:snort busy Warning: fumble:puff busy".
530
531 By comparison, if the action
532 ...
533 -> shell myscript "Warning! busy:" " %h:%i";
534 were executed under the same circumstances, then myscript would be exe‐
535 cuted with the argument string "Warning! busy: mumble:grunt mum‐
536 ble:snort fumble:puff".
537
538 The semantics of the expansion of the special selectors leads to a com‐
539 mon usage pattern in an action, where one argument is a constant (con‐
540 tains no special selectors) the second argument contains the desired
541 special selectors with minimal separator characters, and an optional
542 third argument provides a constant postscript (e.g. to terminate any
543 argument quoting from the first argument). If necessary post-process‐
544 ing (eg. in myscript) can provide the necessary enumeration over each
545 unique expansion of the string containing just the special selectors.
546
547 For complex conditions, the bindings to these selectors is not obvious.
548 It is strongly recommended that pmie be used in the debugging mode
549 (specify the -W command line option in particular) during rule develop‐
550 ment.
551
553 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
554 10 or some_inst ( my.table < 0 ) are assigned the values true or false
555 or unknown. A value is unknown if one or more of the underlying metric
556 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
557 the metric is not in the PCP archive, no values are currently avail‐
558 able, insufficient values have been fetched to allow a rate converted
559 value to be computed or insufficient values have been fetched to in‐
560 stantiate the required number of samples in the temporal domain.
561
562 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
563 logic) when combining values that include unknown:
564
565 ┌────────────┬───────────────────────────┐
566 │ │ B │
567 │ A and B ├─────────┬───────┬─────────┤
568 │ │ true │ false │ unknown │
569 ├──┬─────────┼─────────┼───────┼─────────┤
570 │ │ true │ true │ false │ unknown │
571 │ ├─────────┼─────────┼───────┼─────────┤
572 │A │ false │ false │ false │ false │
573 │ ├─────────┼─────────┼───────┼─────────┤
574 │ │ unknown │ unknown │ false │ unknown │
575 └──┴─────────┴─────────┴───────┴─────────┘
576 ┌────────────┬──────────────────────────┐
577 │ │ B │
578 │ A or B ├──────┬─────────┬─────────┤
579 │ │ true │ false │ unknown │
580 ├──┬─────────┼──────┼─────────┼─────────┤
581 │ │ true │ true │ true │ true │
582 │ ├─────────┼──────┼─────────┼─────────┤
583 │A │ false │ true │ false │ unknown │
584 │ ├─────────┼──────┼─────────┼─────────┤
585 │ │ unknown │ true │ unknown │ unknown │
586 └──┴─────────┴──────┴─────────┴─────────┘
587 ┌────────┬─────────┐
588 │ A │ not A │
589 ├────────┼─────────┤
590 │ true │ false │
591 ├────────┼─────────┤
592 │ false │ true │
593 ├────────┼─────────┤
594 │unknown │ unknown │
595 └────────┴─────────┘
597 The ruleset clause is used to define a set of rules and actions that
598 are evaluated in order until some action is executed, at which point
599 the remaining rules and actions are skipped until the ruleset is again
600 scheduled for evaluation. The keyword else is used to separate rules.
601 After one or more regular rules (with a predicate and an action), a
602 ruleset may include an optional
603 unknown -> action
604 clause, optionally followed by a
605 otherwise -> action
606 clause.
607
608 If all of the predicates in the rules evaluate to unknown and an un‐
609 known clause has been specified then action associated with the unknown
610 clause will be executed.
611
612 If no rule predicate is true and the unknown action is either not spec‐
613 ified or not executed and an otherwise clause has been specified, then
614 the action associated with the otherwise clause will be executed.
615
617 Scale factors may be appended to arithmetic expressions and force lin‐
618 ear scaling of the value to canonical units. Simple scale factors are
619 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
620 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
621 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
622 the operator /, for example ``Kbytes / hour''.
623
625 Macros are defined using expressions of the form:
626
627 name = constexpr;
628
629 Where name follows the normal rules for variables in programming lan‐
630 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
631 must be a constant expression, either a string (enclosed in double
632 quotes) or an arithmetic expression optionally followed by a scale fac‐
633 tor.
634
635 Macros are expanded when their name, prefixed by a dollar ($) appears
636 in an expression, and macros may be nested within a constexpr string.
637
638 The following reserved macro names are understood.
639
640 minute Current minute of the hour.
641
642 hour Current hour of the day, in the range 0 to 23.
643
644 day Current day of the month, in the range 1 to 31.
645
646 month Current month of the year, in the range 0 (January) to 11
647 (December).
648
649 year Current year.
650
651 day_of_week
652 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
653 day).
654
655 delta Sample interval in effect for this expression.
656
657 Dates and times are presented in the reporting time zone (see descrip‐
658 tion of -Z and -z command line options above).
659
661 It is often useful for pmie processes to be started and stopped when
662 the local host is booted or shutdown, or when they have been detected
663 as no longer running (when they have unexpectedly exited for some rea‐
664 son). Refer to pmie_check(1) for details on automating this process.
665
666 Optionally, each system running pmcd(1) may also be configured to run a
667 ``primary'' pmie instance. This pmie instance is launched by
668 $PCP_RC_DIR/pmie, and is affected by the files
669 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
670 chkconfig(8), systemctl(1) or similar platform-specific commands to ac‐
671 tivate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
672 fig/pmie/config.default (the default initial configuration file for the
673 primary pmie).
674
675 The primary pmie instance is identified by the -P option. There may be
676 at most one ``primary'' pmie instance on each system. The primary pmie
677 instance (if any) must be running on the same host as the pmcd(1) to
678 which it connects (if any), so the -h and -P options are mutually ex‐
679 clusive.
680
682 It is common for production systems to be monitored in a central loca‐
683 tion. Traditionally on UNIX systems this has been performed by the
684 system log facilities - see logger(1), and syslogd(1). On Windows,
685 communication with the system event log is handled by pcp-eventlog(1).
686
687 pmie fits into this model when rules use the syslog action. Note that
688 if the action string begins with -p (priority) and/or -t (tag) then
689 these are extracted from the string and treated in the same way as in
690 logger(1) and pcp-eventlog(1).
691
692 However, it is common to have other event monitoring frameworks also,
693 into which you may wish to incorporate performance events from pmie.
694 You can often use the shell action to send events to these frameworks,
695 as they usually provide their a program for injecting events into the
696 framework from external sources.
697
698 A final option is use of the stomp (Streaming Text Oriented Messaging
699 Protocol) action, which allows pmie to connect to a central JMS (Java
700 Messaging System) server and send events to the PMIE topic. Tools can
701 be written to extract these text messages and present them to opera‐
702 tions people (via desktop popup windows, etc). Use of the stomp action
703 requires a stomp configuration file to be setup, which specifies the
704 location of the JMS server host, port number, and username/password.
705
706 The format of this file is as follows:
707
708 host=messages.sgi.com # this is the JMS server (required)
709 port=61616 # and its listening here (required)
710 timeout=2 # seconds to wait for server (optional)
711 username=joe # (required)
712 password=j03ST0MP # (required)
713 topic=PMIE # JMS topic for pmie messages (optional)
714
715 The timeout value specifies the time (in seconds) that pmie should wait
716 for acknowledgements from the JMS server after sending a message (as
717 required by the STOMP protocol). Note that on startup, pmie will wait
718 indefinitely for a connection, and will not begin rule evaluation until
719 that initial connection has been established. Should the connection to
720 the JMS server be lost at any time while pmie is running, pmie will at‐
721 tempt to reconnect on each subsequent truthful evaluation of a rule
722 with a stomp action, but not more than once per minute. This is to
723 avoid contributing to network congestion. In this situation, where the
724 STOMP connection to the JMS server has been severed, the stomp action
725 will return a non-zero error value.
726
728 The lexical scanner and parser will attempt to recover after an error
729 in the input expressions. Parsing resumes after skipping input up to
730 the next semi-colon (;), however during this skipping process the scan‐
731 ner is ignorant of comments and strings, so an embedded semi-colon may
732 cause parsing to resume at an unexpected place. This behavior is
733 largely benign, as until the initial syntax error is corrected, pmie
734 will not attempt any expression evaluation.
735
737 $PCP_DEMOS_DIR/pmie/*
738 annotated example rules
739
740 $PCP_VAR_DIR/pmns/*
741 default PMNS specification files
742
743 $PCP_TMP_DIR/pmie
744 pmie maintains files in this directory to identify the running
745 pmie instances and to export runtime information about each in‐
746 stance - this data forms the basis of the pmcd.pmie performance
747 metrics
748
749 $PCP_PMIECONTROL_PATH
750 the default set of pmie instances to start at boot time - refer to
751 pmie_check(1) for details
752
754 Environment variables with the prefix PCP_ are used to parameterize the
755 file and directory names used by PCP. On each installation, the file
756 /etc/pcp.conf contains the local values for these variables. The
757 $PCP_CONF variable may be used to specify an alternative configuration
758 file, as described in pcp.conf(5).
759
760 When executing shell actions, pmie overrides two variables - IFS and
761 PATH - in the environment of the child process. IFS is set to "\t\n".
762 The PATH is set to a combination of a default path for all platforms
763 ("/usr/sbin:/sbin:/usr/bin:/bin") and several configurable components.
764 These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and $PCP_PLAT‐
765 FORM_PATHS.
766
767 When executing popup alarm actions, pmie will use the value of
768 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
769 typically set to pmconfirm(1), a cross-platform dialog box.
770
772 logger(1).
773
775 pcp-eventlog(1).
776
778 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
779 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), systemd(1), PMAPI(3),
780 pcp.conf(5), pcp.env(5) and PMNS(5).
781
783 For a more complete description of the pmie language, refer to the Per‐
784 formance Co-Pilot Users and Administrators Guide. This is available
785 online from:
786 https://pcp.readthedocs.io/en/latest/UAG/PerformanceMetricsInferenceEngine.html
787
788
789
790Performance Co-Pilot PCP PMIE(1)