1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdeFfHPqvVWxXz?] [-a archive] [-A align] [-c filename] [-h
10 host] [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S start‐
11 time] [-t interval] [-T endtime] [-U username] [-Z timezone] [filename
12 ...]
13
15 pmie accepts a collection of arithmetic, logical, and rule expressions
16 to be evaluated at specified frequencies. The base data for the
17 expressions consists of performance metrics values delivered in real-
18 time from any host running the Performance Metrics Collection Daemon
19 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
20 chive logs.
21
22 As well as computing arithmetic and logical values, pmie can execute
23 actions (popup alarms, write system log messages, and launch programs)
24 in response to specified conditions. Such actions are extremely useful
25 in detecting, monitoring and correcting performance related problems.
26
27 The expressions to be evaluated are read from configuration files spec‐
28 ified by one or more filename arguments. In the absence of any file‐
29 name, expressions are read from standard input.
30
31 Output from pmie is directed to standard output and standard error as
32 follows:
33
34 stdout
35 Expression values printed in the verbose -v mode and the output of
36 print actions.
37
38 stderr
39 Error and warning messages for any syntactic or semantic problems
40 during expression parsing, and any semantic or performance metrics
41 availability problems during expression evaluation.
42
44 The available command line options are:
45
46 -a archive, --archive=archive
47 archive which is a comma-separated list of names, each of which
48 may be the base name of an archive or the name of a directory con‐
49 taining one or more archives written by pmlogger(1). Multiple
50 instances of the -a flag may appear on the command line to specify
51 a list of sets of archives. In this case, it is required that
52 only one set of archives be present for any one host. Also, any
53 explicit host names occurring in a pmie expression must match the
54 host name recorded in one of the archive labels. In the case of
55 multiple sets of archives, timestamps recorded in the archives are
56 used to ensure temporal consistency.
57
58 -A align, --align=align
59 Force the initial time window to be aligned on the boundary of a
60 natural time unit align. Refer to PCPIntro(1) for a complete
61 description of the syntax for align.
62
63 -b, --buffer
64 Output will be line buffered and standard output is attached to
65 standard error. This is most useful for background execution in
66 conjunction with the -l option. The -b option is always used for
67 pmie instances launched from pmie_check(1).
68
69 -c config, --config=config
70 An alternative to specifying filename at the end of the command
71 line.
72
73 -C, --check
74 Parse the configuration file(s) and exit before performing any
75 evaluations. Any errors in the configuration file are reported.
76
77 -d, --interact
78 Normally pmie would be launched as a non-interactive process to
79 monitor and manage the performance of one or more hosts. Given
80 the -d flag however, execution is interactive and the user is pre‐
81 sented with a menu of options. Interactive mode is useful mainly
82 for debugging new expressions.
83
84 -e, --timestamp
85 When used with -V, -v or -W, this option forces timestamps to be
86 reported with each expression. The timestamps are in ctime(3)
87 format, enclosed in parenthesis and appear after the expression
88 name and before the expression value, e.g.
89 expr_1 (Tue Feb 6 19:55:10 2001): 12
90
91 -f, --foreground
92 If the -l option is specified and there is no -a option (ie. real-
93 time monitoring) then pmie is run as a daemon in the background
94 (in all other cases foreground is the default). The -f (and -F,
95 see below) options force pmie to be run in the foreground, inde‐
96 pendent of any other options.
97
98 -F, --systemd
99 Like -f, the -F option runs pmie in the foreground, but also does
100 some housekeeping (like create a pid file, change user id and
101 notify systemd(1) when pmie has started or is shutting down).
102 This is intended for use when pmie is launched from systemd(1) and
103 the daemonizing has already been done. The -f and -F options are
104 mutually exclusive.
105
106 -h host, --host=host
107 By default performance data is fetched from the local host (in
108 real-time mode) or the host for the first named set of archives on
109 the command line (in archive mode). The host argument overrides
110 this default. It does not override hosts explicitly named in the
111 expressions being evaluated. The host argument is interpreted as
112 a connection specification for pmNewContext, and is later mapped
113 to the remote pmcd's self-reported host name for reporting pur‐
114 poses. See also the %h vs. %c substitutions in rule action
115 strings below.
116
117 -l logfile, --logfile=logfile
118 Standard error is sent to logfile.
119
120 -j file
121 An alternative STOMP protocol configuration is loaded from stomp‐
122 file. If this option is not used, and the stomp action is used in
123 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
124 will be used.
125
126 -n pmnsfile, --namespace=pmnsfile
127 An alternative Performance Metrics Name Space (PMNS) is loaded
128 from the file pmnsfile.
129
130 -O origin, --origin=origin
131 Specify the origin of the time window. See PCPIntro(1) for com‐
132 plete description of this option.
133
134 -P, --primary
135 Identifies this as the primary pmie instance for a host. See the
136 ``AUTOMATIC RESTART'' section below for further details.
137
138 -q, --quiet
139 Suppresses diagnostic messages that would be printed to standard
140 output by default, especially the "evaluator exiting" message as
141 this can confuse scripts.
142
143 -S starttime, --start=starttime
144 Specify the starttime of the time window. See PCPIntro(1) for
145 complete description of this option.
146
147 -t interval, --interval=interval
148 The interval argument follows the syntax described in PCPIntro(1),
149 and in the simplest form may be an unsigned integer (the implied
150 units in this case are seconds). The value is used to determine
151 the sample interval for expressions that do not explicitly set
152 their sample interval using the pmie variable delta described
153 below. The default is 10.0 seconds.
154
155 -T endtime, --finish=endtime
156 Specify the endtime of the time window. See PCPIntro(1) for com‐
157 plete description of this option.
158
159 -U username, --username=username
160 User account under which to run pmie. The default is the current
161 user account for interactive use. When run as a daemon, the
162 unprivileged "pcp" account is used in current versions of PCP, but
163 in older versions the superuser account ("root") was used by
164 default.
165
166 -v Unless one of the verbose options -V, -v or -W appears on the com‐
167 mand line, expressions are evaluated silently, the only output is
168 as a result of any actions being executed. In the verbose mode,
169 specified using the -v flag, the value of each expression is
170 printed as it is evaluated. The values are in canonical units;
171 bytes in the dimension of ``space'', seconds in the dimension of
172 ``time'' and events in the dimension of ``count''. See
173 pmLookupDesc(3) for details of the supported dimension and scaling
174 mechanisms for performance metrics. The verbose mode is useful in
175 monitoring the value of given expressions, evaluating derived per‐
176 formance metrics, passing these values on to other tools for fur‐
177 ther processing and in debugging new expressions.
178
179 -V, --verbose
180 This option has the same effect as the -v option, except that the
181 name of the host and instance (if applicable) are printed as well
182 as expression values.
183
184 -W This option has the same effect as the -V option described above,
185 except that for boolean expressions, only those names and values
186 that make the expression true are printed. These are the same
187 names and values accessible to rule actions as the %h, %i, %c and
188 %v bindings, as described below.
189
190 -x, --secret-agent
191 Execute in domain agent mode. This mode is used within the Per‐
192 formance Co-Pilot product to derive values for summary metrics,
193 see pmdasummary(1). Only restricted functionality is available in
194 this mode (expressions with actions may not be used).
195
196 -X, --secret-applet
197 Run in secret applet mode (thin client).
198
199 -z, --hostzone
200 Change the reporting timezone to the timezone of the host that is
201 the source of the performance metrics, as identified via either
202 the -h option or the first named set of archives (as described
203 above for the -a option).
204
205 -Z timezone, --timezone=timezone
206 Change the reporting timezone to timezone in the format of the
207 environment variable TZ as described in environ(7).
208
209 -?, --help
210 Display usage message and exit.
211
213 The following example expressions demonstrate some of the capabilities
214 of the inference engine.
215
216 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
217 examples of pmie expressions.
218
219 The variable delta controls expression evaluation frequency. Specify
220 that subsequent expressions be evaluated once a second, until further
221 notice:
222
223 delta = 1 sec;
224
225 If the total context switch rate exceeds 10000 per second per CPU, then
226 display an alarm notifier:
227
228 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
229 -> alarm "high context switch rate %v";
230
231 If the high context switch rate is sustained for 10 consecutive sam‐
232 ples, then launch top(1) in an xterm(1) window to monitor processes,
233 but do this at most once every 5 minutes:
234
235 all_sample (
236 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
237 ) -> shell 5 min "xterm -e 'top'";
238
239 The following rules are evaluated once every 20 seconds:
240
241 delta = 20 sec;
242
243 If any disk is performing more than 60 I/Os per second, then print a
244 message identifying the busy disk to standard output and launch
245 dkvis(1):
246
247 some_inst (
248 disk.dev.total > 60 count/sec
249 ) -> print "busy disks:" " %i" &
250 shell 5 min "dkvis";
251
252 Refine the preceding rule to apply only between the hours of 9am and
253 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
254 before executing the action:
255
256 $hour >= 9 && $hour <= 17 &&
257 some_inst (
258 75 %_sample (
259 disk.dev.total @0..3 > 60 count/sec
260 )
261 ) -> print "disks busy for 20 sec:" " [%h]%i";
262
263 The following two rules are evaluated once every 10 minutes:
264
265 delta = 10 min;
266
267 If either the / or the /usr filesystem is more than 95% full, display
268 an alarm popup, but not if it has already been displayed during the
269 last 4 hours:
270
271 filesys.free #'/dev/root' /
272 filesys.capacity #'/dev/root' < 0.05
273 -> alarm 4 hour "root filesystem (almost) full";
274
275 filesys.free #'/dev/usr' /
276 filesys.capacity #'/dev/usr' < 0.05
277 -> alarm 4 hour "/usr filesystem (almost) full";
278
279 The following rule requires a machine that supports the lmsensors met‐
280 rics. If the machine environment temperature rises more than 2 degrees
281 over a 10 minute interval, write an entry in the system log:
282
283 lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
284 -> alarm "temperature rising fast" &
285 syslog "machine room temperature rise alarm";
286
287 And something interesting if you have performance problems with your
288 Oracle database:
289
290 // back to 30sec evaluations
291 delta = 30 sec;
292 sid = "ptg1"; # $ORACLE_SID setting
293 lid = "223"; # latch ID from v$latch
294 lru = "#'$sid/$lid cache buffers lru chain'";
295 host = ":moomba.melbourne.sgi.com";
296 gets = "oracle.latch.gets $host $lru";
297 total = "oracle.latch.gets $host $lru +
298 oracle.latch.misses $host $lru +
299 oracle.latch.immisses $host $lru";
300
301 $total > 100 && $gets / $total < 0.2
302 -> alarm "high lru latch contention in database $sid";
303
304 The following ruleset will emit exactly one message depending on the
305 availability and value of the 1-minute load average.
306
307 delta = 1 minute;
308 ruleset
309 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
310 print "extreme load average %v"
311 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
312 print "moderate load average %v"
313 unknown ->
314 print "load average unavailable"
315 otherwise ->
316 print "load average OK"
317 ;
318
319 The following rule will emit a message when some filesystem is more
320 than 75% full and is filling at a rate that if sustained would fill the
321 filesystem to 100% in less than 30 minutes.
322
323 some_inst (
324 100 * filesys.used / filesys.capacity > 75 &&
325 filesys.used + 30min * (rate filesys.used) > filesys.capacity
326 ) -> print "filesystem will be full within 30 mins:" " %i";
327
328 If the metric mypmda.errors counts errors then the following rule will
329 emit a message if the rate of errors exceeds 1 per second provided the
330 error count is less than 100.
331
332 mypmda.errors > 1 && instant mypmda.errors < 100
333 -> print "high error rate: %v";
334
336 The pmie specification language is powerful and large.
337
338 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
339 vides a facility for generating a pmie configuration file from a set of
340 generalized pmie rules. The supplied set of rules covers a wide range
341 of performance scenarios.
342
343 The Performance Co-Pilot User's and Administrator's Guide provides a
344 detailed tutorial-style chapter covering pmie.
345
347 This description is terse and informal. For a more comprehensive
348 description see the Performance Co-Pilot User's and Administrator's
349 Guide.
350
351 A pmie specification is a sequence of semicolon terminated expressions.
352
353 Basic operators are modeled on the arithmetic, relational and Boolean
354 operators of the C programming language. Precedence rules are as
355 expected, although the use of parentheses is encouraged to enhance
356 readability and remove ambiguity.
357
358 Operands are performance metric names (see PMNS(5)) and the normal lit‐
359 eral constants.
360
361 Operands involving performance metrics may produce sets of values, as a
362 result of enumeration in the dimensions of hosts, instances and time.
363 Special qualifiers may appear after a performance metric name to define
364 the enumeration in each dimension. For example,
365
366 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
367
368 defines 6 values corresponding to the time spent executing in user mode
369 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
370 samples. The default interpretation in the absence of : (host), #
371 (instance) and @ (time) qualifiers is all instances at the most recent
372 sample time for the default source of PCP performance metrics.
373
374 Host and instance names that do not follow the rules for variables in
375 programming languages, ie. alphabetic optionally followed by alphanu‐
376 merics, should be enclosed in single quotes.
377
378 Expression evaluation follows the law of ``least surprises''. Where
379 performance metrics have the semantics of a counter, pmie will automat‐
380 ically convert to a rate based upon consecutive samples and the time
381 interval between these samples. All numeric expressions are evaluated
382 in double precision, and where appropriate, automatically scaled into
383 canonical units of ``bytes'', ``seconds'' and ``counts''.
384
385 A rule is a special form of expression that specifies a condition or
386 logical expression, a special operator (->) and actions to be performed
387 when the condition is found to be true.
388
389 The following table summarizes the basic pmie operators:
390
391 ┌────────────────┬────────────────────────────────────────────────┐
392 │ Operators │ Explanation │
393 ├────────────────┼────────────────────────────────────────────────┤
394 │+ - * / │ Arithmetic │
395 │< <= == >= > != │ Relational (value comparison) │
396 │! && || │ Boolean │
397 │-> │ Rule │
398 │rising │ Boolean, false to true transition │
399 │falling │ Boolean, true to false transition │
400 │rate │ Explicit rate conversion (rarely required) │
401 │instant │ No automatic rate conversion (rarely required) │
402 └────────────────┴────────────────────────────────────────────────┘
403 All operators are supported for numeric-valued operands and expres‐
404 sions. For string-valued operands, namely literal string constants
405 enclosed in double quotes or metrics with a data type of string
406 (PM_TYPE_STRING), only the operators == and != are supported.
407
408 The rate and instant operators are the logical inverse of one another,
409 so an arithmetic expression expr is equal to rate instant expr. The
410 more useful cases involve using rate with a metric that is not a
411 counter to determine the rate of change over time or instant with a
412 metric that is a counter to determine if the current value is above or
413 below some threshold.
414
415 Aggregate operators may be used to aggregate or summarize along one
416 dimension of a set-valued expression. The following aggregate opera‐
417 tors map from a logical expression to a logical expression of lower
418 dimension.
419
420 ┌─────────────────────────┬─────────────┬──────────────────────────┐
421 │ Operators │ Type │ Explanation │
422 ├─────────────────────────┼─────────────┼──────────────────────────┤
423 │some_inst │ Existential │ True if at least one set │
424 │some_host │ │ member is true in the │
425 │some_sample │ │ associated dimension │
426 ├─────────────────────────┼─────────────┼──────────────────────────┤
427 │all_inst │ Universal │ True if all set members │
428 │all_host │ │ are true in the associ‐ │
429 │all_sample │ │ ated dimension │
430 ├─────────────────────────┼─────────────┼──────────────────────────┤
431 │N%_inst │ Percentile │ True if at least N per‐ │
432 │N%_host │ │ cent of set members are │
433 │N%_sample │ │ true in the associated │
434 │ │ │ dimension │
435 └─────────────────────────┴─────────────┴──────────────────────────┘
436 The following instantial operators may be used to filter or limit a
437 set-valued logical expression, based on regular expression matching of
438 instance names. The logical expression must be a set involving the
439 dimension of instances, and the regular expression is of the form used
440 by egrep(1) or the Extended Regular Expressions of regcomp(3).
441
442 ┌─────────────┬──────────────────────────────────────────┐
443 │ Operators │ Explanation │
444 ├─────────────┼──────────────────────────────────────────┤
445 │match_inst │ For each value of the logical expression │
446 │ │ that is ``true'', the result is ``true'' │
447 │ │ if the associated instance name matches │
448 │ │ the regular expression. Otherwise the │
449 │ │ result is ``false''. │
450 ├─────────────┼──────────────────────────────────────────┤
451 │nomatch_inst │ For each value of the logical expression │
452 │ │ that is ``true'', the result is ``true'' │
453 │ │ if the associated instance name does not │
454 │ │ match the regular expression. Otherwise │
455 │ │ the result is ``false''. │
456 └─────────────┴──────────────────────────────────────────┘
457 For example, the expression below will be ``true'' for disks attached
458 to controllers 2 or 3 performing more than 20 operations per second:
459 match_inst "^dks[23]d" disk.dev.total > 20;
460
461 The following aggregate operators map from an arithmetic expression to
462 an arithmetic expression of lower dimension.
463
464 ┌─────────────────────────┬───────────┬──────────────────────────┐
465 │ Operators │ Type │ Explanation │
466 ├─────────────────────────┼───────────┼──────────────────────────┤
467 │min_inst │ Extrema │ Minimum value across all │
468 │min_host │ │ set members in the asso‐ │
469 │min_sample │ │ ciated dimension │
470 ├─────────────────────────┼───────────┼──────────────────────────┤
471 │max_inst │ Extrema │ Maximum value across all │
472 │max_host │ │ set members in the asso‐ │
473 │max_sample │ │ ciated dimension │
474 ├─────────────────────────┼───────────┼──────────────────────────┤
475 │sum_inst │ Aggregate │ Sum of values across all │
476 │sum_host │ │ set members in the asso‐ │
477 │sum_sample │ │ ciated dimension │
478 ├─────────────────────────┼───────────┼──────────────────────────┤
479 │avg_inst │ Aggregate │ Average value across all │
480 │avg_host │ │ set members in the asso‐ │
481 │avg_sample │ │ ciated dimension │
482 └─────────────────────────┴───────────┴──────────────────────────┘
483 The aggregate operators count_inst, count_host and count_sample map
484 from a logical expression to an arithmetic expression of lower dimen‐
485 sion by counting the number of set members for which the expression is
486 true in the associated dimension.
487
488 For action rules, the following actions are defined:
489
490 ┌──────────┬────────────────────────────────────────┐
491 │Operators │ Explanation │
492 ├──────────┼────────────────────────────────────────┤
493 │alarm │ Raise a visible alarm with xconfirm(1) │
494 │print │ Display on standard output │
495 │shell │ Execute with sh(1) │
496 │stomp │ Send a STOMP message to a JMS server │
497 │syslog │ Append a message to system log file │
498 └──────────┴────────────────────────────────────────┘
499 Multiple actions may be separated by the & and | operators to specify
500 respectively sequential execution (both actions are executed) and
501 alternate execution (the second action will only be executed if the
502 execution of the first action returns a non-zero error status.
503
504 Arguments to actions are an optional suppression time, and then one or
505 more expressions (a string is an expression in this context). Strings
506 appearing as arguments to an action may include the following special
507 selectors that will be replaced at the time the action is executed.
508
509 %h Host name(s) that make the left-most top-level expression in the
510 condition true.
511
512 %c Connection specification string(s) or files for a PCP tool to reach
513 the hosts or archives that make the left-most top-level expression
514 in the condition true.
515
516 %i Instance(s) that make the left-most top-level expression in the
517 condition true.
518
519 %v One value from the left-most top-level expression in the condition
520 for each host and instance pair that makes the condition true.
521
522 Note that expansion of the special selectors is done by repeating the
523 whole argument once for each unique binding to any of the qualifying
524 special selectors. For example if a rule were true for the host mumble
525 with instances grunt and snort, and for host fumble the instance puff
526 makes the rule true, then the action
527 ...
528 -> shell myscript "Warning: %h:%i busy ";
529 will execute myscript with the argument string "Warning: mumble:grunt
530 busy Warning: mumble:snort busy Warning: fumble:puff busy".
531
532 By comparison, if the action
533 ...
534 -> shell myscript "Warning! busy:" " %h:%i";
535 were executed under the same circumstances, then myscript would be exe‐
536 cuted with the argument string "Warning! busy: mumble:grunt mum‐
537 ble:snort fumble:puff".
538
539 The semantics of the expansion of the special selectors leads to a com‐
540 mon usage pattern in an action, where one argument is a constant (con‐
541 tains no special selectors) the second argument contains the desired
542 special selectors with minimal separator characters, and an optional
543 third argument provides a constant postscript (e.g. to terminate any
544 argument quoting from the first argument). If necessary post-process‐
545 ing (eg. in myscript) can provide the necessary enumeration over each
546 unique expansion of the string containing just the special selectors.
547
548 For complex conditions, the bindings to these selectors is not obvious.
549 It is strongly recommended that pmie be used in the debugging mode
550 (specify the -W command line option in particular) during rule develop‐
551 ment.
552
554 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
555 10 or some_inst ( my.table < 0 ) are assigned the values true or false
556 or unknown. A value is unknown if one or more of the underlying metric
557 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
558 the metric is not in the PCP archive, no values are currently avail‐
559 able, insufficient values have been fetched to allow a rate converted
560 value to be computed or insufficient values have been fetched to
561 instantiate the required number of samples in the temporal domain.
562
563 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
564 logic) when combining values that include unknown:
565
566 ┌────────────┬───────────────────────────┐
567 │ │ B │
568 │ A and B ├─────────┬───────┬─────────┤
569 │ │ true │ false │ unknown │
570 ├──┬─────────┼─────────┼───────┼─────────┤
571 │ │ true │ true │ false │ unknown │
572 │ ├─────────┼─────────┼───────┼─────────┤
573 │A │ false │ false │ false │ false │
574 │ ├─────────┼─────────┼───────┼─────────┤
575 │ │ unknown │ unknown │ false │ unknown │
576 └──┴─────────┴─────────┴───────┴─────────┘
577 ┌────────────┬──────────────────────────┐
578 │ │ B │
579 │ A or B ├──────┬─────────┬─────────┤
580 │ │ true │ false │ unknown │
581 ├──┬─────────┼──────┼─────────┼─────────┤
582 │ │ true │ true │ true │ true │
583 │ ├─────────┼──────┼─────────┼─────────┤
584 │A │ false │ true │ false │ unknown │
585 │ ├─────────┼──────┼─────────┼─────────┤
586 │ │ unknown │ true │ unknown │ unknown │
587 └──┴─────────┴──────┴─────────┴─────────┘
588 ┌────────┬─────────┐
589 │ A │ not A │
590 ├────────┼─────────┤
591 │ true │ false │
592 ├────────┼─────────┤
593 │ false │ true │
594 ├────────┼─────────┤
595 │unknown │ unknown │
596 └────────┴─────────┘
598 The ruleset clause is used to define a set of rules and actions that
599 are evaluated in order until some action is executed, at which point
600 the remaining rules and actions are skipped until the ruleset is again
601 scheduled for evaluation. The keyword else is used to separate rules.
602 After one or more regular rules (with a predicate and an action), a
603 ruleset may include an optional
604 unknown -> action
605 clause, optionally followed by a
606 otherwise -> action
607 clause.
608
609 If all of the predicates in the rules evaluate to unknown and an
610 unknown clause has been specified then action associated with the
611 unknown clause will be executed.
612
613 If no rule predicate is true and the unknown action is either not spec‐
614 ified or not executed and an otherwise clause has been specified, then
615 the action associated with the otherwise clause will be executed.
616
618 Scale factors may be appended to arithmetic expressions and force lin‐
619 ear scaling of the value to canonical units. Simple scale factors are
620 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
621 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
622 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
623 the operator /, for example ``Kbytes / hour''.
624
626 Macros are defined using expressions of the form:
627
628 name = constexpr;
629
630 Where name follows the normal rules for variables in programming lan‐
631 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
632 must be a constant expression, either a string (enclosed in double
633 quotes) or an arithmetic expression optionally followed by a scale fac‐
634 tor.
635
636 Macros are expanded when their name, prefixed by a dollar ($) appears
637 in an expression, and macros may be nested within a constexpr string.
638
639 The following reserved macro names are understood.
640
641 minute Current minute of the hour.
642
643 hour Current hour of the day, in the range 0 to 23.
644
645 day Current day of the month, in the range 1 to 31.
646
647 month Current month of the year, in the range 0 (January) to 11
648 (December).
649
650 year Current year.
651
652 day_of_week
653 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
654 day).
655
656 delta Sample interval in effect for this expression.
657
658 Dates and times are presented in the reporting time zone (see descrip‐
659 tion of -Z and -z command line options above).
660
662 It is often useful for pmie processes to be started and stopped when
663 the local host is booted or shutdown, or when they have been detected
664 as no longer running (when they have unexpectedly exited for some rea‐
665 son). Refer to pmie_check(1) for details on automating this process.
666
667 Optionally, each system running pmcd(1) may also be configured to run a
668 ``primary'' pmie instance. This pmie instance is launched by
669 $PCP_RC_DIR/pmie, and is affected by the files
670 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
671 chkconfig(8), systemctl(1) or similar platform-specific commands to
672 activate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
673 fig/pmie/config.default (the default initial configuration file for the
674 primary pmie).
675
676 The primary pmie instance is identified by the -P option. There may be
677 at most one ``primary'' pmie instance on each system. The primary pmie
678 instance (if any) must be running on the same host as the pmcd(1) to
679 which it connects (if any), so the -h and -P options are mutually
680 exclusive.
681
683 It is common for production systems to be monitored in a central loca‐
684 tion. Traditionally on UNIX systems this has been performed by the
685 system log facilities - see logger(1), and syslogd(1). On Windows,
686 communication with the system event log is handled by pcp-eventlog(1).
687
688 pmie fits into this model when rules use the syslog action. Note that
689 if the action string begins with -p (priority) and/or -t (tag) then
690 these are extracted from the string and treated in the same way as in
691 logger(1) and pcp-eventlog(1).
692
693 However, it is common to have other event monitoring frameworks also,
694 into which you may wish to incorporate performance events from pmie.
695 You can often use the shell action to send events to these frameworks,
696 as they usually provide their a program for injecting events into the
697 framework from external sources.
698
699 A final option is use of the stomp (Streaming Text Oriented Messaging
700 Protocol) action, which allows pmie to connect to a central JMS (Java
701 Messaging System) server and send events to the PMIE topic. Tools can
702 be written to extract these text messages and present them to opera‐
703 tions people (via desktop popup windows, etc). Use of the stomp action
704 requires a stomp configuration file to be setup, which specifies the
705 location of the JMS server host, port number, and username/password.
706
707 The format of this file is as follows:
708
709 host=messages.sgi.com # this is the JMS server (required)
710 port=61616 # and its listening here (required)
711 timeout=2 # seconds to wait for server (optional)
712 username=joe # (required)
713 password=j03ST0MP # (required)
714 topic=PMIE # JMS topic for pmie messages (optional)
715
716 The timeout value specifies the time (in seconds) that pmie should wait
717 for acknowledgements from the JMS server after sending a message (as
718 required by the STOMP protocol). Note that on startup, pmie will wait
719 indefinitely for a connection, and will not begin rule evaluation until
720 that initial connection has been established. Should the connection to
721 the JMS server be lost at any time while pmie is running, pmie will
722 attempt to reconnect on each subsequent truthful evaluation of a rule
723 with a stomp action, but not more than once per minute. This is to
724 avoid contributing to network congestion. In this situation, where the
725 STOMP connection to the JMS server has been severed, the stomp action
726 will return a non-zero error value.
727
729 The lexical scanner and parser will attempt to recover after an error
730 in the input expressions. Parsing resumes after skipping input up to
731 the next semi-colon (;), however during this skipping process the scan‐
732 ner is ignorant of comments and strings, so an embedded semi-colon may
733 cause parsing to resume at an unexpected place. This behavior is
734 largely benign, as until the initial syntax error is corrected, pmie
735 will not attempt any expression evaluation.
736
738 $PCP_DEMOS_DIR/pmie/*
739 annotated example rules
740
741 $PCP_VAR_DIR/pmns/*
742 default PMNS specification files
743
744 $PCP_TMP_DIR/pmie
745 pmie maintains files in this directory to identify the running
746 pmie instances and to export runtime information about each
747 instance - this data forms the basis of the pmcd.pmie performance
748 metrics
749
750 $PCP_PMIECONTROL_PATH
751 the default set of pmie instances to start at boot time - refer to
752 pmie_check(1) for details
753
755 Environment variables with the prefix PCP_ are used to parameterize the
756 file and directory names used by PCP. On each installation, the file
757 /etc/pcp.conf contains the local values for these variables. The
758 $PCP_CONF variable may be used to specify an alternative configuration
759 file, as described in pcp.conf(5).
760
761 When executing shell actions, pmie overrides two variables - IFS and
762 PATH - in the environment of the child process. IFS is set to "\t\n".
763 The PATH is set to a combination of a default path for all platforms
764 ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable compo‐
765 nents. These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
766 $PCP_PLATFORM_PATHS.
767
768 When executing popup alarm actions, pmie will use the value of
769 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
770 typically set to pmconfirm(1), a cross-platform dialog box.
771
773 logger(1).
774
776 pcp-eventlog(1).
777
779 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
780 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), systemd(1), PMAPI(3),
781 pcp.conf(5), pcp.env(5) and PMNS(5).
782
784 For a more complete description of the pmie language, refer to the Per‐
785 formance Co-Pilot Users and Administrators Guide. This is available
786 online from:
787 https://pcp.readthedocs.io/en/latest/UAG/PerformanceMetricsInferenceEngine.html
788
789
790
791Performance Co-Pilot PCP PMIE(1)