1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHqVvWxz] [-A align] [-a archive] [-c filename] [-h host]
10 [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S starttime]
11 [-T endtime] [-t interval] [-U username] [-Z timezone] [filename ...]
12
14 pmie accepts a collection of arithmetic, logical, and rule expressions
15 to be evaluated at specified frequencies. The base data for the
16 expressions consists of performance metrics values delivered in real-
17 time from any host running the Performance Metrics Collection Daemon
18 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
19 chive logs.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 A description of the command line options specific to pmie follows:
31
32 -a archive which is a comma-separated list of names, each of which
33 may be the base name of an archive or the name of a directory con‐
34 taining one or more archives written by pmlogger(1). Multiple
35 instances of the -a flag may appear on the command line to specify
36 a list of sets of archives. In this case, it is required that
37 only one set of archives be present for any one host. Also, any
38 explicit host names occurring in a pmie expression must match the
39 host name recorded in one of the archive labels. In the case of
40 multiple sets of archives, timestamps recorded in the archives are
41 used to ensure temporal consistency.
42
43 -b Output will be line buffered and standard output is attached to
44 standard error. This is most useful for background execution in
45 conjunction with the -l option. The -b option is always used for
46 pmie instances launched from pmie_check(1).
47
48 -C Parse the configuration file(s) and exit before performing any
49 evaluations. Any errors in the configuration file are reported.
50
51 -c An alternative to specifying filename at the end of the command
52 line.
53
54 -d Normally pmie would be launched as a non-interactive process to
55 monitor and manage the performance of one or more hosts. Given
56 the -d flag however, execution is interactive and the user is pre‐
57 sented with a menu of options. Interactive mode is useful mainly
58 for debugging new expressions.
59
60 -e When used with -V, -v or -W, this option forces timestamps to be
61 reported with each expression. The timestamps are in ctime(3)
62 format, enclosed in parenthesis and appear after the expression
63 name and before the expression value, e.g.
64 expr_1 (Tue Feb 6 19:55:10 2001): 12
65
66 -f If the -l option is specified and there is no -a option (ie. real-
67 time monitoring) then pmie is run as a daemon in the background
68 (in all other cases foreground is the default). The -f option
69 forces pmie to be run in the foreground, independent of any other
70 options.
71
72 -h By default performance data is fetched from the local host (in
73 real-time mode) or the host for the first named set of archives on
74 the command line (in archive mode). The host argument overrides
75 this default. It does not override hosts explicitly named in the
76 expressions being evaluated. The host argument is interpreted as
77 a connection specification for pmNewContext, and is later mapped
78 to the remote pmcd's self-reported host name for reporting pur‐
79 poses. See also the %h vs. %c substitutions in rule action
80 strings below.
81
82 -l Standard error is sent to logfile.
83
84 -j An alternative STOMP protocol configuration is loaded from stomp‐
85 file. If this option is not used, and the stomp action is used in
86 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
87 will be used.
88
89 -n An alternative Performance Metrics Name Space (PMNS) is loaded
90 from the file pmnsfile.
91
92 -q Suppresses diagnostic messages that would be printed to standard
93 output by default, especially the "evaluator exiting" message as
94 this can confuse scripts.
95
96 -t The interval argument follows the syntax described in PCPIntro(1),
97 and in the simplest form may be an unsigned integer (the implied
98 units in this case are seconds). The value is used to determine
99 the sample interval for expressions that do not explicitly set
100 their sample interval using the pmie variable delta described
101 below. The default is 10.0 seconds.
102
103 -U username
104 User account under which to run pmie. The default is the current
105 user account for interactive use. When run as a daemon, the
106 unprivileged "pcp" account is used in current versions of PCP, but
107 in older versions the superuser account ("root") was used by
108 default.
109
110 -v Unless one of the verbose options -V, -v or -W appears on the com‐
111 mand line, expressions are evaluated silently, the only output is
112 as a result of any actions being executed. In the verbose mode,
113 specified using the -v flag, the value of each expression is
114 printed as it is evaluated. The values are in canonical units;
115 bytes in the dimension of ``space'', seconds in the dimension of
116 ``time'' and events in the dimension of ``count''. See
117 pmLookupDesc(3) for details of the supported dimension and scaling
118 mechanisms for performance metrics. The verbose mode is useful in
119 monitoring the value of given expressions, evaluating derived per‐
120 formance metrics, passing these values on to other tools for fur‐
121 ther processing and in debugging new expressions.
122
123 -V This option has the same effect as the -v option, except that the
124 name of the host and instance (if applicable) are printed as well
125 as expression values.
126
127 -W This option has the same effect as the -V option described above,
128 except that for boolean expressions, only those names and values
129 that make the expression true are printed. These are the same
130 names and values accessible to rule actions as the %h, %i, %c and
131 %v bindings, as described below.
132
133 -x Execute in domain agent mode. This mode is used within the Per‐
134 formance Co-Pilot product to derive values for summary metrics,
135 see pmdasummary(1). Only restricted functionality is available in
136 this mode (expressions with actions may not be used).
137
138 -Z Change the reporting timezone to timezone in the format of the
139 environment variable TZ as described in environ(7).
140
141 -z Change the reporting timezone to the timezone of the host that is
142 the source of the performance metrics, as identified via either
143 the -h option or the first named set of archives (as described
144 above for the -a option).
145
146 The -S, -T, -O, and -A options may be used to define a time window to
147 restrict the samples retrieved, set an initial origin within the time
148 window, or specify a ``natural'' alignment of the sample times; refer
149 to PCPIntro(1) for a complete description of these options.
150
151 Output from pmie is directed to standard output and standard error as
152 follows:
153
154 stdout
155 Expression values printed in the verbose -v mode and the output of
156 print actions.
157
158 stderr
159 Error and warning messages for any syntactic or semantic problems
160 during expression parsing, and any semantic or performance metrics
161 availability problems during expression evaluation.
162
164 The following example expressions demonstrate some of the capabilities
165 of the inference engine.
166
167 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
168 examples of pmie expressions.
169
170 The variable delta controls expression evaluation frequency. Specify
171 that subsequent expressions be evaluated once a second, until further
172 notice:
173
174 delta = 1 sec;
175
176 If the total context switch rate exceeds 10000 per second per CPU, then
177 display an alarm notifier:
178
179 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
180 -> alarm "high context switch rate %v";
181
182 If the high context switch rate is sustained for 10 consecutive sam‐
183 ples, then launch top(1) in an xterm(1) window to monitor processes,
184 but do this at most once every 5 minutes:
185
186 all_sample (
187 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
188 ) -> shell 5 min "xterm -e 'top'";
189
190 The following rules are evaluated once every 20 seconds:
191
192 delta = 20 sec;
193
194 If any disk is performing more than 60 I/Os per second, then print a
195 message identifying the busy disk to standard output and launch
196 dkvis(1):
197
198 some_inst (
199 disk.dev.total > 60 count/sec
200 ) -> print "busy disks:" " %i" &
201 shell 5 min "dkvis";
202
203 Refine the preceding rule to apply only between the hours of 9am and
204 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
205 before executing the action:
206
207 $hour >= 9 && $hour <= 17 &&
208 some_inst (
209 75 %_sample (
210 disk.dev.total @0..3 > 60 count/sec
211 )
212 ) -> print "disks busy for 20 sec:" " [%h]%i";
213
214 The following two rules are evaluated once every 10 minutes:
215
216 delta = 10 min;
217
218 If either the / or the /usr filesystem is more than 95% full, display
219 an alarm popup, but not if it has already been displayed during the
220 last 4 hours:
221
222 filesys.free #'/dev/root' /
223 filesys.capacity #'/dev/root' < 0.05
224 -> alarm 4 hour "root filesystem (almost) full";
225
226 filesys.free #'/dev/usr' /
227 filesys.capacity #'/dev/usr' < 0.05
228 -> alarm 4 hour "/usr filesystem (almost) full";
229
230 The following rule requires a machine that supports the PCP environment
231 metrics. If the machine environment temperature rises more than 2
232 degrees over a 10 minute interval, write an entry in the system log:
233
234 environ.temp @0 - environ.temp @1 > 2
235 -> alarm "temperature rising fast" &
236 syslog "machine room temperature rise alarm";
237
238 And something interesting if you have performance problems with your
239 Oracle database:
240
241 // back to 30sec evaluations
242 delta = 30 sec;
243 sid = "ptg1"; # $ORACLE_SID setting
244 lid = "223"; # latch ID from v$latch
245 lru = "#'$sid/$lid cache buffers lru chain'";
246 host = ":moomba.melbourne.sgi.com";
247 gets = "oracle.latch.gets $host $lru";
248 total = "oracle.latch.gets $host $lru +
249 oracle.latch.misses $host $lru +
250 oracle.latch.immisses $host $lru";
251
252 $total > 100 && $gets / $total < 0.2
253 -> alarm "high lru latch contention in database $sid";
254
255 The following ruleset will emit exactly one message depending on the
256 availability and value of the 1-minute load average.
257
258 delta = 1 minute;
259 ruleset
260 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
261 print "extreme load average %v"
262 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
263 print "moderate load average %v"
264 unknown ->
265 print "load average unavailable"
266 otherwise ->
267 print "load average OK"
268 ;
269
270 The following rule will emit a message when some filesystem is more
271 than 75% full and is filling at a rate that if sustained would fill the
272 filesystem to 100% in less than 30 minutes.
273
274 some_inst (
275 100 * filesys.used / filesys.capacity > 75 &&
276 filesys.used + 30min * (rate filesys.used) > filesys.capacity
277 ) -> print "filesystem will be full within 30 mins:" " %i";
278
279 If the metric mypmda.errors counts errors then the following rule will
280 emit a message if the rate of errors exceeds 1 per second provided the
281 error count is less than 100.
282
283 mypmda.errors > 1 && instant mypmda.errors < 100
284 -> print "high error rate: %v";
285
287 The pmie specification language is powerful and large.
288
289 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
290 vides a facility for generating a pmie configuration file from a set of
291 generalized pmie rules. The supplied set of rules covers a wide range
292 of performance scenarios.
293
294 The Performance Co-Pilot User's and Administrator's Guide provides a
295 detailed tutorial-style chapter covering pmie.
296
298 This description is terse and informal. For a more comprehensive
299 description see the Performance Co-Pilot User's and Administrator's
300 Guide.
301
302 A pmie specification is a sequence of semicolon terminated expressions.
303
304 Basic operators are modeled on the arithmetic, relational and Boolean
305 operators of the C programming language. Precedence rules are as
306 expected, although the use of parentheses is encouraged to enhance
307 readability and remove ambiguity.
308
309 Operands are performance metric names (see pmns(5)) and the normal lit‐
310 eral constants.
311
312 Operands involving performance metrics may produce sets of values, as a
313 result of enumeration in the dimensions of hosts, instances and time.
314 Special qualifiers may appear after a performance metric name to define
315 the enumeration in each dimension. For example,
316
317 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
318
319 defines 6 values corresponding to the time spent executing in user mode
320 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
321 samples. The default interpretation in the absence of : (host), #
322 (instance) and @ (time) qualifiers is all instances at the most recent
323 sample time for the default source of PCP performance metrics.
324
325 Host and instance names that do not follow the rules for variables in
326 programming languages, ie. alphabetic optionally followed by alphanu‐
327 merics, should be enclosed in single quotes.
328
329 Expression evaluation follows the law of ``least surprises''. Where
330 performance metrics have the semantics of a counter, pmie will automat‐
331 ically convert to a rate based upon consecutive samples and the time
332 interval between these samples. All numeric expressions are evaluated
333 in double precision, and where appropriate, automatically scaled into
334 canonical units of ``bytes'', ``seconds'' and ``counts''.
335
336 A rule is a special form of expression that specifies a condition or
337 logical expression, a special operator (->) and actions to be performed
338 when the condition is found to be true.
339
340 The following table summarizes the basic pmie operators:
341
342 ┌────────────────┬────────────────────────────────────────────────┐
343 │ Operators │ Explanation │
344 ├────────────────┼────────────────────────────────────────────────┤
345 │+ - * / │ Arithmetic │
346 │< <= == >= > != │ Relational (value comparison) │
347 │! && || │ Boolean │
348 │-> │ Rule │
349 │rising │ Boolean, false to true transition │
350 │falling │ Boolean, true to false transition │
351 │rate │ Explicit rate conversion (rarely required) │
352 │instant │ No automatic rate conversion (rarely required) │
353 └────────────────┴────────────────────────────────────────────────┘
354 All operators are supported for numeric-valued operands and expres‐
355 sions. For string-valued operands, namely literal string constants
356 enclosed in double quotes or metrics with a data type of string
357 (PM_TYPE_STRING), only the operators == and != are supported.
358
359 The rate and instant operators are the logical inverse of one another,
360 so an arithmetic expression expr is equal to rate instant expr. The
361 more useful cases involve using rate with a metric that is not a
362 counter to determine the rate of change over time or instant with a
363 metric that is a counter to determine if the current value is above or
364 below some threshold.
365
366 Aggregate operators may be used to aggregate or summarize along one
367 dimension of a set-valued expression. The following aggregate opera‐
368 tors map from a logical expression to a logical expression of lower
369 dimension.
370
371 ┌─────────────────────────┬─────────────┬──────────────────────────┐
372 │ Operators │ Type │ Explanation │
373 ├─────────────────────────┼─────────────┼──────────────────────────┤
374 │some_inst │ Existential │ True if at least one set │
375 │some_host │ │ member is true in the │
376 │some_sample │ │ associated dimension │
377 ├─────────────────────────┼─────────────┼──────────────────────────┤
378 │all_inst │ Universal │ True if all set members │
379 │all_host │ │ are true in the associ‐ │
380 │all_sample │ │ ated dimension │
381 ├─────────────────────────┼─────────────┼──────────────────────────┤
382 │N%_inst │ Percentile │ True if at least N per‐ │
383 │N%_host │ │ cent of set members are │
384 │N%_sample │ │ true in the associated │
385 │ │ │ dimension │
386 └─────────────────────────┴─────────────┴──────────────────────────┘
387 The following instantial operators may be used to filter or limit a
388 set-valued logical expression, based on regular expression matching of
389 instance names. The logical expression must be a set involving the
390 dimension of instances, and the regular expression is of the form used
391 by egrep(1) or the Extended Regular Expressions of regcomp(3).
392
393 ┌─────────────┬──────────────────────────────────────────┐
394 │ Operators │ Explanation │
395 ├─────────────┼──────────────────────────────────────────┤
396 │match_inst │ For each value of the logical expression │
397 │ │ that is ``true'', the result is ``true'' │
398 │ │ if the associated instance name matches │
399 │ │ the regular expression. Otherwise the │
400 │ │ result is ``false''. │
401 ├─────────────┼──────────────────────────────────────────┤
402 │nomatch_inst │ For each value of the logical expression │
403 │ │ that is ``true'', the result is ``true'' │
404 │ │ if the associated instance name does not │
405 │ │ match the regular expression. Otherwise │
406 │ │ the result is ``false''. │
407 └─────────────┴──────────────────────────────────────────┘
408 For example, the expression below will be ``true'' for disks attached
409 to controllers 2 or 3 performing more than 20 operations per second:
410 match_inst "^dks[23]d" disk.dev.total > 20;
411
412 The following aggregate operators map from an arithmetic expression to
413 an arithmetic expression of lower dimension.
414
415 ┌─────────────────────────┬───────────┬──────────────────────────┐
416 │ Operators │ Type │ Explanation │
417 ├─────────────────────────┼───────────┼──────────────────────────┤
418 │min_inst │ Extrema │ Minimum value across all │
419 │min_host │ │ set members in the asso‐ │
420 │min_sample │ │ ciated dimension │
421 ├─────────────────────────┼───────────┼──────────────────────────┤
422 │max_inst │ Extrema │ Maximum value across all │
423 │max_host │ │ set members in the asso‐ │
424 │max_sample │ │ ciated dimension │
425 ├─────────────────────────┼───────────┼──────────────────────────┤
426 │sum_inst │ Aggregate │ Sum of values across all │
427 │sum_host │ │ set members in the asso‐ │
428 │sum_sample │ │ ciated dimension │
429 ├─────────────────────────┼───────────┼──────────────────────────┤
430 │avg_inst │ Aggregate │ Average value across all │
431 │avg_host │ │ set members in the asso‐ │
432 │avg_sample │ │ ciated dimension │
433 └─────────────────────────┴───────────┴──────────────────────────┘
434 The aggregate operators count_inst, count_host and count_sample map
435 from a logical expression to an arithmetic expression of lower dimen‐
436 sion by counting the number of set members for which the expression is
437 true in the associated dimension.
438
439 For action rules, the following actions are defined:
440
441 ┌──────────┬────────────────────────────────────────┐
442 │Operators │ Explanation │
443 ├──────────┼────────────────────────────────────────┤
444 │alarm │ Raise a visible alarm with xconfirm(1) │
445 │print │ Display on standard output │
446 │shell │ Execute with sh(1) │
447 │stomp │ Send a STOMP message to a JMS server │
448 │syslog │ Append a message to system log file │
449 └──────────┴────────────────────────────────────────┘
450 Multiple actions may be separated by the & and | operators to specify
451 respectively sequential execution (both actions are executed) and
452 alternate execution (the second action will only be executed if the
453 execution of the first action returns a non-zero error status.
454
455 Arguments to actions are an optional suppression time, and then one or
456 more expressions (a string is an expression in this context). Strings
457 appearing as arguments to an action may include the following special
458 selectors that will be replaced at the time the action is executed.
459
460 %h Host name(s) that make the left-most top-level expression in the
461 condition true.
462
463 %c Connection specification string(s) or files for a PCP tool to reach
464 the hosts or archives that make the left-most top-level expression
465 in the condition true.
466
467 %i Instance(s) that make the left-most top-level expression in the
468 condition true.
469
470 %v One value from the left-most top-level expression in the condition
471 for each host and instance pair that makes the condition true.
472
473 Note that expansion of the special selectors is done by repeating the
474 whole argument once for each unique binding to any of the qualifying
475 special selectors. For example if a rule were true for the host mumble
476 with instances grunt and snort, and for host fumble the instance puff
477 makes the rule true, then the action
478 ...
479 -> shell myscript "Warning: %h:%i busy ";
480 will execute myscript with the argument string "Warning: mumble:grunt
481 busy Warning: mumble:snort busy Warning: fumble:puff busy".
482
483 By comparison, if the action
484 ...
485 -> shell myscript "Warning! busy:" " %h:%i";
486 were executed under the same circumstances, then myscript would be exe‐
487 cuted with the argument string "Warning! busy: mumble:grunt mum‐
488 ble:snort fumble:puff".
489
490 The semantics of the expansion of the special selectors leads to a com‐
491 mon usage pattern in an action, where one argument is a constant (con‐
492 tains no special selectors) the second argument contains the desired
493 special selectors with minimal separator characters, and an optional
494 third argument provides a constant postscript (e.g. to terminate any
495 argument quoting from the first argument). If necessary post-process‐
496 ing (eg. in myscript) can provide the necessary enumeration over each
497 unique expansion of the string containing just the special selectors.
498
499 For complex conditions, the bindings to these selectors is not obvious.
500 It is strongly recommended that pmie be used in the debugging mode
501 (specify the -W command line option in particular) during rule develop‐
502 ment.
503
505 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
506 10 or some_inst ( my.table < 0 ) are assigned the values true or false
507 or unknown. A value is unknown if one or more of the underlying metric
508 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
509 the metric is not in the PCP archive, no values are currently avail‐
510 able, insufficient values have been fetched to allow a rate converted
511 value to be computed or insufficient values have been fetched to
512 instantiate the required number of samples in the temporal domain.
513
514 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
515 logic) when combining values that include unknown:
516
517 ┌────────────┬───────────────────────────┐
518 │ │ B │
519 │ A and B ├─────────┬───────┬─────────┤
520 │ │ true │ false │ unknown │
521 ├──┬─────────┼─────────┼───────┼─────────┤
522 │ │ true │ true │ false │ unknown │
523 │ ├─────────┼─────────┼───────┼─────────┤
524 │A │ false │ false │ false │ false │
525 │ ├─────────┼─────────┼───────┼─────────┤
526 │ │ unknown │ unknown │ false │ unknown │
527 └──┴─────────┴─────────┴───────┴─────────┘
528 ┌────────────┬──────────────────────────┐
529 │ │ B │
530 │ A or B ├──────┬─────────┬─────────┤
531 │ │ true │ false │ unknown │
532 ├──┬─────────┼──────┼─────────┼─────────┤
533 │ │ true │ true │ true │ true │
534 │ ├─────────┼──────┼─────────┼─────────┤
535 │A │ false │ true │ false │ unknown │
536 │ ├─────────┼──────┼─────────┼─────────┤
537 │ │ unknown │ true │ unknown │ unknown │
538 └──┴─────────┴──────┴─────────┴─────────┘
539 ┌────────┬─────────┐
540 │ A │ not A │
541 ├────────┼─────────┤
542 │ true │ false │
543 ├────────┼─────────┤
544 │ false │ true │
545 ├────────┼─────────┤
546 │unknown │ unknown │
547 └────────┴─────────┘
549 The ruleset clause is used to define a set of rules and actions that
550 are evaluated in order until some action is executed, at which point
551 the remaining rules and actions are skipped until the ruleset is again
552 scheduled for evaluation. The keyword else is used to separate rules.
553 After one or more regular rules (with a predicate and an action), a
554 ruleset may include an optional
555 unknown -> action
556 clause, optionally followed by a
557 otherwise -> action
558 clause.
559
560 If all of the predicates in the rules evaluate to unknown and an
561 unknown clause has been specified then action associated with the
562 unknown clause will be executed.
563
564 If no rule predicate is true and the unknown action is either not spec‐
565 ified or not executed and an otherwise clause has been specified, then
566 the action associated with the otherwise clause will be executed.
567
569 Scale factors may be appended to arithmetic expressions and force lin‐
570 ear scaling of the value to canonical units. Simple scale factors are
571 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
572 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
573 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
574 the operator /, for example ``Kbytes / hour''.
575
577 Macros are defined using expressions of the form:
578
579 name = constexpr;
580
581 Where name follows the normal rules for variables in programming lan‐
582 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
583 must be a constant expression, either a string (enclosed in double
584 quotes) or an arithmetic expression optionally followed by a scale fac‐
585 tor.
586
587 Macros are expanded when their name, prefixed by a dollar ($) appears
588 in an expression, and macros may be nested within a constexpr string.
589
590 The following reserved macro names are understood.
591
592 minute Current minute of the hour.
593
594 hour Current hour of the day, in the range 0 to 23.
595
596 day Current day of the month, in the range 1 to 31.
597
598 month Current month of the year, in the range 0 (January) to 11
599 (December).
600
601 year Current year.
602
603 day_of_week
604 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
605 day).
606
607 delta Sample interval in effect for this expression.
608
609 Dates and times are presented in the reporting time zone (see descrip‐
610 tion of -Z and -z command line options above).
611
613 It is often useful for pmie processes to be started and stopped when
614 the local host is booted or shutdown, or when they have been detected
615 as no longer running (when they have unexpectedly exited for some rea‐
616 son). Refer to pmie_check(1) for details on automating this process.
617
619 It is common for production systems to be monitored in a central loca‐
620 tion. Traditionally on UNIX systems this has been performed by the
621 system log facilities - see logger(1), and syslogd(1). On Windows,
622 communication with the system event log is handled by pcp-eventlog(1).
623
624 pmie fits into this model when rules use the syslog action. Note that
625 if the action string begins with -p (priority) and/or -t (tag) then
626 these are extracted from the string and treated in the same way as in
627 logger(1) and pcp-eventlog(1).
628
629 However, it is common to have other event monitoring frameworks also,
630 into which you may wish to incorporate performance events from pmie.
631 You can often use the shell action to send events to these frameworks,
632 as they usually provide their a program for injecting events into the
633 framework from external sources.
634
635 A final option is use of the stomp (Streaming Text Oriented Messaging
636 Protocol) action, which allows pmie to connect to a central JMS (Java
637 Messaging System) server and send events to the PMIE topic. Tools can
638 be written to extract these text messages and present them to opera‐
639 tions people (via desktop popup windows, etc). Use of the stomp action
640 requires a stomp configuration file to be setup, which specifies the
641 location of the JMS server host, port number, and username/password.
642
643 The format of this file is as follows:
644
645 host=messages.sgi.com # this is the JMS server (required)
646 port=61616 # and its listening here (required)
647 timeout=2 # seconds to wait for server (optional)
648 username=joe # (required)
649 password=j03ST0MP # (required)
650 topic=PMIE # JMS topic for pmie messages (optional)
651
652 The timeout value specifies the time (in seconds) that pmie should wait
653 for acknowledgements from the JMS server after sending a message (as
654 required by the STOMP protocol). Note that on startup, pmie will wait
655 indefinitely for a connection, and will not begin rule evaluation until
656 that initial connection has been established. Should the connection to
657 the JMS server be lost at any time while pmie is running, pmie will
658 attempt to reconnect on each subsequent truthful evaluation of a rule
659 with a stomp action, but not more than once per minute. This is to
660 avoid contributing to network congestion. In this situation, where the
661 STOMP connection to the JMS server has been severed, the stomp action
662 will return a non-zero error value.
663
665 $PCP_DEMOS_DIR/pmie/*
666 annotated example rules
667 $PCP_VAR_DIR/pmns/*
668 default PMNS specification files
669 $PCP_TMP_DIR/pmie
670 pmie maintains files in this directory to identify the run‐
671 ning pmie instances and to export runtime information about
672 each instance - this data forms the basis of the pmcd.pmie
673 performance metrics
674 $PCP_PMIECONTROL_PATH
675 the default set of pmie instances to start at boot time -
676 refer to pmie_check(1) for details
677
679 The lexical scanner and parser will attempt to recover after an error
680 in the input expressions. Parsing resumes after skipping input up to
681 the next semi-colon (;), however during this skipping process the scan‐
682 ner is ignorant of comments and strings, so an embedded semi-colon may
683 cause parsing to resume at an unexpected place. This behavior is
684 largely benign, as until the initial syntax error is corrected, pmie
685 will not attempt any expression evaluation.
686
688 Environment variables with the prefix PCP_ are used to parameterize the
689 file and directory names used by PCP. On each installation, the file
690 /etc/pcp.conf contains the local values for these variables. The
691 $PCP_CONF variable may be used to specify an alternative configuration
692 file, as described in pcp.conf(5).
693
694 When executing shell actions, pmie overrides two variables - IFS and
695 PATH - in the environment of the child process. IFS is set to "\t\n".
696 The PATH is set to a combination of a default path for all platforms
697 ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable compo‐
698 nents. These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
699 $PCP_PLATFORM_PATHS.
700
701 When executing popup alarm actions, pmie will use the value of
702 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
703 typically set to pmconfirm(1), a cross-platform dialog box.
704
706 logger(1).
707
709 pcp-eventlog(1).
710
712 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
713 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5)
714 and pcp.env(5).
715
717 For a more complete description of the pmie language, refer to the Per‐
718 formance Co-Pilot Users and Administrators Guide. This is available
719 online from:
720 https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
721
722
723
724Performance Co-Pilot PCP PMIE(1)