1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHqVvWxz] [-A align] [-a archive] [-c filename] [-h host]
10 [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S starttime]
11 [-T endtime] [-t interval] [-U username] [-Z timezone] [filename ...]
12
14 pmie accepts a collection of arithmetic, logical, and rule expressions
15 to be evaluated at specified frequencies. The base data for the
16 expressions consists of performance metrics values delivered in real-
17 time from any host running the Performance Metrics Collection Daemon
18 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
19 chive logs.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 A description of the command line options specific to pmie follows:
31
32 -a archive which is a comma-separated list of names, each of which
33 may be the base name of an archive or the name of a directory con‐
34 taining one or more archives written by pmlogger(1). Multiple
35 instances of the -a flag may appear on the command line to specify
36 a list of sets of archives. In this case, it is required that
37 only one set of archives be present for any one host. Also, any
38 explicit host names occurring in a pmie expression must match the
39 host name recorded in one of the archive labels. In the case of
40 multiple sets of archives, timestamps recorded in the archives are
41 used to ensure temporal consistency.
42
43 -b Output will be line buffered and standard output is attached to
44 standard error. This is most useful for background execution in
45 conjunction with the -l option. The -b option is always used for
46 pmie instances launched from pmie_check(1).
47
48 -C Parse the configuration file(s) and exit before performing any
49 evaluations. Any errors in the configuration file are reported.
50
51 -c An alternative to specifying filename at the end of the command
52 line.
53
54 -d Normally pmie would be launched as a non-interactive process to
55 monitor and manage the performance of one or more hosts. Given
56 the -d flag however, execution is interactive and the user is pre‐
57 sented with a menu of options. Interactive mode is useful mainly
58 for debugging new expressions.
59
60 -e When used with -V, -v or -W, this option forces timestamps to be
61 reported with each expression. The timestamps are in ctime(3)
62 format, enclosed in parenthesis and appear after the expression
63 name and before the expression value, e.g.
64 expr_1 (Tue Feb 6 19:55:10 2001): 12
65
66 -f If the -l option is specified and there is no -a option (ie. real-
67 time monitoring) then pmie is run as a daemon in the background
68 (in all other cases foreground is the default). The -f option
69 forces pmie to be run in the foreground, independent of any other
70 options.
71
72 -h By default performance data is fetched from the local host (in
73 real-time mode) or the host for the first named set of archives on
74 the command line (in archive mode). The host argument overrides
75 this default. It does not override hosts explicitly named in the
76 expressions being evaluated. The host argument is interpreted as
77 a connection specification for pmNewContext, and is later mapped
78 to the remote pmcd's self-reported host name for reporting pur‐
79 poses. See also the %h vs. %c substitutions in rule action
80 strings below.
81
82 -l Standard error is sent to logfile.
83
84 -j An alternative STOMP protocol configuration is loaded from stomp‐
85 file. If this option is not used, and the stomp action is used in
86 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
87 will be used.
88
89 -n An alternative Performance Metrics Name Space (PMNS) is loaded
90 from the file pmnsfile.
91
92 -P Identifies this as the primary pmie instance for a host. See the
93 ``AUTOMATIC RESTART'' section below for further details.
94
95 -q Suppresses diagnostic messages that would be printed to standard
96 output by default, especially the "evaluator exiting" message as
97 this can confuse scripts.
98
99 -t The interval argument follows the syntax described in PCPIntro(1),
100 and in the simplest form may be an unsigned integer (the implied
101 units in this case are seconds). The value is used to determine
102 the sample interval for expressions that do not explicitly set
103 their sample interval using the pmie variable delta described
104 below. The default is 10.0 seconds.
105
106 -U username
107 User account under which to run pmie. The default is the current
108 user account for interactive use. When run as a daemon, the
109 unprivileged "pcp" account is used in current versions of PCP, but
110 in older versions the superuser account ("root") was used by
111 default.
112
113 -v Unless one of the verbose options -V, -v or -W appears on the com‐
114 mand line, expressions are evaluated silently, the only output is
115 as a result of any actions being executed. In the verbose mode,
116 specified using the -v flag, the value of each expression is
117 printed as it is evaluated. The values are in canonical units;
118 bytes in the dimension of ``space'', seconds in the dimension of
119 ``time'' and events in the dimension of ``count''. See
120 pmLookupDesc(3) for details of the supported dimension and scaling
121 mechanisms for performance metrics. The verbose mode is useful in
122 monitoring the value of given expressions, evaluating derived per‐
123 formance metrics, passing these values on to other tools for fur‐
124 ther processing and in debugging new expressions.
125
126 -V This option has the same effect as the -v option, except that the
127 name of the host and instance (if applicable) are printed as well
128 as expression values.
129
130 -W This option has the same effect as the -V option described above,
131 except that for boolean expressions, only those names and values
132 that make the expression true are printed. These are the same
133 names and values accessible to rule actions as the %h, %i, %c and
134 %v bindings, as described below.
135
136 -x Execute in domain agent mode. This mode is used within the Per‐
137 formance Co-Pilot product to derive values for summary metrics,
138 see pmdasummary(1). Only restricted functionality is available in
139 this mode (expressions with actions may not be used).
140
141 -Z Change the reporting timezone to timezone in the format of the
142 environment variable TZ as described in environ(7).
143
144 -z Change the reporting timezone to the timezone of the host that is
145 the source of the performance metrics, as identified via either
146 the -h option or the first named set of archives (as described
147 above for the -a option).
148
149 The -S, -T, -O, and -A options may be used to define a time window to
150 restrict the samples retrieved, set an initial origin within the time
151 window, or specify a ``natural'' alignment of the sample times; refer
152 to PCPIntro(1) for a complete description of these options.
153
154 Output from pmie is directed to standard output and standard error as
155 follows:
156
157 stdout
158 Expression values printed in the verbose -v mode and the output of
159 print actions.
160
161 stderr
162 Error and warning messages for any syntactic or semantic problems
163 during expression parsing, and any semantic or performance metrics
164 availability problems during expression evaluation.
165
167 The following example expressions demonstrate some of the capabilities
168 of the inference engine.
169
170 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
171 examples of pmie expressions.
172
173 The variable delta controls expression evaluation frequency. Specify
174 that subsequent expressions be evaluated once a second, until further
175 notice:
176
177 delta = 1 sec;
178
179 If the total context switch rate exceeds 10000 per second per CPU, then
180 display an alarm notifier:
181
182 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
183 -> alarm "high context switch rate %v";
184
185 If the high context switch rate is sustained for 10 consecutive sam‐
186 ples, then launch top(1) in an xterm(1) window to monitor processes,
187 but do this at most once every 5 minutes:
188
189 all_sample (
190 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
191 ) -> shell 5 min "xterm -e 'top'";
192
193 The following rules are evaluated once every 20 seconds:
194
195 delta = 20 sec;
196
197 If any disk is performing more than 60 I/Os per second, then print a
198 message identifying the busy disk to standard output and launch
199 dkvis(1):
200
201 some_inst (
202 disk.dev.total > 60 count/sec
203 ) -> print "busy disks:" " %i" &
204 shell 5 min "dkvis";
205
206 Refine the preceding rule to apply only between the hours of 9am and
207 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
208 before executing the action:
209
210 $hour >= 9 && $hour <= 17 &&
211 some_inst (
212 75 %_sample (
213 disk.dev.total @0..3 > 60 count/sec
214 )
215 ) -> print "disks busy for 20 sec:" " [%h]%i";
216
217 The following two rules are evaluated once every 10 minutes:
218
219 delta = 10 min;
220
221 If either the / or the /usr filesystem is more than 95% full, display
222 an alarm popup, but not if it has already been displayed during the
223 last 4 hours:
224
225 filesys.free #'/dev/root' /
226 filesys.capacity #'/dev/root' < 0.05
227 -> alarm 4 hour "root filesystem (almost) full";
228
229 filesys.free #'/dev/usr' /
230 filesys.capacity #'/dev/usr' < 0.05
231 -> alarm 4 hour "/usr filesystem (almost) full";
232
233 The following rule requires a machine that supports the PCP environment
234 metrics. If the machine environment temperature rises more than 2
235 degrees over a 10 minute interval, write an entry in the system log:
236
237 environ.temp @0 - environ.temp @1 > 2
238 -> alarm "temperature rising fast" &
239 syslog "machine room temperature rise alarm";
240
241 And something interesting if you have performance problems with your
242 Oracle database:
243
244 // back to 30sec evaluations
245 delta = 30 sec;
246 sid = "ptg1"; # $ORACLE_SID setting
247 lid = "223"; # latch ID from v$latch
248 lru = "#'$sid/$lid cache buffers lru chain'";
249 host = ":moomba.melbourne.sgi.com";
250 gets = "oracle.latch.gets $host $lru";
251 total = "oracle.latch.gets $host $lru +
252 oracle.latch.misses $host $lru +
253 oracle.latch.immisses $host $lru";
254
255 $total > 100 && $gets / $total < 0.2
256 -> alarm "high lru latch contention in database $sid";
257
258 The following ruleset will emit exactly one message depending on the
259 availability and value of the 1-minute load average.
260
261 delta = 1 minute;
262 ruleset
263 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
264 print "extreme load average %v"
265 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
266 print "moderate load average %v"
267 unknown ->
268 print "load average unavailable"
269 otherwise ->
270 print "load average OK"
271 ;
272
273 The following rule will emit a message when some filesystem is more
274 than 75% full and is filling at a rate that if sustained would fill the
275 filesystem to 100% in less than 30 minutes.
276
277 some_inst (
278 100 * filesys.used / filesys.capacity > 75 &&
279 filesys.used + 30min * (rate filesys.used) > filesys.capacity
280 ) -> print "filesystem will be full within 30 mins:" " %i";
281
282 If the metric mypmda.errors counts errors then the following rule will
283 emit a message if the rate of errors exceeds 1 per second provided the
284 error count is less than 100.
285
286 mypmda.errors > 1 && instant mypmda.errors < 100
287 -> print "high error rate: %v";
288
290 The pmie specification language is powerful and large.
291
292 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
293 vides a facility for generating a pmie configuration file from a set of
294 generalized pmie rules. The supplied set of rules covers a wide range
295 of performance scenarios.
296
297 The Performance Co-Pilot User's and Administrator's Guide provides a
298 detailed tutorial-style chapter covering pmie.
299
301 This description is terse and informal. For a more comprehensive
302 description see the Performance Co-Pilot User's and Administrator's
303 Guide.
304
305 A pmie specification is a sequence of semicolon terminated expressions.
306
307 Basic operators are modeled on the arithmetic, relational and Boolean
308 operators of the C programming language. Precedence rules are as
309 expected, although the use of parentheses is encouraged to enhance
310 readability and remove ambiguity.
311
312 Operands are performance metric names (see pmns(5)) and the normal lit‐
313 eral constants.
314
315 Operands involving performance metrics may produce sets of values, as a
316 result of enumeration in the dimensions of hosts, instances and time.
317 Special qualifiers may appear after a performance metric name to define
318 the enumeration in each dimension. For example,
319
320 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
321
322 defines 6 values corresponding to the time spent executing in user mode
323 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
324 samples. The default interpretation in the absence of : (host), #
325 (instance) and @ (time) qualifiers is all instances at the most recent
326 sample time for the default source of PCP performance metrics.
327
328 Host and instance names that do not follow the rules for variables in
329 programming languages, ie. alphabetic optionally followed by alphanu‐
330 merics, should be enclosed in single quotes.
331
332 Expression evaluation follows the law of ``least surprises''. Where
333 performance metrics have the semantics of a counter, pmie will automat‐
334 ically convert to a rate based upon consecutive samples and the time
335 interval between these samples. All numeric expressions are evaluated
336 in double precision, and where appropriate, automatically scaled into
337 canonical units of ``bytes'', ``seconds'' and ``counts''.
338
339 A rule is a special form of expression that specifies a condition or
340 logical expression, a special operator (->) and actions to be performed
341 when the condition is found to be true.
342
343 The following table summarizes the basic pmie operators:
344
345 ┌────────────────┬────────────────────────────────────────────────┐
346 │ Operators │ Explanation │
347 ├────────────────┼────────────────────────────────────────────────┤
348 │+ - * / │ Arithmetic │
349 │< <= == >= > != │ Relational (value comparison) │
350 │! && || │ Boolean │
351 │-> │ Rule │
352 │rising │ Boolean, false to true transition │
353 │falling │ Boolean, true to false transition │
354 │rate │ Explicit rate conversion (rarely required) │
355 │instant │ No automatic rate conversion (rarely required) │
356 └────────────────┴────────────────────────────────────────────────┘
357 All operators are supported for numeric-valued operands and expres‐
358 sions. For string-valued operands, namely literal string constants
359 enclosed in double quotes or metrics with a data type of string
360 (PM_TYPE_STRING), only the operators == and != are supported.
361
362 The rate and instant operators are the logical inverse of one another,
363 so an arithmetic expression expr is equal to rate instant expr. The
364 more useful cases involve using rate with a metric that is not a
365 counter to determine the rate of change over time or instant with a
366 metric that is a counter to determine if the current value is above or
367 below some threshold.
368
369 Aggregate operators may be used to aggregate or summarize along one
370 dimension of a set-valued expression. The following aggregate opera‐
371 tors map from a logical expression to a logical expression of lower
372 dimension.
373
374 ┌─────────────────────────┬─────────────┬──────────────────────────┐
375 │ Operators │ Type │ Explanation │
376 ├─────────────────────────┼─────────────┼──────────────────────────┤
377 │some_inst │ Existential │ True if at least one set │
378 │some_host │ │ member is true in the │
379 │some_sample │ │ associated dimension │
380 ├─────────────────────────┼─────────────┼──────────────────────────┤
381 │all_inst │ Universal │ True if all set members │
382 │all_host │ │ are true in the associ‐ │
383 │all_sample │ │ ated dimension │
384 ├─────────────────────────┼─────────────┼──────────────────────────┤
385 │N%_inst │ Percentile │ True if at least N per‐ │
386 │N%_host │ │ cent of set members are │
387 │N%_sample │ │ true in the associated │
388 │ │ │ dimension │
389 └─────────────────────────┴─────────────┴──────────────────────────┘
390 The following instantial operators may be used to filter or limit a
391 set-valued logical expression, based on regular expression matching of
392 instance names. The logical expression must be a set involving the
393 dimension of instances, and the regular expression is of the form used
394 by egrep(1) or the Extended Regular Expressions of regcomp(3).
395
396 ┌─────────────┬──────────────────────────────────────────┐
397 │ Operators │ Explanation │
398 ├─────────────┼──────────────────────────────────────────┤
399 │match_inst │ For each value of the logical expression │
400 │ │ that is ``true'', the result is ``true'' │
401 │ │ if the associated instance name matches │
402 │ │ the regular expression. Otherwise the │
403 │ │ result is ``false''. │
404 ├─────────────┼──────────────────────────────────────────┤
405 │nomatch_inst │ For each value of the logical expression │
406 │ │ that is ``true'', the result is ``true'' │
407 │ │ if the associated instance name does not │
408 │ │ match the regular expression. Otherwise │
409 │ │ the result is ``false''. │
410 └─────────────┴──────────────────────────────────────────┘
411 For example, the expression below will be ``true'' for disks attached
412 to controllers 2 or 3 performing more than 20 operations per second:
413 match_inst "^dks[23]d" disk.dev.total > 20;
414
415 The following aggregate operators map from an arithmetic expression to
416 an arithmetic expression of lower dimension.
417
418 ┌─────────────────────────┬───────────┬──────────────────────────┐
419 │ Operators │ Type │ Explanation │
420 ├─────────────────────────┼───────────┼──────────────────────────┤
421 │min_inst │ Extrema │ Minimum value across all │
422 │min_host │ │ set members in the asso‐ │
423 │min_sample │ │ ciated dimension │
424 ├─────────────────────────┼───────────┼──────────────────────────┤
425 │max_inst │ Extrema │ Maximum value across all │
426 │max_host │ │ set members in the asso‐ │
427 │max_sample │ │ ciated dimension │
428 ├─────────────────────────┼───────────┼──────────────────────────┤
429 │sum_inst │ Aggregate │ Sum of values across all │
430 │sum_host │ │ set members in the asso‐ │
431 │sum_sample │ │ ciated dimension │
432 ├─────────────────────────┼───────────┼──────────────────────────┤
433 │avg_inst │ Aggregate │ Average value across all │
434 │avg_host │ │ set members in the asso‐ │
435 │avg_sample │ │ ciated dimension │
436 └─────────────────────────┴───────────┴──────────────────────────┘
437 The aggregate operators count_inst, count_host and count_sample map
438 from a logical expression to an arithmetic expression of lower dimen‐
439 sion by counting the number of set members for which the expression is
440 true in the associated dimension.
441
442 For action rules, the following actions are defined:
443
444 ┌──────────┬────────────────────────────────────────┐
445 │Operators │ Explanation │
446 ├──────────┼────────────────────────────────────────┤
447 │alarm │ Raise a visible alarm with xconfirm(1) │
448 │print │ Display on standard output │
449 │shell │ Execute with sh(1) │
450 │stomp │ Send a STOMP message to a JMS server │
451 │syslog │ Append a message to system log file │
452 └──────────┴────────────────────────────────────────┘
453 Multiple actions may be separated by the & and | operators to specify
454 respectively sequential execution (both actions are executed) and
455 alternate execution (the second action will only be executed if the
456 execution of the first action returns a non-zero error status.
457
458 Arguments to actions are an optional suppression time, and then one or
459 more expressions (a string is an expression in this context). Strings
460 appearing as arguments to an action may include the following special
461 selectors that will be replaced at the time the action is executed.
462
463 %h Host name(s) that make the left-most top-level expression in the
464 condition true.
465
466 %c Connection specification string(s) or files for a PCP tool to reach
467 the hosts or archives that make the left-most top-level expression
468 in the condition true.
469
470 %i Instance(s) that make the left-most top-level expression in the
471 condition true.
472
473 %v One value from the left-most top-level expression in the condition
474 for each host and instance pair that makes the condition true.
475
476 Note that expansion of the special selectors is done by repeating the
477 whole argument once for each unique binding to any of the qualifying
478 special selectors. For example if a rule were true for the host mumble
479 with instances grunt and snort, and for host fumble the instance puff
480 makes the rule true, then the action
481 ...
482 -> shell myscript "Warning: %h:%i busy ";
483 will execute myscript with the argument string "Warning: mumble:grunt
484 busy Warning: mumble:snort busy Warning: fumble:puff busy".
485
486 By comparison, if the action
487 ...
488 -> shell myscript "Warning! busy:" " %h:%i";
489 were executed under the same circumstances, then myscript would be exe‐
490 cuted with the argument string "Warning! busy: mumble:grunt mum‐
491 ble:snort fumble:puff".
492
493 The semantics of the expansion of the special selectors leads to a com‐
494 mon usage pattern in an action, where one argument is a constant (con‐
495 tains no special selectors) the second argument contains the desired
496 special selectors with minimal separator characters, and an optional
497 third argument provides a constant postscript (e.g. to terminate any
498 argument quoting from the first argument). If necessary post-process‐
499 ing (eg. in myscript) can provide the necessary enumeration over each
500 unique expansion of the string containing just the special selectors.
501
502 For complex conditions, the bindings to these selectors is not obvious.
503 It is strongly recommended that pmie be used in the debugging mode
504 (specify the -W command line option in particular) during rule develop‐
505 ment.
506
508 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
509 10 or some_inst ( my.table < 0 ) are assigned the values true or false
510 or unknown. A value is unknown if one or more of the underlying metric
511 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
512 the metric is not in the PCP archive, no values are currently avail‐
513 able, insufficient values have been fetched to allow a rate converted
514 value to be computed or insufficient values have been fetched to
515 instantiate the required number of samples in the temporal domain.
516
517 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
518 logic) when combining values that include unknown:
519
520 ┌────────────┬───────────────────────────┐
521 │ │ B │
522 │ A and B ├─────────┬───────┬─────────┤
523 │ │ true │ false │ unknown │
524 ├──┬─────────┼─────────┼───────┼─────────┤
525 │ │ true │ true │ false │ unknown │
526 │ ├─────────┼─────────┼───────┼─────────┤
527 │A │ false │ false │ false │ false │
528 │ ├─────────┼─────────┼───────┼─────────┤
529 │ │ unknown │ unknown │ false │ unknown │
530 └──┴─────────┴─────────┴───────┴─────────┘
531 ┌────────────┬──────────────────────────┐
532 │ │ B │
533 │ A or B ├──────┬─────────┬─────────┤
534 │ │ true │ false │ unknown │
535 ├──┬─────────┼──────┼─────────┼─────────┤
536 │ │ true │ true │ true │ true │
537 │ ├─────────┼──────┼─────────┼─────────┤
538 │A │ false │ true │ false │ unknown │
539 │ ├─────────┼──────┼─────────┼─────────┤
540 │ │ unknown │ true │ unknown │ unknown │
541 └──┴─────────┴──────┴─────────┴─────────┘
542 ┌────────┬─────────┐
543 │ A │ not A │
544 ├────────┼─────────┤
545 │ true │ false │
546 ├────────┼─────────┤
547 │ false │ true │
548 ├────────┼─────────┤
549 │unknown │ unknown │
550 └────────┴─────────┘
552 The ruleset clause is used to define a set of rules and actions that
553 are evaluated in order until some action is executed, at which point
554 the remaining rules and actions are skipped until the ruleset is again
555 scheduled for evaluation. The keyword else is used to separate rules.
556 After one or more regular rules (with a predicate and an action), a
557 ruleset may include an optional
558 unknown -> action
559 clause, optionally followed by a
560 otherwise -> action
561 clause.
562
563 If all of the predicates in the rules evaluate to unknown and an
564 unknown clause has been specified then action associated with the
565 unknown clause will be executed.
566
567 If no rule predicate is true and the unknown action is either not spec‐
568 ified or not executed and an otherwise clause has been specified, then
569 the action associated with the otherwise clause will be executed.
570
572 Scale factors may be appended to arithmetic expressions and force lin‐
573 ear scaling of the value to canonical units. Simple scale factors are
574 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
575 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
576 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
577 the operator /, for example ``Kbytes / hour''.
578
580 Macros are defined using expressions of the form:
581
582 name = constexpr;
583
584 Where name follows the normal rules for variables in programming lan‐
585 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
586 must be a constant expression, either a string (enclosed in double
587 quotes) or an arithmetic expression optionally followed by a scale fac‐
588 tor.
589
590 Macros are expanded when their name, prefixed by a dollar ($) appears
591 in an expression, and macros may be nested within a constexpr string.
592
593 The following reserved macro names are understood.
594
595 minute Current minute of the hour.
596
597 hour Current hour of the day, in the range 0 to 23.
598
599 day Current day of the month, in the range 1 to 31.
600
601 month Current month of the year, in the range 0 (January) to 11
602 (December).
603
604 year Current year.
605
606 day_of_week
607 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
608 day).
609
610 delta Sample interval in effect for this expression.
611
612 Dates and times are presented in the reporting time zone (see descrip‐
613 tion of -Z and -z command line options above).
614
616 It is often useful for pmie processes to be started and stopped when
617 the local host is booted or shutdown, or when they have been detected
618 as no longer running (when they have unexpectedly exited for some rea‐
619 son). Refer to pmie_check(1) for details on automating this process.
620
621 Optionally, each system running pmcd(1) may also be configured to run a
622 ``primary'' pmie instance. This pmie instance is launched by
623 $PCP_RC_DIR/pmie, and is affected by the files
624 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
625 chkconfig(8), systemctl(1) or similar platform-specific commands to
626 activate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
627 fig/pmie/config.default (the default initial configuration file for the
628 primary pmie).
629
630 The primary pmie instance is identified by the -P option. There may be
631 at most one ``primary'' pmie instance on each system. The primary pmie
632 instance (if any) must be running on the same host as the pmcd(1) to
633 which it connects (if any), so the -h and -P options are mutually
634 exclusive.
635
637 It is common for production systems to be monitored in a central loca‐
638 tion. Traditionally on UNIX systems this has been performed by the
639 system log facilities - see logger(1), and syslogd(1). On Windows,
640 communication with the system event log is handled by pcp-eventlog(1).
641
642 pmie fits into this model when rules use the syslog action. Note that
643 if the action string begins with -p (priority) and/or -t (tag) then
644 these are extracted from the string and treated in the same way as in
645 logger(1) and pcp-eventlog(1).
646
647 However, it is common to have other event monitoring frameworks also,
648 into which you may wish to incorporate performance events from pmie.
649 You can often use the shell action to send events to these frameworks,
650 as they usually provide their a program for injecting events into the
651 framework from external sources.
652
653 A final option is use of the stomp (Streaming Text Oriented Messaging
654 Protocol) action, which allows pmie to connect to a central JMS (Java
655 Messaging System) server and send events to the PMIE topic. Tools can
656 be written to extract these text messages and present them to opera‐
657 tions people (via desktop popup windows, etc). Use of the stomp action
658 requires a stomp configuration file to be setup, which specifies the
659 location of the JMS server host, port number, and username/password.
660
661 The format of this file is as follows:
662
663 host=messages.sgi.com # this is the JMS server (required)
664 port=61616 # and its listening here (required)
665 timeout=2 # seconds to wait for server (optional)
666 username=joe # (required)
667 password=j03ST0MP # (required)
668 topic=PMIE # JMS topic for pmie messages (optional)
669
670 The timeout value specifies the time (in seconds) that pmie should wait
671 for acknowledgements from the JMS server after sending a message (as
672 required by the STOMP protocol). Note that on startup, pmie will wait
673 indefinitely for a connection, and will not begin rule evaluation until
674 that initial connection has been established. Should the connection to
675 the JMS server be lost at any time while pmie is running, pmie will
676 attempt to reconnect on each subsequent truthful evaluation of a rule
677 with a stomp action, but not more than once per minute. This is to
678 avoid contributing to network congestion. In this situation, where the
679 STOMP connection to the JMS server has been severed, the stomp action
680 will return a non-zero error value.
681
683 $PCP_DEMOS_DIR/pmie/*
684 annotated example rules
685 $PCP_VAR_DIR/pmns/*
686 default PMNS specification files
687 $PCP_TMP_DIR/pmie
688 pmie maintains files in this directory to identify the run‐
689 ning pmie instances and to export runtime information about
690 each instance - this data forms the basis of the pmcd.pmie
691 performance metrics
692 $PCP_PMIECONTROL_PATH
693 the default set of pmie instances to start at boot time -
694 refer to pmie_check(1) for details
695
697 The lexical scanner and parser will attempt to recover after an error
698 in the input expressions. Parsing resumes after skipping input up to
699 the next semi-colon (;), however during this skipping process the scan‐
700 ner is ignorant of comments and strings, so an embedded semi-colon may
701 cause parsing to resume at an unexpected place. This behavior is
702 largely benign, as until the initial syntax error is corrected, pmie
703 will not attempt any expression evaluation.
704
706 Environment variables with the prefix PCP_ are used to parameterize the
707 file and directory names used by PCP. On each installation, the file
708 /etc/pcp.conf contains the local values for these variables. The
709 $PCP_CONF variable may be used to specify an alternative configuration
710 file, as described in pcp.conf(5).
711
712 When executing shell actions, pmie overrides two variables - IFS and
713 PATH - in the environment of the child process. IFS is set to "\t\n".
714 The PATH is set to a combination of a default path for all platforms
715 ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable compo‐
716 nents. These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
717 $PCP_PLATFORM_PATHS.
718
719 When executing popup alarm actions, pmie will use the value of
720 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
721 typically set to pmconfirm(1), a cross-platform dialog box.
722
724 logger(1).
725
727 pcp-eventlog(1).
728
730 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
731 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5)
732 and pcp.env(5).
733
735 For a more complete description of the pmie language, refer to the Per‐
736 formance Co-Pilot Users and Administrators Guide. This is available
737 online from:
738 https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
739
740
741
742Performance Co-Pilot PCP PMIE(1)