1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHqVvWxz] [-A align] [-a archive] [-c filename] [-h host]
10 [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S starttime]
11 [-T endtime] [-t interval] [-U username] [-Z timezone] [filename ...]
12
14 pmie accepts a collection of arithmetic, logical, and rule expressions
15 to be evaluated at specified frequencies. The base data for the
16 expressions consists of performance metrics values delivered in real-
17 time from any host running the Performance Metrics Collection Daemon
18 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
19 chive logs.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 A description of the command line options specific to pmie follows:
31
32 -a archive is the base name of a PCP archive log written by pmlog‐
33 ger(1). Multiple instances of the -a flag may appear on the com‐
34 mand line to specify a set of archives. In this case, it is
35 required that only one archive be present for any one host. Also,
36 any explicit host names occurring in a pmie expression must match
37 the host name recorded in one of the archive labels. In the case
38 of multiple archives, timestamps recorded in the archives are used
39 to ensure temporal consistency.
40
41 -b Output will be line buffered and standard output is attached to
42 standard error. This is most useful for background execution in
43 conjunction with the -l option. The -b option is always used for
44 pmie instances launched from pmie_check(1).
45
46 -C Parse the configuration file(s) and exit before performing any
47 evaluations. Any errors in the configuration file are reported.
48
49 -c An alternative to specifying filename at the end of the command
50 line.
51
52 -d Normally pmie would be launched as a non-interactive process to
53 monitor and manage the performance of one or more hosts. Given
54 the -d flag however, execution is interactive and the user is pre‐
55 sented with a menu of options. Interactive mode is useful mainly
56 for debugging new expressions.
57
58 -e When used with -V, -v or -W, this option forces timestamps to be
59 reported with each expression. The timestamps are in ctime(3)
60 format, enclosed in parenthesis and appear after the expression
61 name and before the expression value, e.g.
62 expr_1 (Tue Feb 6 19:55:10 2001): 12
63
64 -f If the -l option is specified and there is no -a option (ie. real-
65 time monitoring) then pmie is run as a daemon in the background
66 (in all other cases foreground is the default). The -f option
67 forces pmie to be run in the foreground, independent of any other
68 options.
69
70 -h By default performance data is fetched from the local host (in
71 real-time mode) or the host for the first named archive on the
72 command line (in archive mode). The host argument overrides this
73 default. It does not override hosts explicitly named in the
74 expressions being evaluated. The host argument is interpreted as
75 a connection specification for pmNewContext, and is later mapped
76 to the remote pmcd's self-reported host name for reporting pur‐
77 poses. See also the %h vs. %c substitutions in rule action
78 strings below.
79
80 -l Standard error is sent to logfile.
81
82 -j An alternative STOMP protocol configuration is loaded from stomp‐
83 file. If this option is not used, and the stomp action is used in
84 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
85 will be used.
86
87 -n An alternative Performance Metrics Name Space (PMNS) is loaded
88 from the file pmnsfile.
89
90 -q Suppresses diagnostic messages that would be printed to standard
91 output by default, especially the "evaluator exiting" message as
92 this can confuse scripts.
93
94 -t The interval argument follows the syntax described in PCPIntro(1),
95 and in the simplest form may be an unsigned integer (the implied
96 units in this case are seconds). The value is used to determine
97 the sample interval for expressions that do not explicitly set
98 their sample interval using the pmie variable delta described
99 below. The default is 10.0 seconds.
100
101 -U username
102 User account under which to run pmie. The default is the current
103 user account for interactive use. When run as a daemon, the
104 unprivileged "pcp" account is used in current versions of PCP, but
105 in older versions the superuser account ("root") was used by
106 default.
107
108 -v Unless one of the verbose options -V, -v or -W appears on the com‐
109 mand line, expressions are evaluated silently, the only output is
110 as a result of any actions being executed. In the verbose mode,
111 specified using the -v flag, the value of each expression is
112 printed as it is evaluated. The values are in canonical units;
113 bytes in the dimension of ``space'', seconds in the dimension of
114 ``time'' and events in the dimension of ``count''. See
115 pmLookupDesc(3) for details of the supported dimension and scaling
116 mechanisms for performance metrics. The verbose mode is useful in
117 monitoring the value of given expressions, evaluating derived per‐
118 formance metrics, passing these values on to other tools for fur‐
119 ther processing and in debugging new expressions.
120
121 -V This option has the same effect as the -v option, except that the
122 name of the host and instance (if applicable) are printed as well
123 as expression values.
124
125 -W This option has the same effect as the -V option described above,
126 except that for boolean expressions, only those names and values
127 that make the expression true are printed. These are the same
128 names and values accessible to rule actions as the %h, %i, %c and
129 %v bindings, as described below.
130
131 -x Execute in domain agent mode. This mode is used within the Per‐
132 formance Co-Pilot product to derive values for summary metrics,
133 see pmdasummary(1). Only restricted functionality is available in
134 this mode (expressions with actions may not be used).
135
136 -Z Change the reporting timezone to timezone in the format of the
137 environment variable TZ as described in environ(7).
138
139 -z Change the reporting timezone to the timezone of the host that is
140 the source of the performance metrics, as identified via either
141 the -h option or the first named archive (as described above for
142 the -a option).
143
144 The -S, -T, -O, and -A options may be used to define a time window to
145 restrict the samples retrieved, set an initial origin within the time
146 window, or specify a ``natural'' alignment of the sample times; refer
147 to PCPIntro(1) for a complete description of these options.
148
149 Output from pmie is directed to standard output and standard error as
150 follows:
151
152 stdout
153 Expression values printed in the verbose -v mode and the output of
154 print actions.
155
156 stderr
157 Error and warning messages for any syntactic or semantic problems
158 during expression parsing, and any semantic or performance metrics
159 availability problems during expression evaluation.
160
162 The following example expressions demonstrate some of the capabilities
163 of the inference engine.
164
165 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
166 examples of pmie expressions.
167
168 The variable delta controls expression evaluation frequency. Specify
169 that subsequent expressions be evaluated once a second, until further
170 notice:
171
172 delta = 1 sec;
173
174 If the total context switch rate exceeds 10000 per second per CPU, then
175 display an alarm notifier:
176
177 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
178 -> alarm "high context switch rate %v";
179
180 If the high context switch rate is sustained for 10 consecutive sam‐
181 ples, then launch top(1) in an xwsh(1) window to monitor processes, but
182 do this at most once every 5 minutes:
183
184 all_sample (
185 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
186 ) -> shell 5 min "xwsh -e 'top'";
187
188 The following rules are evaluated once every 20 seconds:
189
190 delta = 20 sec;
191
192 If any disk is performing more than 60 I/Os per second, then print a
193 message identifying the busy disk to standard output and launch
194 dkvis(1):
195
196 some_inst (
197 disk.dev.total > 60 count/sec
198 ) -> print "busy disks:" " %i" &
199 shell 5 min "dkvis";
200
201 Refine the preceding rule to apply only between the hours of 9am and
202 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
203 before executing the action:
204
205 $hour >= 9 && $hour <= 17 &&
206 some_inst (
207 75 %_sample (
208 disk.dev.total @0..3 > 60 count/sec
209 )
210 ) -> print "disks busy for 20 sec:" " [%h]%i";
211
212 The following two rules are evaluated once every 10 minutes:
213
214 delta = 10 min;
215
216 If either the / or the /usr filesystem is more than 95% full, display
217 an alarm popup, but not if it has already been displayed during the
218 last 4 hours:
219
220 filesys.free #'/dev/root' /
221 filesys.capacity #'/dev/root' < 0.05
222 -> alarm 4 hour "root filesystem (almost) full";
223
224 filesys.free #'/dev/usr' /
225 filesys.capacity #'/dev/usr' < 0.05
226 -> alarm 4 hour "/usr filesystem (almost) full";
227
228 The following rule requires a machine that supports the PCP environment
229 metrics. If the machine environment temperature rises more than 2
230 degrees over a 10 minute interval, write an entry in the system log:
231
232 environ.temp @0 - environ.temp @1 > 2
233 -> alarm "temperature rising fast" &
234 syslog "machine room temperature rise alarm";
235
236 And something interesting if you have performance problems with your
237 Oracle database:
238
239 // back to 30sec evaluations
240 delta = 30 sec;
241 db = "oracle.ptg1";
242 host = ":moomba.melbourne.sgi.com";
243 lru = "#'cache buffers lru chain'";
244 gets = "$db.latch.gets $host $lru";
245 total = "$db.latch.gets $host $lru +
246 $db.latch.misses $host $lru +
247 $db.latch.immisses $host $lru";
248
249 $total > 100 && $gets / $total < 0.2
250 -> alarm "high lru latch contention";
251
252 The following ruleset will emit exactly one message depending on the
253 availability and value of the 1-minute load average.
254
255 delta = 1 minute;
256 ruleset
257 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
258 print "extreme load average %v"
259 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
260 print "moderate load average %v"
261 unknown ->
262 print "load average unavailable"
263 otherwise ->
264 print "load average OK"
265 ;
266
267 The following rule will emit a message when some filesystem is more
268 than 75% full and is filling at a rate that if sustained would fill the
269 filesystem to 100% in less than 30 minutes.
270
271 some_inst (
272 100 * filesys.used / filesys.capacity > 75 &&
273 filesys.used + 30min * (rate filesys.used) > filesys.capacity
274 ) -> print "filesystem will be full within 30 mins:" " %i";
275
276 If the metric mypmda.errors counts errors then the following rule will
277 emit a message if the rate of errors exceeds 1 per second provided the
278 error count is less than 100.
279
280 mypmda.errors > 1 && instant mypmda.errors < 100
281 -> print "high error rate: %v";
282
284 The pmie specification language is powerful and large.
285
286 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
287 vides a facility for generating a pmie configuration file from a set of
288 generalized pmie rules. The supplied set of rules covers a wide range
289 of performance scenarios.
290
291 The Performance Co-Pilot User's and Administrator's Guide provides a
292 detailed tutorial-style chapter covering pmie.
293
295 This description is terse and informal. For a more comprehensive
296 description see the Performance Co-Pilot User's and Administrator's
297 Guide.
298
299 A pmie specification is a sequence of semicolon terminated expressions.
300
301 Basic operators are modeled on the arithmetic, relational and Boolean
302 operators of the C programming language. Precedence rules are as
303 expected, although the use of parentheses is encouraged to enhance
304 readability and remove ambiguity.
305
306 Operands are performance metric names (see pmns(5)) and the normal lit‐
307 eral constants.
308
309 Operands involving performance metrics may produce sets of values, as a
310 result of enumeration in the dimensions of hosts, instances and time.
311 Special qualifiers may appear after a performance metric name to define
312 the enumeration in each dimension. For example,
313
314 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
315
316 defines 6 values corresponding to the time spent executing in user mode
317 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
318 samples. The default interpretation in the absence of : (host), #
319 (instance) and @ (time) qualifiers is all instances at the most recent
320 sample time for the default source of PCP performance metrics.
321
322 Host and instance names that do not follow the rules for variables in
323 programming languages, ie. alphabetic optionally followed by alphanu‐
324 merics, should be enclosed in single quotes.
325
326 Expression evaluation follows the law of ``least surprises''. Where
327 performance metrics have the semantics of a counter, pmie will automat‐
328 ically convert to a rate based upon consecutive samples and the time
329 interval between these samples. All expressions are evaluated in dou‐
330 ble precision, and where appropriate, automatically scaled into canoni‐
331 cal units of ``bytes'', ``seconds'' and ``counts''.
332
333 A rule is a special form of expression that specifies a condition or
334 logical expression, a special operator (->) and actions to be performed
335 when the condition is found to be true.
336
337 The following table summarizes the basic pmie operators:
338
339 ┌────────────────┬────────────────────────────────────────────────┐
340 │ Operators │ Explanation │
341 ├────────────────┼────────────────────────────────────────────────┤
342 │+ - * / │ Arithmetic │
343 │< <= == >= > != │ Relational (value comparison) │
344 │! && || │ Boolean │
345 │-> │ Rule │
346 │rising │ Boolean, false to true transition │
347 │falling │ Boolean, true to false transition │
348 │rate │ Explicit rate conversion (rarely required) │
349 │instant │ No automatic rate conversion (rarely required) │
350 └────────────────┴────────────────────────────────────────────────┘
351 The rate and instant operators are the logical inverse of one another,
352 so an arithmetic expression expr is equal to rate instant expr. The
353 more useful cases involve using rate with a metric that is not a
354 counter to determine the rate of change over time or instant with a
355 metric that is a counter to determine if the current value is above or
356 below some threshold.
357
358 Aggregate operators may be used to aggregate or summarize along one
359 dimension of a set-valued expression. The following aggregate opera‐
360 tors map from a logical expression to a logical expression of lower
361 dimension.
362
363 ┌─────────────────────────┬─────────────┬──────────────────────────┐
364 │ Operators │ Type │ Explanation │
365 ├─────────────────────────┼─────────────┼──────────────────────────┤
366 │some_inst │ Existential │ True if at least one set │
367 │some_host │ │ member is true in the │
368 │some_sample │ │ associated dimension │
369 ├─────────────────────────┼─────────────┼──────────────────────────┤
370 │all_inst │ Universal │ True if all set members │
371 │all_host │ │ are true in the associ‐ │
372 │all_sample │ │ ated dimension │
373 ├─────────────────────────┼─────────────┼──────────────────────────┤
374 │N%_inst │ Percentile │ True if at least N per‐ │
375 │N%_host │ │ cent of set members are │
376 │N%_sample │ │ true in the associated │
377 │ │ │ dimension │
378 └─────────────────────────┴─────────────┴──────────────────────────┘
379 The following instantial operators may be used to filter or limit a
380 set-valued logical expression, based on regular expression matching of
381 instance names. The logical expression must be a set involving the
382 dimension of instances, and the regular expression is of the form used
383 by egrep(1) or the Extended Regular Expressions of regcomp(3).
384
385 ┌─────────────┬──────────────────────────────────────────┐
386 │ Operators │ Explanation │
387 ├─────────────┼──────────────────────────────────────────┤
388 │match_inst │ For each value of the logical expression │
389 │ │ that is ``true'', the result is ``true'' │
390 │ │ if the associated instance name matches │
391 │ │ the regular expression. Otherwise the │
392 │ │ result is ``false''. │
393 ├─────────────┼──────────────────────────────────────────┤
394 │nomatch_inst │ For each value of the logical expression │
395 │ │ that is ``true'', the result is ``true'' │
396 │ │ if the associated instance name does not │
397 │ │ match the regular expression. Otherwise │
398 │ │ the result is ``false''. │
399 └─────────────┴──────────────────────────────────────────┘
400 For example, the expression below will be ``true'' for disks attached
401 to controllers 2 or 3 performing more than 20 operations per second:
402 match_inst "^dks[23]d" disk.dev.total > 20;
403
404 The following aggregate operators map from an arithmetic expression to
405 an arithmetic expression of lower dimension.
406
407 ┌─────────────────────────┬───────────┬──────────────────────────┐
408 │ Operators │ Type │ Explanation │
409 ├─────────────────────────┼───────────┼──────────────────────────┤
410 │min_inst │ Extrema │ Minimum value across all │
411 │min_host │ │ set members in the asso‐ │
412 │min_sample │ │ ciated dimension │
413 ├─────────────────────────┼───────────┼──────────────────────────┤
414 │max_inst │ Extrema │ Maximum value across all │
415 │max_host │ │ set members in the asso‐ │
416 │max_sample │ │ ciated dimension │
417 ├─────────────────────────┼───────────┼──────────────────────────┤
418 │sum_inst │ Aggregate │ Sum of values across all │
419 │sum_host │ │ set members in the asso‐ │
420 │sum_sample │ │ ciated dimension │
421 ├─────────────────────────┼───────────┼──────────────────────────┤
422 │avg_inst │ Aggregate │ Average value across all │
423 │avg_host │ │ set members in the asso‐ │
424 │avg_sample │ │ ciated dimension │
425 └─────────────────────────┴───────────┴──────────────────────────┘
426 The aggregate operators count_inst, count_host and count_sample map
427 from a logical expression to an arithmetic expression of lower dimen‐
428 sion by counting the number of set members for which the expression is
429 true in the associated dimension.
430
431 For action rules, the following actions are defined:
432
433 ┌──────────┬────────────────────────────────────────┐
434 │Operators │ Explanation │
435 ├──────────┼────────────────────────────────────────┤
436 │alarm │ Raise a visible alarm with xconfirm(1) │
437 │print │ Display on standard output │
438 │shell │ Execute with sh(1) │
439 │stomp │ Send a STOMP message to a JMS server │
440 │syslog │ Append a message to system log file │
441 └──────────┴────────────────────────────────────────┘
442 Multiple actions may be separated by the & and | operators to specify
443 respectively sequential execution (both actions are executed) and
444 alternate execution (the second action will only be executed if the
445 execution of the first action returns a non-zero error status.
446
447 Arguments to actions are an optional suppression time, and then one or
448 more expressions (a string is an expression in this context). Strings
449 appearing as arguments to an action may include the following special
450 selectors that will be replaced at the time the action is executed.
451
452 %h Host name(s) that make the left-most top-level expression in the
453 condition true.
454
455 %c Connection specification string(s) or files for a PCP tool to reach
456 the hosts or archives that make the left-most top-level expression
457 in the condition true.
458
459 %i Instance(s) that make the left-most top-level expression in the
460 condition true.
461
462 %v One value from the left-most top-level expression in the condition
463 for each host and instance pair that makes the condition true.
464
465 Note that expansion of the special selectors is done by repeating the
466 whole argument once for each unique binding to any of the qualifying
467 special selectors. For example if a rule were true for the host mumble
468 with instances grunt and snort, and for host fumble the instance puff
469 makes the rule true, then the action
470 ...
471 -> shell myscript "Warning: %h:%i busy ";
472 will execute myscript with the argument string "Warning: mumble:grunt
473 busy Warning: mumble:snort busy Warning: fumble:puff busy".
474
475 By comparison, if the action
476 ...
477 -> shell myscript "Warning! busy:" " %h:%i";
478 were executed under the same circumstances, then myscript would be exe‐
479 cuted with the argument string "Warning! busy: mumble:grunt mum‐
480 ble:snort fumble:puff".
481
482 The semantics of the expansion of the special selectors leads to a com‐
483 mon usage pattern in an action, where one argument is a constant (con‐
484 tains no special selectors) the second argument contains the desired
485 special selectors with minimal separator characters, and an optional
486 third argument provides a constant postscript (e.g. to terminate any
487 argument quoting from the first argument). If necessary post-process‐
488 ing (eg. in myscript) can provide the necessary enumeration over each
489 unique expansion of the string containing just the special selectors.
490
491 For complex conditions, the bindings to these selectors is not obvious.
492 It is strongly recommended that pmie be used in the debugging mode
493 (specify the -W command line option in particular) during rule develop‐
494 ment.
495
497 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
498 10 or some_inst ( my.table < 0 ) are assigned the values true or false
499 or unknown. A value is unknown if one or more of the underlying metric
500 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
501 the metric is not in the PCP archive, no values are currently avail‐
502 able, insufficient values have been fetched to allow a rate converted
503 value to be computed or insufficient values have been fetched to
504 instantiate the required number of samples in the temporal domain.
505
506 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
507 logic) when combining values that include unknown:
508
509 ┌────────────┬───────────────────────────┐
510 │ │ B │
511 │ A and B ├─────────┬───────┬─────────┤
512 │ │ true │ false │ unknown │
513 ├──┬─────────┼─────────┼───────┼─────────┤
514 │ │ true │ true │ false │ unknown │
515 │ ├─────────┼─────────┼───────┼─────────┤
516 │A │ false │ false │ false │ false │
517 │ ├─────────┼─────────┼───────┼─────────┤
518 │ │ unknown │ unknown │ false │ unknown │
519 └──┴─────────┴─────────┴───────┴─────────┘
520 ┌────────────┬──────────────────────────┐
521 │ │ B │
522 │ A or B ├──────┬─────────┬─────────┤
523 │ │ true │ false │ unknown │
524 ├──┬─────────┼──────┼─────────┼─────────┤
525 │ │ true │ true │ true │ true │
526 │ ├─────────┼──────┼─────────┼─────────┤
527 │A │ false │ true │ false │ unknown │
528 │ ├─────────┼──────┼─────────┼─────────┤
529 │ │ unknown │ true │ unknown │ unknown │
530 └──┴─────────┴──────┴─────────┴─────────┘
531 ┌────────┬─────────┐
532 │ A │ not A │
533 ├────────┼─────────┤
534 │ true │ false │
535 ├────────┼─────────┤
536 │ false │ true │
537 ├────────┼─────────┤
538 │unknown │ unknown │
539 └────────┴─────────┘
541 The ruleset clause is used to define a set of rules and actions that
542 are evaluated in order until some action is executed, at which point
543 the remaining rules and actions are skipped until the ruleset is again
544 scheduled for evaluation. The keyword else is used to separate rules.
545 After one or more regular rules (with a predicate and an action), a
546 ruleset may include an optional
547 unknown -> action
548 clause, optionally followed by a
549 otherwise -> action
550 clause.
551
552 If all of the predicates in the rules evaluate to unknown and an
553 unknown clause has been specified then action associated with the
554 unknown clause will be executed.
555
556 If no rule predicate is true and the unknown action is either not spec‐
557 ified or not executed and an otherwise clause has been specified, then
558 the action associated with the otherwise clause will be executed.
559
561 Scale factors may be appended to arithmetic expressions and force lin‐
562 ear scaling of the value to canonical units. Simple scale factors are
563 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
564 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
565 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
566 the operator /, for example ``Kbytes / hour''.
567
569 Macros are defined using expressions of the form:
570
571 name = constexpr;
572
573 Where name follows the normal rules for variables in programming lan‐
574 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
575 must be a constant expression, either a string (enclosed in double
576 quotes) or an arithmetic expression optionally followed by a scale fac‐
577 tor.
578
579 Macros are expanded when their name, prefixed by a dollar ($) appears
580 in an expression, and macros may be nested within a constexpr string.
581
582 The following reserved macro names are understood.
583
584 minute Current minute of the hour.
585
586 hour Current hour of the day, in the range 0 to 23.
587
588 day Current day of the month, in the range 1 to 31.
589
590 month Current month of the year, in the range 0 (January) to 11
591 (December).
592
593 year Current year.
594
595 day_of_week
596 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
597 day).
598
599 delta Sample interval in effect for this expression.
600
601 Dates and times are presented in the reporting time zone (see descrip‐
602 tion of -Z and -z command line options above).
603
605 It is often useful for pmie processes to be started and stopped when
606 the local host is booted or shutdown, or when they have been detected
607 as no longer running (when they have unexpectedly exited for some rea‐
608 son). Refer to pmie_check(1) for details on automating this process.
609
611 It is common for production systems to be monitored in a central loca‐
612 tion. Traditionally on UNIX systems this has been performed by the
613 system log facilities - see logger(1), and syslogd(1). On Windows,
614 communication with the system event log is handled by pcp-eventlog(1).
615
616 pmie fits into this model when rules use the syslog action. Note that
617 if the action string begins with -p (priority) and/or -t (tag) then
618 these are extracted from the string and treated in the same way as in
619 logger(1) and pcp-eventlog(1).
620
621 However, it is common to have other event monitoring frameworks also,
622 into which you may wish to incorporate performance events from pmie.
623 You can often use the shell action to send events to these frameworks,
624 as they usually provide their a program for injecting events into the
625 framework from external sources.
626
627 A final option is use of the stomp (Streaming Text Oriented Messaging
628 Protocol) action, which allows pmie to connect to a central JMS (Java
629 Messaging System) server and send events to the PMIE topic. Tools can
630 be written to extract these text messages and present them to opera‐
631 tions people (via desktop popup windows, etc). Use of the stomp action
632 requires a stomp configuration file to be setup, which specifies the
633 location of the JMS server host, port number, and username/password.
634
635 The format of this file is as follows:
636
637 host=messages.sgi.com # this is the JMS server (required)
638 port=61616 # and its listening here (required)
639 timeout=2 # seconds to wait for server (optional)
640 username=joe # (required)
641 password=j03ST0MP # (required)
642 topic=PMIE # JMS topic for pmie messages (optional)
643
644 The timeout value specifies the time (in seconds) that pmie should wait
645 for acknowledgements from the JMS server after sending a message (as
646 required by the STOMP protocol). Note that on startup, pmie will wait
647 indefinitely for a connection, and will not begin rule evaluation until
648 that initial connection has been established. Should the connection to
649 the JMS server be lost at any time while pmie is running, pmie will
650 attempt to reconnect on each subsequent truthful evaluation of a rule
651 with a stomp action, but not more than once per minute. This is to
652 avoid contributing to network congestion. In this situation, where the
653 STOMP connection to the JMS server has been severed, the stomp action
654 will return a non-zero error value.
655
657 $PCP_DEMOS_DIR/pmie/*
658 annotated example rules
659 $PCP_VAR_DIR/pmns/*
660 default PMNS specification files
661 $PCP_TMP_DIR/pmie
662 pmie maintains files in this directory to identify the run‐
663 ning pmie instances and to export runtime information about
664 each instance - this data forms the basis of the pmcd.pmie
665 performance metrics
666 $PCP_PMIECONTROL_PATH
667 the default set of pmie instances to start at boot time -
668 refer to pmie_check(1) for details
669
671 The lexical scanner and parser will attempt to recover after an error
672 in the input expressions. Parsing resumes after skipping input up to
673 the next semi-colon (;), however during this skipping process the scan‐
674 ner is ignorant of comments and strings, so an embedded semi-colon may
675 cause parsing to resume at an unexpected place. This behavior is
676 largely benign, as until the initial syntax error is corrected, pmie
677 will not attempt any expression evaluation.
678
680 Environment variables with the prefix PCP_ are used to parameterize the
681 file and directory names used by PCP. On each installation, the file
682 /etc/pcp.conf contains the local values for these variables. The
683 $PCP_CONF variable may be used to specify an alternative configuration
684 file, as described in pcp.conf(5).
685
687 logger(1).
688
690 pcp-eventlog(1).
691
693 PCPIntro(1), pmcd(1), pmdumplog(1), pmieconf(1), pmie_check(1),
694 pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5) and pcp.env(5).
695
697 For a more complete description of the pmie language, refer to the Per‐
698 formance Co-Pilot Users and Administrators Guide. This is available
699 online from:
700 http://www.pcp.io/doc/pcp-users-and-administrators-guide.pdf
701
702
703
704Performance Co-Pilot PCP PMIE(1)