1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHPqvVWxXz?] [-a archive] [-A align] [-c filename] [-h
10 host] [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S start‐
11 time] [-t interval] [-T endtime] [-U username] [-Z timezone] [filename
12 ...]
13
15 pmie accepts a collection of arithmetic, logical, and rule expressions
16 to be evaluated at specified frequencies. The base data for the
17 expressions consists of performance metrics values delivered in real-
18 time from any host running the Performance Metrics Collection Daemon
19 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
20 chive logs.
21
22 As well as computing arithmetic and logical values, pmie can execute
23 actions (popup alarms, write system log messages, and launch programs)
24 in response to specified conditions. Such actions are extremely useful
25 in detecting, monitoring and correcting performance related problems.
26
27 The expressions to be evaluated are read from configuration files spec‐
28 ified by one or more filename arguments. In the absence of any file‐
29 name, expressions are read from standard input.
30
31 Output from pmie is directed to standard output and standard error as
32 follows:
33
34 stdout
35 Expression values printed in the verbose -v mode and the output of
36 print actions.
37
38 stderr
39 Error and warning messages for any syntactic or semantic problems
40 during expression parsing, and any semantic or performance metrics
41 availability problems during expression evaluation.
42
44 The available command line options are:
45
46 -a archive, --archive=archive
47 archive which is a comma-separated list of names, each of which
48 may be the base name of an archive or the name of a directory con‐
49 taining one or more archives written by pmlogger(1). Multiple
50 instances of the -a flag may appear on the command line to specify
51 a list of sets of archives. In this case, it is required that
52 only one set of archives be present for any one host. Also, any
53 explicit host names occurring in a pmie expression must match the
54 host name recorded in one of the archive labels. In the case of
55 multiple sets of archives, timestamps recorded in the archives are
56 used to ensure temporal consistency.
57
58 -A align, --align=align
59 Force the initial time window to be aligned on the boundary of a
60 natural time unit align. Refer to PCPIntro(1) for a complete
61 description of the syntax for align.
62
63 -b, --buffer
64 Output will be line buffered and standard output is attached to
65 standard error. This is most useful for background execution in
66 conjunction with the -l option. The -b option is always used for
67 pmie instances launched from pmie_check(1).
68
69 -c config, --config=config
70 An alternative to specifying filename at the end of the command
71 line.
72
73 -C, --check
74 Parse the configuration file(s) and exit before performing any
75 evaluations. Any errors in the configuration file are reported.
76
77 -d, --interact
78 Normally pmie would be launched as a non-interactive process to
79 monitor and manage the performance of one or more hosts. Given
80 the -d flag however, execution is interactive and the user is pre‐
81 sented with a menu of options. Interactive mode is useful mainly
82 for debugging new expressions.
83
84 -e, --timestamp
85 When used with -V, -v or -W, this option forces timestamps to be
86 reported with each expression. The timestamps are in ctime(3)
87 format, enclosed in parenthesis and appear after the expression
88 name and before the expression value, e.g.
89 expr_1 (Tue Feb 6 19:55:10 2001): 12
90
91 -f, --foreground
92 If the -l option is specified and there is no -a option (ie. real-
93 time monitoring) then pmie is run as a daemon in the background
94 (in all other cases foreground is the default). The -f option
95 forces pmie to be run in the foreground, independent of any other
96 options.
97
98 -h host, --host=host
99 By default performance data is fetched from the local host (in
100 real-time mode) or the host for the first named set of archives on
101 the command line (in archive mode). The host argument overrides
102 this default. It does not override hosts explicitly named in the
103 expressions being evaluated. The host argument is interpreted as
104 a connection specification for pmNewContext, and is later mapped
105 to the remote pmcd's self-reported host name for reporting pur‐
106 poses. See also the %h vs. %c substitutions in rule action
107 strings below.
108
109 -l logfile, --logfile=logfile
110 Standard error is sent to logfile.
111
112 -j file
113 An alternative STOMP protocol configuration is loaded from stomp‐
114 file. If this option is not used, and the stomp action is used in
115 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
116 will be used.
117
118 -n pmnsfile, --namespace=pmnsfile
119 An alternative Performance Metrics Name Space (PMNS) is loaded
120 from the file pmnsfile.
121
122 -O origin, --origin=origin
123 Specify the origin of the time window. See PCPIntro(1) for com‐
124 plete description of this option.
125
126 -P, --primary
127 Identifies this as the primary pmie instance for a host. See the
128 ``AUTOMATIC RESTART'' section below for further details.
129
130 -q, --quiet
131 Suppresses diagnostic messages that would be printed to standard
132 output by default, especially the "evaluator exiting" message as
133 this can confuse scripts.
134
135 -S starttime, --start=starttime
136 Specify the starttime of the time window. See PCPIntro(1) for
137 complete description of this option.
138
139 -t interval, --interval=interval
140 The interval argument follows the syntax described in PCPIntro(1),
141 and in the simplest form may be an unsigned integer (the implied
142 units in this case are seconds). The value is used to determine
143 the sample interval for expressions that do not explicitly set
144 their sample interval using the pmie variable delta described
145 below. The default is 10.0 seconds.
146
147 -T endtime, --finish=endtime
148 Specify the endtime of the time window. See PCPIntro(1) for com‐
149 plete description of this option.
150
151 -U username, --username=username
152 User account under which to run pmie. The default is the current
153 user account for interactive use. When run as a daemon, the
154 unprivileged "pcp" account is used in current versions of PCP, but
155 in older versions the superuser account ("root") was used by
156 default.
157
158 -v Unless one of the verbose options -V, -v or -W appears on the com‐
159 mand line, expressions are evaluated silently, the only output is
160 as a result of any actions being executed. In the verbose mode,
161 specified using the -v flag, the value of each expression is
162 printed as it is evaluated. The values are in canonical units;
163 bytes in the dimension of ``space'', seconds in the dimension of
164 ``time'' and events in the dimension of ``count''. See
165 pmLookupDesc(3) for details of the supported dimension and scaling
166 mechanisms for performance metrics. The verbose mode is useful in
167 monitoring the value of given expressions, evaluating derived per‐
168 formance metrics, passing these values on to other tools for fur‐
169 ther processing and in debugging new expressions.
170
171 -V, --verbose
172 This option has the same effect as the -v option, except that the
173 name of the host and instance (if applicable) are printed as well
174 as expression values.
175
176 -W This option has the same effect as the -V option described above,
177 except that for boolean expressions, only those names and values
178 that make the expression true are printed. These are the same
179 names and values accessible to rule actions as the %h, %i, %c and
180 %v bindings, as described below.
181
182 -x, --secret-agent
183 Execute in domain agent mode. This mode is used within the Per‐
184 formance Co-Pilot product to derive values for summary metrics,
185 see pmdasummary(1). Only restricted functionality is available in
186 this mode (expressions with actions may not be used).
187
188 -X, --secret-applet
189 Run in secret applet mode (thin client).
190
191 -z, --hostzone
192 Change the reporting timezone to the timezone of the host that is
193 the source of the performance metrics, as identified via either
194 the -h option or the first named set of archives (as described
195 above for the -a option).
196
197 -Z timezone, --timezone=timezone
198 Change the reporting timezone to timezone in the format of the
199 environment variable TZ as described in environ(7).
200
201 -?, --help
202 Display usage message and exit.
203
205 The following example expressions demonstrate some of the capabilities
206 of the inference engine.
207
208 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
209 examples of pmie expressions.
210
211 The variable delta controls expression evaluation frequency. Specify
212 that subsequent expressions be evaluated once a second, until further
213 notice:
214
215 delta = 1 sec;
216
217 If the total context switch rate exceeds 10000 per second per CPU, then
218 display an alarm notifier:
219
220 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
221 -> alarm "high context switch rate %v";
222
223 If the high context switch rate is sustained for 10 consecutive sam‐
224 ples, then launch top(1) in an xterm(1) window to monitor processes,
225 but do this at most once every 5 minutes:
226
227 all_sample (
228 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
229 ) -> shell 5 min "xterm -e 'top'";
230
231 The following rules are evaluated once every 20 seconds:
232
233 delta = 20 sec;
234
235 If any disk is performing more than 60 I/Os per second, then print a
236 message identifying the busy disk to standard output and launch
237 dkvis(1):
238
239 some_inst (
240 disk.dev.total > 60 count/sec
241 ) -> print "busy disks:" " %i" &
242 shell 5 min "dkvis";
243
244 Refine the preceding rule to apply only between the hours of 9am and
245 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
246 before executing the action:
247
248 $hour >= 9 && $hour <= 17 &&
249 some_inst (
250 75 %_sample (
251 disk.dev.total @0..3 > 60 count/sec
252 )
253 ) -> print "disks busy for 20 sec:" " [%h]%i";
254
255 The following two rules are evaluated once every 10 minutes:
256
257 delta = 10 min;
258
259 If either the / or the /usr filesystem is more than 95% full, display
260 an alarm popup, but not if it has already been displayed during the
261 last 4 hours:
262
263 filesys.free #'/dev/root' /
264 filesys.capacity #'/dev/root' < 0.05
265 -> alarm 4 hour "root filesystem (almost) full";
266
267 filesys.free #'/dev/usr' /
268 filesys.capacity #'/dev/usr' < 0.05
269 -> alarm 4 hour "/usr filesystem (almost) full";
270
271 The following rule requires a machine that supports the lmsensors met‐
272 rics. If the machine environment temperature rises more than 2 degrees
273 over a 10 minute interval, write an entry in the system log:
274
275 lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
276 -> alarm "temperature rising fast" &
277 syslog "machine room temperature rise alarm";
278
279 And something interesting if you have performance problems with your
280 Oracle database:
281
282 // back to 30sec evaluations
283 delta = 30 sec;
284 sid = "ptg1"; # $ORACLE_SID setting
285 lid = "223"; # latch ID from v$latch
286 lru = "#'$sid/$lid cache buffers lru chain'";
287 host = ":moomba.melbourne.sgi.com";
288 gets = "oracle.latch.gets $host $lru";
289 total = "oracle.latch.gets $host $lru +
290 oracle.latch.misses $host $lru +
291 oracle.latch.immisses $host $lru";
292
293 $total > 100 && $gets / $total < 0.2
294 -> alarm "high lru latch contention in database $sid";
295
296 The following ruleset will emit exactly one message depending on the
297 availability and value of the 1-minute load average.
298
299 delta = 1 minute;
300 ruleset
301 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
302 print "extreme load average %v"
303 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
304 print "moderate load average %v"
305 unknown ->
306 print "load average unavailable"
307 otherwise ->
308 print "load average OK"
309 ;
310
311 The following rule will emit a message when some filesystem is more
312 than 75% full and is filling at a rate that if sustained would fill the
313 filesystem to 100% in less than 30 minutes.
314
315 some_inst (
316 100 * filesys.used / filesys.capacity > 75 &&
317 filesys.used + 30min * (rate filesys.used) > filesys.capacity
318 ) -> print "filesystem will be full within 30 mins:" " %i";
319
320 If the metric mypmda.errors counts errors then the following rule will
321 emit a message if the rate of errors exceeds 1 per second provided the
322 error count is less than 100.
323
324 mypmda.errors > 1 && instant mypmda.errors < 100
325 -> print "high error rate: %v";
326
328 The pmie specification language is powerful and large.
329
330 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
331 vides a facility for generating a pmie configuration file from a set of
332 generalized pmie rules. The supplied set of rules covers a wide range
333 of performance scenarios.
334
335 The Performance Co-Pilot User's and Administrator's Guide provides a
336 detailed tutorial-style chapter covering pmie.
337
339 This description is terse and informal. For a more comprehensive
340 description see the Performance Co-Pilot User's and Administrator's
341 Guide.
342
343 A pmie specification is a sequence of semicolon terminated expressions.
344
345 Basic operators are modeled on the arithmetic, relational and Boolean
346 operators of the C programming language. Precedence rules are as
347 expected, although the use of parentheses is encouraged to enhance
348 readability and remove ambiguity.
349
350 Operands are performance metric names (see PMNS(5)) and the normal lit‐
351 eral constants.
352
353 Operands involving performance metrics may produce sets of values, as a
354 result of enumeration in the dimensions of hosts, instances and time.
355 Special qualifiers may appear after a performance metric name to define
356 the enumeration in each dimension. For example,
357
358 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
359
360 defines 6 values corresponding to the time spent executing in user mode
361 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
362 samples. The default interpretation in the absence of : (host), #
363 (instance) and @ (time) qualifiers is all instances at the most recent
364 sample time for the default source of PCP performance metrics.
365
366 Host and instance names that do not follow the rules for variables in
367 programming languages, ie. alphabetic optionally followed by alphanu‐
368 merics, should be enclosed in single quotes.
369
370 Expression evaluation follows the law of ``least surprises''. Where
371 performance metrics have the semantics of a counter, pmie will automat‐
372 ically convert to a rate based upon consecutive samples and the time
373 interval between these samples. All numeric expressions are evaluated
374 in double precision, and where appropriate, automatically scaled into
375 canonical units of ``bytes'', ``seconds'' and ``counts''.
376
377 A rule is a special form of expression that specifies a condition or
378 logical expression, a special operator (->) and actions to be performed
379 when the condition is found to be true.
380
381 The following table summarizes the basic pmie operators:
382
383 ┌────────────────┬────────────────────────────────────────────────┐
384 │ Operators │ Explanation │
385 ├────────────────┼────────────────────────────────────────────────┤
386 │+ - * / │ Arithmetic │
387 │< <= == >= > != │ Relational (value comparison) │
388 │! && || │ Boolean │
389 │-> │ Rule │
390 │rising │ Boolean, false to true transition │
391 │falling │ Boolean, true to false transition │
392 │rate │ Explicit rate conversion (rarely required) │
393 │instant │ No automatic rate conversion (rarely required) │
394 └────────────────┴────────────────────────────────────────────────┘
395 All operators are supported for numeric-valued operands and expres‐
396 sions. For string-valued operands, namely literal string constants
397 enclosed in double quotes or metrics with a data type of string
398 (PM_TYPE_STRING), only the operators == and != are supported.
399
400 The rate and instant operators are the logical inverse of one another,
401 so an arithmetic expression expr is equal to rate instant expr. The
402 more useful cases involve using rate with a metric that is not a
403 counter to determine the rate of change over time or instant with a
404 metric that is a counter to determine if the current value is above or
405 below some threshold.
406
407 Aggregate operators may be used to aggregate or summarize along one
408 dimension of a set-valued expression. The following aggregate opera‐
409 tors map from a logical expression to a logical expression of lower
410 dimension.
411
412 ┌─────────────────────────┬─────────────┬──────────────────────────┐
413 │ Operators │ Type │ Explanation │
414 ├─────────────────────────┼─────────────┼──────────────────────────┤
415 │some_inst │ Existential │ True if at least one set │
416 │some_host │ │ member is true in the │
417 │some_sample │ │ associated dimension │
418 ├─────────────────────────┼─────────────┼──────────────────────────┤
419 │all_inst │ Universal │ True if all set members │
420 │all_host │ │ are true in the associ‐ │
421 │all_sample │ │ ated dimension │
422 ├─────────────────────────┼─────────────┼──────────────────────────┤
423 │N%_inst │ Percentile │ True if at least N per‐ │
424 │N%_host │ │ cent of set members are │
425 │N%_sample │ │ true in the associated │
426 │ │ │ dimension │
427 └─────────────────────────┴─────────────┴──────────────────────────┘
428 The following instantial operators may be used to filter or limit a
429 set-valued logical expression, based on regular expression matching of
430 instance names. The logical expression must be a set involving the
431 dimension of instances, and the regular expression is of the form used
432 by egrep(1) or the Extended Regular Expressions of regcomp(3).
433
434 ┌─────────────┬──────────────────────────────────────────┐
435 │ Operators │ Explanation │
436 ├─────────────┼──────────────────────────────────────────┤
437 │match_inst │ For each value of the logical expression │
438 │ │ that is ``true'', the result is ``true'' │
439 │ │ if the associated instance name matches │
440 │ │ the regular expression. Otherwise the │
441 │ │ result is ``false''. │
442 ├─────────────┼──────────────────────────────────────────┤
443 │nomatch_inst │ For each value of the logical expression │
444 │ │ that is ``true'', the result is ``true'' │
445 │ │ if the associated instance name does not │
446 │ │ match the regular expression. Otherwise │
447 │ │ the result is ``false''. │
448 └─────────────┴──────────────────────────────────────────┘
449 For example, the expression below will be ``true'' for disks attached
450 to controllers 2 or 3 performing more than 20 operations per second:
451 match_inst "^dks[23]d" disk.dev.total > 20;
452
453 The following aggregate operators map from an arithmetic expression to
454 an arithmetic expression of lower dimension.
455
456 ┌─────────────────────────┬───────────┬──────────────────────────┐
457 │ Operators │ Type │ Explanation │
458 ├─────────────────────────┼───────────┼──────────────────────────┤
459 │min_inst │ Extrema │ Minimum value across all │
460 │min_host │ │ set members in the asso‐ │
461 │min_sample │ │ ciated dimension │
462 ├─────────────────────────┼───────────┼──────────────────────────┤
463 │max_inst │ Extrema │ Maximum value across all │
464 │max_host │ │ set members in the asso‐ │
465 │max_sample │ │ ciated dimension │
466 ├─────────────────────────┼───────────┼──────────────────────────┤
467 │sum_inst │ Aggregate │ Sum of values across all │
468 │sum_host │ │ set members in the asso‐ │
469 │sum_sample │ │ ciated dimension │
470 ├─────────────────────────┼───────────┼──────────────────────────┤
471 │avg_inst │ Aggregate │ Average value across all │
472 │avg_host │ │ set members in the asso‐ │
473 │avg_sample │ │ ciated dimension │
474 └─────────────────────────┴───────────┴──────────────────────────┘
475 The aggregate operators count_inst, count_host and count_sample map
476 from a logical expression to an arithmetic expression of lower dimen‐
477 sion by counting the number of set members for which the expression is
478 true in the associated dimension.
479
480 For action rules, the following actions are defined:
481
482 ┌──────────┬────────────────────────────────────────┐
483 │Operators │ Explanation │
484 ├──────────┼────────────────────────────────────────┤
485 │alarm │ Raise a visible alarm with xconfirm(1) │
486 │print │ Display on standard output │
487 │shell │ Execute with sh(1) │
488 │stomp │ Send a STOMP message to a JMS server │
489 │syslog │ Append a message to system log file │
490 └──────────┴────────────────────────────────────────┘
491 Multiple actions may be separated by the & and | operators to specify
492 respectively sequential execution (both actions are executed) and
493 alternate execution (the second action will only be executed if the
494 execution of the first action returns a non-zero error status.
495
496 Arguments to actions are an optional suppression time, and then one or
497 more expressions (a string is an expression in this context). Strings
498 appearing as arguments to an action may include the following special
499 selectors that will be replaced at the time the action is executed.
500
501 %h Host name(s) that make the left-most top-level expression in the
502 condition true.
503
504 %c Connection specification string(s) or files for a PCP tool to reach
505 the hosts or archives that make the left-most top-level expression
506 in the condition true.
507
508 %i Instance(s) that make the left-most top-level expression in the
509 condition true.
510
511 %v One value from the left-most top-level expression in the condition
512 for each host and instance pair that makes the condition true.
513
514 Note that expansion of the special selectors is done by repeating the
515 whole argument once for each unique binding to any of the qualifying
516 special selectors. For example if a rule were true for the host mumble
517 with instances grunt and snort, and for host fumble the instance puff
518 makes the rule true, then the action
519 ...
520 -> shell myscript "Warning: %h:%i busy ";
521 will execute myscript with the argument string "Warning: mumble:grunt
522 busy Warning: mumble:snort busy Warning: fumble:puff busy".
523
524 By comparison, if the action
525 ...
526 -> shell myscript "Warning! busy:" " %h:%i";
527 were executed under the same circumstances, then myscript would be exe‐
528 cuted with the argument string "Warning! busy: mumble:grunt mum‐
529 ble:snort fumble:puff".
530
531 The semantics of the expansion of the special selectors leads to a com‐
532 mon usage pattern in an action, where one argument is a constant (con‐
533 tains no special selectors) the second argument contains the desired
534 special selectors with minimal separator characters, and an optional
535 third argument provides a constant postscript (e.g. to terminate any
536 argument quoting from the first argument). If necessary post-process‐
537 ing (eg. in myscript) can provide the necessary enumeration over each
538 unique expansion of the string containing just the special selectors.
539
540 For complex conditions, the bindings to these selectors is not obvious.
541 It is strongly recommended that pmie be used in the debugging mode
542 (specify the -W command line option in particular) during rule develop‐
543 ment.
544
546 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
547 10 or some_inst ( my.table < 0 ) are assigned the values true or false
548 or unknown. A value is unknown if one or more of the underlying metric
549 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
550 the metric is not in the PCP archive, no values are currently avail‐
551 able, insufficient values have been fetched to allow a rate converted
552 value to be computed or insufficient values have been fetched to
553 instantiate the required number of samples in the temporal domain.
554
555 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
556 logic) when combining values that include unknown:
557
558 ┌────────────┬───────────────────────────┐
559 │ │ B │
560 │ A and B ├─────────┬───────┬─────────┤
561 │ │ true │ false │ unknown │
562 ├──┬─────────┼─────────┼───────┼─────────┤
563 │ │ true │ true │ false │ unknown │
564 │ ├─────────┼─────────┼───────┼─────────┤
565 │A │ false │ false │ false │ false │
566 │ ├─────────┼─────────┼───────┼─────────┤
567 │ │ unknown │ unknown │ false │ unknown │
568 └──┴─────────┴─────────┴───────┴─────────┘
569 ┌────────────┬──────────────────────────┐
570 │ │ B │
571 │ A or B ├──────┬─────────┬─────────┤
572 │ │ true │ false │ unknown │
573 ├──┬─────────┼──────┼─────────┼─────────┤
574 │ │ true │ true │ true │ true │
575 │ ├─────────┼──────┼─────────┼─────────┤
576 │A │ false │ true │ false │ unknown │
577 │ ├─────────┼──────┼─────────┼─────────┤
578 │ │ unknown │ true │ unknown │ unknown │
579 └──┴─────────┴──────┴─────────┴─────────┘
580 ┌────────┬─────────┐
581 │ A │ not A │
582 ├────────┼─────────┤
583 │ true │ false │
584 ├────────┼─────────┤
585 │ false │ true │
586 ├────────┼─────────┤
587 │unknown │ unknown │
588 └────────┴─────────┘
590 The ruleset clause is used to define a set of rules and actions that
591 are evaluated in order until some action is executed, at which point
592 the remaining rules and actions are skipped until the ruleset is again
593 scheduled for evaluation. The keyword else is used to separate rules.
594 After one or more regular rules (with a predicate and an action), a
595 ruleset may include an optional
596 unknown -> action
597 clause, optionally followed by a
598 otherwise -> action
599 clause.
600
601 If all of the predicates in the rules evaluate to unknown and an
602 unknown clause has been specified then action associated with the
603 unknown clause will be executed.
604
605 If no rule predicate is true and the unknown action is either not spec‐
606 ified or not executed and an otherwise clause has been specified, then
607 the action associated with the otherwise clause will be executed.
608
610 Scale factors may be appended to arithmetic expressions and force lin‐
611 ear scaling of the value to canonical units. Simple scale factors are
612 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
613 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
614 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
615 the operator /, for example ``Kbytes / hour''.
616
618 Macros are defined using expressions of the form:
619
620 name = constexpr;
621
622 Where name follows the normal rules for variables in programming lan‐
623 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
624 must be a constant expression, either a string (enclosed in double
625 quotes) or an arithmetic expression optionally followed by a scale fac‐
626 tor.
627
628 Macros are expanded when their name, prefixed by a dollar ($) appears
629 in an expression, and macros may be nested within a constexpr string.
630
631 The following reserved macro names are understood.
632
633 minute Current minute of the hour.
634
635 hour Current hour of the day, in the range 0 to 23.
636
637 day Current day of the month, in the range 1 to 31.
638
639 month Current month of the year, in the range 0 (January) to 11
640 (December).
641
642 year Current year.
643
644 day_of_week
645 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
646 day).
647
648 delta Sample interval in effect for this expression.
649
650 Dates and times are presented in the reporting time zone (see descrip‐
651 tion of -Z and -z command line options above).
652
654 It is often useful for pmie processes to be started and stopped when
655 the local host is booted or shutdown, or when they have been detected
656 as no longer running (when they have unexpectedly exited for some rea‐
657 son). Refer to pmie_check(1) for details on automating this process.
658
659 Optionally, each system running pmcd(1) may also be configured to run a
660 ``primary'' pmie instance. This pmie instance is launched by
661 $PCP_RC_DIR/pmie, and is affected by the files
662 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
663 chkconfig(8), systemctl(1) or similar platform-specific commands to
664 activate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
665 fig/pmie/config.default (the default initial configuration file for the
666 primary pmie).
667
668 The primary pmie instance is identified by the -P option. There may be
669 at most one ``primary'' pmie instance on each system. The primary pmie
670 instance (if any) must be running on the same host as the pmcd(1) to
671 which it connects (if any), so the -h and -P options are mutually
672 exclusive.
673
675 It is common for production systems to be monitored in a central loca‐
676 tion. Traditionally on UNIX systems this has been performed by the
677 system log facilities - see logger(1), and syslogd(1). On Windows,
678 communication with the system event log is handled by pcp-eventlog(1).
679
680 pmie fits into this model when rules use the syslog action. Note that
681 if the action string begins with -p (priority) and/or -t (tag) then
682 these are extracted from the string and treated in the same way as in
683 logger(1) and pcp-eventlog(1).
684
685 However, it is common to have other event monitoring frameworks also,
686 into which you may wish to incorporate performance events from pmie.
687 You can often use the shell action to send events to these frameworks,
688 as they usually provide their a program for injecting events into the
689 framework from external sources.
690
691 A final option is use of the stomp (Streaming Text Oriented Messaging
692 Protocol) action, which allows pmie to connect to a central JMS (Java
693 Messaging System) server and send events to the PMIE topic. Tools can
694 be written to extract these text messages and present them to opera‐
695 tions people (via desktop popup windows, etc). Use of the stomp action
696 requires a stomp configuration file to be setup, which specifies the
697 location of the JMS server host, port number, and username/password.
698
699 The format of this file is as follows:
700
701 host=messages.sgi.com # this is the JMS server (required)
702 port=61616 # and its listening here (required)
703 timeout=2 # seconds to wait for server (optional)
704 username=joe # (required)
705 password=j03ST0MP # (required)
706 topic=PMIE # JMS topic for pmie messages (optional)
707
708 The timeout value specifies the time (in seconds) that pmie should wait
709 for acknowledgements from the JMS server after sending a message (as
710 required by the STOMP protocol). Note that on startup, pmie will wait
711 indefinitely for a connection, and will not begin rule evaluation until
712 that initial connection has been established. Should the connection to
713 the JMS server be lost at any time while pmie is running, pmie will
714 attempt to reconnect on each subsequent truthful evaluation of a rule
715 with a stomp action, but not more than once per minute. This is to
716 avoid contributing to network congestion. In this situation, where the
717 STOMP connection to the JMS server has been severed, the stomp action
718 will return a non-zero error value.
719
721 The lexical scanner and parser will attempt to recover after an error
722 in the input expressions. Parsing resumes after skipping input up to
723 the next semi-colon (;), however during this skipping process the scan‐
724 ner is ignorant of comments and strings, so an embedded semi-colon may
725 cause parsing to resume at an unexpected place. This behavior is
726 largely benign, as until the initial syntax error is corrected, pmie
727 will not attempt any expression evaluation.
728
730 $PCP_DEMOS_DIR/pmie/*
731 annotated example rules
732
733 $PCP_VAR_DIR/pmns/*
734 default PMNS specification files
735
736 $PCP_TMP_DIR/pmie
737 pmie maintains files in this directory to identify the running
738 pmie instances and to export runtime information about each
739 instance - this data forms the basis of the pmcd.pmie performance
740 metrics
741
742 $PCP_PMIECONTROL_PATH
743 the default set of pmie instances to start at boot time - refer to
744 pmie_check(1) for details
745
747 Environment variables with the prefix PCP_ are used to parameterize the
748 file and directory names used by PCP. On each installation, the file
749 /etc/pcp.conf contains the local values for these variables. The
750 $PCP_CONF variable may be used to specify an alternative configuration
751 file, as described in pcp.conf(5).
752
753 When executing shell actions, pmie overrides two variables - IFS and
754 PATH - in the environment of the child process. IFS is set to "\t\n".
755 The PATH is set to a combination of a default path for all platforms
756 ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable compo‐
757 nents. These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
758 $PCP_PLATFORM_PATHS.
759
760 When executing popup alarm actions, pmie will use the value of
761 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
762 typically set to pmconfirm(1), a cross-platform dialog box.
763
765 logger(1).
766
768 pcp-eventlog(1).
769
771 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
772 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5),
773 pcp.env(5) and PMNS(5).
774
776 For a more complete description of the pmie language, refer to the Per‐
777 formance Co-Pilot Users and Administrators Guide. This is available
778 online from:
779 https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
780
781
782
783Performance Co-Pilot PCP PMIE(1)