1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHPqvVWxXz?] [-a archive] [-A align] [-c filename] [-h
10 host] [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S start‐
11 time] [-t interval] [-T endtime] [-U username] [-Z timezone] [filename
12 ...]
13
15 pmie accepts a collection of arithmetic, logical, and rule expressions
16 to be evaluated at specified frequencies. The base data for the
17 expressions consists of performance metrics values delivered in real-
18 time from any host running the Performance Metrics Collection Daemon
19 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
20 chive logs.
21
22 As well as computing arithmetic and logical values, pmie can execute
23 actions (popup alarms, write system log messages, and launch programs)
24 in response to specified conditions. Such actions are extremely useful
25 in detecting, monitoring and correcting performance related problems.
26
27 The expressions to be evaluated are read from configuration files spec‐
28 ified by one or more filename arguments. In the absence of any file‐
29 name, expressions are read from standard input.
30
31 Output from pmie is directed to standard output and standard error as
32 follows:
33
34 stdout
35 Expression values printed in the verbose -v mode and the output of
36 print actions.
37
38 stderr
39 Error and warning messages for any syntactic or semantic problems
40 during expression parsing, and any semantic or performance metrics
41 availability problems during expression evaluation.
42
44 The available command line options are:
45
46 -a archive, --archive=archive
47 archive which is a comma-separated list of names, each of which
48 may be the base name of an archive or the name of a directory con‐
49 taining one or more archives written by pmlogger(1). Multiple
50 instances of the -a flag may appear on the command line to specify
51 a list of sets of archives. In this case, it is required that
52 only one set of archives be present for any one host. Also, any
53 explicit host names occurring in a pmie expression must match the
54 host name recorded in one of the archive labels. In the case of
55 multiple sets of archives, timestamps recorded in the archives are
56 used to ensure temporal consistency.
57
58 -A align, --align=align
59 Force the initial time window to be aligned on the boundary of a
60 natural time unit align. Refer to PCPIntro(1) for a complete
61 description of the syntax for align.
62
63 -b, --buffer
64 Output will be line buffered and standard output is attached to
65 standard error. This is most useful for background execution in
66 conjunction with the -l option. The -b option is always used for
67 pmie instances launched from pmie_check(1).
68
69 -c config, --config=config
70 An alternative to specifying filename at the end of the command
71 line.
72
73 -C, --check
74 Parse the configuration file(s) and exit before performing any
75 evaluations. Any errors in the configuration file are reported.
76
77 -d, --interact
78 Normally pmie would be launched as a non-interactive process to
79 monitor and manage the performance of one or more hosts. Given
80 the -d flag however, execution is interactive and the user is pre‐
81 sented with a menu of options. Interactive mode is useful mainly
82 for debugging new expressions.
83
84 -e, --timestamp
85 When used with -V, -v or -W, this option forces timestamps to be
86 reported with each expression. The timestamps are in ctime(3)
87 format, enclosed in parenthesis and appear after the expression
88 name and before the expression value, e.g.
89 expr_1 (Tue Feb 6 19:55:10 2001): 12
90
91 -f, --foreground
92 If the -l option is specified and there is no -a option (ie. real-
93 time monitoring) then pmie is run as a daemon in the background
94 (in all other cases foreground is the default). The -f option
95 forces pmie to be run in the foreground, independent of any other
96 options.
97
98 -h host, --host=host
99 By default performance data is fetched from the local host (in
100 real-time mode) or the host for the first named set of archives on
101 the command line (in archive mode). The host argument overrides
102 this default. It does not override hosts explicitly named in the
103 expressions being evaluated. The host argument is interpreted as
104 a connection specification for pmNewContext, and is later mapped
105 to the remote pmcd's self-reported host name for reporting pur‐
106 poses. See also the %h vs. %c substitutions in rule action
107 strings below.
108
109 -l logfile, --logfile=logfile
110 Standard error is sent to logfile.
111
112 -j file
113 An alternative STOMP protocol configuration is loaded from stomp‐
114 file. If this option is not used, and the stomp action is used in
115 any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
116 will be used.
117
118 -n pmnsfile, --namespace=pmnsfile
119 An alternative Performance Metrics Name Space (PMNS) is loaded
120 from the file pmnsfile.
121
122 -O origin, --origin=origin
123 Specify the origin of the time window. See PCPIntro(1) for com‐
124 plete description of this option.
125
126 -P, --primary
127 Identifies this as the primary pmie instance for a host. See the
128 ``AUTOMATIC RESTART'' section below for further details.
129
130 -q, --quiet
131 Suppresses diagnostic messages that would be printed to standard
132 output by default, especially the "evaluator exiting" message as
133 this can confuse scripts.
134
135 -S starttime, --start=starttime
136 Specify the starttime of the time window. See PCPIntro(1) for
137 complete description of this option.
138
139 -t interval, --interval=interval
140 The interval argument follows the syntax described in PCPIntro(1),
141 and in the simplest form may be an unsigned integer (the implied
142 units in this case are seconds). The value is used to determine
143 the sample interval for expressions that do not explicitly set
144 their sample interval using the pmie variable delta described
145 below. The default is 10.0 seconds.
146
147 -T endtime, --finish=endtime
148 Specify the endtime of the time window. See PCPIntro(1) for com‐
149 plete description of this option.
150
151 -U username, --username=username
152 User account under which to run pmie. The default is the current
153 user account for interactive use. When run as a daemon, the
154 unprivileged "pcp" account is used in current versions of PCP, but
155 in older versions the superuser account ("root") was used by
156 default.
157
158 -v Unless one of the verbose options -V, -v or -W appears on the com‐
159 mand line, expressions are evaluated silently, the only output is
160 as a result of any actions being executed. In the verbose mode,
161 specified using the -v flag, the value of each expression is
162 printed as it is evaluated. The values are in canonical units;
163 bytes in the dimension of ``space'', seconds in the dimension of
164 ``time'' and events in the dimension of ``count''. See
165 pmLookupDesc(3) for details of the supported dimension and scaling
166 mechanisms for performance metrics. The verbose mode is useful in
167 monitoring the value of given expressions, evaluating derived per‐
168 formance metrics, passing these values on to other tools for fur‐
169 ther processing and in debugging new expressions.
170
171 -V, --verbose
172 This option has the same effect as the -v option, except that the
173 name of the host and instance (if applicable) are printed as well
174 as expression values.
175
176 -W This option has the same effect as the -V option described above,
177 except that for boolean expressions, only those names and values
178 that make the expression true are printed. These are the same
179 names and values accessible to rule actions as the %h, %i, %c and
180 %v bindings, as described below.
181
182 -x, --secret-agent
183 Execute in domain agent mode. This mode is used within the Per‐
184 formance Co-Pilot product to derive values for summary metrics,
185 see pmdasummary(1). Only restricted functionality is available in
186 this mode (expressions with actions may not be used).
187
188 -X, --secret-applet
189 Run in secret applet mode (thin client).
190
191 -z, --hostzone
192 Change the reporting timezone to the timezone of the host that is
193 the source of the performance metrics, as identified via either
194 the -h option or the first named set of archives (as described
195 above for the -a option).
196
197 -Z timezone, --timezone=timezone
198 Change the reporting timezone to timezone in the format of the
199 environment variable TZ as described in environ(7).
200
201 -?, --help
202 Display usage message and exit.
203
205 The following example expressions demonstrate some of the capabilities
206 of the inference engine.
207
208 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
209 examples of pmie expressions.
210
211 The variable delta controls expression evaluation frequency. Specify
212 that subsequent expressions be evaluated once a second, until further
213 notice:
214
215 delta = 1 sec;
216
217 If the total context switch rate exceeds 10000 per second per CPU, then
218 display an alarm notifier:
219
220 kernel.all.pswitch / hinv.ncpu > 10000 count/sec
221 -> alarm "high context switch rate %v";
222
223 If the high context switch rate is sustained for 10 consecutive sam‐
224 ples, then launch top(1) in an xterm(1) window to monitor processes,
225 but do this at most once every 5 minutes:
226
227 all_sample (
228 kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
229 ) -> shell 5 min "xterm -e 'top'";
230
231 The following rules are evaluated once every 20 seconds:
232
233 delta = 20 sec;
234
235 If any disk is performing more than 60 I/Os per second, then print a
236 message identifying the busy disk to standard output and launch
237 dkvis(1):
238
239 some_inst (
240 disk.dev.total > 60 count/sec
241 ) -> print "busy disks:" " %i" &
242 shell 5 min "dkvis";
243
244 Refine the preceding rule to apply only between the hours of 9am and
245 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
246 before executing the action:
247
248 $hour >= 9 && $hour <= 17 &&
249 some_inst (
250 75 %_sample (
251 disk.dev.total @0..3 > 60 count/sec
252 )
253 ) -> print "disks busy for 20 sec:" " [%h]%i";
254
255 The following two rules are evaluated once every 10 minutes:
256
257 delta = 10 min;
258
259 If either the / or the /usr filesystem is more than 95% full, display
260 an alarm popup, but not if it has already been displayed during the
261 last 4 hours:
262
263 filesys.free #'/dev/root' /
264 filesys.capacity #'/dev/root' < 0.05
265 -> alarm 4 hour "root filesystem (almost) full";
266
267 filesys.free #'/dev/usr' /
268 filesys.capacity #'/dev/usr' < 0.05
269 -> alarm 4 hour "/usr filesystem (almost) full";
270
271 The following rule requires a machine that supports the lmsensors met‐
272 rics. If the machine environment temperature rises more than 2 degrees
273 over a 10 minute interval, write an entry in the system log:
274
275 lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
276 -> alarm "temperature rising fast" &
277 syslog "machine room temperature rise alarm";
278
279 And something interesting if you have performance problems with your
280 Oracle database:
281
282 // back to 30sec evaluations
283 delta = 30 sec;
284 sid = "ptg1"; # $ORACLE_SID setting
285 lid = "223"; # latch ID from v$latch
286 lru = "#'$sid/$lid cache buffers lru chain'";
287 host = ":moomba.melbourne.sgi.com";
288 gets = "oracle.latch.gets $host $lru";
289 total = "oracle.latch.gets $host $lru +
290 oracle.latch.misses $host $lru +
291 oracle.latch.immisses $host $lru";
292
293 $total > 100 && $gets / $total < 0.2
294 -> alarm "high lru latch contention in database $sid";
295
296 The following ruleset will emit exactly one message depending on the
297 availability and value of the 1-minute load average.
298
299 delta = 1 minute;
300 ruleset
301 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
302 print "extreme load average %v"
303 else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
304 print "moderate load average %v"
305 unknown ->
306 print "load average unavailable"
307 otherwise ->
308 print "load average OK"
309 ;
310
311 The following rule will emit a message when some filesystem is more
312 than 75% full and is filling at a rate that if sustained would fill the
313 filesystem to 100% in less than 30 minutes.
314
315 some_inst (
316 100 * filesys.used / filesys.capacity > 75 &&
317 filesys.used + 30min * (rate filesys.used) > filesys.capacity
318 ) -> print "filesystem will be full within 30 mins:" " %i";
319
320 If the metric mypmda.errors counts errors then the following rule will
321 emit a message if the rate of errors exceeds 1 per second provided the
322 error count is less than 100.
323
324 mypmda.errors > 1 && instant mypmda.errors < 100
325 -> print "high error rate: %v";
326
328 The pmie specification language is powerful and large.
329
330 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
331 vides a facility for generating a pmie configuration file from a set of
332 generalized pmie rules. The supplied set of rules covers a wide range
333 of performance scenarios.
334
335 The Performance Co-Pilot User's and Administrator's Guide provides a
336 detailed tutorial-style chapter covering pmie.
337
339 This description is terse and informal. For a more comprehensive
340 description see the Performance Co-Pilot User's and Administrator's
341 Guide.
342
343 A pmie specification is a sequence of semicolon terminated expressions.
344
345 Basic operators are modeled on the arithmetic, relational and Boolean
346 operators of the C programming language. Precedence rules are as
347 expected, although the use of parentheses is encouraged to enhance
348 readability and remove ambiguity.
349
350 Operands are performance metric names (see PMNS(5)) and the normal lit‐
351 eral constants.
352
353 Operands involving performance metrics may produce sets of values, as a
354 result of enumeration in the dimensions of hosts, instances and time.
355 Special qualifiers may appear after a performance metric name to define
356 the enumeration in each dimension. For example,
357
358 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
359
360 defines 6 values corresponding to the time spent executing in user mode
361 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
362 samples. The default interpretation in the absence of : (host), #
363 (instance) and @ (time) qualifiers is all instances at the most recent
364 sample time for the default source of PCP performance metrics.
365
366 Host and instance names that do not follow the rules for variables in
367 programming languages, ie. alphabetic optionally followed by alphanu‐
368 merics, should be enclosed in single quotes.
369
370 Expression evaluation follows the law of ``least surprises''. Where
371 performance metrics have the semantics of a counter, pmie will automat‐
372 ically convert to a rate based upon consecutive samples and the time
373 interval between these samples. All numeric expressions are evaluated
374 in double precision, and where appropriate, automatically scaled into
375 canonical units of ``bytes'', ``seconds'' and ``counts''.
376
377 A rule is a special form of expression that specifies a condition or
378 logical expression, a special operator (->) and actions to be performed
379 when the condition is found to be true.
380
381 The following table summarizes the basic pmie operators:
382
383 ┌────────────────┬────────────────────────────────────────────────┐
384 │ Operators │ Explanation │
385 ├────────────────┼────────────────────────────────────────────────┤
386 │+ - * / │ Arithmetic │
387 │< <= == >= > != │ Relational (value comparison) │
388 │! && || │ Boolean │
389 │-> │ Rule │
390 │rising │ Boolean, false to true transition │
391 │falling │ Boolean, true to false transition │
392 │rate │ Explicit rate conversion (rarely required) │
393 │instant │ No automatic rate conversion (rarely required) │
394 └────────────────┴────────────────────────────────────────────────┘
395 All operators are supported for numeric-valued operands and expres‐
396 sions. For string-valued operands, namely literal string constants
397 enclosed in double quotes or metrics with a data type of string
398 (PM_TYPE_STRING), only the operators == and != are supported.
399
400 The rate and instant operators are the logical inverse of one another,
401 so an arithmetic expression expr is equal to rate instant expr. The
402 more useful cases involve using rate with a metric that is not a
403 counter to determine the rate of change over time or instant with a
404 metric that is a counter to determine if the current value is above or
405 below some threshold.
406
407 Aggregate operators may be used to aggregate or summarize along one
408 dimension of a set-valued expression. The following aggregate opera‐
409 tors map from a logical expression to a logical expression of lower
410 dimension.
411
412 ┌─────────────────────────┬─────────────┬──────────────────────────┐
413 │ Operators │ Type │ Explanation │
414 ├─────────────────────────┼─────────────┼──────────────────────────┤
415 │some_inst │ Existential │ True if at least one set │
416 │some_host │ │ member is true in the │
417 │some_sample │ │ associated dimension │
418 ├─────────────────────────┼─────────────┼──────────────────────────┤
419 │all_inst │ Universal │ True if all set members │
420 │all_host │ │ are true in the associ‐ │
421 │all_sample │ │ ated dimension │
422 ├─────────────────────────┼─────────────┼──────────────────────────┤
423 │N%_inst │ Percentile │ True if at least N per‐ │
424 │N%_host │ │ cent of set members are │
425 │N%_sample │ │ true in the associated │
426 │ │ │ dimension │
427 └─────────────────────────┴─────────────┴──────────────────────────┘
428 The following instantial operators may be used to filter or limit a
429 set-valued logical expression, based on regular expression matching of
430 instance names. The logical expression must be a set involving the
431 dimension of instances, and the regular expression is of the form used
432 by egrep(1) or the Extended Regular Expressions of regcomp(3).
433
434 ┌─────────────┬──────────────────────────────────────────┐
435 │ Operators │ Explanation │
436 ├─────────────┼──────────────────────────────────────────┤
437 │match_inst │ For each value of the logical expression │
438 │ │ that is ``true'', the result is ``true'' │
439 │ │ if the associated instance name matches │
440 │ │ the regular expression. Otherwise the │
441 │ │ result is ``false''. │
442 ├─────────────┼──────────────────────────────────────────┤
443 │nomatch_inst │ For each value of the logical expression │
444 │ │ that is ``true'', the result is ``true'' │
445 │ │ if the associated instance name does not │
446 │ │ match the regular expression. Otherwise │
447 │ │ the result is ``false''. │
448 └─────────────┴──────────────────────────────────────────┘
449 For example, the expression below will be ``true'' for disks attached
450 to controllers 2 or 3 performing more than 20 operations per second:
451 match_inst "^dks[23]d" disk.dev.total > 20;
452
453 The following aggregate operators map from an arithmetic expression to
454 an arithmetic expression of lower dimension.
455
456 ┌─────────────────────────┬───────────┬──────────────────────────┐
457 │ Operators │ Type │ Explanation │
458 ├─────────────────────────┼───────────┼──────────────────────────┤
459 │min_inst │ Extrema │ Minimum value across all │
460 │min_host │ │ set members in the asso‐ │
461 │min_sample │ │ ciated dimension │
462 ├─────────────────────────┼───────────┼──────────────────────────┤
463 │max_inst │ Extrema │ Maximum value across all │
464 │max_host │ │ set members in the asso‐ │
465 │max_sample │ │ ciated dimension │
466 ├─────────────────────────┼───────────┼──────────────────────────┤
467 │sum_inst │ Aggregate │ Sum of values across all │
468 │sum_host │ │ set members in the asso‐ │
469 │sum_sample │ │ ciated dimension │
470 ├─────────────────────────┼───────────┼──────────────────────────┤
471 │avg_inst │ Aggregate │ Average value across all │
472 │avg_host │ │ set members in the asso‐ │
473 │avg_sample │ │ ciated dimension │
474 └─────────────────────────┴───────────┴──────────────────────────┘
475 The aggregate operators count_inst, count_host and count_sample map
476 from a logical expression to an arithmetic expression of lower dimen‐
477 sion by counting the number of set members for which the expression is
478 true in the associated dimension.
479
480 For action rules, the following actions are defined:
481
482 ┌──────────┬────────────────────────────────────────┐
483 │Operators │ Explanation │
484 ├──────────┼────────────────────────────────────────┤
485 │alarm │ Raise a visible alarm with xconfirm(1) │
486 │print │ Display on standard output │
487 │shell │ Execute with sh(1) │
488 │stomp │ Send a STOMP message to a JMS server │
489 │syslog │ Append a message to system log file │
490 └──────────┴────────────────────────────────────────┘
491 Multiple actions may be separated by the & and | operators to specify
492 respectively sequential execution (both actions are executed) and
493 alternate execution (the second action will only be executed if the
494 execution of the first action returns a non-zero error status.
495
496 Arguments to actions are an optional suppression time, and then one or
497 more expressions (a string is an expression in this context). Strings
498 appearing as arguments to an action may include the following special
499 selectors that will be replaced at the time the action is executed.
500
501 %h Host name(s) that make the left-most top-level expression in the
502 condition true.
503
504 %c Connection specification string(s) or files for a PCP tool to reach
505 the hosts or archives that make the left-most top-level expression
506 in the condition true.
507
508 %i Instance(s) that make the left-most top-level expression in the
509 condition true.
510
511 %v One value from the left-most top-level expression in the condition
512 for each host and instance pair that makes the condition true.
513
514 Note that expansion of the special selectors is done by repeating the
515 whole argument once for each unique binding to any of the qualifying
516 special selectors. For example if a rule were true for the host mumble
517 with instances grunt and snort, and for host fumble the instance puff
518 makes the rule true, then the action
519 ...
520 -> shell myscript "Warning: %h:%i busy ";
521 will execute myscript with the argument string "Warning: mumble:grunt
522 busy Warning: mumble:snort busy Warning: fumble:puff busy".
523
524 By comparison, if the action
525 ...
526 -> shell myscript "Warning! busy:" " %h:%i";
527 were executed under the same circumstances, then myscript would be exe‐
528 cuted with the argument string "Warning! busy: mumble:grunt mum‐
529 ble:snort fumble:puff".
530
531 The semantics of the expansion of the special selectors leads to a com‐
532 mon usage pattern in an action, where one argument is a constant (con‐
533 tains no special selectors) the second argument contains the desired
534 special selectors with minimal separator characters, and an optional
535 third argument provides a constant postscript (e.g. to terminate any
536 argument quoting from the first argument). If necessary post-process‐
537 ing (eg. in myscript) can provide the necessary enumeration over each
538 unique expansion of the string containing just the special selectors.
539
540 For complex conditions, the bindings to these selectors is not obvious.
541 It is strongly recommended that pmie be used in the debugging mode
542 (specify the -W command line option in particular) during rule develop‐
543 ment.
544
546 pmie expressions that have the semantics of a Boolean, e.g. foo.bar >
547 10 or some_inst ( my.table < 0 ) are assigned the values true or false
548 or unknown. A value is unknown if one or more of the underlying metric
549 values is unavailable, e.g. pmcd(1) on the host cannot be contacted,
550 the metric is not in the PCP archive, no values are currently avail‐
551 able, insufficient values have been fetched to allow a rate converted
552 value to be computed or insufficient values have been fetched to
553 instantiate the required number of samples in the temporal domain.
554
555 Boolean operators follow the normal rules of Kleene logic (aka 3-valued
556 logic) when combining values that include unknown:
557
558 ┌────────────┬───────────────────────────┐
559 │ │ B │
560 │ A and B │ true │
561 │ ├─────────┬───────┬─────────┤
562 │ │ true │ true │ false │
563 │ ┌─────────┼─────────┼───────┼─────────┤
564 │ │ false │ false │ false │ false │
565 │ ├─────────┼─────────┼───────┼─────────┤
566 │ │ unknown │ unknown │ false │ unknown │
567 └──┴─────────┴─────────┴───────┴─────────┘
568 ┌────────────┬──────────────────────────┐
569 │ │ B │
570 │ A or B ├──────┬─────────┬─────────┤
571 │ │ true │ false │ unknown │
572 ├──┬─────────┼──────┼─────────┼─────────┤
573 │ │ true │ true │ true │ true │
574 │ ├─────────┼──────┼─────────┼─────────┤
575 │A │ false │ true │ false │ unknown │
576 │ ├─────────┼──────┼─────────┼─────────┤
577 │ │ unknown │ true │ unknown │ unknown │
578 └──┴─────────┴──────┴─────────┴─────────┘
579 ┌────────┬─────────┐
580 │ A │ not A │
581 ├────────┼─────────┤
582 │ true │ false │
583 ├────────┼─────────┤
584 │ false │ true │
585 ├────────┼─────────┤
586 │unknown │ unknown │
587 └────────┴─────────┘
589 The ruleset clause is used to define a set of rules and actions that
590 are evaluated in order until some action is executed, at which point
591 the remaining rules and actions are skipped until the ruleset is again
592 scheduled for evaluation. The keyword else is used to separate rules.
593 After one or more regular rules (with a predicate and an action), a
594 ruleset may include an optional
595 unknown -> action
596 clause, optionally followed by a
597 otherwise -> action
598 clause.
599
600 If all of the predicates in the rules evaluate to unknown and an
601 unknown clause has been specified then action associated with the
602 unknown clause will be executed.
603
604 If no rule predicate is true and the unknown action is either not spec‐
605 ified or not executed and an otherwise clause has been specified, then
606 the action associated with the otherwise clause will be executed.
607
609 Scale factors may be appended to arithmetic expressions and force lin‐
610 ear scaling of the value to canonical units. Simple scale factors are
611 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
612 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
613 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
614 the operator /, for example ``Kbytes / hour''.
615
617 Macros are defined using expressions of the form:
618
619 name = constexpr;
620
621 Where name follows the normal rules for variables in programming lan‐
622 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
623 must be a constant expression, either a string (enclosed in double
624 quotes) or an arithmetic expression optionally followed by a scale fac‐
625 tor.
626
627 Macros are expanded when their name, prefixed by a dollar ($) appears
628 in an expression, and macros may be nested within a constexpr string.
629
630 The following reserved macro names are understood.
631
632 minute Current minute of the hour.
633
634 hour Current hour of the day, in the range 0 to 23.
635
636 day Current day of the month, in the range 1 to 31.
637
638 month Current month of the year, in the range 0 (January) to 11
639 (December).
640
641 year Current year.
642
643 day_of_week
644 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
645 day).
646
647 delta Sample interval in effect for this expression.
648
649 Dates and times are presented in the reporting time zone (see descrip‐
650 tion of -Z and -z command line options above).
651
653 It is often useful for pmie processes to be started and stopped when
654 the local host is booted or shutdown, or when they have been detected
655 as no longer running (when they have unexpectedly exited for some rea‐
656 son). Refer to pmie_check(1) for details on automating this process.
657
658 Optionally, each system running pmcd(1) may also be configured to run a
659 ``primary'' pmie instance. This pmie instance is launched by
660 $PCP_RC_DIR/pmie, and is affected by the files
661 $PCP_SYSCONF_DIR/pmie/control, $PCP_SYSCONF_DIR/pmie/control.d (use
662 chkconfig(8), systemctl(1) or similar platform-specific commands to
663 activate or disable the primary pmie instance) and $PCP_VAR_DIR/con‐
664 fig/pmie/config.default (the default initial configuration file for the
665 primary pmie).
666
667 The primary pmie instance is identified by the -P option. There may be
668 at most one ``primary'' pmie instance on each system. The primary pmie
669 instance (if any) must be running on the same host as the pmcd(1) to
670 which it connects (if any), so the -h and -P options are mutually
671 exclusive.
672
674 It is common for production systems to be monitored in a central loca‐
675 tion. Traditionally on UNIX systems this has been performed by the
676 system log facilities - see logger(1), and syslogd(1). On Windows,
677 communication with the system event log is handled by pcp-eventlog(1).
678
679 pmie fits into this model when rules use the syslog action. Note that
680 if the action string begins with -p (priority) and/or -t (tag) then
681 these are extracted from the string and treated in the same way as in
682 logger(1) and pcp-eventlog(1).
683
684 However, it is common to have other event monitoring frameworks also,
685 into which you may wish to incorporate performance events from pmie.
686 You can often use the shell action to send events to these frameworks,
687 as they usually provide their a program for injecting events into the
688 framework from external sources.
689
690 A final option is use of the stomp (Streaming Text Oriented Messaging
691 Protocol) action, which allows pmie to connect to a central JMS (Java
692 Messaging System) server and send events to the PMIE topic. Tools can
693 be written to extract these text messages and present them to opera‐
694 tions people (via desktop popup windows, etc). Use of the stomp action
695 requires a stomp configuration file to be setup, which specifies the
696 location of the JMS server host, port number, and username/password.
697
698 The format of this file is as follows:
699
700 host=messages.sgi.com # this is the JMS server (required)
701 port=61616 # and its listening here (required)
702 timeout=2 # seconds to wait for server (optional)
703 username=joe # (required)
704 password=j03ST0MP # (required)
705 topic=PMIE # JMS topic for pmie messages (optional)
706
707 The timeout value specifies the time (in seconds) that pmie should wait
708 for acknowledgements from the JMS server after sending a message (as
709 required by the STOMP protocol). Note that on startup, pmie will wait
710 indefinitely for a connection, and will not begin rule evaluation until
711 that initial connection has been established. Should the connection to
712 the JMS server be lost at any time while pmie is running, pmie will
713 attempt to reconnect on each subsequent truthful evaluation of a rule
714 with a stomp action, but not more than once per minute. This is to
715 avoid contributing to network congestion. In this situation, where the
716 STOMP connection to the JMS server has been severed, the stomp action
717 will return a non-zero error value.
718
720 The lexical scanner and parser will attempt to recover after an error
721 in the input expressions. Parsing resumes after skipping input up to
722 the next semi-colon (;), however during this skipping process the scan‐
723 ner is ignorant of comments and strings, so an embedded semi-colon may
724 cause parsing to resume at an unexpected place. This behavior is
725 largely benign, as until the initial syntax error is corrected, pmie
726 will not attempt any expression evaluation.
727
729 $PCP_DEMOS_DIR/pmie/*
730 annotated example rules
731 $PCP_VAR_DIR/pmns/*
732 default PMNS specification files
733 $PCP_TMP_DIR/pmie
734 pmie maintains files in this directory to identify the run‐
735 ning pmie instances and to export runtime information about
736 each instance - this data forms the basis of the pmcd.pmie
737 performance metrics
738 $PCP_PMIECONTROL_PATH
739 the default set of pmie instances to start at boot time -
740 refer to pmie_check(1) for details
741
743 Environment variables with the prefix PCP_ are used to parameterize the
744 file and directory names used by PCP. On each installation, the file
745 /etc/pcp.conf contains the local values for these variables. The
746 $PCP_CONF variable may be used to specify an alternative configuration
747 file, as described in pcp.conf(5).
748
749 When executing shell actions, pmie overrides two variables - IFS and
750 PATH - in the environment of the child process. IFS is set to "\t\n".
751 The PATH is set to a combination of a default path for all platforms
752 ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable compo‐
753 nents. These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
754 $PCP_PLATFORM_PATHS.
755
756 When executing popup alarm actions, pmie will use the value of
757 $PCP_XCONFIRM_PROG as the visual notification program to run. This is
758 typically set to pmconfirm(1), a cross-platform dialog box.
759
761 logger(1).
762
764 pcp-eventlog(1).
765
767 PCPIntro(1), pmcd(1), pmconfirm(1), pmdumplog(1), pmieconf(1),
768 pmie_check(1), pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5),
769 pcp.env(5) and PMNS(5).
770
772 For a more complete description of the pmie language, refer to the Per‐
773 formance Co-Pilot Users and Administrators Guide. This is available
774 online from:
775 https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
776
777
778
779Performance Co-Pilot PCP PMIE(1)