1PMIE(1)                     General Commands Manual                    PMIE(1)
2
3
4

NAME

6       pmie - inference engine for performance metrics
7

SYNOPSIS

9       pmie  [-bCdefHqVvWxz]  [-A  align] [-a archive] [-c filename] [-h host]
10       [-l logfile] [-j stompfile] [-n pmnsfile] [-O  offset]  [-S  starttime]
11       [-T endtime] [-t interval] [-U username] [-Z timezone] [filename ...]
12

DESCRIPTION

14       pmie  accepts a collection of arithmetic, logical, and rule expressions
15       to be evaluated at  specified  frequencies.   The  base  data  for  the
16       expressions  consists  of performance metrics values delivered in real-
17       time from any host running the Performance  Metrics  Collection  Daemon
18       (PMCD),  or  using  historical data from Performance Co-Pilot (PCP) ar‐
19       chive logs.
20
21       As well as computing arithmetic and logical values,  pmie  can  execute
22       actions  (popup alarms, write system log messages, and launch programs)
23       in response to specified conditions.  Such actions are extremely useful
24       in detecting, monitoring and correcting performance related problems.
25
26       The expressions to be evaluated are read from configuration files spec‐
27       ified by one or more filename arguments.  In the absence of  any  file‐
28       name, expressions are read from standard input.
29
30       A description of the command line options specific to pmie follows:
31
32       -a   archive  which  is  a comma-separated list of names, each of which
33            may be the base name of an archive or the name of a directory con‐
34            taining  one  or  more  archives written by pmlogger(1).  Multiple
35            instances of the -a flag may appear on the command line to specify
36            a  list  of  sets  of archives.  In this case, it is required that
37            only one set of archives be present for any one host.   Also,  any
38            explicit  host names occurring in a pmie expression must match the
39            host name recorded in one of the archive labels.  In the  case  of
40            multiple sets of archives, timestamps recorded in the archives are
41            used to ensure temporal consistency.
42
43       -b   Output will be line buffered and standard output  is  attached  to
44            standard  error.   This is most useful for background execution in
45            conjunction with the -l option.  The -b option is always used  for
46            pmie instances launched from pmie_check(1).
47
48       -C   Parse  the  configuration  file(s)  and exit before performing any
49            evaluations.  Any errors in the configuration file are reported.
50
51       -c   An alternative to specifying filename at the end  of  the  command
52            line.
53
54       -d   Normally  pmie  would  be launched as a non-interactive process to
55            monitor and manage the performance of one or  more  hosts.   Given
56            the -d flag however, execution is interactive and the user is pre‐
57            sented with a menu of options.  Interactive mode is useful  mainly
58            for debugging new expressions.
59
60       -e   When  used  with -V, -v or -W, this option forces timestamps to be
61            reported with each expression.  The  timestamps  are  in  ctime(3)
62            format,  enclosed  in  parenthesis and appear after the expression
63            name and before the expression value, e.g.
64                 expr_1 (Tue Feb  6 19:55:10 2001): 12
65
66       -f   If the -l option is specified and there is no -a option (ie. real-
67            time  monitoring)  then  pmie is run as a daemon in the background
68            (in all other cases foreground is the  default).   The  -f  option
69            forces  pmie to be run in the foreground, independent of any other
70            options.
71
72       -h   By default performance data is fetched from  the  local  host  (in
73            real-time mode) or the host for the first named set of archives on
74            the command line (in archive mode).  The host  argument  overrides
75            this  default.  It does not override hosts explicitly named in the
76            expressions being evaluated.  The host argument is interpreted  as
77            a  connection  specification for pmNewContext, and is later mapped
78            to the remote pmcd's self-reported host name  for  reporting  pur‐
79            poses.   See  also  the  %h  vs.  %c  substitutions in rule action
80            strings below.
81
82       -l   Standard error is sent to logfile.
83
84       -j   An alternative STOMP protocol configuration is loaded from  stomp‐
85            file.  If this option is not used, and the stomp action is used in
86            any rule, the default location  $PCP_SYSCONF_DIR/pmie/config/stomp
87            will be used.
88
89       -n   An  alternative  Performance  Metrics  Name Space (PMNS) is loaded
90            from the file pmnsfile.
91
92       -P   Identifies this as the primary pmie instance for a host.  See  the
93            ``AUTOMATIC RESTART'' section below for further details.
94
95       -q   Suppresses  diagnostic  messages that would be printed to standard
96            output by default, especially the "evaluator exiting"  message  as
97            this can confuse scripts.
98
99       -t   The interval argument follows the syntax described in PCPIntro(1),
100            and in the simplest form may be an unsigned integer  (the  implied
101            units  in  this case are seconds).  The value is used to determine
102            the sample interval for expressions that  do  not  explicitly  set
103            their  sample  interval  using  the  pmie variable delta described
104            below.  The default is 10.0 seconds.
105
106       -U username
107            User account under which to run pmie.  The default is the  current
108            user  account  for  interactive  use.   When  run as a daemon, the
109            unprivileged "pcp" account is used in current versions of PCP, but
110            in  older  versions  the  superuser  account  ("root") was used by
111            default.
112
113       -v   Unless one of the verbose options -V, -v or -W appears on the com‐
114            mand  line, expressions are evaluated silently, the only output is
115            as a result of any actions being executed.  In the  verbose  mode,
116            specified  using  the  -v  flag,  the  value of each expression is
117            printed as it is evaluated.  The values are  in  canonical  units;
118            bytes  in  the dimension of ``space'', seconds in the dimension of
119            ``time''  and  events  in  the  dimension   of   ``count''.    See
120            pmLookupDesc(3) for details of the supported dimension and scaling
121            mechanisms for performance metrics.  The verbose mode is useful in
122            monitoring the value of given expressions, evaluating derived per‐
123            formance metrics, passing these values on to other tools for  fur‐
124            ther processing and in debugging new expressions.
125
126       -V   This  option has the same effect as the -v option, except that the
127            name of the host and instance (if applicable) are printed as  well
128            as expression values.
129
130       -W   This  option has the same effect as the -V option described above,
131            except that for boolean expressions, only those names  and  values
132            that  make  the  expression  true are printed.  These are the same
133            names and values accessible to rule actions as the %h, %i, %c  and
134            %v bindings, as described below.
135
136       -x   Execute  in  domain agent mode.  This mode is used within the Per‐
137            formance Co-Pilot product to derive values  for  summary  metrics,
138            see pmdasummary(1).  Only restricted functionality is available in
139            this mode (expressions with actions may not be used).
140
141       -Z   Change the reporting timezone to timezone in  the  format  of  the
142            environment variable TZ as described in environ(7).
143
144       -z   Change  the reporting timezone to the timezone of the host that is
145            the source of the performance metrics, as  identified  via  either
146            the  -h  option  or  the first named set of archives (as described
147            above for the -a option).
148
149       The -S, -T, -O, and -A options may be used to define a time  window  to
150       restrict  the  samples retrieved, set an initial origin within the time
151       window, or specify a ``natural'' alignment of the sample  times;  refer
152       to PCPIntro(1) for a complete description of these options.
153
154       Output  from  pmie is directed to standard output and standard error as
155       follows:
156
157       stdout
158            Expression values printed in the verbose -v mode and the output of
159            print actions.
160
161       stderr
162            Error  and warning messages for any syntactic or semantic problems
163            during expression parsing, and any semantic or performance metrics
164            availability problems during expression evaluation.
165

EXAMPLES

167       The  following example expressions demonstrate some of the capabilities
168       of the inference engine.
169
170       The directory $PCP_DEMOS_DIR/pmie contains a number of other  annotated
171       examples of pmie expressions.
172
173       The  variable  delta controls expression evaluation frequency.  Specify
174       that subsequent expressions be evaluated once a second,  until  further
175       notice:
176
177            delta = 1 sec;
178
179       If the total context switch rate exceeds 10000 per second per CPU, then
180       display an alarm notifier:
181
182            kernel.all.pswitch / hinv.ncpu > 10000 count/sec
183            -> alarm "high context switch rate %v";
184
185       If the high context switch rate is sustained for  10  consecutive  sam‐
186       ples,  then  launch  top(1) in an xterm(1) window to monitor processes,
187       but do this at most once every 5 minutes:
188
189            all_sample (
190                kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
191            ) -> shell 5 min "xterm -e 'top'";
192
193       The following rules are evaluated once every 20 seconds:
194
195            delta = 20 sec;
196
197       If any disk is performing more than 60 I/Os per second,  then  print  a
198       message  identifying  the  busy  disk  to  standard  output  and launch
199       dkvis(1):
200
201            some_inst (
202                disk.dev.total > 60 count/sec
203            ) -> print "busy disks:" " %i" &
204                 shell 5 min "dkvis";
205
206       Refine the preceding rule to apply only between the hours  of  9am  and
207       5pm,  and to require 3 of 4 consecutive samples to exceed the threshold
208       before executing the action:
209
210            $hour >= 9 && $hour <= 17 &&
211            some_inst (
212              75 %_sample (
213                disk.dev.total @0..3 > 60 count/sec
214              )
215            ) -> print "disks busy for 20 sec:" " [%h]%i";
216
217       The following two rules are evaluated once every 10 minutes:
218
219            delta = 10 min;
220
221       If either the / or the /usr filesystem is more than 95%  full,  display
222       an  alarm  popup,  but  not if it has already been displayed during the
223       last 4 hours:
224
225            filesys.free #'/dev/root' /
226                filesys.capacity #'/dev/root' < 0.05
227            -> alarm 4 hour "root filesystem (almost) full";
228
229            filesys.free #'/dev/usr' /
230                filesys.capacity #'/dev/usr' < 0.05
231            -> alarm 4 hour "/usr filesystem (almost) full";
232
233       The following rule requires a machine that supports the PCP environment
234       metrics.   If  the  machine  environment  temperature rises more than 2
235       degrees over a 10 minute interval, write an entry in the system log:
236
237            environ.temp @0 - environ.temp @1 > 2
238            -> alarm "temperature rising fast" &
239               syslog "machine room temperature rise alarm";
240
241       And something interesting if you have performance  problems  with  your
242       Oracle database:
243
244            // back to 30sec evaluations
245            delta = 30 sec;
246            sid = "ptg1";       # $ORACLE_SID setting
247            lid = "223";        # latch ID from v$latch
248            lru = "#'$sid/$lid cache buffers lru chain'";
249            host = ":moomba.melbourne.sgi.com";
250            gets = "oracle.latch.gets $host $lru";
251            total = "oracle.latch.gets $host $lru +
252                     oracle.latch.misses $host $lru +
253                     oracle.latch.immisses $host $lru";
254
255            $total > 100 && $gets / $total < 0.2
256            -> alarm "high lru latch contention in database $sid";
257
258       The  following  ruleset  will emit exactly one message depending on the
259       availability and value of the 1-minute load average.
260
261            delta = 1 minute;
262            ruleset
263                 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
264                     print "extreme load average %v"
265            else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
266                     print "moderate load average %v"
267            unknown ->
268                     print "load average unavailable"
269            otherwise ->
270                     print "load average OK"
271            ;
272
273       The following rule will emit a message when  some  filesystem  is  more
274       than 75% full and is filling at a rate that if sustained would fill the
275       filesystem to 100% in less than 30 minutes.
276
277            some_inst (
278                100 * filesys.used / filesys.capacity > 75 &&
279                filesys.used + 30min * (rate filesys.used) > filesys.capacity
280            ) -> print "filesystem will be full within 30 mins:" " %i";
281
282       If the metric mypmda.errors counts errors then the following rule  will
283       emit  a message if the rate of errors exceeds 1 per second provided the
284       error count is less than 100.
285
286            mypmda.errors > 1 && instant mypmda.errors < 100
287            -> print "high error rate: %v";
288

QUICK START

290       The pmie specification language is powerful and large.
291
292       To expedite rapid development of pmie rules, the pmieconf(1) tool  pro‐
293       vides a facility for generating a pmie configuration file from a set of
294       generalized pmie rules.  The supplied set of rules covers a wide  range
295       of performance scenarios.
296
297       The  Performance  Co-Pilot  User's and Administrator's Guide provides a
298       detailed tutorial-style chapter covering pmie.
299

EXPRESSION SYNTAX

301       This description is terse  and  informal.   For  a  more  comprehensive
302       description  see  the  Performance  Co-Pilot User's and Administrator's
303       Guide.
304
305       A pmie specification is a sequence of semicolon terminated expressions.
306
307       Basic operators are modeled on the arithmetic, relational  and  Boolean
308       operators  of  the  C  programming  language.   Precedence rules are as
309       expected, although the use of  parentheses  is  encouraged  to  enhance
310       readability and remove ambiguity.
311
312       Operands are performance metric names (see pmns(5)) and the normal lit‐
313       eral constants.
314
315       Operands involving performance metrics may produce sets of values, as a
316       result  of  enumeration in the dimensions of hosts, instances and time.
317       Special qualifiers may appear after a performance metric name to define
318       the enumeration in each dimension.  For example,
319
320           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
321
322       defines 6 values corresponding to the time spent executing in user mode
323       on CPU 0 on the hosts ``foo'' and ``bar'' over the last  3  consecutive
324       samples.   The  default  interpretation  in  the absence of : (host), #
325       (instance) and @ (time) qualifiers is all instances at the most  recent
326       sample time for the default source of PCP performance metrics.
327
328       Host  and  instance names that do not follow the rules for variables in
329       programming languages, ie. alphabetic optionally followed  by  alphanu‐
330       merics, should be enclosed in single quotes.
331
332       Expression  evaluation  follows  the law of ``least surprises''.  Where
333       performance metrics have the semantics of a counter, pmie will automat‐
334       ically  convert  to  a rate based upon consecutive samples and the time
335       interval between these samples.  All numeric expressions are  evaluated
336       in  double  precision, and where appropriate, automatically scaled into
337       canonical units of ``bytes'', ``seconds'' and ``counts''.
338
339       A rule is a special form of expression that specifies  a  condition  or
340       logical expression, a special operator (->) and actions to be performed
341       when the condition is found to be true.
342
343       The following table summarizes the basic pmie operators:
344
345         ┌────────────────┬────────────────────────────────────────────────┐
346         │   Operators    │                  Explanation                   │
347         ├────────────────┼────────────────────────────────────────────────┤
348         │+ - * /         │ Arithmetic                                     │
349         │< <= == >= > != │ Relational (value comparison)                  │
350         │! && ||         │ Boolean                                        │
351         │->              │ Rule                                           │
352rising          │ Boolean, false to true transition              │
353falling         │ Boolean, true to false transition              │
354rate            │ Explicit rate conversion (rarely required)     │
355instant         │ No automatic rate conversion (rarely required) │
356         └────────────────┴────────────────────────────────────────────────┘
357       All operators are supported for  numeric-valued  operands  and  expres‐
358       sions.   For  string-valued  operands,  namely literal string constants
359       enclosed in double quotes  or  metrics  with  a  data  type  of  string
360       (PM_TYPE_STRING), only the operators == and != are supported.
361
362       The  rate and instant operators are the logical inverse of one another,
363       so an arithmetic expression expr is equal to rate  instant  expr.   The
364       more  useful  cases  involve  using  rate  with  a metric that is not a
365       counter to determine the rate of change over time  or  instant  with  a
366       metric  that is a counter to determine if the current value is above or
367       below some threshold.
368
369       Aggregate operators may be used to aggregate  or  summarize  along  one
370       dimension  of  a set-valued expression.  The following aggregate opera‐
371       tors map from a logical expression to a  logical  expression  of  lower
372       dimension.
373
374         ┌─────────────────────────┬─────────────┬──────────────────────────┐
375         │       Operators         │    Type     │       Explanation        │
376         ├─────────────────────────┼─────────────┼──────────────────────────┤
377some_inst                │ Existential │ True if at least one set │
378some_host                │             │ member is true in the    │
379some_sample              │             │ associated dimension     │
380         ├─────────────────────────┼─────────────┼──────────────────────────┤
381all_inst                 │ Universal   │ True if all set members  │
382all_host                 │             │ are true in the associ‐  │
383all_sample               │             │ ated dimension           │
384         ├─────────────────────────┼─────────────┼──────────────────────────┤
385N%_inst                  │ Percentile  │ True if at least N per‐  │
386N%_host                  │             │ cent of set members are  │
387N%_sample                │             │ true in the associated   │
388         │                         │             │ dimension                │
389         └─────────────────────────┴─────────────┴──────────────────────────┘
390       The following instantial operators may be used to  filter  or  limit  a
391       set-valued  logical expression, based on regular expression matching of
392       instance names.  The logical expression must be  a  set  involving  the
393       dimension  of instances, and the regular expression is of the form used
394       by egrep(1) or the Extended Regular Expressions of regcomp(3).
395
396              ┌─────────────┬──────────────────────────────────────────┐
397              │ Operators   │               Explanation                │
398              ├─────────────┼──────────────────────────────────────────┤
399match_inst   │ For each value of the logical expression │
400              │             │ that is ``true'', the result is ``true'' │
401              │             │ if the associated instance name matches  │
402              │             │ the regular expression.  Otherwise the   │
403              │             │ result is ``false''.                     │
404              ├─────────────┼──────────────────────────────────────────┤
405nomatch_inst │ For each value of the logical expression │
406              │             │ that is ``true'', the result is ``true'' │
407              │             │ if the associated instance name does not 
408              │             │ match the regular expression.  Otherwise │
409              │             │ the result is ``false''.                 │
410              └─────────────┴──────────────────────────────────────────┘
411       For example, the expression below will be ``true'' for  disks  attached
412       to controllers 2 or 3 performing more than 20 operations per second:
413            match_inst "^dks[23]d" disk.dev.total > 20;
414
415       The  following aggregate operators map from an arithmetic expression to
416       an arithmetic expression of lower dimension.
417
418          ┌─────────────────────────┬───────────┬──────────────────────────┐
419          │       Operators         │   Type    │       Explanation        │
420          ├─────────────────────────┼───────────┼──────────────────────────┤
421min_inst                 │ Extrema   │ Minimum value across all │
422min_host                 │           │ set members in the asso‐ │
423min_sample               │           │ ciated dimension         │
424          ├─────────────────────────┼───────────┼──────────────────────────┤
425max_inst                 │ Extrema   │ Maximum value across all │
426max_host                 │           │ set members in the asso‐ │
427max_sample               │           │ ciated dimension         │
428          ├─────────────────────────┼───────────┼──────────────────────────┤
429sum_inst                 │ Aggregate │ Sum of values across all │
430sum_host                 │           │ set members in the asso‐ │
431sum_sample               │           │ ciated dimension         │
432          ├─────────────────────────┼───────────┼──────────────────────────┤
433avg_inst                 │ Aggregate │ Average value across all │
434avg_host                 │           │ set members in the asso‐ │
435avg_sample               │           │ ciated dimension         │
436          └─────────────────────────┴───────────┴──────────────────────────┘
437       The aggregate operators count_inst,  count_host  and  count_sample  map
438       from  a  logical expression to an arithmetic expression of lower dimen‐
439       sion by counting the number of set members for which the expression  is
440       true in the associated dimension.
441
442       For action rules, the following actions are defined:
443
444                ┌──────────┬────────────────────────────────────────┐
445                │Operators │              Explanation               │
446                ├──────────┼────────────────────────────────────────┤
447alarm     │ Raise a visible alarm with xconfirm(1) │
448print     │ Display on standard output             │
449shell     │ Execute with sh(1)
450stomp     │ Send a STOMP message to a JMS server   │
451syslog    │ Append a message to system log file    │
452                └──────────┴────────────────────────────────────────┘
453       Multiple  actions  may be separated by the & and | operators to specify
454       respectively sequential  execution  (both  actions  are  executed)  and
455       alternate  execution  (the  second  action will only be executed if the
456       execution of the first action returns a non-zero error status.
457
458       Arguments to actions are an optional suppression time, and then one  or
459       more  expressions (a string is an expression in this context).  Strings
460       appearing as arguments to an action may include the  following  special
461       selectors that will be replaced at the time the action is executed.
462
463       %h  Host  name(s)  that  make the left-most top-level expression in the
464           condition true.
465
466       %c  Connection specification string(s) or files for a PCP tool to reach
467           the  hosts or archives that make the left-most top-level expression
468           in the condition true.
469
470       %i  Instance(s) that make the left-most  top-level  expression  in  the
471           condition true.
472
473       %v  One  value from the left-most top-level expression in the condition
474           for each host and instance pair that makes the condition true.
475
476       Note that expansion of the special selectors is done by  repeating  the
477       whole  argument  once  for each unique binding to any of the qualifying
478       special selectors.  For example if a rule were true for the host mumble
479       with  instances  grunt and snort, and for host fumble the instance puff
480       makes the rule true, then the action
481            ...
482            -> shell myscript "Warning: %h:%i busy ";
483       will execute myscript with the argument string  "Warning:  mumble:grunt
484       busy Warning: mumble:snort busy Warning: fumble:puff busy".
485
486       By comparison, if the action
487            ...
488            -> shell myscript "Warning! busy:" " %h:%i";
489       were executed under the same circumstances, then myscript would be exe‐
490       cuted with  the  argument  string  "Warning!  busy:  mumble:grunt  mum‐
491       ble:snort fumble:puff".
492
493       The semantics of the expansion of the special selectors leads to a com‐
494       mon usage pattern in an action, where one argument is a constant  (con‐
495       tains  no  special  selectors) the second argument contains the desired
496       special selectors with minimal separator characters,  and  an  optional
497       third  argument  provides  a constant postscript (e.g. to terminate any
498       argument quoting from the first argument).  If necessary  post-process‐
499       ing  (eg.  in myscript) can provide the necessary enumeration over each
500       unique expansion of the string containing just the special selectors.
501
502       For complex conditions, the bindings to these selectors is not obvious.
503       It  is  strongly  recommended  that  pmie be used in the debugging mode
504       (specify the -W command line option in particular) during rule develop‐
505       ment.
506

BOOLEAN EXPRESSIONS

508       pmie  expressions that have the semantics of a Boolean, e.g.  foo.bar >
509       10 or some_inst ( my.table < 0 ) are assigned the values true or  false
510       or unknown.  A value is unknown if one or more of the underlying metric
511       values is unavailable, e.g.  pmcd(1) on the host cannot  be  contacted,
512       the  metric  is  not in the PCP archive, no values are currently avail‐
513       able, insufficient values have been fetched to allow a  rate  converted
514       value  to  be  computed  or  insufficient  values  have been fetched to
515       instantiate the required number of samples in the temporal domain.
516
517       Boolean operators follow the normal rules of Kleene logic (aka 3-valued
518       logic) when combining values that include unknown:
519
520                      ┌────────────┬───────────────────────────┐
521                      │            │             B             │
522                      │  A and B   ├─────────┬───────┬─────────┤
523                      │            │  true   false unknown 
524                      ├──┬─────────┼─────────┼───────┼─────────┤
525                      │  │  true   true   false unknown 
526                      │  ├─────────┼─────────┼───────┼─────────┤
527                      │A │  false  false  false false  
528                      │  ├─────────┼─────────┼───────┼─────────┤
529                      │  │ unknown unknown false unknown 
530                      └──┴─────────┴─────────┴───────┴─────────┘
531                      ┌────────────┬──────────────────────────┐
532                      │            │            B             │
533                      │  A or B    ├──────┬─────────┬─────────┤
534                      │            │ true false  unknown 
535                      ├──┬─────────┼──────┼─────────┼─────────┤
536                      │  │  true   true true   true   
537                      │  ├─────────┼──────┼─────────┼─────────┤
538                      │A │  false  true false  unknown 
539                      │  ├─────────┼──────┼─────────┼─────────┤
540                      │  │ unknown true unknown unknown 
541                      └──┴─────────┴──────┴─────────┴─────────┘
542                                 ┌────────┬─────────┐
543                                 │   A    │  not A  │
544                                 ├────────┼─────────┤
545true   false  
546                                 ├────────┼─────────┤
547false  true   
548                                 ├────────┼─────────┤
549unknown unknown 
550                                 └────────┴─────────┘

RULESETS

552       The  ruleset  clause  is used to define a set of rules and actions that
553       are evaluated in order until some action is executed,  at  which  point
554       the  remaining rules and actions are skipped until the ruleset is again
555       scheduled for evaluation.  The keyword else is used to separate  rules.
556       After  one  or  more  regular rules (with a predicate and an action), a
557       ruleset may include an optional
558            unknown -> action
559       clause, optionally followed by a
560            otherwise -> action
561       clause.
562
563       If all of the predicates in  the  rules  evaluate  to  unknown  and  an
564       unknown  clause  has  been  specified  then  action associated with the
565       unknown clause will be executed.
566
567       If no rule predicate is true and the unknown action is either not spec‐
568       ified  or not executed and an otherwise clause has been specified, then
569       the action associated with the otherwise clause will be executed.
570

SCALE FACTORS

572       Scale factors may be appended to arithmetic expressions and force  lin‐
573       ear  scaling of the value to canonical units.  Simple scale factors are
574       constructed from the keywords: nanosecond, nanosec, nsec,  microsecond,
575       microsec,  usec, millisecond, millisec, msec, second, sec, minute, min,
576       hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and  Mcount,  and
577       the operator /, for example ``Kbytes / hour''.
578

MACROS

580       Macros are defined using expressions of the form:
581
582            name = constexpr;
583
584       Where  name  follows the normal rules for variables in programming lan‐
585       guages, ie. alphabetic optionally followed by alphanumerics.  constexpr
586       must  be  a  constant  expression,  either a string (enclosed in double
587       quotes) or an arithmetic expression optionally followed by a scale fac‐
588       tor.
589
590       Macros  are  expanded when their name, prefixed by a dollar ($) appears
591       in an expression, and macros may be nested within a constexpr string.
592
593       The following reserved macro names are understood.
594
595       minute    Current minute of the hour.
596
597       hour      Current hour of the day, in the range 0 to 23.
598
599       day       Current day of the month, in the range 1 to 31.
600
601       month     Current month of the year, in the range  0  (January)  to  11
602                 (December).
603
604       year      Current year.
605
606       day_of_week
607                 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
608                 day).
609
610       delta     Sample interval in effect for this expression.
611
612       Dates and times are presented in the reporting time zone (see  descrip‐
613       tion of -Z and -z command line options above).
614

AUTOMATIC RESTART

616       It  is  often  useful for pmie processes to be started and stopped when
617       the local host is booted or shutdown, or when they have  been  detected
618       as  no longer running (when they have unexpectedly exited for some rea‐
619       son).  Refer to pmie_check(1) for details on automating this process.
620
621       Optionally, each system running pmcd(1) may also be configured to run a
622       ``primary''   pmie   instance.   This  pmie  instance  is  launched  by
623       $PCP_RC_DIR/pmie,     and     is     affected     by     the      files
624       $PCP_SYSCONF_DIR/pmie/control,   $PCP_SYSCONF_DIR/pmie/control.d   (use
625       chkconfig(8), systemctl(1) or  similar  platform-specific  commands  to
626       activate  or  disable  the primary pmie instance) and $PCP_VAR_DIR/con‐
627       fig/pmie/config.default (the default initial configuration file for the
628       primary pmie).
629
630       The primary pmie instance is identified by the -P option.  There may be
631       at most one ``primary'' pmie instance on each system.  The primary pmie
632       instance  (if  any)  must be running on the same host as the pmcd(1) to
633       which it connects (if any), so the  -h  and  -P  options  are  mutually
634       exclusive.
635

EVENT MONITORING

637       It  is common for production systems to be monitored in a central loca‐
638       tion.  Traditionally on UNIX systems this has  been  performed  by  the
639       system  log  facilities  -  see logger(1), and syslogd(1).  On Windows,
640       communication with the system event log is handled by pcp-eventlog(1).
641
642       pmie fits into this model when rules use the syslog action.  Note  that
643       if  the  action  string  begins with -p (priority) and/or -t (tag) then
644       these are extracted from the string and treated in the same way  as  in
645       logger(1) and pcp-eventlog(1).
646
647       However,  it  is common to have other event monitoring frameworks also,
648       into which you may wish to incorporate performance  events  from  pmie.
649       You  can often use the shell action to send events to these frameworks,
650       as they usually provide their a program for injecting events  into  the
651       framework from external sources.
652
653       A  final  option is use of the stomp (Streaming Text Oriented Messaging
654       Protocol) action, which allows pmie to connect to a central  JMS  (Java
655       Messaging  System) server and send events to the PMIE topic.  Tools can
656       be written to extract these text messages and present  them  to  opera‐
657       tions people (via desktop popup windows, etc).  Use of the stomp action
658       requires a stomp configuration file to be setup,  which  specifies  the
659       location of the JMS server host, port number, and username/password.
660
661       The format of this file is as follows:
662
663            host=messages.sgi.com   # this is the JMS server (required)
664            port=61616              # and its listening here (required)
665            timeout=2               # seconds to wait for server (optional)
666            username=joe            # (required)
667            password=j03ST0MP       # (required)
668            topic=PMIE              # JMS topic for pmie messages (optional)
669
670       The timeout value specifies the time (in seconds) that pmie should wait
671       for acknowledgements from the JMS server after sending  a  message  (as
672       required  by the STOMP protocol).  Note that on startup, pmie will wait
673       indefinitely for a connection, and will not begin rule evaluation until
674       that initial connection has been established.  Should the connection to
675       the JMS server be lost at any time while pmie  is  running,  pmie  will
676       attempt  to  reconnect on each subsequent truthful evaluation of a rule
677       with a stomp action, but not more than once per  minute.   This  is  to
678       avoid contributing to network congestion.  In this situation, where the
679       STOMP connection to the JMS server has been severed, the  stomp  action
680       will return a non-zero error value.
681

FILES

683       $PCP_DEMOS_DIR/pmie/*
684                 annotated example rules
685       $PCP_VAR_DIR/pmns/*
686                 default PMNS specification files
687       $PCP_TMP_DIR/pmie
688                 pmie  maintains  files in this directory to identify the run‐
689                 ning pmie instances and to export runtime  information  about
690                 each  instance  -  this data forms the basis of the pmcd.pmie
691                 performance metrics
692       $PCP_PMIECONTROL_PATH
693                 the default set of pmie instances to start  at  boot  time  -
694                 refer to pmie_check(1) for details
695

BUGS

697       The  lexical  scanner and parser will attempt to recover after an error
698       in the input expressions.  Parsing resumes after skipping input  up  to
699       the next semi-colon (;), however during this skipping process the scan‐
700       ner is ignorant of comments and strings, so an embedded semi-colon  may
701       cause  parsing  to  resume  at  an  unexpected place.  This behavior is
702       largely benign, as until the initial syntax error  is  corrected,  pmie
703       will not attempt any expression evaluation.
704

PCP ENVIRONMENT

706       Environment variables with the prefix PCP_ are used to parameterize the
707       file and directory names used by PCP.  On each installation,  the  file
708       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
709       $PCP_CONF variable may be used to specify an alternative  configuration
710       file, as described in pcp.conf(5).
711
712       When  executing  shell  actions, pmie overrides two variables - IFS and
713       PATH - in the environment of the child process.  IFS is set to  "\t\n".
714       The  PATH  is  set to a combination of a default path for all platforms
715       ("/usr/sbin:/sbin:/usr/bin:/usr/sbin") and several configurable  compo‐
716       nents.   These  are  (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and
717       $PCP_PLATFORM_PATHS.
718
719       When executing  popup  alarm  actions,  pmie  will  use  the  value  of
720       $PCP_XCONFIRM_PROG  as the visual notification program to run.  This is
721       typically set to pmconfirm(1), a cross-platform dialog box.
722

UNIX SEE ALSO

724       logger(1).
725

WINDOWS SEE ALSO

727       pcp-eventlog(1).
728

SEE ALSO

730       PCPIntro(1),   pmcd(1),   pmconfirm(1),   pmdumplog(1),    pmieconf(1),
731       pmie_check(1),  pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5)
732       and pcp.env(5).
733

USER GUIDE

735       For a more complete description of the pmie language, refer to the Per‐
736       formance  Co-Pilot  Users  and Administrators Guide.  This is available
737       online from:
738           https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
739
740
741
742Performance Co-Pilot                  PCP                              PMIE(1)
Impressum