1PMIE(1)                     General Commands Manual                    PMIE(1)
2
3
4

NAME

6       pmie - inference engine for performance metrics
7

SYNOPSIS

9       pmie  [-bCdefHPqvVWxXz?]   [-a  archive]  [-A  align] [-c filename] [-h
10       host] [-l logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S  start‐
11       time]  [-t interval] [-T endtime] [-U username] [-Z timezone] [filename
12       ...]
13

DESCRIPTION

15       pmie accepts a collection of arithmetic, logical, and rule  expressions
16       to  be  evaluated  at  specified  frequencies.   The  base data for the
17       expressions consists of performance metrics values delivered  in  real-
18       time  from  any  host running the Performance Metrics Collection Daemon
19       (PMCD), or using historical data from Performance  Co-Pilot  (PCP)  ar‐
20       chive logs.
21
22       As  well  as  computing arithmetic and logical values, pmie can execute
23       actions (popup alarms, write system log messages, and launch  programs)
24       in response to specified conditions.  Such actions are extremely useful
25       in detecting, monitoring and correcting performance related problems.
26
27       The expressions to be evaluated are read from configuration files spec‐
28       ified  by  one or more filename arguments.  In the absence of any file‐
29       name, expressions are read from standard input.
30
31       Output from pmie is directed to standard output and standard  error  as
32       follows:
33
34       stdout
35            Expression values printed in the verbose -v mode and the output of
36            print actions.
37
38       stderr
39            Error and warning messages for any syntactic or semantic  problems
40            during expression parsing, and any semantic or performance metrics
41            availability problems during expression evaluation.
42

OPTIONS

44       The available command line options are:
45
46       -a archive, --archive=archive
47            archive which is a comma-separated list of names,  each  of  which
48            may be the base name of an archive or the name of a directory con‐
49            taining one or more archives  written  by  pmlogger(1).   Multiple
50            instances of the -a flag may appear on the command line to specify
51            a list of sets of archives.  In this case,  it  is  required  that
52            only  one  set of archives be present for any one host.  Also, any
53            explicit host names occurring in a pmie expression must match  the
54            host  name  recorded in one of the archive labels.  In the case of
55            multiple sets of archives, timestamps recorded in the archives are
56            used to ensure temporal consistency.
57
58       -A align, --align=align
59            Force  the  initial time window to be aligned on the boundary of a
60            natural time unit align.  Refer  to  PCPIntro(1)  for  a  complete
61            description of the syntax for align.
62
63       -b, --buffer
64            Output  will  be  line buffered and standard output is attached to
65            standard error.  This is most useful for background  execution  in
66            conjunction  with the -l option.  The -b option is always used for
67            pmie instances launched from pmie_check(1).
68
69       -c config, --config=config
70            An alternative to specifying filename at the end  of  the  command
71            line.
72
73       -C, --check
74            Parse  the  configuration  file(s)  and exit before performing any
75            evaluations.  Any errors in the configuration file are reported.
76
77       -d, --interact
78            Normally pmie would be launched as a  non-interactive  process  to
79            monitor  and  manage  the performance of one or more hosts.  Given
80            the -d flag however, execution is interactive and the user is pre‐
81            sented  with a menu of options.  Interactive mode is useful mainly
82            for debugging new expressions.
83
84       -e, --timestamp
85            When used with -V, -v or -W, this option forces timestamps  to  be
86            reported  with  each  expression.   The timestamps are in ctime(3)
87            format, enclosed in parenthesis and appear  after  the  expression
88            name and before the expression value, e.g.
89                 expr_1 (Tue Feb  6 19:55:10 2001): 12
90
91       -f, --foreground
92            If the -l option is specified and there is no -a option (ie. real-
93            time monitoring) then pmie is run as a daemon  in  the  background
94            (in  all  other  cases  foreground is the default).  The -f option
95            forces pmie to be run in the foreground, independent of any  other
96            options.
97
98       -h host, --host=host
99            By  default  performance  data  is fetched from the local host (in
100            real-time mode) or the host for the first named set of archives on
101            the  command  line (in archive mode).  The host argument overrides
102            this default.  It does not override hosts explicitly named in  the
103            expressions  being evaluated.  The host argument is interpreted as
104            a connection specification for pmNewContext, and is  later  mapped
105            to  the  remote  pmcd's self-reported host name for reporting pur‐
106            poses.  See also the  %h  vs.  %c  substitutions  in  rule  action
107            strings below.
108
109       -l logfile, --logfile=logfile
110            Standard error is sent to logfile.
111
112       -j file
113            An  alternative STOMP protocol configuration is loaded from stomp‐
114            file.  If this option is not used, and the stomp action is used in
115            any  rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
116            will be used.
117
118       -n pmnsfile, --namespace=pmnsfile
119            An alternative Performance Metrics Name  Space  (PMNS)  is  loaded
120            from the file pmnsfile.
121
122       -O origin, --origin=origin
123            Specify  the  origin of the time window.  See PCPIntro(1) for com‐
124            plete description of this option.
125
126       -P, --primary
127            Identifies this as the primary pmie instance for a host.  See  the
128            ``AUTOMATIC RESTART'' section below for further details.
129
130       -q, --quiet
131            Suppresses  diagnostic  messages that would be printed to standard
132            output by default, especially the "evaluator exiting"  message  as
133            this can confuse scripts.
134
135       -S starttime, --start=starttime
136            Specify  the  starttime  of  the time window.  See PCPIntro(1) for
137            complete description of this option.
138
139       -t interval, --interval=interval
140            The interval argument follows the syntax described in PCPIntro(1),
141            and  in  the simplest form may be an unsigned integer (the implied
142            units in this case are seconds).  The value is used  to  determine
143            the  sample  interval  for  expressions that do not explicitly set
144            their sample interval using  the  pmie  variable  delta  described
145            below.  The default is 10.0 seconds.
146
147       -T endtime, --finish=endtime
148            Specify  the endtime of the time window.  See PCPIntro(1) for com‐
149            plete description of this option.
150
151       -U username, --username=username
152            User account under which to run pmie.  The default is the  current
153            user  account  for  interactive  use.   When  run as a daemon, the
154            unprivileged "pcp" account is used in current versions of PCP, but
155            in  older  versions  the  superuser  account  ("root") was used by
156            default.
157
158       -v   Unless one of the verbose options -V, -v or -W appears on the com‐
159            mand  line, expressions are evaluated silently, the only output is
160            as a result of any actions being executed.  In the  verbose  mode,
161            specified  using  the  -v  flag,  the  value of each expression is
162            printed as it is evaluated.  The values are  in  canonical  units;
163            bytes  in  the dimension of ``space'', seconds in the dimension of
164            ``time''  and  events  in  the  dimension   of   ``count''.    See
165            pmLookupDesc(3) for details of the supported dimension and scaling
166            mechanisms for performance metrics.  The verbose mode is useful in
167            monitoring the value of given expressions, evaluating derived per‐
168            formance metrics, passing these values on to other tools for  fur‐
169            ther processing and in debugging new expressions.
170
171       -V, --verbose
172            This  option has the same effect as the -v option, except that the
173            name of the host and instance (if applicable) are printed as  well
174            as expression values.
175
176       -W   This  option has the same effect as the -V option described above,
177            except that for boolean expressions, only those names  and  values
178            that  make  the  expression  true are printed.  These are the same
179            names and values accessible to rule actions as the %h, %i, %c  and
180            %v bindings, as described below.
181
182       -x, --secret-agent
183            Execute  in  domain agent mode.  This mode is used within the Per‐
184            formance Co-Pilot product to derive values  for  summary  metrics,
185            see pmdasummary(1).  Only restricted functionality is available in
186            this mode (expressions with actions may not be used).
187
188       -X, --secret-applet
189            Run in secret applet mode (thin client).
190
191       -z, --hostzone
192            Change the reporting timezone to the timezone of the host that  is
193            the  source  of  the performance metrics, as identified via either
194            the -h option or the first named set  of  archives  (as  described
195            above for the -a option).
196
197       -Z timezone, --timezone=timezone
198            Change  the  reporting  timezone  to timezone in the format of the
199            environment variable TZ as described in environ(7).
200
201       -?, --help
202            Display usage message and exit.
203

EXAMPLES

205       The following example expressions demonstrate some of the  capabilities
206       of the inference engine.
207
208       The  directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
209       examples of pmie expressions.
210
211       The variable delta controls expression evaluation  frequency.   Specify
212       that  subsequent  expressions be evaluated once a second, until further
213       notice:
214
215            delta = 1 sec;
216
217       If the total context switch rate exceeds 10000 per second per CPU, then
218       display an alarm notifier:
219
220            kernel.all.pswitch / hinv.ncpu > 10000 count/sec
221            -> alarm "high context switch rate %v";
222
223       If  the  high  context switch rate is sustained for 10 consecutive sam‐
224       ples, then launch top(1) in an xterm(1) window  to  monitor  processes,
225       but do this at most once every 5 minutes:
226
227            all_sample (
228                kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
229            ) -> shell 5 min "xterm -e 'top'";
230
231       The following rules are evaluated once every 20 seconds:
232
233            delta = 20 sec;
234
235       If  any  disk  is performing more than 60 I/Os per second, then print a
236       message identifying  the  busy  disk  to  standard  output  and  launch
237       dkvis(1):
238
239            some_inst (
240                disk.dev.total > 60 count/sec
241            ) -> print "busy disks:" " %i" &
242                 shell 5 min "dkvis";
243
244       Refine  the  preceding  rule to apply only between the hours of 9am and
245       5pm, and to require 3 of 4 consecutive samples to exceed the  threshold
246       before executing the action:
247
248            $hour >= 9 && $hour <= 17 &&
249            some_inst (
250              75 %_sample (
251                disk.dev.total @0..3 > 60 count/sec
252              )
253            ) -> print "disks busy for 20 sec:" " [%h]%i";
254
255       The following two rules are evaluated once every 10 minutes:
256
257            delta = 10 min;
258
259       If  either  the / or the /usr filesystem is more than 95% full, display
260       an alarm popup, but not if it has already  been  displayed  during  the
261       last 4 hours:
262
263            filesys.free #'/dev/root' /
264                filesys.capacity #'/dev/root' < 0.05
265            -> alarm 4 hour "root filesystem (almost) full";
266
267            filesys.free #'/dev/usr' /
268                filesys.capacity #'/dev/usr' < 0.05
269            -> alarm 4 hour "/usr filesystem (almost) full";
270
271       The  following rule requires a machine that supports the lmsensors met‐
272       rics.  If the machine environment temperature rises more than 2 degrees
273       over a 10 minute interval, write an entry in the system log:
274
275            lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
276            -> alarm "temperature rising fast" &
277               syslog "machine room temperature rise alarm";
278
279       And  something  interesting  if you have performance problems with your
280       Oracle database:
281
282            // back to 30sec evaluations
283            delta = 30 sec;
284            sid = "ptg1";       # $ORACLE_SID setting
285            lid = "223";        # latch ID from v$latch
286            lru = "#'$sid/$lid cache buffers lru chain'";
287            host = ":moomba.melbourne.sgi.com";
288            gets = "oracle.latch.gets $host $lru";
289            total = "oracle.latch.gets $host $lru +
290                     oracle.latch.misses $host $lru +
291                     oracle.latch.immisses $host $lru";
292
293            $total > 100 && $gets / $total < 0.2
294            -> alarm "high lru latch contention in database $sid";
295
296       The following ruleset will emit exactly one message  depending  on  the
297       availability and value of the 1-minute load average.
298
299            delta = 1 minute;
300            ruleset
301                 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
302                     print "extreme load average %v"
303            else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
304                     print "moderate load average %v"
305            unknown ->
306                     print "load average unavailable"
307            otherwise ->
308                     print "load average OK"
309            ;
310
311       The  following  rule  will  emit a message when some filesystem is more
312       than 75% full and is filling at a rate that if sustained would fill the
313       filesystem to 100% in less than 30 minutes.
314
315            some_inst (
316                100 * filesys.used / filesys.capacity > 75 &&
317                filesys.used + 30min * (rate filesys.used) > filesys.capacity
318            ) -> print "filesystem will be full within 30 mins:" " %i";
319
320       If  the metric mypmda.errors counts errors then the following rule will
321       emit a message if the rate of errors exceeds 1 per second provided  the
322       error count is less than 100.
323
324            mypmda.errors > 1 && instant mypmda.errors < 100
325            -> print "high error rate: %v";
326

QUICK START

328       The pmie specification language is powerful and large.
329
330       To  expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
331       vides a facility for generating a pmie configuration file from a set of
332       generalized  pmie rules.  The supplied set of rules covers a wide range
333       of performance scenarios.
334
335       The Performance Co-Pilot User's and Administrator's  Guide  provides  a
336       detailed tutorial-style chapter covering pmie.
337

EXPRESSION SYNTAX

339       This  description  is  terse  and  informal.   For a more comprehensive
340       description see the Performance  Co-Pilot  User's  and  Administrator's
341       Guide.
342
343       A pmie specification is a sequence of semicolon terminated expressions.
344
345       Basic  operators  are modeled on the arithmetic, relational and Boolean
346       operators of the C  programming  language.   Precedence  rules  are  as
347       expected,  although  the  use  of  parentheses is encouraged to enhance
348       readability and remove ambiguity.
349
350       Operands are performance metric names (see PMNS(5)) and the normal lit‐
351       eral constants.
352
353       Operands involving performance metrics may produce sets of values, as a
354       result of enumeration in the dimensions of hosts, instances  and  time.
355       Special qualifiers may appear after a performance metric name to define
356       the enumeration in each dimension.  For example,
357
358           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
359
360       defines 6 values corresponding to the time spent executing in user mode
361       on  CPU  0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
362       samples.  The default interpretation in the  absence  of  :  (host),  #
363       (instance)  and @ (time) qualifiers is all instances at the most recent
364       sample time for the default source of PCP performance metrics.
365
366       Host and instance names that do not follow the rules for  variables  in
367       programming  languages,  ie. alphabetic optionally followed by alphanu‐
368       merics, should be enclosed in single quotes.
369
370       Expression evaluation follows the law of  ``least  surprises''.   Where
371       performance metrics have the semantics of a counter, pmie will automat‐
372       ically convert to a rate based upon consecutive samples  and  the  time
373       interval  between these samples.  All numeric expressions are evaluated
374       in double precision, and where appropriate, automatically  scaled  into
375       canonical units of ``bytes'', ``seconds'' and ``counts''.
376
377       A  rule  is  a special form of expression that specifies a condition or
378       logical expression, a special operator (->) and actions to be performed
379       when the condition is found to be true.
380
381       The following table summarizes the basic pmie operators:
382
383         ┌────────────────┬────────────────────────────────────────────────┐
384         │   Operators    │                  Explanation                   │
385         ├────────────────┼────────────────────────────────────────────────┤
386         │+ - * /         │ Arithmetic                                     │
387         │< <= == >= > != │ Relational (value comparison)                  │
388         │! && ||         │ Boolean                                        │
389         │->              │ Rule                                           │
390rising          │ Boolean, false to true transition              │
391falling         │ Boolean, true to false transition              │
392rate            │ Explicit rate conversion (rarely required)     │
393instant         │ No automatic rate conversion (rarely required) │
394         └────────────────┴────────────────────────────────────────────────┘
395       All  operators  are  supported  for numeric-valued operands and expres‐
396       sions.  For string-valued operands,  namely  literal  string  constants
397       enclosed  in  double  quotes  or  metrics  with  a  data type of string
398       (PM_TYPE_STRING), only the operators == and != are supported.
399
400       The rate and instant operators are the logical inverse of one  another,
401       so  an  arithmetic  expression expr is equal to rate instant expr.  The
402       more useful cases involve using rate  with  a  metric  that  is  not  a
403       counter  to  determine  the  rate of change over time or instant with a
404       metric that is a counter to determine if the current value is above  or
405       below some threshold.
406
407       Aggregate  operators  may  be  used to aggregate or summarize along one
408       dimension of a set-valued expression.  The following  aggregate  opera‐
409       tors  map  from  a  logical expression to a logical expression of lower
410       dimension.
411
412         ┌─────────────────────────┬─────────────┬──────────────────────────┐
413         │       Operators         │    Type     │       Explanation        │
414         ├─────────────────────────┼─────────────┼──────────────────────────┤
415some_inst                │ Existential │ True if at least one set │
416some_host                │             │ member is true in the    │
417some_sample              │             │ associated dimension     │
418         ├─────────────────────────┼─────────────┼──────────────────────────┤
419all_inst                 │ Universal   │ True if all set members  │
420all_host                 │             │ are true in the associ‐  │
421all_sample               │             │ ated dimension           │
422         ├─────────────────────────┼─────────────┼──────────────────────────┤
423N%_inst                  │ Percentile  │ True if at least N per‐  │
424N%_host                  │             │ cent of set members are  │
425N%_sample                │             │ true in the associated   │
426         │                         │             │ dimension                │
427         └─────────────────────────┴─────────────┴──────────────────────────┘
428       The  following  instantial  operators  may be used to filter or limit a
429       set-valued logical expression, based on regular expression matching  of
430       instance  names.   The  logical  expression must be a set involving the
431       dimension of instances, and the regular expression is of the form  used
432       by egrep(1) or the Extended Regular Expressions of regcomp(3).
433
434              ┌─────────────┬──────────────────────────────────────────┐
435              │ Operators   │               Explanation                │
436              ├─────────────┼──────────────────────────────────────────┤
437match_inst   │ For each value of the logical expression │
438              │             │ that is ``true'', the result is ``true'' │
439              │             │ if the associated instance name matches  │
440              │             │ the regular expression.  Otherwise the   │
441              │             │ result is ``false''.                     │
442              ├─────────────┼──────────────────────────────────────────┤
443nomatch_inst │ For each value of the logical expression │
444              │             │ that is ``true'', the result is ``true'' │
445              │             │ if the associated instance name does not 
446              │             │ match the regular expression.  Otherwise │
447              │             │ the result is ``false''.                 │
448              └─────────────┴──────────────────────────────────────────┘
449       For  example,  the expression below will be ``true'' for disks attached
450       to controllers 2 or 3 performing more than 20 operations per second:
451            match_inst "^dks[23]d" disk.dev.total > 20;
452
453       The following aggregate operators map from an arithmetic expression  to
454       an arithmetic expression of lower dimension.
455
456          ┌─────────────────────────┬───────────┬──────────────────────────┐
457          │       Operators         │   Type    │       Explanation        │
458          ├─────────────────────────┼───────────┼──────────────────────────┤
459min_inst                 │ Extrema   │ Minimum value across all │
460min_host                 │           │ set members in the asso‐ │
461min_sample               │           │ ciated dimension         │
462          ├─────────────────────────┼───────────┼──────────────────────────┤
463max_inst                 │ Extrema   │ Maximum value across all │
464max_host                 │           │ set members in the asso‐ │
465max_sample               │           │ ciated dimension         │
466          ├─────────────────────────┼───────────┼──────────────────────────┤
467sum_inst                 │ Aggregate │ Sum of values across all │
468sum_host                 │           │ set members in the asso‐ │
469sum_sample               │           │ ciated dimension         │
470          ├─────────────────────────┼───────────┼──────────────────────────┤
471avg_inst                 │ Aggregate │ Average value across all │
472avg_host                 │           │ set members in the asso‐ │
473avg_sample               │           │ ciated dimension         │
474          └─────────────────────────┴───────────┴──────────────────────────┘
475       The  aggregate  operators  count_inst,  count_host and count_sample map
476       from a logical expression to an arithmetic expression of  lower  dimen‐
477       sion  by counting the number of set members for which the expression is
478       true in the associated dimension.
479
480       For action rules, the following actions are defined:
481
482                ┌──────────┬────────────────────────────────────────┐
483                │Operators │              Explanation               │
484                ├──────────┼────────────────────────────────────────┤
485alarm     │ Raise a visible alarm with xconfirm(1) │
486print     │ Display on standard output             │
487shell     │ Execute with sh(1)
488stomp     │ Send a STOMP message to a JMS server   │
489syslog    │ Append a message to system log file    │
490                └──────────┴────────────────────────────────────────┘
491       Multiple actions may be separated by the & and | operators  to  specify
492       respectively  sequential  execution  (both  actions  are  executed) and
493       alternate execution (the second action will only  be  executed  if  the
494       execution of the first action returns a non-zero error status.
495
496       Arguments  to actions are an optional suppression time, and then one or
497       more expressions (a string is an expression in this context).   Strings
498       appearing  as  arguments to an action may include the following special
499       selectors that will be replaced at the time the action is executed.
500
501       %h  Host name(s) that make the left-most top-level  expression  in  the
502           condition true.
503
504       %c  Connection specification string(s) or files for a PCP tool to reach
505           the hosts or archives that make the left-most top-level  expression
506           in the condition true.
507
508       %i  Instance(s)  that  make  the  left-most top-level expression in the
509           condition true.
510
511       %v  One value from the left-most top-level expression in the  condition
512           for each host and instance pair that makes the condition true.
513
514       Note  that  expansion of the special selectors is done by repeating the
515       whole argument once for each unique binding to any  of  the  qualifying
516       special selectors.  For example if a rule were true for the host mumble
517       with instances grunt and snort, and for host fumble the  instance  puff
518       makes the rule true, then the action
519            ...
520            -> shell myscript "Warning: %h:%i busy ";
521       will  execute  myscript with the argument string "Warning: mumble:grunt
522       busy Warning: mumble:snort busy Warning: fumble:puff busy".
523
524       By comparison, if the action
525            ...
526            -> shell myscript "Warning! busy:" " %h:%i";
527       were executed under the same circumstances, then myscript would be exe‐
528       cuted  with  the  argument  string  "Warning!  busy:  mumble:grunt mum‐
529       ble:snort fumble:puff".
530
531       The semantics of the expansion of the special selectors leads to a com‐
532       mon  usage pattern in an action, where one argument is a constant (con‐
533       tains no special selectors) the second argument  contains  the  desired
534       special  selectors  with  minimal separator characters, and an optional
535       third argument provides a constant postscript (e.g.  to  terminate  any
536       argument  quoting from the first argument).  If necessary post-process‐
537       ing (eg. in myscript) can provide the necessary enumeration  over  each
538       unique expansion of the string containing just the special selectors.
539
540       For complex conditions, the bindings to these selectors is not obvious.
541       It is strongly recommended that pmie be  used  in  the  debugging  mode
542       (specify the -W command line option in particular) during rule develop‐
543       ment.
544

BOOLEAN EXPRESSIONS

546       pmie expressions that have the semantics of a Boolean, e.g.  foo.bar  >
547       10  or some_inst ( my.table < 0 ) are assigned the values true or false
548       or unknown.  A value is unknown if one or more of the underlying metric
549       values  is  unavailable, e.g.  pmcd(1) on the host cannot be contacted,
550       the metric is not in the PCP archive, no values  are  currently  avail‐
551       able,  insufficient  values have been fetched to allow a rate converted
552       value to be computed  or  insufficient  values  have  been  fetched  to
553       instantiate the required number of samples in the temporal domain.
554
555       Boolean operators follow the normal rules of Kleene logic (aka 3-valued
556       logic) when combining values that include unknown:
557
558                      ┌────────────┬───────────────────────────┐
559                      │            │             B             │
560                      │  A and B   ├─────────┬───────┬─────────┤
561                      │            │  true   false unknown 
562                      ├──┬─────────┼─────────┼───────┼─────────┤
563                      │  │  true   true   false unknown 
564                      │  ├─────────┼─────────┼───────┼─────────┤
565                      │A │  false  false  false false  
566                      │  ├─────────┼─────────┼───────┼─────────┤
567                      │  │ unknown unknown false unknown 
568                      └──┴─────────┴─────────┴───────┴─────────┘
569                      ┌────────────┬──────────────────────────┐
570                      │            │            B             │
571                      │  A or B    ├──────┬─────────┬─────────┤
572                      │            │ true false  unknown 
573                      ├──┬─────────┼──────┼─────────┼─────────┤
574                      │  │  true   true true   true   
575                      │  ├─────────┼──────┼─────────┼─────────┤
576                      │A │  false  true false  unknown 
577                      │  ├─────────┼──────┼─────────┼─────────┤
578                      │  │ unknown true unknown unknown 
579                      └──┴─────────┴──────┴─────────┴─────────┘
580                                 ┌────────┬─────────┐
581                                 │   A    │  not A  │
582                                 ├────────┼─────────┤
583true   false  
584                                 ├────────┼─────────┤
585false  true   
586                                 ├────────┼─────────┤
587unknown unknown 
588                                 └────────┴─────────┘

RULESETS

590       The ruleset clause is used to define a set of rules  and  actions  that
591       are  evaluated  in  order until some action is executed, at which point
592       the remaining rules and actions are skipped until the ruleset is  again
593       scheduled  for evaluation.  The keyword else is used to separate rules.
594       After one or more regular rules (with a predicate  and  an  action),  a
595       ruleset may include an optional
596            unknown -> action
597       clause, optionally followed by a
598            otherwise -> action
599       clause.
600
601       If  all  of  the  predicates  in  the  rules evaluate to unknown and an
602       unknown clause has been  specified  then  action  associated  with  the
603       unknown clause will be executed.
604
605       If no rule predicate is true and the unknown action is either not spec‐
606       ified or not executed and an otherwise clause has been specified,  then
607       the action associated with the otherwise clause will be executed.
608

SCALE FACTORS

610       Scale  factors may be appended to arithmetic expressions and force lin‐
611       ear scaling of the value to canonical units.  Simple scale factors  are
612       constructed  from the keywords: nanosecond, nanosec, nsec, microsecond,
613       microsec, usec, millisecond, millisec, msec, second, sec, minute,  min,
614       hour,  byte,  Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
615       the operator /, for example ``Kbytes / hour''.
616

MACROS

618       Macros are defined using expressions of the form:
619
620            name = constexpr;
621
622       Where name follows the normal rules for variables in  programming  lan‐
623       guages, ie. alphabetic optionally followed by alphanumerics.  constexpr
624       must be a constant expression, either  a  string  (enclosed  in  double
625       quotes) or an arithmetic expression optionally followed by a scale fac‐
626       tor.
627
628       Macros are expanded when their name, prefixed by a dollar  ($)  appears
629       in an expression, and macros may be nested within a constexpr string.
630
631       The following reserved macro names are understood.
632
633       minute    Current minute of the hour.
634
635       hour      Current hour of the day, in the range 0 to 23.
636
637       day       Current day of the month, in the range 1 to 31.
638
639       month     Current  month  of  the  year, in the range 0 (January) to 11
640                 (December).
641
642       year      Current year.
643
644       day_of_week
645                 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
646                 day).
647
648       delta     Sample interval in effect for this expression.
649
650       Dates  and times are presented in the reporting time zone (see descrip‐
651       tion of -Z and -z command line options above).
652

AUTOMATIC RESTART

654       It is often useful for pmie processes to be started  and  stopped  when
655       the  local  host is booted or shutdown, or when they have been detected
656       as no longer running (when they have unexpectedly exited for some  rea‐
657       son).  Refer to pmie_check(1) for details on automating this process.
658
659       Optionally, each system running pmcd(1) may also be configured to run a
660       ``primary''  pmie  instance.   This  pmie  instance  is   launched   by
661       $PCP_RC_DIR/pmie,      and     is     affected     by     the     files
662       $PCP_SYSCONF_DIR/pmie/control,   $PCP_SYSCONF_DIR/pmie/control.d   (use
663       chkconfig(8),  systemctl(1)  or  similar  platform-specific commands to
664       activate or disable the primary pmie  instance)  and  $PCP_VAR_DIR/con‐
665       fig/pmie/config.default (the default initial configuration file for the
666       primary pmie).
667
668       The primary pmie instance is identified by the -P option.  There may be
669       at most one ``primary'' pmie instance on each system.  The primary pmie
670       instance (if any) must be running on the same host as  the  pmcd(1)  to
671       which  it  connects  (if  any),  so  the -h and -P options are mutually
672       exclusive.
673

EVENT MONITORING

675       It is common for production systems to be monitored in a central  loca‐
676       tion.   Traditionally  on  UNIX  systems this has been performed by the
677       system log facilities - see logger(1),  and  syslogd(1).   On  Windows,
678       communication with the system event log is handled by pcp-eventlog(1).
679
680       pmie  fits into this model when rules use the syslog action.  Note that
681       if the action string begins with -p (priority)  and/or  -t  (tag)  then
682       these  are  extracted from the string and treated in the same way as in
683       logger(1) and pcp-eventlog(1).
684
685       However, it is common to have other event monitoring  frameworks  also,
686       into  which  you  may wish to incorporate performance events from pmie.
687       You can often use the shell action to send events to these  frameworks,
688       as  they  usually provide their a program for injecting events into the
689       framework from external sources.
690
691       A final option is use of the stomp (Streaming Text  Oriented  Messaging
692       Protocol)  action,  which allows pmie to connect to a central JMS (Java
693       Messaging System) server and send events to the PMIE topic.  Tools  can
694       be  written  to  extract these text messages and present them to opera‐
695       tions people (via desktop popup windows, etc).  Use of the stomp action
696       requires  a  stomp  configuration file to be setup, which specifies the
697       location of the JMS server host, port number, and username/password.
698
699       The format of this file is as follows:
700
701            host=messages.sgi.com   # this is the JMS server (required)
702            port=61616              # and its listening here (required)
703            timeout=2               # seconds to wait for server (optional)
704            username=joe            # (required)
705            password=j03ST0MP       # (required)
706            topic=PMIE              # JMS topic for pmie messages (optional)
707
708       The timeout value specifies the time (in seconds) that pmie should wait
709       for  acknowledgements  from  the JMS server after sending a message (as
710       required by the STOMP protocol).  Note that on startup, pmie will  wait
711       indefinitely for a connection, and will not begin rule evaluation until
712       that initial connection has been established.  Should the connection to
713       the  JMS  server  be  lost at any time while pmie is running, pmie will
714       attempt to reconnect on each subsequent truthful evaluation of  a  rule
715       with  a  stomp  action,  but not more than once per minute.  This is to
716       avoid contributing to network congestion.  In this situation, where the
717       STOMP  connection  to the JMS server has been severed, the stomp action
718       will return a non-zero error value.
719

BUGS

721       The lexical scanner and parser will attempt to recover after  an  error
722       in  the  input expressions.  Parsing resumes after skipping input up to
723       the next semi-colon (;), however during this skipping process the scan‐
724       ner  is ignorant of comments and strings, so an embedded semi-colon may
725       cause parsing to resume at  an  unexpected  place.   This  behavior  is
726       largely  benign,  as  until the initial syntax error is corrected, pmie
727       will not attempt any expression evaluation.
728

FILES

730       $PCP_DEMOS_DIR/pmie/*
731            annotated example rules
732
733       $PCP_VAR_DIR/pmns/*
734            default PMNS specification files
735
736       $PCP_TMP_DIR/pmie
737            pmie maintains files in this directory  to  identify  the  running
738            pmie  instances  and  to  export  runtime  information  about each
739            instance - this data forms the basis of the pmcd.pmie  performance
740            metrics
741
742       $PCP_PMIECONTROL_PATH
743            the default set of pmie instances to start at boot time - refer to
744            pmie_check(1) for details
745

PCP ENVIRONMENT

747       Environment variables with the prefix PCP_ are used to parameterize the
748       file  and  directory names used by PCP.  On each installation, the file
749       /etc/pcp.conf contains the  local  values  for  these  variables.   The
750       $PCP_CONF  variable may be used to specify an alternative configuration
751       file, as described in pcp.conf(5).
752
753       When executing shell actions, pmie overrides two variables  -  IFS  and
754       PATH  - in the environment of the child process.  IFS is set to "\t\n".
755       The PATH is set to a combination of a default path  for  all  platforms
756       ("/usr/sbin:/sbin:/usr/bin:/usr/sbin")  and several configurable compo‐
757       nents.  These are (in this order):  $PCP_BIN_DIR,  $PCP_BINADM_DIR  and
758       $PCP_PLATFORM_PATHS.
759
760       When  executing  popup  alarm  actions,  pmie  will  use  the  value of
761       $PCP_XCONFIRM_PROG as the visual notification program to run.  This  is
762       typically set to pmconfirm(1), a cross-platform dialog box.
763

UNIX SEE ALSO

765       logger(1).
766

WINDOWS SEE ALSO

768       pcp-eventlog(1).
769

SEE ALSO

771       PCPIntro(1),    pmcd(1),   pmconfirm(1),   pmdumplog(1),   pmieconf(1),
772       pmie_check(1), pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(5),
773       pcp.env(5) and PMNS(5).
774

USER GUIDE

776       For a more complete description of the pmie language, refer to the Per‐
777       formance Co-Pilot Users and Administrators Guide.   This  is  available
778       online from:
779           https://pcp.io/doc/pcp-users-and-administrators-guide.pdf
780
781
782
783Performance Co-Pilot                  PCP                              PMIE(1)
Impressum