1PMIE(1)                     General Commands Manual                    PMIE(1)
2
3
4

NAME

6       pmie - inference engine for performance metrics
7

SYNOPSIS

9       pmie  [-bCdeFfPqvVWxXz?]   [-a  archive]  [-A  align] [-c filename] [-h
10       host] [-l logfile] [-m note] [-j stompfile] [-n pmnsfile]  [-o  format]
11       [-O offset] [-S starttime] [-t interval] [-T endtime] [-U username] [-Z
12       timezone] [filename ...]
13

DESCRIPTION

15       pmie accepts a collection of arithmetic, logical, and rule  expressions
16       to  be  evaluated  at specified frequencies.  The base data for the ex‐
17       pressions consists of performance metrics values delivered in real-time
18       from any host running the Performance Metrics Collection Daemon (PMCD),
19       or using historical data from Performance Co-Pilot (PCP) archives.
20
21       As well as computing arithmetic and logical values,  pmie  can  execute
22       actions  (popup alarms, write system log messages, and launch programs)
23       in response to specified conditions.  Such actions are extremely useful
24       in detecting, monitoring and correcting performance related problems.
25
26       The expressions to be evaluated are read from configuration files spec‐
27       ified by one or more filename arguments.  In the absence of  any  file‐
28       name, expressions are read from standard input.
29
30       Output  from  pmie is directed to standard output and standard error as
31       follows:
32
33       stdout
34            Expression values printed in the verbose -v mode and the output of
35            print actions.
36
37       stderr
38            Error  and warning messages for any syntactic or semantic problems
39            during expression parsing, and any semantic or performance metrics
40            availability problems during expression evaluation.
41

OPTIONS

43       The available command line options are:
44
45       -a archive, --archive=archive
46            archive  which  is  a comma-separated list of names, each of which
47            may be the base name of an archive or the name of a directory con‐
48            taining one or more archives written by pmlogger(1).  Multiple in‐
49            stances of the -a flag may appear on the command line to specify a
50            list  of sets of archives.  In this case, it is required that only
51            one set of archives be present for any one host.   Also,  any  ex‐
52            plicit  host  names  occurring in a pmie expression must match the
53            host name recorded in one of the archive labels.  In the  case  of
54            multiple sets of archives, timestamps recorded in the archives are
55            used to ensure temporal consistency.
56
57       -A align, --align=align
58            Force the initial time window to be aligned on the boundary  of  a
59            natural  time unit align.  Refer to PCPIntro(1) for a complete de‐
60            scription of the syntax for align.
61
62       -b, --buffer
63            Output will be line buffered and standard output  is  attached  to
64            standard  error.   This is most useful for background execution in
65            conjunction with the -l option.  The -b option is always used  for
66            pmie instances launched from pmie_check(1).
67
68       -c config, --config=config
69            An  alternative  to  specifying filename at the end of the command
70            line.
71
72       -C, --check
73            Parse the configuration file(s) and  exit  before  performing  any
74            evaluations.  Any errors in the configuration file are reported.
75
76       -d, --interact
77            Normally  pmie  would  be launched as a non-interactive process to
78            monitor and manage the performance of one or  more  hosts.   Given
79            the -d flag however, execution is interactive and the user is pre‐
80            sented with a menu of options.  Interactive mode is useful  mainly
81            for debugging new expressions.
82
83       -e, --timestamp
84            When  used  with -V, -v or -W, this option forces timestamps to be
85            reported with each expression.  The  timestamps  are  in  ctime(3)
86            format,  enclosed  in  parenthesis and appear after the expression
87            name and before the expression value, e.g.
88                 expr_1 (Tue Feb  6 19:55:10 2001): 12
89
90       -f, --foreground
91            If the -l option is specified and there  is  no  -a  option  (i.e.
92            real-time  monitoring)  then  pmie is run as a daemon in the back‐
93            ground (in all other cases foreground is  the  default).   The  -f
94            (and  -F,  see  below)  options  force pmie to be run in the fore‐
95            ground, independent of any other options.
96
97       -F, --systemd
98            Like -f, the -F option runs pmie in the foreground, but also  does
99            some  housekeeping (like create a pid file, change user id and no‐
100            tify systemd(1) when pmie has started or is shutting down).   This
101            is  intended for use when pmie is launched from systemd(1) and the
102            daemonizing has already been done.  The -f and -F options are  mu‐
103            tually exclusive.
104
105       -h host, --host=host
106            By  default  performance  data  is fetched from the local host (in
107            real-time mode) or the host for the first named set of archives on
108            the  command  line (in archive mode).  The host argument overrides
109            this default.  It does not override hosts explicitly named in  the
110            expressions  being evaluated.  The host argument is interpreted as
111            a connection specification for pmNewContext, and is  later  mapped
112            to  the  remote  pmcd's self-reported host name for reporting pur‐
113            poses.  See also the  %h  vs.  %c  substitutions  in  rule  action
114            strings below.
115
116       -j file
117            An  alternative STOMP protocol configuration is loaded from stomp‐
118            file.  If this option is not used, and the stomp action is used in
119            any  rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp
120            will be used.
121
122       -l logfile, --logfile=logfile
123            Standard error is sent to logfile.
124
125       -m note, --note=note
126            Used  to  indicate  where  pmie  has  been  launched  from,   e.g.
127            pmie_check(1) and pmie_daily(1) use -m pmie_check and this is used
128            by pmie to determine if it needs to be restarted should  the  PMCD
129            hostname  change, as described in the HOSTNAME CHANGES section be‐
130            low.
131
132       -n pmnsfile, --namespace=pmnsfile
133            An alternative Performance Metrics Name  Space  (PMNS)  is  loaded
134            from the file pmnsfile.
135
136       -o format, --format=format
137            When  precessing  performance  data from an archive, the -o option
138            may be used to specify an alternate output format when a rule  ac‐
139            tion  is  executed.  See the DIFFERENCES IN HOST AND ARCHIVE MODES
140            section for a description of how the output  format  may  be  con‐
141            structed.
142
143       -O origin, --origin=origin
144            Specify  the  origin of the time window.  See PCPIntro(1) for com‐
145            plete description of this option.
146
147       -P, --primary
148            Identifies this as the primary pmie instance for a host.  See  the
149            ``AUTOMATIC RESTART'' section below for further details.
150
151       -q, --quiet
152            Suppresses  diagnostic  messages that would be printed to standard
153            output by default, especially the "evaluator exiting"  message  as
154            this can confuse scripts.
155
156       -S starttime, --start=starttime
157            Specify  the  starttime  of  the time window.  See PCPIntro(1) for
158            complete description of this option.
159
160       -t interval, --interval=interval
161            The interval argument follows the syntax described in PCPIntro(1),
162            and  in  the simplest form may be an unsigned integer (the implied
163            units in this case are seconds).  The value is used  to  determine
164            the  sample  interval  for  expressions that do not explicitly set
165            their sample interval using the pmie variable delta described  be‐
166            low.  The default is 10.0 seconds.
167
168       -T endtime, --finish=endtime
169            Specify  the endtime of the time window.  See PCPIntro(1) for com‐
170            plete description of this option.
171
172       -U username, --username=username
173            User account under which to run pmie.  The default is the  current
174            user  account  for interactive use.  When run as a daemon, the un‐
175            privileged "pcp" account is used in current versions of  PCP,  but
176            in  older  versions the superuser account ("root") was used by de‐
177            fault.
178
179       -v   Unless one of the verbose options -V, -v or -W appears on the com‐
180            mand  line, expressions are evaluated silently, the only output is
181            as a result of any actions being executed.  In the  verbose  mode,
182            specified  using  the  -v  flag,  the  value of each expression is
183            printed as it is evaluated.  The values are  in  canonical  units;
184            bytes  in  the dimension of ``space'', seconds in the dimension of
185            ``time'' and events  in  the  dimension  of  ``count''.   See  pm‐
186            LookupDesc(3)  for  details of the supported dimension and scaling
187            mechanisms for performance metrics.  The verbose mode is useful in
188            monitoring the value of given expressions, evaluating derived per‐
189            formance metrics, passing these values on to other tools for  fur‐
190            ther processing and in debugging new expressions.
191
192       -V, --verbose
193            This  option has the same effect as the -v option, except that the
194            name of the host and instance (if applicable) are printed as  well
195            as expression values.
196
197       -W   This  option has the same effect as the -V option described above,
198            except that for boolean expressions, only those names  and  values
199            that  make  the  expression  true are printed.  These are the same
200            names and values accessible to rule actions as the %h, %i, %c  and
201            %v bindings, as described below.
202
203       -x, --secret-agent
204            Execute  in  domain agent mode.  This mode is used within the Per‐
205            formance Co-Pilot product to derive values  for  summary  metrics,
206            see pmdasummary(1).  Only restricted functionality is available in
207            this mode (expressions with actions may not be used).
208
209       -X, --secret-applet
210            Run in secret applet mode (thin client).
211
212       -z, --hostzone
213            Change the reporting timezone to the timezone of the host that  is
214            the  source  of  the performance metrics, as identified via either
215            the -h option or the first named set  of  archives  (as  described
216            above for the -a option).
217
218       -Z timezone, --timezone=timezone
219            Change the reporting timezone to timezone in the format of the en‐
220            vironment variable TZ as described in environ(7).
221
222       -?, --help
223            Display usage message and exit.
224

EXAMPLES

226       The following example expressions demonstrate some of the  capabilities
227       of the inference engine.
228
229       The  directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
230       examples of pmie expressions.
231
232       The variable delta controls expression evaluation  frequency.   Specify
233       that  subsequent  expressions be evaluated once a second, until further
234       notice:
235
236            delta = 1 sec;
237
238       If the total context switch rate exceeds 10000 per second per CPU, then
239       display an alarm notifier:
240
241            kernel.all.pswitch / hinv.ncpu > 10000 count/sec
242            -> alarm "high context switch rate %v";
243
244       If  the  high  context switch rate is sustained for 10 consecutive sam‐
245       ples, then launch top(1) in an xterm(1) window  to  monitor  processes,
246       but do this at most once every 5 minutes:
247
248            all_sample (
249                kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
250            ) -> shell 5 min "xterm -e 'top'";
251
252       The following rules are evaluated once every 20 seconds:
253
254            delta = 20 sec;
255
256       If  any  disk  is performing more than 60 I/Os per second, then print a
257       message identifying  the  busy  disk  to  standard  output  and  launch
258       dkvis(1):
259
260            some_inst (
261                disk.dev.total > 60 count/sec
262            ) -> print "busy disks:" " %i" &
263                 shell 5 min "dkvis";
264
265       Refine  the  preceding  rule to apply only between the hours of 9am and
266       5pm, and to require 3 of 4 consecutive samples to exceed the  threshold
267       before executing the action:
268
269            $hour >= 9 && $hour <= 17 &&
270            some_inst (
271              75 %_sample (
272                disk.dev.total @0..3 > 60 count/sec
273              )
274            ) -> print "disks busy for 20 sec:" " [%h]%i";
275
276       The following two rules are evaluated once every 10 minutes:
277
278            delta = 10 min;
279
280       If  either  the / or the /usr filesystem is more than 95% full, display
281       an alarm popup, but not if it has already  been  displayed  during  the
282       last 4 hours:
283
284            filesys.free #'/dev/root' /
285                filesys.capacity #'/dev/root' < 0.05
286            -> alarm 4 hour "root filesystem (almost) full";
287
288            filesys.free #'/dev/usr' /
289                filesys.capacity #'/dev/usr' < 0.05
290            -> alarm 4 hour "/usr filesystem (almost) full";
291
292       The  following rule requires a machine that supports the lmsensors met‐
293       rics.  If the machine environment temperature rises more than 2 degrees
294       over a 10 minute interval, write an entry in the system log:
295
296            lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2
297            -> alarm "temperature rising fast" &
298               syslog "machine room temperature rise alarm";
299
300       And  something  interesting  if you have performance problems with your
301       Oracle database:
302
303            // back to 30sec evaluations
304            delta = 30 sec;
305            sid = "ptg1";       # $ORACLE_SID setting
306            lid = "223";        # latch ID from v$latch
307            lru = "#'$sid/$lid cache buffers lru chain'";
308            host = ":moomba.melbourne.sgi.com";
309            gets = "oracle.latch.gets $host $lru";
310            total = "oracle.latch.gets $host $lru +
311                     oracle.latch.misses $host $lru +
312                     oracle.latch.immisses $host $lru";
313
314            $total > 100 && $gets / $total < 0.2
315            -> alarm "high lru latch contention in database $sid";
316
317       The following ruleset will emit exactly one message  depending  on  the
318       availability and value of the 1-minute load average.
319
320            delta = 1 minute;
321            ruleset
322                 kernel.all.load #'1 minute' > 10 * hinv.ncpu ->
323                     print "extreme load average %v"
324            else kernel.all.load #'1 minute' > 2 * hinv.ncpu ->
325                     print "moderate load average %v"
326            unknown ->
327                     print "load average unavailable"
328            otherwise ->
329                     print "load average OK"
330            ;
331
332       The  following  rule  will  emit a message when some filesystem is more
333       than 75% full and is filling at a rate that if sustained would fill the
334       filesystem to 100% in less than 30 minutes.
335
336            some_inst (
337                100 * filesys.used / filesys.capacity > 75 &&
338                filesys.used + 30min * (rate filesys.used) > filesys.capacity
339            ) -> print "filesystem will be full within 30 mins:" " %i";
340
341       If  the metric mypmda.errors counts errors then the following rule will
342       emit a message if the rate of errors exceeds 1 per second provided  the
343       error count is less than 100.
344
345            mypmda.errors > 1 && instant mypmda.errors < 100
346            -> print "high error rate: %v";
347

QUICK START

349       The pmie specification language is powerful and large.
350
351       To  expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
352       vides a facility for generating a pmie configuration file from a set of
353       generalized  pmie rules.  The supplied set of rules covers a wide range
354       of performance scenarios.
355
356       The Performance Co-Pilot User's and Administrator's  Guide  provides  a
357       detailed tutorial-style chapter covering pmie.
358

EXPRESSION SYNTAX

360       This  description  is terse and informal.  For a more comprehensive de‐
361       scription see  the  Performance  Co-Pilot  User's  and  Administrator's
362       Guide.
363
364       A pmie specification is a sequence of semicolon terminated expressions.
365
366       Basic  operators  are modeled on the arithmetic, relational and Boolean
367       operators of the C programming language.  Precedence rules are  as  ex‐
368       pected,  although the use of parentheses is encouraged to enhance read‐
369       ability and remove ambiguity.
370
371       Operands are performance metric names (see PMNS(5)) and the normal lit‐
372       eral constants.
373
374       Operands involving performance metrics may produce sets of values, as a
375       result of enumeration in the dimensions of hosts, instances  and  time.
376       Special qualifiers may appear after a performance metric name to define
377       the enumeration in each dimension.  For example,
378
379           kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
380
381       defines 6 values corresponding to the time spent executing in user mode
382       on  CPU  0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
383       samples.  The default interpretation in the absence of : (host), # (in‐
384       stance)  and  @  (time)  qualifiers is all instances at the most recent
385       sample time for the default source of PCP performance metrics.
386
387       Host and instance names that do not follow the rules for  variables  in
388       programming  languages, i.e. alphabetic optionally followed by alphanu‐
389       merics, should be enclosed in single quotes.
390
391       Expression evaluation follows the law of  ``least  surprises''.   Where
392       performance metrics have the semantics of a counter, pmie will automat‐
393       ically convert to a rate based upon consecutive samples  and  the  time
394       interval  between these samples.  All numeric expressions are evaluated
395       in double precision, and where appropriate, automatically  scaled  into
396       canonical units of ``bytes'', ``seconds'' and ``counts''.
397
398       A  rule  is  a special form of expression that specifies a condition or
399       logical expression, a special operator (->) and actions to be performed
400       when the condition is found to be true.
401
402       The following table summarizes the basic pmie operators:
403
404         ┌────────────────┬────────────────────────────────────────────────┐
405         │   Operators    │                  Explanation                   │
406         ├────────────────┼────────────────────────────────────────────────┤
407         │+ - * /         │ Arithmetic                                     │
408         │< <= == >= > != │ Relational (value comparison)                  │
409         │! && ||         │ Boolean                                        │
410         │->              │ Rule                                           │
411rising          │ Boolean, false to true transition              │
412falling         │ Boolean, true to false transition              │
413rate            │ Explicit rate conversion (rarely required)     │
414instant         │ No automatic rate conversion (rarely required) │
415         └────────────────┴────────────────────────────────────────────────┘
416       All  operators  are  supported  for numeric-valued operands and expres‐
417       sions.  For string-valued operands, namely literal string constants en‐
418       closed  in  double  quotes  or  metrics  with  a  data  type  of string
419       (PM_TYPE_STRING), only the operators == and != are supported.
420
421       The rate and instant operators are the logical inverse of one  another,
422       so  an  arithmetic  expression expr is equal to rate instant expr.  The
423       more useful cases involve using rate  with  a  metric  that  is  not  a
424       counter  to  determine  the  rate of change over time or instant with a
425       metric that is a counter to determine if the current value is above  or
426       below some threshold.
427
428       Aggregate operators may be used to aggregate or summarize along one di‐
429       mension of a set-valued expression.  The following aggregate  operators
430       map  from  a logical expression to a logical expression of lower dimen‐
431       sion.
432
433         ┌─────────────────────────┬─────────────┬──────────────────────────┐
434         │       Operators         │    Type     │       Explanation        │
435         ├─────────────────────────┼─────────────┼──────────────────────────┤
436some_inst                │ Existential │ True if at least one set │
437some_host                │             │ member is true in the    │
438some_sample              │             │ associated dimension     │
439         ├─────────────────────────┼─────────────┼──────────────────────────┤
440all_inst                 │ Universal   │ True if all set members  │
441all_host                 │             │ are true in the associ‐  │
442all_sample               │             │ ated dimension           │
443         ├─────────────────────────┼─────────────┼──────────────────────────┤
444N%_inst                  │ Percentile  │ True if at least N per‐  │
445N%_host                  │             │ cent of set members are  │
446N%_sample                │             │ true in the associated   │
447         │                         │             │ dimension                │
448         └─────────────────────────┴─────────────┴──────────────────────────┘
449       The  following  instantial  operators  may be used to filter or limit a
450       set-valued logical expression, based on regular expression matching  of
451       instance names.  The logical expression must be a set involving the di‐
452       mension of instances, and the regular expression is of the form used by
453       egrep(1) or the Extended Regular Expressions of regcomp(3).
454
455              ┌─────────────┬──────────────────────────────────────────┐
456              │ Operators   │               Explanation                │
457              ├─────────────┼──────────────────────────────────────────┤
458match_inst   │ For each value of the logical expression │
459              │             │ that is ``true'', the result is ``true'' │
460              │             │ if the associated instance name matches  │
461              │             │ the regular expression.  Otherwise the   │
462              │             │ result is ``false''.                     │
463              ├─────────────┼──────────────────────────────────────────┤
464nomatch_inst │ For each value of the logical expression │
465              │             │ that is ``true'', the result is ``true'' │
466              │             │ if the associated instance name does not 
467              │             │ match the regular expression.  Otherwise │
468              │             │ the result is ``false''.                 │
469              └─────────────┴──────────────────────────────────────────┘
470       For  example,  the expression below will be ``true'' for disks attached
471       to controllers 2 or 3 performing more than 20 operations per second:
472            match_inst "^dks[23]d" disk.dev.total > 20;
473
474       The following aggregate operators map from an arithmetic expression  to
475       an arithmetic expression of lower dimension.
476
477          ┌─────────────────────────┬───────────┬──────────────────────────┐
478          │       Operators         │   Type    │       Explanation        │
479          ├─────────────────────────┼───────────┼──────────────────────────┤
480min_inst                 │ Extrema   │ Minimum value across all │
481min_host                 │           │ set members in the asso‐ │
482min_sample               │           │ ciated dimension         │
483          ├─────────────────────────┼───────────┼──────────────────────────┤
484max_inst                 │ Extrema   │ Maximum value across all │
485max_host                 │           │ set members in the asso‐ │
486max_sample               │           │ ciated dimension         │
487          ├─────────────────────────┼───────────┼──────────────────────────┤
488sum_inst                 │ Aggregate │ Sum of values across all │
489sum_host                 │           │ set members in the asso‐ │
490sum_sample               │           │ ciated dimension         │
491          ├─────────────────────────┼───────────┼──────────────────────────┤
492avg_inst                 │ Aggregate │ Average value across all │
493avg_host                 │           │ set members in the asso‐ │
494avg_sample               │           │ ciated dimension         │
495          └─────────────────────────┴───────────┴──────────────────────────┘
496       The  aggregate  operators  count_inst,  count_host and count_sample map
497       from a logical expression to an arithmetic expression of  lower  dimen‐
498       sion  by counting the number of set members for which the expression is
499       true in the associated dimension.
500
501       For action rules, the following actions are defined:
502
503                ┌──────────┬────────────────────────────────────────┐
504                │Operators │              Explanation               │
505                ├──────────┼────────────────────────────────────────┤
506alarm     │ Raise a visible alarm with xconfirm(1) │
507print     │ Display on standard output             │
508shell     │ Execute with sh(1)
509stomp     │ Send a STOMP message to a JMS server   │
510syslog    │ Append a message to system log file    │
511                └──────────┴────────────────────────────────────────┘
512       Multiple actions may be separated by the & and | operators  to  specify
513       respectively  sequential  execution (both actions are executed) and al‐
514       ternate execution (the second action will only be executed if the  exe‐
515       cution of the first action returns a non-zero error status.
516
517       Arguments  to actions are an optional suppression time, and then one or
518       more expressions (a string is an expression in this context).   Strings
519       appearing  as  arguments to an action may include the following special
520       selectors that will be replaced at the time the action is executed.
521
522       %h  Host name(s) that make the left-most top-level  expression  in  the
523           condition true.
524
525       %c  Connection specification string(s) or files for a PCP tool to reach
526           the hosts or archives that make the left-most top-level  expression
527           in the condition true.
528
529       %i  Instance(s)  that  make  the  left-most top-level expression in the
530           condition true.
531
532       %v  One value from the left-most top-level expression in the  condition
533           for each host and instance pair that makes the condition true.
534
535       Note  that  expansion of the special selectors is done by repeating the
536       whole argument once for each unique binding to any  of  the  qualifying
537       special selectors.  For example if a rule were true for the host mumble
538       with instances grunt and snort, and for host fumble the  instance  puff
539       makes the rule true, then the action
540            ...
541            -> shell myscript "Warning: %h:%i busy ";
542       will  execute  myscript with the argument string "Warning: mumble:grunt
543       busy Warning: mumble:snort busy Warning: fumble:puff busy".
544
545       By comparison, if the action
546            ...
547            -> shell myscript "Warning! busy:" " %h:%i";
548       were executed under the same circumstances, then myscript would be exe‐
549       cuted  with  the  argument  string  "Warning!  busy:  mumble:grunt mum‐
550       ble:snort fumble:puff".
551
552       The semantics of the expansion of the special selectors leads to a com‐
553       mon  usage pattern in an action, where one argument is a constant (con‐
554       tains no special selectors) the second argument  contains  the  desired
555       special  selectors  with  minimal separator characters, and an optional
556       third argument provides a constant postscript (e.g.  to  terminate  any
557       argument  quoting from the first argument).  If necessary post-process‐
558       ing (e.g. in myscript) can provide the necessary enumeration over  each
559       unique expansion of the string containing just the special selectors.
560
561       For complex conditions, the bindings to these selectors is not obvious.
562       It is strongly recommended that pmie be  used  in  the  debugging  mode
563       (specify the -W command line option in particular) during rule develop‐
564       ment.
565

BOOLEAN EXPRESSIONS

567       pmie expressions that have the semantics of a Boolean, e.g.  foo.bar  >
568       10  or some_inst ( my.table < 0 ) are assigned the values true or false
569       or unknown.  A value is unknown if one or more of the underlying metric
570       values  is  unavailable, e.g.  pmcd(1) on the host cannot be contacted,
571       the metric is not in the PCP archive, no values  are  currently  avail‐
572       able,  insufficient  values have been fetched to allow a rate converted
573       value to be computed or insufficient values have been  fetched  to  in‐
574       stantiate the required number of samples in the temporal domain.
575
576       Boolean operators follow the normal rules of Kleene logic (aka 3-valued
577       logic) when combining values that include unknown:
578
579                      ┌────────────┬───────────────────────────┐
580                      │            │             B             │
581                      │  A and B   ├─────────┬───────┬─────────┤
582                      │            │  true   false unknown 
583                      ├──┬─────────┼─────────┼───────┼─────────┤
584                      │  │  true   true   false unknown 
585                      │  ├─────────┼─────────┼───────┼─────────┤
586                      │A │  false  false  false false  
587                      │  ├─────────┼─────────┼───────┼─────────┤
588                      │  │ unknown unknown false unknown 
589                      └──┴─────────┴─────────┴───────┴─────────┘
590                      ┌────────────┬──────────────────────────┐
591                      │            │            B             │
592                      │  A or B    ├──────┬─────────┬─────────┤
593                      │            │ true false  unknown 
594                      ├──┬─────────┼──────┼─────────┼─────────┤
595                      │  │  true   true true   true   
596                      │  ├─────────┼──────┼─────────┼─────────┤
597                      │A │  false  true false  unknown 
598                      │  ├─────────┼──────┼─────────┼─────────┤
599                      │  │ unknown true unknown unknown 
600                      └──┴─────────┴──────┴─────────┴─────────┘
601                                 ┌────────┬─────────┐
602                                 │   A    │  not A  │
603                                 ├────────┼─────────┤
604true   false  
605                                 ├────────┼─────────┤
606false  true   
607                                 ├────────┼─────────┤
608unknown unknown 
609                                 └────────┴─────────┘

RULESETS

611       The ruleset clause is used to define a set of rules  and  actions  that
612       are  evaluated  in  order until some action is executed, at which point
613       the remaining rules and actions are skipped until the ruleset is  again
614       scheduled  for evaluation.  The keyword else is used to separate rules.
615       After one or more regular rules (with a predicate  and  an  action),  a
616       ruleset may include an optional
617            unknown -> action
618       clause, optionally followed by a
619            otherwise -> action
620       clause.
621
622       If  all  of  the predicates in the rules evaluate to unknown and an un‐
623       known clause has been specified then action associated with the unknown
624       clause will be executed.
625
626       If no rule predicate is true and the unknown action is either not spec‐
627       ified or not executed and an otherwise clause has been specified,  then
628       the action associated with the otherwise clause will be executed.
629

SCALE FACTORS

631       Scale  factors may be appended to arithmetic expressions and force lin‐
632       ear scaling of the value to canonical units.  Simple scale factors  are
633       constructed  from the keywords: nanosecond, nanosec, nsec, microsecond,
634       microsec, usec, millisecond, millisec, msec, second, sec, minute,  min,
635       hour,  byte,  Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount and Mcount, and
636       the operator /, for example ``Kbytes / hour''.
637

MACROS

639       Macros are defined using expressions of the form:
640
641            name = constexpr;
642
643       Where name follows the normal rules for variables in  programming  lan‐
644       guages,  i.e.  alphabetic optionally followed by alphanumerics.  const‐
645       expr must be a constant expression, either a string (enclosed in double
646       quotes) or an arithmetic expression optionally followed by a scale fac‐
647       tor.
648
649       Macros are expanded when their name, prefixed by a dollar  ($)  appears
650       in an expression, and macros may be nested within a constexpr string.
651
652       The following reserved macro names are understood.
653
654       minute    Current minute of the hour.
655
656       hour      Current hour of the day, in the range 0 to 23.
657
658       day       Current day of the month, in the range 1 to 31.
659
660       month     Current  month  of  the  year, in the range 0 (January) to 11
661                 (December).
662
663       year      Current year.
664
665       day_of_week
666                 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
667                 day).
668
669       delta     Sample interval in effect for this expression.
670
671       Dates  and times are presented in the reporting time zone (see descrip‐
672       tion of -Z and -z command line options above).
673

AUTOMATIC RESTART

675       It is often useful for pmie processes to be started  and  stopped  when
676       the  local  host is booted or shutdown, or when they have been detected
677       as no longer running (when they have unexpectedly exited for some  rea‐
678       son).  Refer to pmie_check(1) for details on automating this process.
679
680       Optionally, each system running pmcd(1) may also be configured to run a
681       ``primary''  pmie  instance.   This  pmie  instance  is   launched   by
682       $PCP_RC_DIR/pmie,      and     is     affected     by     the     files
683       $PCP_SYSCONF_DIR/pmie/control,   $PCP_SYSCONF_DIR/pmie/control.d   (use
684       chkconfig(8), systemctl(1) or similar platform-specific commands to ac‐
685       tivate or disable the  primary  pmie  instance)  and  $PCP_VAR_DIR/con‐
686       fig/pmie/config.default (the default initial configuration file for the
687       primary pmie).
688
689       The primary pmie instance is identified by the -P option.  There may be
690       at most one ``primary'' pmie instance on each system.  The primary pmie
691       instance (if any) must be running on the same host as  the  pmcd(1)  to
692       which  it  connects (if any), so the -h and -P options are mutually ex‐
693       clusive.
694

EVENT MONITORING

696       It is common for production systems to be monitored in a central  loca‐
697       tion.   Traditionally  on  UNIX  systems this has been performed by the
698       system log facilities - see logger(1),  and  syslogd(1).   On  Windows,
699       communication with the system event log is handled by pcp-eventlog(1).
700
701       pmie  fits into this model when rules use the syslog action.  Note that
702       if the action string begins with -p (priority)  and/or  -t  (tag)  then
703       these  are  extracted from the string and treated in the same way as in
704       logger(1) and pcp-eventlog(1).
705
706       However, it is common to have other event monitoring  frameworks  also,
707       into  which  you  may wish to incorporate performance events from pmie.
708       You can often use the shell action to send events to these  frameworks,
709       as  they  usually provide their a program for injecting events into the
710       framework from external sources.
711
712       A final option is use of the stomp (Streaming Text  Oriented  Messaging
713       Protocol)  action,  which allows pmie to connect to a central JMS (Java
714       Messaging System) server and send events to the PMIE topic.  Tools  can
715       be  written  to  extract these text messages and present them to opera‐
716       tions people (via desktop popup windows, etc).  Use of the stomp action
717       requires  a  stomp  configuration file to be setup, which specifies the
718       location of the JMS server host, port number, and username/password.
719
720       The format of this file is as follows:
721
722            host=messages.sgi.com   # this is the JMS server (required)
723            port=61616              # and its listening here (required)
724            timeout=2               # seconds to wait for server (optional)
725            username=joe            # (required)
726            password=j03ST0MP       # (required)
727            topic=PMIE              # JMS topic for pmie messages (optional)
728
729       The timeout value specifies the time (in seconds) that pmie should wait
730       for  acknowledgements  from  the JMS server after sending a message (as
731       required by the STOMP protocol).  Note that on startup, pmie will  wait
732       indefinitely for a connection, and will not begin rule evaluation until
733       that initial connection has been established.  Should the connection to
734       the JMS server be lost at any time while pmie is running, pmie will at‐
735       tempt to reconnect on each subsequent truthful  evaluation  of  a  rule
736       with  a  stomp  action,  but not more than once per minute.  This is to
737       avoid contributing to network congestion.  In this situation, where the
738       STOMP  connection  to the JMS server has been severed, the stomp action
739       will return a non-zero error value.
740

DIFFERENCES IN HOST AND ARCHIVE MODES

742       When running in host mode, the delta interval for each rule  determines
743       a  real-time  delay between rule evaluation, so pmie spends most if its
744       time sleeping and waiting for the next scheduled rule evaluation.
745
746       When running in archive mode, pmie uses the  delta  interval  for  each
747       rule  to  determine  how frequently the rules are evaluated against the
748       archive data, but unlike host mode there are no real-time delays as the
749       archive is ``replayed'' as fast as possible.
750
751       In archive mode when a rule predicate evaluates true then the action is
752       modified, so that rather than posting to syslog or  raising  a  visible
753       alarm  or  running  a  shell  command  or sending a stomp message, pmie
754       prints the name of the action, the timestamp from the archive when  the
755       rule  predicate triggering the action was true and all of the arguments
756       that would have been passed to the real action in host mode.
757
758       For example, given the rule:
759            delta = 10 sec;
760            kernel.all.nprocs > 10 * hinv.ncpu -> print "lotsaprocs:" " %v";
761       when run against an archive, the output appears as:
762            print Mon Sep  4 00:10:21 2017: lotsaprocs: 1292
763            print Mon Sep  4 00:10:31 2017: lotsaprocs: 1294
764            print Mon Sep  4 00:10:41 2017: lotsaprocs: 1291
765            ...
766
767       The rationale is that the context in which the action would  have  been
768       executed (in host mode) was at a time in the past and the possibly on a
769       different host (if the archive was collected from one host, but pmie is
770       being  run  on  a  different host).  So flooding syslog with misleading
771       messages or an avalanche visual alarms or a lot of STOMP messages or  a
772       shell  command that might not even work on the host where pmie is being
773       run, are all examples of ``badness'' to be avoided.  Rather the  output
774       is  text  in a regular format suitable for post-processing with a range
775       of filters and performance analysis tools.
776
777       The output format can be changed using the -o option which consists  of
778       literal characters with the following embedded ``meta-field'' tokens:
779
780       %a  The name of the action, e.g.  print, syslog, etc.
781
782       %d  The  date  and  time  in ctime(3) format when the action would have
783           been executed.
784
785       %f  The name of the configuration file containing the action being exe‐
786           cuted, else <stdin> if the rules were read from standard input.
787
788       %l  The (approximate) line number in the configuration file for the ac‐
789           tion being executed.
790
791       %m  The message component of the action.
792
793       %u  The date and time when the action would have been executed  in  ex‐
794           tended ctime(3) format with microsecond precision for the time.
795
796       %%  A literal percent character.
797
798       The default output format is equivalent to a format of %a %d: %m.
799

SIGNALS

801       If  pmie  is sent a SIGHUP signal, the logfile will be closed, unlinked
802       and re-opened.  This is used by pmie_daily(1) to  achieve  nightly  log
803       rotation.
804
805       Most  of the time pmie is sleeping, waiting until the next set of rules
806       needs to be evaluated.  Sending pmie a SIGUSR1 signal  will  cause  the
807       details  for  the  next set of rules to be dumped on logfile, including
808       how long the current sleep is and how much time remains.  The  schedul‐
809       ing of rules is not changed by this action.
810

HOSTNAME CHANGES

812       The  hostname  of the PMCD that is providing metrics to pmie is used in
813       several ways.
814
815       PMCD's hostname is user internally to provide a value for the  %h  sub‐
816       stitutions in rule action strings.
817
818       For  pmie instances using a local PMCD that are launched and managed by
819       pmie_check(1) and pmie_daily(1), (or the systemd(1) or cron(8) services
820       that  use  these  scripts), the local hostname may also be used to con‐
821       struct the name of a directory where the pmie logs  for  one  host  are
822       stored, e.g. $PCP_LOG_DIR/pmie/<hostname>.
823
824       The hostname of the PMCD host may change during boot time when the sys‐
825       tem transitions from a temporary hostname to a persistent hostname,  or
826       by  explicit  administrative  action  anytime after the system has been
827       booted.  When this happens, pmie  may  need  to  take  special  action,
828       specifically  if  the  pmie instance was launched from pmie_check(1) or
829       pmie_daily(1), then pmie must exit.  Under  normal  circumstances  sys‐
830       temd(1)  or cron(8) will launch a new pmie shortly thereafter, and this
831       new pmie instance will be operating in the context of the new  hostname
832       for the host where PMCD is running.
833

BUGS

835       The  lexical  scanner and parser will attempt to recover after an error
836       in the input expressions.  Parsing resumes after skipping input  up  to
837       the next semi-colon (;), however during this skipping process the scan‐
838       ner is ignorant of comments and strings, so an embedded semi-colon  may
839       cause  parsing  to  resume  at  an  unexpected place.  This behavior is
840       largely benign, as until the initial syntax error  is  corrected,  pmie
841       will not attempt any expression evaluation.
842

FILES

844       $PCP_DEMOS_DIR/pmie/*
845            annotated example rules
846
847       $PCP_VAR_DIR/pmns/*
848            default PMNS specification files
849
850       $PCP_TMP_DIR/pmie
851            pmie  maintains  files  in  this directory to identify the running
852            pmie instances and to export runtime information  about  each  in‐
853            stance  -  this  data forms the basis of the pmcd.pmie performance
854            metrics
855
856       $PCP_PMIECONTROL_PATH
857            the default set of pmie instances to start at boot time - refer to
858            pmie_check(1) for details
859

PCP ENVIRONMENT

861       Environment variables with the prefix PCP_ are used to parameterize the
862       file and directory names used by PCP.  On each installation,  the  file
863       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
864       $PCP_CONF variable may be used to specify an alternative  configuration
865       file, as described in pcp.conf(5).
866
867       When  executing  shell  actions, pmie overrides two variables - IFS and
868       PATH - in the environment of the child process.  IFS is set to  "\t\n".
869       The  PATH  is  set to a combination of a default path for all platforms
870       ("/usr/sbin:/sbin:/usr/bin:/bin") and several configurable  components.
871       These are (in this order): $PCP_BIN_DIR, $PCP_BINADM_DIR and $PCP_PLAT‐
872       FORM_PATHS.
873
874       When executing  popup  alarm  actions,  pmie  will  use  the  value  of
875       $PCP_XCONFIRM_PROG  as the visual notification program to run.  This is
876       typically set to pmconfirm(1), a cross-platform dialog box.
877

UNIX SEE ALSO

879       logger(1).
880

WINDOWS SEE ALSO

882       pcp-eventlog(1).
883

SEE ALSO

885       PCPIntro(1),  pmcd(1),   pmconfirm(1),   pmdumplog(1),   pmie_check(1),
886       pmieconf(1),  pmie_daily(1),  pminfo(1),  pmlogger(1),  pmval(1),  sys‐
887       temd(1), ctime(3), PMAPI(3), pcp.conf(5), pcp.env(5) and PMNS(5).
888

USER GUIDE

890       For a more complete description of the pmie language, refer to the Per‐
891       formance  Co-Pilot  Users  and Administrators Guide.  This is available
892       online from:
893           https://pcp.readthedocs.io/en/latest/UAG/PerformanceMetricsInferenceEngine.html
894
895
896
897Performance Co-Pilot                  PCP                              PMIE(1)
Impressum