1PMIE(1) General Commands Manual PMIE(1)
2
3
4
6 pmie - inference engine for performance metrics
7
9 pmie [-bCdefHVvWxz] [-A align] [-a archive] [-c filename] [-h host] [-l
10 logfile] [-j stompfile] [-n pmnsfile] [-O offset] [-S starttime] [-T
11 endtime] [-t interval] [-Z timezone] [filename ...]
12
14 pmie accepts a collection of arithmetic, logical, and rule expressions
15 to be evaluated at specified frequencies. The base data for the
16 expressions consists of performance metrics values delivered in real-
17 time from any host running the Performance Metrics Collection Daemon
18 (PMCD), or using historical data from Performance Co-Pilot (PCP) ar‐
19 chive logs.
20
21 As well as computing arithmetic and logical values, pmie can execute
22 actions (popup alarms, write system log messages, and launch programs)
23 in response to specified conditions. Such actions are extremely useful
24 in detecting, monitoring and correcting performance related problems.
25
26 The expressions to be evaluated are read from configuration files spec‐
27 ified by one or more filename arguments. In the absence of any file‐
28 name, expressions are read from standard input.
29
30 A description of the command line options specific to pmie follows:
31
32 -a archive is the base name of a PCP archive log written by pmlog‐
33 ger(1). Multiple instances of the -a flag may appear on the com‐
34 mand line to specify a set of archives. In this case, it is
35 required that only one archive be present for any one host. Also,
36 any explicit host names occurring in a pmie expression must match
37 the host name recorded in one of the archive labels. In the case
38 of multiple archives, timestamps recorded in the archives are used
39 to ensure temporal consistency.
40
41 -b Output will be line buffered and standard output is attached to
42 standard error. This is most useful for background execution in
43 conjunction with the -l option. The -b option is always used for
44 pmie instances launched from pmie_check(1).
45
46 -C Parse the configuration file(s) and exit before performing any
47 evaluations. Any errors in the configuration file are reported.
48
49 -c An alternative to specifying filename at the end of the command
50 line.
51
52 -d Normally pmie would be launched as a non-interactive process to
53 monitor and manage the performance of one or more hosts. Given
54 the -d flag however, execution is interactive and the user is pre‐
55 sented with a menu of options. Interactive mode is useful mainly
56 for debugging new expressions.
57
58 -e When used with -V, -v or -W, this option forces timestamps to be
59 reported with each expression. The timestamps are in ctime(3)
60 format, enclosed in parenthesis and appear after the expression
61 name and before the expression value, e.g.
62 expr_1 (Tue Feb 6 19:55:10 2001): 12
63
64 -f If the -l option is specified and there is no -a option (ie. real-
65 time monitoring) then pmie is run as a daemon in the background
66 (in all other cases foreground is the default). The -f option
67 forces pmie to be run in the foreground, independent of any other
68 options.
69
70 -H The default hostname written to the stats file will not be looked
71 up via gethostbyname(3), rather it will be written as-is. This
72 option can be useful when host name aliases are in use at a site,
73 and the logical name is more important than the physical host
74 name.
75
76 -h By default performance data is fetched from the local host (in
77 real-time mode) or the host for the first named archive on the
78 command line (in archive mode). The host argument overrides this
79 default. It does not override hosts explicitly named in the
80 expressions being evaluated.
81
82 -l Standard error is sent to logfile.
83
84 -j An alternative STOMP protocol configuration is loaded from stomp‐
85 file. If this option is not used, and the stomp action is used in
86 any rule, the default location $PCP_VAR_DIR/pmie/config/stomp will
87 be used.
88
89 -n An alternative Performance Metrics Name Space (PMNS) is loaded
90 from the file pmnsfile.
91
92 -t The interval argument follows the syntax described in PCPIntro(1),
93 and in the simplest form may be an unsigned integer (the implied
94 units in this case are seconds). The value is used to determine
95 the sample interval for expressions that do not explicitly set
96 their sample interval using the pmie variable delta described
97 below. The default is 10.0 seconds.
98
99 -v Unless one of the verbose options -V, -v or -W appears on the com‐
100 mand line, expressions are evaluated silently, the only output is
101 as a result of any actions being executed. In the verbose mode,
102 specified using the -v flag, the value of each expression is
103 printed as it is evaluated. The values are in canonical units;
104 bytes in the dimension of ``space'', seconds in the dimension of
105 ``time'' and events in the dimension of ``count''. See
106 pmLookupDesc(3) for details of the supported dimension and scaling
107 mechanisms for performance metrics. The verbose mode is useful in
108 monitoring the value of given expressions, evaluating derived per‐
109 formance metrics, passing these values on to other tools for fur‐
110 ther processing and in debugging new expressions.
111
112 -V This option has the same effect as the -v option, except that the
113 name of the host and instance (if applicable) are printed as well
114 as expression values.
115
116 -W This option has the same effect as the -V option described above,
117 except that for boolean expressions, only those names and values
118 that make the expression true are printed. These are the same
119 names and values accessible to rule actions as the %h, %i and %v
120 bindings, as described below.
121
122 -x Execute in domain agent mode. This mode is used within the Per‐
123 formance Co-Pilot product to derive values for summary metrics,
124 see pmdasummary(1). Only restricted functionality is available in
125 this mode (expressions with actions may not be used).
126
127 -Z Change the reporting timezone to timezone in the format of the
128 environment variable TZ as described in environ(5).
129
130 -z Change the reporting timezone to the timezone of the host that is
131 the source of the performance metrics, as identified via either
132 the -h option or the first named archive (as described above for
133 the -a option).
134
135 The -S, -T, -O, and -A options may be used to define a time window to
136 restrict the samples retrieved, set an initial origin within the time
137 window, or specify a ``natural'' alignment of the sample times; refer
138 to PCPIntro(1) for a complete description of these options.
139
140 Output from pmie is directed to standard output and standard error as
141 follows:
142
143 stdout
144 Expression values printed in the verbose -v mode and the output of
145 print actions.
146
147 stderr
148 Error and warning messages for any syntactic or semantic problems
149 during expression parsing, and any semantic or performance metrics
150 availability problems during expression evaluation.
151
153 The following example expressions demonstrate some of the capabilities
154 of the inference engine.
155
156 The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated
157 examples of pmie expressions.
158
159 The variable delta controls expression evaluation frequency. Specify
160 that subsequent expressions be evaluated once a second, until further
161 notice:
162
163 delta = 1 sec;
164
165 If total syscall rate exceeds 5000 per second per CPU, then display an
166 alarm notifier:
167
168 kernel.all.syscall / hinv.ncpu > 5000 count/sec
169 -> alarm "high syscall rate";
170
171 If the high syscall rate is sustained for 10 consecutive samples, then
172 launch top(1) in an xwsh(1G) window to monitor processes, but do this
173 at most once every 5 minutes:
174
175 all_sample (
176 kernel.all.syscall @0..9 > 5000 count/sec * hinv.ncpu
177 ) -> shell 5 min "xwsh -e 'top'";
178
179 The following rules are evaluated once every 20 seconds:
180
181 delta = 20 sec;
182
183 If any disk is performing more than 60 I/Os per second, then print a
184 message identifying the busy disk to standard output and launch
185 dkvis(1):
186
187 some_inst (
188 disk.dev.total > 60 count/sec
189 ) -> print "disk %i busy " &
190 shell 5 min "dkvis";
191
192 Refine the preceding rule to apply only between the hours of 9am and
193 5pm, and to require 3 of 4 consecutive samples to exceed the threshold
194 before executing the action:
195
196 $hour >= 9 && $hour <= 17 &&
197 some_inst (
198 75 %_sample (
199 disk.dev.total @0..3 > 60 count/sec
200 )
201 ) -> print "disk %i busy ";
202
203 The following rules are evaluated once every 10 minutes:
204
205 delta = 10 min;
206
207 If either the / or the /usr filesystem is more than 95% full, display
208 an alarm popup, but not if it has already been displayed during the
209 last 4 hours:
210
211 filesys.free #'/dev/root' /
212 filesys.capacity #'/dev/root' < 0.05
213 -> alarm 4 hour "root filesystem (almost) full";
214
215 filesys.free #'/dev/usr' /
216 filesys.capacity #'/dev/usr' < 0.05
217 -> alarm 4 hour "/usr filesystem (almost) full";
218
219 The following rule requires a machine that supports the PCP environment
220 metrics. If the machine environment temperature rises more than 2
221 degrees over a 10 minute interval, write an entry in the system log:
222
223 environ.temp @0 - environ.temp @1 > 2
224 -> alarm "temperature rising fast" &
225 syslog "machine room temperature rise alarm";
226
227 And last, something interesting if you have performance problems with
228 your Oracle database:
229
230 db = "oracle.ptg1";
231 host = ":moomba.melbourne.sgi.com";
232 lru = "#'cache buffers lru chain'";
233 gets = "$db.latch.gets $host $lru";
234 total = "$db.latch.gets $host $lru +
235 $db.latch.misses $host $lru +
236 $db.latch.immisses $host $lru";
237
238 $total > 100 && $gets / $total < 0.2
239 -> alarm "high lru latch contention";
240
242 The pmie specification language is powerful and large.
243
244 To expedite rapid development of pmie rules, the pmieconf(1) tool pro‐
245 vides a facility for generating a pmie configuration file from a set of
246 generalized pmie rules. The supplied set of rules covers a wide range
247 of performance scenarios.
248
249 The pmrules(1) tool provides a GUI-based facility for generating pmie
250 rules from parametrized templates. The supplied templates cover a wide
251 range of performance scenarios.
252
253 The development efforts of the PCP engineering team are focused on
254 pmieconf rather than pmrules, and thus pmieconf is the recommended tool
255 for quickly deploying useful pmie rules.
256
257 The Performance Co-Pilot User's and Administrator's Guide provides a
258 detailed tutorial-style chapter covering pmie.
259
261 This description is terse and informal. For a more comprehensive
262 description see the Performance Co-Pilot User's and Administrator's
263 Guide.
264
265 A pmie specification is a sequence of semicolon terminated expressions.
266
267 Basic operators are modeled on the arithmetic, relational and Boolean
268 operators of the C programming language. Precedence rules are as
269 expected, although the use of parentheses is encouraged to enhance
270 readability and remove ambiguity.
271
272 Operands are performance metric names (see pmns(4)) and the normal lit‐
273 eral constants.
274
275 Operands involving performance metrics may produce sets of values, as a
276 result of enumeration in the dimensions of hosts, instances and time.
277 Special qualifiers may appear after a performance metric name to define
278 the enumeration in each dimension. For example,
279
280 kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
281
282 defines 6 values corresponding to the time spent executing in user mode
283 on CPU 0 on the hosts ``foo'' and ``bar'' over the last 3 consecutive
284 samples. The default interpretation in the absence of : (host), #
285 (instance) and @ (time) qualifiers is all instances at the most recent
286 sample time for the default source of PCP performance metrics.
287
288 Host and instance names that do not follow the rules for variables in
289 programming languages, ie. alphabetic optionally followed by alphanu‐
290 merics, should be enclosed in single quotes.
291
292 Expression evaluation follows the law of ``least surprises''. Where
293 performance metrics have the semantics of a counter, pmie will automat‐
294 ically convert to a rate based upon consecutive samples and the time
295 interval between these samples. All expressions are evaluated in dou‐
296 ble precision, and where appropriate, automatically scaled into canoni‐
297 cal units of ``bytes'', ``seconds'' and ``counts''.
298
299 A rule is a special form of expression that specifies a condition or
300 logical expression, a special operator (->) and actions to be performed
301 when the condition is found to be true.
302
303 The following table summarizes the basic pmie operators:
304
305 ┌────────────────┬────────────────────────────────────────────┐
306 │ Operators │ Explanation │
307 ├────────────────┼────────────────────────────────────────────┤
308 │+ - * / │ Arithmetic │
309 │< <= == >= > != │ Relational (value comparison) │
310 │! && || │ Boolean │
311 │-> │ Rule │
312 │rising │ Boolean, false to true transition │
313 │falling │ Boolean, true to false transition │
314 │rate │ Explicit rate conversion (rarely required) │
315 └────────────────┴────────────────────────────────────────────┘
316 Aggregate operators may be used to aggregate or summarize along one
317 dimension of a set-valued expression. The following aggregate opera‐
318 tors map from a logical expression to a logical expression of lower
319 dimension.
320
321 ┌─────────────────────────┬─────────────┬──────────────────────────┐
322 │ Operators │ Type │ Explanation │
323 ├─────────────────────────┼─────────────┼──────────────────────────┤
324 │some_inst │ Existential │ True if at least one set │
325 │some_host │ │ member is true in the │
326 │some_sample │ │ associated dimension │
327 ├─────────────────────────┼─────────────┼──────────────────────────┤
328 │all_inst │ Universal │ True if all set members │
329 │all_host │ │ are true in the associ‐ │
330 │all_sample │ │ ated dimension │
331 ├─────────────────────────┼─────────────┼──────────────────────────┤
332 │N%_inst │ Percentile │ True if at least N per‐ │
333 │N%_host │ │ cent of set members are │
334 │N%_sample │ │ true in the associated │
335 │ │ │ dimension │
336 └─────────────────────────┴─────────────┴──────────────────────────┘
337 The following instantial operators may be used to filter or limit a
338 set-valued logical expression, based on regular expression matching of
339 instance names. The logical expression must be a set involving the
340 dimension of instances, and the regular expression is of the form used
341 by egrep(1) or the Extended Regular Expressions of regcomp(3G).
342
343 ┌─────────────┬──────────────────────────────────────────┐
344 │ Operators │ Explanation │
345 ├─────────────┼──────────────────────────────────────────┤
346 │match_inst │ For each value of the logical expression │
347 │ │ that is ``true'', the result is ``true'' │
348 │ │ if the associated instance name matches │
349 │ │ the regular expression. Otherwise the │
350 │ │ result is ``false''. │
351 ├─────────────┼──────────────────────────────────────────┤
352 │nomatch_inst │ For each value of the logical expression │
353 │ │ that is ``true'', the result is ``true'' │
354 │ │ if the associated instance name does not │
355 │ │ match the regular expression. Otherwise │
356 │ │ the result is ``false''. │
357 └─────────────┴──────────────────────────────────────────┘
358 For example, the expression below will be ``true'' for disks attached
359 to controllers 2 or 3 performing more than 20 operations per second:
360 match_inst "^dks[23]d" disk.dev.total > 20;
361
362 The following aggregate operators map from an arithmetic expression to
363 an arithmetic expression of lower dimension.
364
365 ┌─────────────────────────┬───────────┬──────────────────────────┐
366 │ Operators │ Type │ Explanation │
367 ├─────────────────────────┼───────────┼──────────────────────────┤
368 │min_inst │ Extrema │ Minimum value across all │
369 │min_host │ │ set members in the asso‐ │
370 │min_sample │ │ ciated dimension │
371 ├─────────────────────────┼───────────┼──────────────────────────┤
372 │max_inst │ Extrema │ Maximum value across all │
373 │max_host │ │ set members in the asso‐ │
374 │max_sample │ │ ciated dimension │
375 ├─────────────────────────┼───────────┼──────────────────────────┤
376 │sum_inst │ Aggregate │ Sum of values across all │
377 │sum_host │ │ set members in the asso‐ │
378 │sum_sample │ │ ciated dimension │
379 ├─────────────────────────┼───────────┼──────────────────────────┤
380 │avg_inst │ Aggregate │ Average value across all │
381 │avg_host │ │ set members in the asso‐ │
382 │avg_sample │ │ ciated dimension │
383 └─────────────────────────┴───────────┴──────────────────────────┘
384 The aggregate operators count_inst, count_host and count_sample map
385 from a logical expression to an arithmetic expression of lower dimen‐
386 sion by counting the number of set members for which the expression is
387 true in the associated dimension.
388
389 For action rules, the following actions are defined:
390
391 ┌──────────┬────────────────────────────────────────┐
392 │Operators │ Explanation │
393 ├──────────┼────────────────────────────────────────┤
394 │alarm │ Raise a visible alarm with xconfirm(1) │
395 │print │ Display on standard output │
396 │shell │ Execute with sh(1) │
397 │stomp │ Send a STOMP message to a JMS server │
398 │syslog │ Append a message to system log file │
399 └──────────┴────────────────────────────────────────┘
400 Multiple actions may be separated by the & and | operators to specify
401 respectively sequential execution (both actions are executed) and
402 alternate execution (the second action will only be executed if the
403 execution of the first action returns a non-zero error status.
404
405 Arguments to actions are an optional suppression time, and then one or
406 more expressions (a string is an expression in this context). Strings
407 appearing as arguments to an action may include the following special
408 selectors that will be replaced at the time the action is executed.
409
410 %h Host(s) that make the left-most top-level expression in the condi‐
411 tion true.
412
413 %i Instance(s) that make the left-most top-level expression in the
414 condition true.
415
416 %v Values(s) from the left-most top-level expression in the condition
417 subject to the host and instance assignments that make the condi‐
418 tion true.
419
420 Note that expansion of the special selectors is done by repeating the
421 whole argument once for each unique binding to any of the qualifying
422 special selectors. For example if a rule were true for the host mumble
423 with instances grunt and snort, and for host fumble the instance puff
424 makes the rule true, then the action
425 ...
426 -> shell myscript "Warning: %h-%i busy ";
427 will execute myscript with the argument string "Warning: mumble-grunt
428 busy Warning: mumble-snort busy Warning: fumble-puff busy".
429
430 By comparison, if the action
431 ...
432 -> shell myscript "'Warning! busy:" " %i@%h" "'";
433 were executed under the same circumstances, then myscript would be exe‐
434 cuted with the argument string '"Warning! busy: grunt@mumble snort@mum‐
435 ble puff@fumble"'.
436
437 The semantics of the expansion of the special selectors leads to a com‐
438 mon usage, where one argument is a constant (contains no special selec‐
439 tors) the second argument contains the desired special selectors with
440 minimal separator characters, and an optional third argument provides a
441 constant postscript (e.g. to terminate any argument quoting from the
442 first argument). If necessary post-processing (eg. in myscript) can
443 provide the necessary enumeration over each unique expansion of the
444 string containing just the special selectors.
445
446 For complex conditions, the bindings to these selectors is not obvious.
447 It is strongly recommended that pmie be used in the debugging mode
448 (specify the -W command line option in particular) during rule develop‐
449 ment.
450
452 Scale factors may be appended to arithmetic expressions and force lin‐
453 ear scaling of the value to canonical units. Simple scale factors are
454 constructed from the keywords: nanosecond, nanosec, nsec, microsecond,
455 microsec, usec, millisecond, millisec, msec, second, sec, minute, min,
456 hour, byte, Kbyte, Mbyte, Gbyte, Tbyte, count, Kcount, Mcount, Gcount
457 and Tcount, and the operator /, for example ``Kbytes / hour''.
458
460 Macros are defined using expressions of the form:
461
462 name = constexpr;
463
464 Where name follows the normal rules for variables in programming lan‐
465 guages, ie. alphabetic optionally followed by alphanumerics. constexpr
466 must be a constant expression, either a string (enclosed in double
467 quotes) or an arithmetic expression optionally followed by a scale fac‐
468 tor.
469
470 Macros are expanded when their name, prefixed by a dollar ($) appears
471 in an expression, and macros may be nested within a constexpr string.
472
473 The following reserved macro names are understood.
474
475 minute Current minute of the hour.
476
477 hour Current hour of the day, in the range 0 to 23.
478
479 day Current day of the month, in the range 1 to 31.
480
481 month Current month of the year, in the range 0 (January) to 11
482 (December).
483
484 year Current year.
485
486 day_of_week
487 Current day of the week, in the range 0 (Sunday) to 6 (Satur‐
488 day).
489
490 delta Sample interval in effect for this expression.
491
492 Dates and times are presented in the reporting time zone (see descrip‐
493 tion of -Z and -z command line options above).
494
496 It is often useful for pmie processes to be started and stopped when
497 the local host is booted or shutdown, or when they have been detected
498 as no longer running (when they have unexpectedly exited for some rea‐
499 son). Refer to pmie_check(1) for details on automating this process.
500
502 It is common for production systems to be monitored in a central loca‐
503 tion. Traditionally on UNIX systems this has been performed by the
504 system log facilities - see logger(1), and syslogd(1). On Windows,
505 communication with the system event log is handled by pcp-eventlog(1).
506
507 pmie fits into this model when rules use the syslog action. Note that
508 if the action string begins with -p (priority) and/or -t (tag) then
509 these are extracted from the string and treated in the same way as in
510 logger(1) and pcp-eventlog(1).
511
512 However, it is common to have other event monitoring frameworks also,
513 into which you may wish to incorporate performance events from pmie.
514 You can often use the shell action to send events to these frameworks,
515 as they usually provide their a program for injecting events into the
516 framework from external sources.
517
518 A final option is use of the stomp (Streaming Text Oriented Messaging
519 Protocol) action, which allows pmie to connect to a central JMS (Java
520 Messaging System) server and send events to the PMIE topic. Tools can
521 be written to extract these text messages and present them to opera‐
522 tions people (via desktop popup windows, etc). Use of the stomp action
523 requires a stomp configuration file to be setup, which specifies the
524 location of the JMS server host, port number, and username/password.
525
526 The format of this file is as follows:
527
528 host=messages.sgi.com # this is the JMS server (required)
529 port=61616 # and its listening here (required)
530 timeout=2 # seconds to wait for server (optional)
531 username=joe # (required)
532 password=j03ST0MP # (required)
533 topic=PMIE # JMS topic for pmie messages (optional)
534
535 The timeout value specifies the time (in seconds) that pmie should wait
536 for acknowledgements from the JMS server after sending a message (as
537 required by the STOMP protocol). Note that on startup, pmie will wait
538 indefinately for a connection, and will not begin rule evaluation until
539 that initial connection has been established. Should the connection to
540 the JMS server be lost at any time while pmie is running, pmie will
541 attempt to reconnect on each subsequent truthful evaluation of a rule
542 with a stomp action, but not more than once per minute. This is to
543 avoid contributing to network congestion. In this situation, where the
544 STOMP connection to the JMS server has been severed, the stomp action
545 will return a non-zero error value.
546
548 $PCP_DEMOS_DIR/pmie/*
549 annotated example rules
550 $PCP_VAR_DIR/pmns/*
551 default PMNS specification files
552 $PCP_TMP_DIR/pmie
553 pmie maintains files in this directory to identify the run‐
554 ning pmie instances and to export runtime information about
555 each instance - this data forms the basis of the pmcd.pmie
556 performance metrics
557 $PCP_PMIECONTROL_PATH
558 the default set of pmie instances to start at boot time -
559 refer to pmie_check(1) for details
560 $PCP_VAR_DIR/config/pmie/*
561 the predefined alarm action scripts (email, log, popup and
562 syslog), the example action script (sample)and the concurrent
563 action control file (control.master, see also pmrules(1)).
564 /usr/pcp/lib/pmie-common
565 common shell procedures for the predefined alarm action
566 scripts
567
569 The lexical scanner and parser will attempt to recover after an error
570 in the input expressions. Parsing resumes after skipping input up to
571 the next semi-colon (;), however during this skipping process the scan‐
572 ner is ignorant of comments and strings, so an embedded semi-colon may
573 cause parsing to resume at an unexpected place. This behavior is
574 largely benign, as until the initial syntax error is corrected, pmie
575 will not attempt any expression evaluation.
576
578 Environment variables with the prefix PCP_ are used to parameterize the
579 file and directory names used by PCP. On each installation, the file
580 /etc/pcp.conf contains the local values for these variables. The
581 $PCP_CONF variable may be used to specify an alternative configuration
582 file, as described in pcp.conf(4).
583
585 logger(1).
586
588 pcp-eventlog(1).
589
591 PCPIntro(1), pmcd(1), pmdumplog(1), pmieconf(1), pmie_check(1),
592 pminfo(1), pmlogger(1), pmval(1), PMAPI(3), pcp.conf(4) and pcp.env(4).
593
595 For a more complete description of the pmie language, refer to the Per‐
596 formance Co-Pilot Users and Administrators Guide. This is distributed
597 in insight(1) format as part of the pcp.books subsystem, or in HTML
598 format from:
599 http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?\
600 db=bks&fname=/SGI_Admin/books/PCP_IRIX/sgi_html/ch05.html
601
602
603
604Performance Co-Pilot SGI PMIE(1)