1check_postgres(3) User Contributed Perl Documentation check_postgres(3)
2
3
4
6 check_postgres.pl - a Postgres monitoring script for Nagios, MRTG,
7 Cacti, and others
8
9 This documents describes check_postgres.pl version 2.18.0
10
12 ## Create all symlinks
13 check_postgres.pl --symlinks
14
15 ## Check connection to Postgres database 'pluto':
16 check_postgres.pl --action=connection --db=pluto
17
18 ## Same things, but using the symlink
19 check_postgres_connection --db=pluto
20
21 ## Warn if > 100 locks, critical if > 200, or > 20 exclusive
22 check_postgres_locks --warning=100 --critical="total=200;exclusive=20"
23
24 ## Show the current number of idle connections on port 6543:
25 check_postgres_txn_idle --port=6543 --output=simple
26
27 ## There are many other actions and options, please keep reading.
28
29 The latest news and documentation can always be found at:
30 http://bucardo.org/check_postgres/
31
33 check_postgres.pl is a Perl script that runs many different tests
34 against one or more Postgres databases. It uses the psql program to
35 gather the information, and outputs the results in one of three
36 formats: Nagios, MRTG, or simple.
37
38 Output Modes
39 The output can be changed by use of the "--output" option. The default
40 output is nagios, although this can be changed at the top of the script
41 if you wish. The current option choices are nagios, mrtg, and simple.
42 To avoid having to enter the output argument each time, the type of
43 output is automatically set if no --output argument is given, and if
44 the current directory has one of the output options in its name. For
45 example, creating a directory named mrtg and populating it with
46 symlinks via the --symlinks argument would ensure that any actions run
47 from that directory will always default to an output of "mrtg" As a
48 shortcut for --output=simple, you can enter --simple, which also
49 overrides the directory naming trick.
50
51 Nagios output
52
53 The default output format is for Nagios, which is a single line of
54 information, along with four specific exit codes:
55
56 0 (OK)
57 1 (WARNING)
58 2 (CRITICAL)
59 3 (UNKNOWN)
60
61 The output line is one of the words above, a colon, and then a short
62 description of what was measured. Additional statistics information, as
63 well as the total time the command took, can be output as well: see the
64 documentation on the arguments --showperf, --perflimit, and --showtime.
65
66 MRTG output
67
68 The MRTG output is four lines, with the first line always giving a
69 single number of importance. When possible, this number represents an
70 actual value such as a number of bytes, but it may also be a 1 or a 0
71 for actions that only return "true" or "false", such as
72 check_postgres_version. The second line is an additional stat and is
73 only used for some actions. The third line indicates an "uptime" and is
74 not used. The fourth line is a description and usually indicates the
75 name of the database the stat from the first line was pulled from, but
76 may be different depending on the action.
77
78 Some actions accept an optional --mrtg argument to further control the
79 output.
80
81 See the documentation on each action for details on the exact MRTG
82 output for each one.
83
84 Simple output
85
86 The simple output is simply a truncated version of the MRTG one, and
87 simply returns the first number and nothing else. This is very useful
88 when you just want to check the state of something, regardless of any
89 threshold. You can transform the numeric output by appending KB, MB,
90 GB, TB, or EB to the output argument, for example:
91
92 --output=simple,MB
93
94 Cacti output
95
96 The Cacti output consists of one or more items on the same line, with a
97 simple name, a colon, and then a number. At the moment, the only action
98 with explicit Cacti output is 'dbstats', and using the --output option
99 is not needed in this case, as Cacti is the only output for this
100 action. For many other actions, using --simple is enough to make Cacti
101 happy.
102
104 All actions accept a common set of database options.
105
106 -H NAME or --host=NAME
107 Connect to the host indicated by NAME. Can be a comma-separated
108 list of names. Multiple host arguments are allowed. If no host is
109 given, defaults to the "PGHOST" environment variable or no host at
110 all (which indicates using a local Unix socket). You may also use
111 "--dbhost".
112
113 -p PORT or --port=PORT
114 Connects using the specified PORT number. Can be a comma-separated
115 list of port numbers, and multiple port arguments are allowed. If
116 no port number is given, defaults to the "PGPORT" environment
117 variable. If that is not set, it defaults to 5432. You may also use
118 "--dbport"
119
120 -db NAME or --dbname=NAME
121 Specifies which database to connect to. Can be a comma-separated
122 list of names, and multiple dbname arguments are allowed. If no
123 dbname option is provided, defaults to the "PGDATABASE" environment
124 variable. If that is not set, it defaults to 'postgres' if psql is
125 version 8 or greater, and 'template1' otherwise.
126
127 -u USERNAME or --dbuser=USERNAME
128 The name of the database user to connect as. Can be a comma-
129 separated list of usernames, and multiple dbuser arguments are
130 allowed. If this is not provided, it defaults to the "PGUSER"
131 environment variable, otherwise it defaults to 'postgres'.
132
133 --dbpass=PASSWORD
134 Provides the password to connect to the database with. Use of this
135 option is highly discouraged. Instead, one should use a .pgpass or
136 pg_service.conf file.
137
138 --dbservice=NAME
139 The name of a service inside of the pg_service.conf file. This file
140 is in your home directory by default and contains a simple list of
141 connection options. You can also pass additional information when
142 using this option such as --dbservice="maindatabase
143 sslmode=require"
144
145 The database connection options can be grouped: --host=a,b --host=c
146 --port=1234 --port=3344 would connect to a-1234, b-1234, and c-3344.
147 Note that once set, an option carries over until it is changed again.
148
149 Examples:
150
151 --host=a,b --port=5433 --db=c
152 Connects twice to port 5433, using database c, to hosts a and b: a-5433-c b-5433-c
153
154 --host=a,b --port=5433 --db=c,d
155 Connects four times: a-5433-c a-5433-d b-5433-c b-5433-d
156
157 --host=a,b --host=foo --port=1234 --port=5433 --db=e,f
158 Connects six times: a-1234-e a-1234-f b-1234-e b-1234-f foo-5433-e foo-5433-f
159
160 --host=a,b --host=x --port=5432,5433 --dbuser=alice --dbuser=bob -db=baz
161 Connects three times: a-5432-alice-baz b-5433-alice-baz x-5433-bob-baz
162
163 --dbservice="foo" --port=5433
164 Connects using the named service 'foo' in the pg_service.conf file, but overrides the port
165
167 Other options include:
168
169 --action=NAME
170 States what action we are running. Required unless using a
171 symlinked file, in which case the name of the file is used to
172 figure out the action.
173
174 --warning=VAL or -w VAL
175 Sets the threshold at which a warning alert is fired. The valid
176 options for this option depends on the action used.
177
178 --critical=VAL or -c VAL
179 Sets the threshold at which a critical alert is fired. The valid
180 options for this option depends on the action used.
181
182 -t VAL or --timeout=VAL
183 Sets the timeout in seconds after which the script will abort
184 whatever it is doing and return an UNKNOWN status. The timeout is
185 per Postgres cluster, not for the entire script. The default value
186 is 10; the units are always in seconds.
187
188 --assume-standby-mode
189 If specified, first the check if server in standby mode will be
190 performed (--datadir is required), if so, all checks that require
191 SQL queries will be ignored and "Server in standby mode" with OK
192 status will be returned instead.
193
194 Example:
195
196 postgres@db$./check_postgres.pl --action=version --warning=8.1 --datadir /var/lib/postgresql/8.3/main/ --assume-standby-mode
197 POSTGRES_VERSION OK: Server in standby mode | time=0.00
198
199 -h or --help
200 Displays a help screen with a summary of all actions and options.
201
202 --man
203 Displays the entire manual.
204
205 -V or --version
206 Shows the current version.
207
208 -v or --verbose
209 Set the verbosity level. Can call more than once to boost the
210 level. Setting it to three or higher (in other words, issuing "-v
211 -v -v") turns on debugging information for this program which is
212 sent to stderr.
213
214 --showperf=VAL
215 Determines if we output additional performance data in standard
216 Nagios format (at end of string, after a pipe symbol, using
217 name=value). VAL should be 0 or 1. The default is 1. Only takes
218 effect if using Nagios output mode.
219
220 --perflimit=i
221 Sets a limit as to how many items of interest are reported back
222 when using the showperf option. This only has an effect for actions
223 that return a large number of items, such as table_size. The
224 default is 0, or no limit. Be careful when using this with the
225 --include or --exclude options, as those restrictions are done
226 after the query has been run, and thus your limit may not include
227 the items you want. Only takes effect if using Nagios output mode.
228
229 --showtime=VAL
230 Determines if the time taken to run each query is shown in the
231 output. VAL should be 0 or 1. The default is 1. No effect unless
232 showperf is on. Only takes effect if using Nagios output mode.
233
234 --test
235 Enables test mode. See the "TEST MODE" section below.
236
237 --PSQL=PATH
238 Tells the script where to find the psql program. Useful if you have
239 more than one version of the psql executable on your system, or if
240 there is no psql program in your path. Note that this option is in
241 all uppercase. By default, this option is not allowed. To enable
242 it, you must change the $NO_PSQL_OPTION near the top of the script
243 to 0. Avoid using this option if you can, and instead hard-code
244 your psql location into the $PSQL variable, also near the top of
245 the script.
246
247 --symlinks
248 Creates symlinks to the main program for each action.
249
250 --output=VAL
251 Determines the format of the output, for use in various programs.
252 The default is 'nagios'. Available options are 'nagios', 'mrtg',
253 'simple' and 'cacti'.
254
255 --mrtg=VAL
256 Used only for the MRTG or simple output, for a few specific
257 actions.
258
259 --debugoutput=VAL
260 Outputs the exact string returned by psql, for use in debugging.
261 The value is one or more letters, which determine if the output is
262 displayed or not, where 'a' = all, 'c' = critical, 'w' = warning,
263 'o' = ok, and 'u' = unknown. Letters can be combined.
264
265 --get_method=VAL
266 Allows specification of the method used to fetch information for
267 the "new_version_cp", "new_version_pg", "new_version_bc",
268 "new_version_box", and "new_version_tnm" checks. The following
269 programs are tried, in order, to grab the information from the web:
270 GET, wget, fetch, curl, lynx, links. To force the use of just one
271 (and thus remove the overhead of trying all the others until one of
272 those works), enter one of the names as the argument to get_method.
273 For example, a BSD box might enter the following line in their
274 ".check_postgresrc" file:
275
276 get_method=fetch
277
278 --language=VAL
279 Set the language to use for all output messages. Normally, this is
280 detected by examining the environment variables LC_ALL,
281 LC_MESSAGES, and LANG, but setting this option will override any
282 such detection.
283
285 The script runs one or more actions. This can either be done with the
286 --action flag, or by using a symlink to the main file that contains the
287 name of the action inside of it. For example, to run the action
288 "timesync", you may either issue:
289
290 check_postgres.pl --action=timesync
291
292 or use a program named:
293
294 check_postgres_timesync
295
296 All the symlinks are created for you in the current directory if use
297 the option --symlinks
298
299 perl check_postgres.pl --symlinks
300
301 If the file name already exists, it will not be overwritten. If the
302 file exists and is a symlink, you can force it to overwrite by using
303 "--action=build_symlinks_force"
304
305 Most actions take a --warning and a --critical option, indicating at
306 what point we change from OK to WARNING, and what point we go to
307 CRITICAL. Note that because criticals are always checked first, setting
308 the warning equal to the critical is an effective way to turn warnings
309 off and always give a critical.
310
311 The current supported actions are:
312
313 archive_ready
314 ("symlink: check_postgres_archive_ready") Checks how many WAL files
315 with extension .ready exist in the pg_xlog/archive_status directory,
316 which is found off of your data_directory. This action must be run as a
317 superuser, in order to access the contents of the
318 pg_xlog/archive_status directory. The minimum version to use this
319 action is Postgres 8.1. The --warning and --critical options are simply
320 the number of .ready files in the pg_xlog/archive_status directory.
321 Usually, these values should be low, turning on the archive mechanism,
322 we usually want it to archive WAL files as fast as possible.
323
324 If the archive command fail, number of WAL in your pg_xlog directory
325 will grow until exhausting all the disk space and force PostgreSQL to
326 stop immediately.
327
328 Example 1: Check that the number of ready WAL files is 10 or less on
329 host "pluto"
330
331 check_postgres_archive_ready --host=pluto --critical=10
332
333 For MRTG output, reports the number of ready WAL files on line 1.
334
335 autovac_freeze
336 ("symlink: check_postgres_autovac_freeze") Checks how close each
337 database is to the Postgres autovacuum_freeze_max_age setting. This
338 action will only work for databases version 8.2 or higher. The
339 --warning and --critical options should be expressed as percentages.
340 The 'age' of the transactions in each database is compared to the
341 autovacuum_freeze_max_age setting (200 million by default) to generate
342 a rounded percentage. The default values are 90% for the warning and
343 95% for the critical. Databases can be filtered by use of the --include
344 and --exclude options. See the "BASIC FILTERING" section for more
345 details.
346
347 Example 1: Give a warning when any databases on port 5432 are above 97%
348
349 check_postgres_autovac_freeze --port=5432 --warning="97%"
350
351 For MRTG output, the highest overall percentage is reported on the
352 first line, and the highest age is reported on the second line. All
353 databases which have the percentage from the first line are reported on
354 the fourth line, separated by a pipe symbol.
355
356 backends
357 ("symlink: check_postgres_backends") Checks the current number of
358 connections for one or more databases, and optionally compares it to
359 the maximum allowed, which is determined by the Postgres configuration
360 variable max_connections. The --warning and --critical options can take
361 one of three forms. First, a simple number can be given, which
362 represents the number of connections at which the alert will be given.
363 This choice does not use the max_connections setting. Second, the
364 percentage of available connections can be given. Third, a negative
365 number can be given which represents the number of connections left
366 until max_connections is reached. The default values for --warning and
367 --critical are '90%' and '95%'. You can also filter the databases by
368 use of the --include and --exclude options. See the "BASIC FILTERING"
369 section for more details.
370
371 To view only non-idle processes, you can use the --noidle argument.
372 Note that the user you are connecting as must be a superuser for this
373 to work properly.
374
375 Example 1: Give a warning when the number of connections on host quirm
376 reaches 120, and a critical if it reaches 150.
377
378 check_postgres_backends --host=quirm --warning=120 --critical=150
379
380 Example 2: Give a critical when we reach 75% of our max_connections
381 setting on hosts lancre or lancre2.
382
383 check_postgres_backends --warning='75%' --critical='75%' --host=lancre,lancre2
384
385 Example 3: Give a warning when there are only 10 more connection slots
386 left on host plasmid, and a critical when we have only 5 left.
387
388 check_postgres_backends --warning=-10 --critical=-5 --host=plasmid
389
390 Example 4: Check all databases except those with "test" in their name,
391 but allow ones that are named "pg_greatest". Connect as port 5432 on
392 the first two hosts, and as port 5433 on the third one. We want to
393 always throw a critical when we reach 30 or more connections.
394
395 check_postgres_backends --dbhost=hong,kong --dbhost=fooey --dbport=5432 --dbport=5433 --warning=30 --critical=30 --exclude="~test" --include="pg_greatest,~prod"
396
397 For MRTG output, the number of connections is reported on the first
398 line, and the fourth line gives the name of the database, plus the
399 current maximum_connections. If more than one database has been
400 queried, the one with the highest number of connections is output.
401
402 bloat
403 ("symlink: check_postgres_bloat") Checks the amount of bloat in tables
404 and indexes. (Bloat is generally the amount of dead unused space taken
405 up in a table or index. This space is usually reclaimed by use of the
406 VACUUM command.) This action requires that stats collection be enabled
407 on the target databases, and requires that ANALYZE is run frequently.
408 The --include and --exclude options can be used to filter out which
409 tables to look at. See the "BASIC FILTERING" section for more details.
410
411 The --warning and --critical options can be specified as sizes,
412 percents, or both. Valid size units are bytes, kilobytes, megabytes,
413 gigabytes, terabytes, exabytes, petabytes, and zettabytes. You can
414 abbreviate all of those with the first letter. Items without units are
415 assumed to be 'bytes'. The default values are '1 GB' and '5 GB'. The
416 value represents the number of "wasted bytes", or the difference
417 between what is actually used by the table and index, and what we
418 compute that it should be.
419
420 Note that this action has two hard-coded values to avoid false alarms
421 on smaller relations. Tables must have at least 10 pages, and indexes
422 at least 15, before they can be considered by this test. If you really
423 want to adjust these values, you can look for the variables $MINPAGES
424 and $MINIPAGES at the top of the "check_bloat" subroutine. These values
425 are ignored if either --exclude or --include is used.
426
427 Only the top 10 most bloated relations are shown. You can change this
428 number by using the --perflimit option to set your own limit.
429
430 The schema named 'information_schema' is excluded from this test, as
431 the only tables it contains are small and do not change.
432
433 Please note that the values computed by this action are not precise,
434 and should be used as a guideline only. Great effort was made to
435 estimate the correct size of a table, but in the end it is only an
436 estimate. The correct index size is even more of a guess than the
437 correct table size, but both should give a rough idea of how bloated
438 things are.
439
440 Example 1: Warn if any table on port 5432 is over 100 MB bloated, and
441 critical if over 200 MB
442
443 check_postgres_bloat --port=5432 --warning='100 M' --critical='200 M'
444
445 Example 2: Give a critical if table 'orders' on host 'sami' has more
446 than 10 megs of bloat
447
448 check_postgres_bloat --host=sami --include=orders --critical='10 MB'
449
450 Example 3: Give a critical if table 'q4' on database 'sales' is over
451 50% bloated
452
453 check_postgres_bloat --db=sales --include=q4 --critical='50%'
454
455 Example 4: Give a critical any table is over 20% bloated and has over
456 150 MB of bloat:
457
458 check_postgres_bloat --port=5432 --critical='20% and 150 M'
459
460 Example 5: Give a critical any table is over 40% bloated or has over
461 500 MB of bloat:
462
463 check_postgres_bloat --port=5432 --warning='500 M or 40%'
464
465 For MRTG output, the first line gives the highest number of wasted
466 bytes for the tables, and the second line gives the highest number of
467 wasted bytes for the indexes. The fourth line gives the database name,
468 table name, and index name information. If you want to output the bloat
469 ratio instead (how many times larger the relation is compared to how
470 large it should be), just pass in "--mrtg=ratio".
471
472 checkpoint
473 ("symlink: check_postgres_checkpoint") Determines how long since the
474 last checkpoint has been run. This must run on the same server as the
475 database that is being checked (e.g. the -h flag will not work). This
476 check is meant to run on a "warm standby" server that is actively
477 processing shipped WAL files, and is meant to check that your warm
478 standby is truly 'warm'. The data directory must be set, either by the
479 environment variable "PGDATA", or passing the "--datadir" argument. It
480 returns the number of seconds since the last checkpoint was run, as
481 determined by parsing the call to "pg_controldata". Because of this,
482 the pg_controldata executable must be available in the current path.
483 Alternatively, you can set the environment variable "PGCONTROLDATA" to
484 the exact location of the pg_controldata executable, or you can specify
485 "PGBINDIR" as the directory that it lives in.
486
487 At least one warning or critical argument must be set.
488
489 This action requires the Date::Parse module.
490
491 For MRTG or simple output, returns the number of seconds.
492
493 commitratio
494 ("symlink: check_postgres_commitratio") Checks the commit ratio of all
495 databases and complains when they are too low. There is no need to run
496 this command more than once per database cluster. Databases can be
497 filtered with the --include and --exclude options. See the "BASIC
498 FILTERING" section for more details. They can also be filtered by the
499 owner of the database with the --includeuser and --excludeuser options.
500 See the "USER NAME FILTERING" section for more details.
501
502 The warning and critical options should be specified as percentages.
503 There are not defaults for this action: the warning and critical must
504 be specified. The warning value cannot be greater than the critical
505 value. The output returns all databases sorted by commitratio, smallest
506 first.
507
508 Example: Warn if any database on host flagg is less than 90% in
509 commitratio, and critical if less then 80%.
510
511 check_postgres_database_commitratio --host=flagg --warning='90%' --critical='80%'
512
513 For MRTG output, returns the percentage of the database with the
514 smallest commitratio on the first line, and the name of the database on
515 the fourth line.
516
517 connection
518 ("symlink: check_postgres_connection") Simply connects, issues a
519 'SELECT version()', and leaves. Takes no --warning or --critical
520 options.
521
522 For MRTG output, simply outputs a 1 (good connection) or a 0 (bad
523 connection) on the first line.
524
525 custom_query
526 ("symlink: check_postgres_custom_query") Runs a custom query of your
527 choosing, and parses the results. The query itself is passed in
528 through the "query" argument, and should be kept as simple as possible.
529 If at all possible, wrap it in a view or a function to keep things
530 easier to manage. The query should return one or two columns. It is
531 required that one of the columns be named "result" and is the item that
532 will be checked against your warning and critical values. The second
533 column is for the performance data and any name can be used: this will
534 be the 'value' inside the performance data section.
535
536 At least one warning or critical argument must be specified. What these
537 are set to depends on the type of query you are running. There are four
538 types of custom_queries that can be run, specified by the "valtype"
539 argument. If none is specified, this action defaults to 'integer'. The
540 four types are:
541
542 integer: Does a simple integer comparison. The first column should be a
543 simple integer, and the warning and critical values should be the same.
544
545 string: The warning and critical are strings, and are triggered only if
546 the value in the first column matches it exactly. This is case-
547 sensitive.
548
549 time: The warning and the critical are times, and can have units of
550 seconds, minutes, hours, or days. Each may be written singular or
551 abbreviated to just the first letter. If no units are given, seconds
552 are assumed. The first column should be an integer representing the
553 number of seconds to check.
554
555 size: The warning and the critical are sizes, and can have units of
556 bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. Each
557 may be abbreviated to the first letter. If no units are given, bytes
558 are assumed. The first column should be an integer representing the
559 number of bytes to check.
560
561 Normally, an alert is triggered if the values returned are greater than
562 or equal to the critical or warning value. However, an option of
563 --reverse will trigger the alert if the returned value is lower than or
564 equal to the critical or warning value.
565
566 Example 1: Warn if any relation over 100 pages is named "rad", put the
567 number of pages inside the performance data section.
568
569 check_postgres_custom_query --valtype=string -w "rad" --query=
570 "SELECT relname AS result, relpages AS pages FROM pg_class WHERE relpages > 100"
571
572 Example 2: Give a critical if the "foobar" function returns a number
573 over 5MB:
574
575 check_postgres_custom_query --critical='5MB'--valtype=size --query="SELECT foobar() AS result"
576
577 Example 2: Warn if the function "snazzo" returns less than 42:
578
579 check_postgres_custom_query --critical=42 --query="SELECT snazzo() AS result" --reverse
580
581 If you come up with a useful custom_query, consider sending in a patch
582 to this program to make it into a standard action that other people can
583 use.
584
585 This action does not support MRTG or simple output yet.
586
587 database_size
588 ("symlink: check_postgres_database_size") Checks the size of all
589 databases and complains when they are too big. There is no need to run
590 this command more than once per database cluster. Databases can be
591 filtered with the --include and --exclude options. See the "BASIC
592 FILTERING" section for more details. They can also be filtered by the
593 owner of the database with the --includeuser and --excludeuser options.
594 See the "USER NAME FILTERING" section for more details.
595
596 The warning and critical options can be specified as bytes, kilobytes,
597 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
598 to the first letter as well. If no unit is given, the units are
599 assumed to be bytes. There are not defaults for this action: the
600 warning and critical must be specified. The warning value cannot be
601 greater than the critical value. The output returns all databases
602 sorted by size largest first, showing both raw bytes and a "pretty"
603 version of the size.
604
605 Example 1: Warn if any database on host flagg is over 1 TB in size, and
606 critical if over 1.1 TB.
607
608 check_postgres_database_size --host=flagg --warning='1 TB' --critical='1.1 t'
609
610 Example 2: Give a critical if the database template1 on port 5432 is
611 over 10 MB.
612
613 check_postgres_database_size --port=5432 --include=template1 --warning='10MB' --critical='10MB'
614
615 Example 3: Give a warning if any database on host 'tardis' owned by the
616 user 'tom' is over 5 GB
617
618 check_postgres_database_size --host=tardis --includeuser=tom --warning='5 GB' --critical='10 GB'
619
620 For MRTG output, returns the size in bytes of the largest database on
621 the first line, and the name of the database on the fourth line.
622
623 dbstats
624 ("symlink: check_postgres_dbstats") Reports information from the
625 pg_stat_database view, and outputs it in a Cacti-friendly manner. No
626 other output is supported, as the output is informational and does not
627 lend itself to alerts, such as used with Nagios. If no options are
628 given, all databases are returned, one per line. You can include a
629 specific database by use of the "--include" option, or you can use the
630 "--dbname" option.
631
632 Eleven items are returned on each line, in the format name:value,
633 separated by a single space. The items are:
634
635 backends
636 The number of currently running backends for this database.
637
638 commits
639 The total number of commits for this database since it was created
640 or reset.
641
642 rollbacks
643 The total number of rollbacks for this database since it was
644 created or reset.
645
646 read
647 The total number of disk blocks read.
648
649 hit The total number of buffer hits.
650
651 ret The total number of rows returned.
652
653 fetch
654 The total number of rows fetched.
655
656 ins The total number of rows inserted.
657
658 upd The total number of rows updated.
659
660 del The total number of rows deleted.
661
662 dbname
663 The name of the database.
664
665 Note that ret, fetch, ins, upd, and del items will always be 0 if
666 Postgres is version 8.2 or lower, as those stats were not available in
667 those versions.
668
669 If the dbname argument is given, seven additional items are returned:
670
671 idxscan
672 Total number of user index scans.
673
674 idxtupread
675 Total number of user index entries returned.
676
677 idxtupfetch
678 Total number of rows fetched by simple user index scans.
679
680 idxblksread
681 Total number of disk blocks read for all user indexes.
682
683 idxblkshit
684 Total number of buffer hits for all user indexes.
685
686 seqscan
687 Total number of sequential scans against all user tables.
688
689 seqtupread
690 Total number of tuples returned from all user tables.
691
692 Example 1: Grab the stats for a database named "products" on host
693 "willow":
694
695 check_postgres_dbstats --dbhost willow --dbname products
696
697 The output returned will be like this (all on one line, not wrapped):
698
699 backends:82 commits:58374408 rollbacks:1651 read:268435543 hit:2920381758 idxscan:310931294 idxtupread:2777040927
700 idxtupfetch:1840241349 idxblksread:62860110 idxblkshit:1107812216 seqscan:5085305 seqtupread:5370500520
701 ret:0 fetch:0 ins:0 upd:0 del:0 dbname:willow
702
703 disabled_triggers
704 ("symlink: check_postgres_disabled_triggers") Checks on the number of
705 disabled triggers inside the database. The --warning and --critical
706 options are the number of such triggers found, and both default to "1",
707 as in normal usage having disabled triggers is a dangerous event. If
708 the database being checked is 8.3 or higher, the check is for the
709 number of triggers that are in a 'disabled' status (as opposed to being
710 'always' or 'replica'). The output will show the name of the table and
711 the name of the trigger for each disabled trigger.
712
713 Example 1: Make sure that there are no disabled triggers
714
715 check_postgres_disabled_triggers
716
717 For MRTG output, returns the number of disabled triggers on the first
718 line.
719
720 disk_space
721 ("symlink: check_postgres_disk_space") Checks on the available physical
722 disk space used by Postgres. This action requires that you have the
723 executable "/bin/df" available to report on disk sizes, and it also
724 needs to be run as a superuser, so it can examine the data_directory
725 setting inside of Postgres. The --warning and --critical options are
726 given in either sizes or percentages or both. If using sizes, the
727 standard unit types are allowed: bytes, kilobytes, gigabytes,
728 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
729 to the first letter only; no units at all indicates 'bytes'. The
730 default values are '90%' and '95%'.
731
732 This command checks the following things to determine all of the
733 different physical disks being used by Postgres.
734
735 data_directory - The disk that the main data directory is on.
736
737 log directory - The disk that the log files are on.
738
739 WAL file directory - The disk that the write-ahead logs are on (e.g.
740 symlinked pg_xlog)
741
742 tablespaces - Each tablespace that is on a separate disk.
743
744 The output shows the total size used and available on each disk, as
745 well as the percentage, ordered by highest to lowest percentage used.
746 Each item above maps to a file system: these can be included or
747 excluded. See the "BASIC FILTERING" section for more details.
748
749 Example 1: Make sure that no file system is over 90% for the database
750 on port 5432.
751
752 check_postgres_disk_space --port=5432 --warning='90%' --critical="90%'
753
754 Example 2: Check that all file systems starting with /dev/sda are
755 smaller than 10 GB and 11 GB (warning and critical)
756
757 check_postgres_disk_space --port=5432 --warning='10 GB' --critical='11 GB' --include="~^/dev/sda"
758
759 Example 4: Make sure that no file system is both over 50% and has over
760 15 GB
761
762 check_postgres_disk_space --critical='50% and 15 GB'
763
764 Example 5: Issue a warning if any file system is either over 70% full
765 or has more than 1T
766
767 check_postgres_disk_space --warning='1T or 75'
768
769 For MRTG output, returns the size in bytes of the file system on the
770 first line, and the name of the file system on the fourth line.
771
772 fsm_pages
773 ("symlink: check_postgres_fsm_pages") Checks how close a cluster is to
774 the Postgres max_fsm_pages setting. This action will only work for
775 databases of 8.2 or higher, and it requires the contrib module
776 pg_freespacemap be installed. The --warning and --critical options
777 should be expressed as percentages. The number of used pages in the
778 free-space-map is determined by looking in the
779 pg_freespacemap_relations view, and running a formula based on the
780 formula used for outputting free-space-map pageslots in the vacuum
781 verbose command. The default values are 85% for the warning and 95% for
782 the critical.
783
784 Example 1: Give a warning when our cluster has used up 76% of the free-
785 space pageslots, with pg_freespacemap installed in database robert
786
787 check_postgres_fsm_pages --dbname=robert --warning="76%"
788
789 While you need to pass in the name of the database where
790 pg_freespacemap is installed, you only need to run this check once per
791 cluster. Also, checking this information does require obtaining special
792 locks on the free-space-map, so it is recommend you do not run this
793 check with short intervals.
794
795 For MRTG output, returns the percent of free-space-map on the first
796 line, and the number of pages currently used on the second line.
797
798 fsm_relations
799 ("symlink: check_postgres_fsm_relations") Checks how close a cluster is
800 to the Postgres max_fsm_relations setting. This action will only work
801 for databases of 8.2 or higher, and it requires the contrib module
802 pg_freespacemap be installed. The --warning and --critical options
803 should be expressed as percentages. The number of used relations in the
804 free-space-map is determined by looking in the
805 pg_freespacemap_relations view. The default values are 85% for the
806 warning and 95% for the critical.
807
808 Example 1: Give a warning when our cluster has used up 80% of the free-
809 space relations, with pg_freespacemap installed in database dylan
810
811 check_postgres_fsm_relations --dbname=dylan --warning="75%"
812
813 While you need to pass in the name of the database where
814 pg_freespacemap is installed, you only need to run this check once per
815 cluster. Also, checking this information does require obtaining special
816 locks on the free-space-map, so it is recommend you do not run this
817 check with short intervals.
818
819 For MRTG output, returns the percent of free-space-map on the first
820 line, the number of relations currently used on the second line.
821
822 hitratio
823 ("symlink: check_postgres_database_hitratio") Checks the hit ratio of
824 all databases and complains when they are too low. There is no need to
825 run this command more than once per database cluster. Databases can be
826 filtered with the --include and --exclude options. See the "BASIC
827 FILTERING" section for more details. They can also be filtered by the
828 owner of the database with the --includeuser and --excludeuser options.
829 See the "USER NAME FILTERING" section for more details.
830
831 The warning and critical options should be specified as percentages.
832 There are not defaults for this action: the warning and critical must
833 be specified. The warning value cannot be greater than the critical
834 value. The output returns all databases sorted by hitratio, smallest
835 first.
836
837 Example: Warn if any database on host flagg is less than 90% in
838 hitratio, and critical if less then 80%.
839
840 check_postgres_database_hitratio --host=flagg --warning='90%' --critical='80%'
841
842 For MRTG output, returns the percentage of the database with the
843 smallest hitratio on the first line, and the name of the database on
844 the fourth line.
845
846 hot_standby_delay
847 ("symlink: check_hot_standby_delay") Checks the streaming replication
848 lag by computing the delta between the xlog position of a master server
849 and the one of the slaves connected to it. The slave_ server must be in
850 hot_standby (e.g. read only) mode, therefore the minimum version to use
851 this_ action is Postgres 9.0. The --warning and --critical options are
852 the delta between xlog location. These values should match the volume
853 of transactions needed to have the streaming replication disconnect
854 from the master because of too much lag.
855
856 You must provide information on how to reach the second database by a
857 connection parameter ending in the number 2, such as "--dbport2=5543".
858 If if it not given, the action fails.
859
860 index_size
861 table_size
862 relation_size
863 (symlinks: "check_postgres_index_size", "check_postgres_table_size",
864 and "check_postgres_relation_size") The actions table_size and
865 index_size are simply variations of the relation_size action, which
866 checks for a relation that has grown too big. Relations (in other
867 words, tables and indexes) can be filtered with the --include and
868 --exclude options. See the "BASIC FILTERING" section for more details.
869 Relations can also be filtered by the user that owns them, by using the
870 --includeuser and --excludeuser options. See the "USER NAME FILTERING"
871 section for more details.
872
873 The values for the --warning and --critical options are file sizes, and
874 may have units of bytes, kilobytes, megabytes, gigabytes, terabytes, or
875 exabytes. Each can be abbreviated to the first letter. If no units are
876 given, bytes are assumed. There are no default values: both the warning
877 and the critical option must be given. The return text shows the size
878 of the largest relation found.
879
880 If the --showperf option is enabled, all of the relations with their
881 sizes will be given. To prevent this, it is recommended that you set
882 the --perflimit option, which will cause the query to do a "ORDER BY
883 size DESC LIMIT (perflimit)".
884
885 Example 1: Give a critical if any table is larger than 600MB on host
886 burrick.
887
888 check_postgres_table_size --critical='600 MB' --warning='600 MB' --host=burrick
889
890 Example 2: Warn if the table products is over 4 GB in size, and give a
891 critical at 4.5 GB.
892
893 check_postgres_table_size --host=burrick --warning='4 GB' --critical='4.5 GB' --include=products
894
895 Example 3: Warn if any index not owned by postgres goes over 500 MB.
896
897 check_postgres_index_size --port=5432 --excludeuser=postgres -w 500MB -c 600MB
898
899 For MRTG output, returns the size in bytes of the largest relation, and
900 the name of the database and relation as the fourth line.
901
902 last_analyze
903 last_vacuum
904 last_autoanalyze
905 last_autovacuum
906 (symlinks: "check_postgres_last_analyze", "check_postgres_last_vacuum",
907 "check_postgres_last_autoanalyze", and
908 "check_postgres_last_autovacuum") Checks how long it has been since
909 vacuum (or analyze) was last run on each table in one or more
910 databases. Use of these actions requires that the target database is
911 version 8.3 or greater, or that the version is 8.2 and the
912 configuration variable stats_row_level has been enabled. Tables can be
913 filtered with the --include and --exclude options. See the "BASIC
914 FILTERING" section for more details. Tables can also be filtered by
915 their owner by use of the --includeuser and --excludeuser options. See
916 the "USER NAME FILTERING" section for more details.
917
918 The units for --warning and --critical are specified as times. Valid
919 units are seconds, minutes, hours, and days; all can be abbreviated to
920 the first letter. If no units are given, 'seconds' are assumed. The
921 default values are '1 day' and '2 days'. Please note that there are
922 cases in which this field does not get automatically populated. If
923 certain tables are giving you problems, make sure that they have dead
924 rows to vacuum, or just exclude them from the test.
925
926 The schema named 'information_schema' is excluded from this test, as
927 the only tables it contains are small and do not change.
928
929 Note that the non-'auto' versions will also check on the auto versions
930 as well. In other words, using last_vacuum will report on the last
931 vacuum, whether it was a normal vacuum, or one run by the autovacuum
932 daemon.
933
934 Example 1: Warn if any table has not been vacuumed in 3 days, and give
935 a critical at a week, for host wormwood
936
937 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d'
938
939 Example 2: Same as above, but skip tables belonging to the users 'eve'
940 or 'mallory'
941
942 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d' --excludeusers=eve,mallory
943
944 For MRTG output, returns (on the first line) the LEAST amount of time
945 in seconds since a table was last vacuumed or analyzed. The fourth line
946 returns the name of the database and name of the table.
947
948 listener
949 ("symlink: check_postgres_listener") Confirm that someone is listening
950 for one or more specific strings (using the LISTEN/NOTIFY system), by
951 looking at the pg_listener table. Only one of warning or critical is
952 needed. The format is a simple string representing the LISTEN target,
953 or a tilde character followed by a string for a regular expression
954 check. Note that this check will not work on versions of Postgres 9.0
955 or higher.
956
957 Example 1: Give a warning if nobody is listening for the string
958 bucardo_mcp_ping on ports 5555 and 5556
959
960 check_postgres_listener --port=5555,5556 --warning=bucardo_mcp_ping
961
962 Example 2: Give a critical if there are no active LISTEN requests
963 matching 'grimm' on database oskar
964
965 check_postgres_listener --db oskar --critical=~grimm
966
967 For MRTG output, returns a 1 or a 0 on the first, indicating success or
968 failure. The name of the notice must be provided via the --mrtg option.
969
970 locks
971 ("symlink: check_postgres_locks") Check the total number of locks on
972 one or more databases. There is no need to run this more than once per
973 database cluster. Databases can be filtered with the --include and
974 --exclude options. See the "BASIC FILTERING" section for more details.
975
976 The --warning and --critical options can be specified as simple
977 numbers, which represent the total number of locks, or they can be
978 broken down by type of lock. Valid lock names are 'total', 'waiting',
979 or the name of a lock type used by Postgres. These names are case-
980 insensitive and do not need the "lock" part on the end, so exclusive
981 will match 'ExclusiveLock'. The format is name=number, with different
982 items separated by semicolons.
983
984 Example 1: Warn if the number of locks is 100 or more, and critical if
985 200 or more, on host garrett
986
987 check_postgres_locks --host=garrett --warning=100 --critical=200
988
989 Example 2: On the host artemus, warn if 200 or more locks exist, and
990 give a critical if over 250 total locks exist, or if over 20 exclusive
991 locks exist, or if over 5 connections are waiting for a lock.
992
993 check_postgres_locks --host=artemus --warning=200 --critical="total=250;waiting=5;exclusive=20"
994
995 For MRTG output, returns the number of locks on the first line, and the
996 name of the database on the fourth line.
997
998 logfile
999 ("symlink: check_postgres_logfile") Ensures that the logfile is in the
1000 expected location and is being logged to. This action issues a command
1001 that throws an error on each database it is checking, and ensures that
1002 the message shows up in the logs. It scans the various log_* settings
1003 inside of Postgres to figure out where the logs should be. If you are
1004 using syslog, it does a rough (but not foolproof) scan of
1005 /etc/syslog.conf. Alternatively, you can provide the name of the
1006 logfile with the --logfile option. This is especially useful if the
1007 logs have a custom rotation scheme driven be an external program. The
1008 --logfile option supports the following escape characters: "%Y %m %d
1009 %H", which represent the current year, month, date, and hour
1010 respectively. An error is always reported as critical unless the
1011 warning option has been passed in as a non-zero value. Other than that
1012 specific usage, the "--warning" and "--critical" options should not be
1013 used.
1014
1015 Example 1: On port 5432, ensure the logfile is being written to the
1016 file /home/greg/pg8.2.log
1017
1018 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log
1019
1020 Example 2: Same as above, but raise a warning, not a critical
1021
1022 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log -w 1
1023
1024 For MRTG output, returns a 1 or 0 on the first line, indicating success
1025 or failure. In case of a failure, the fourth line will provide more
1026 detail on the failure encountered.
1027
1028 new_version_bc
1029 ("symlink: check_postgres_new_version_bc") Checks if a newer version of
1030 the Bucardo program is available. The current version is obtained by
1031 running "bucardo_ctl --version". If a major upgrade is available, a
1032 warning is returned. If a revision upgrade is available, a critical is
1033 returned. (Bucardo is a master to slave, and master to master
1034 replication system for Postgres: see http://bucardo.org for more
1035 information). See also the information on the "--get_method" option.
1036
1037 new_version_box
1038 ("symlink: check_postgres_new_version_box") Checks if a newer version
1039 of the boxinfo program is available. The current version is obtained by
1040 running "boxinfo.pl --version". If a major upgrade is available, a
1041 warning is returned. If a revision upgrade is available, a critical is
1042 returned. (boxinfo is a program for grabbing important information from
1043 a server and putting it into a HTML format: see
1044 http://bucardo.org/wiki/boxinfo for more information). See also the
1045 information on the "--get_method" option.
1046
1047 new_version_cp
1048 ("symlink: check_postgres_new_version_cp") Checks if a newer version of
1049 this program (check_postgres.pl) is available, by grabbing the version
1050 from a small text file on the main page of the home page for the
1051 project. Returns a warning if the returned version does not match the
1052 one you are running. Recommended interval to check is once a day. See
1053 also the information on the "--get_method" option.
1054
1055 new_version_pg
1056 ("symlink: check_postgres_new_version_pg") Checks if a newer revision
1057 of Postgres exists for each database connected to. Note that this only
1058 checks for revision, e.g. going from 8.3.6 to 8.3.7. Revisions are
1059 always 100% binary compatible and involve no dump and restore to
1060 upgrade. Revisions are made to address bugs, so upgrading as soon as
1061 possible is always recommended. Returns a warning if you do not have
1062 the latest revision. It is recommended this check is run at least once
1063 a day. See also the information on the "--get_method" option.
1064
1065 new_version_tnm
1066 ("symlink: check_postgres_new_version_tnm") Checks if a newer version
1067 of the tail_n_mail program is available. The current version is
1068 obtained by running "tail_n_mail --version". If a major upgrade is
1069 available, a warning is returned. If a revision upgrade is available, a
1070 critical is returned. (tail_n_mail is a log monitoring tool that can
1071 send mail when interesting events appear in your Postgres logs. See:
1072 http://bucardo.org/wiki/Tail_n_mail for more information). See also
1073 the information on the "--get_method" option.
1074
1075 pgb_pool_cl_active
1076 pgb_pool_cl_waiting
1077 pgb_pool_sv_active
1078 pgb_pool_sv_idle
1079 pgb_pool_sv_used
1080 pgb_pool_sv_tested
1081 pgb_pool_sv_login
1082 pgb_pool_maxwait
1083 (symlinks: "check_postgres_pgb_pool_cl_active",
1084 "check_postgres_pgb_pool_cl_waiting",
1085 "check_postgres_pgb_pool_sv_active", "check_postgres_pgb_pool_sv_idle",
1086 "check_postgres_pgb_pool_sv_used", "check_postgres_pgb_pool_sv_tested",
1087 "check_postgres_pgb_pool_sv_login", and
1088 "check_postgres_pgb_pool_maxwait")
1089
1090 Examines pgbouncer's pool statistics. Each pool has a set of "client"
1091 connections, referring to connections from external clients, and
1092 "server" connections, referring to connections to PostgreSQL itself.
1093 The related check_postgres actions are prefixed by "cl_" and "sv_",
1094 respectively. Active client connections are those connections currently
1095 linked with an active server connection. Client connections may also be
1096 "waiting", meaning they have not yet been allocated a server
1097 connection. Server connections are "active" (linked to a client),
1098 "idle" (standing by for a client connection to link with), "used" (just
1099 unlinked from a client, and not yet returned to the idle pool),
1100 "tested" (currently being tested) and "login" (in the process of
1101 logging in). The maxwait value shows how long in seconds the oldest
1102 waiting client connection has been waiting.
1103
1104 pgbouncer_backends
1105 ("symlink: check_postgres_pgbouncer_backends") Checks the current
1106 number of connections for one or more databases through pgbouncer, and
1107 optionally compares it to the maximum allowed, which is determined by
1108 the pgbouncer configuration variable max_client_conn. The --warning and
1109 --critical options can take one of three forms. First, a simple number
1110 can be given, which represents the number of connections at which the
1111 alert will be given. This choice does not use the max_connections
1112 setting. Second, the percentage of available connections can be given.
1113 Third, a negative number can be given which represents the number of
1114 connections left until max_connections is reached. The default values
1115 for --warning and --critical are '90%' and '95%'. You can also filter
1116 the databases by use of the --include and --exclude options. See the
1117 "BASIC FILTERING" section for more details.
1118
1119 To view only non-idle processes, you can use the --noidle argument.
1120 Note that the user you are connecting as must be a superuser for this
1121 to work properly.
1122
1123 Example 1: Give a warning when the number of connections on host quirm
1124 reaches 120, and a critical if it reaches 150.
1125
1126 check_postgres_pgbouncer_backends --host=quirm --warning=120 --critical=150 -p 6432 -u pgbouncer
1127
1128 Example 2: Give a critical when we reach 75% of our max_connections
1129 setting on hosts lancre or lancre2.
1130
1131 check_postgres_pgbouncer_backends --warning='75%' --critical='75%' --host=lancre,lancre2 -p 6432 -u pgbouncer
1132
1133 Example 3: Give a warning when there are only 10 more connection slots
1134 left on host plasmid, and a critical when we have only 5 left.
1135
1136 check_postgres_pgbouncer_backends --warning=-10 --critical=-5 --host=plasmid -p 6432 -u pgbouncer
1137
1138 For MRTG output, the number of connections is reported on the first
1139 line, and the fourth line gives the name of the database, plus the
1140 current max_client_conn. If more than one database has been queried,
1141 the one with the highest number of connections is output.
1142
1143 pgbouncer_checksum
1144 ("symlink: check_postgres_pgbouncer_checksum") Checks that all the
1145 pgBouncer settings are the same as last time you checked. This is done
1146 by generating a checksum of a sorted list of setting names and their
1147 values. Note that you shouldn't specify the database name, it will
1148 automatically default to pgbouncer. Either the --warning or the
1149 --critical option should be given, but not both. The value of each one
1150 is the checksum, a 32-character hexadecimal value. You can run with the
1151 special "--critical=0" option to find out an existing checksum.
1152
1153 This action requires the Digest::MD5 module.
1154
1155 Example 1: Find the initial checksum for pgbouncer configuration on
1156 port 6432 using the default user (usually postgres)
1157
1158 check_postgres_pgbouncer_checksum --port=6432 --critical=0
1159
1160 Example 2: Make sure no settings have changed and warn if so, using the
1161 checksum from above.
1162
1163 check_postgres_pgbouncer_checksum --port=6432 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1164
1165 For MRTG output, returns a 1 or 0 indicating success of failure of the
1166 checksum to match. A checksum must be provided as the "--mrtg"
1167 argument. The fourth line always gives the current checksum.
1168
1169 prepared_txns
1170 ("symlink: check_postgres_prepared_txns") Check on the age of any
1171 existing prepared transactions. Note that most people will NOT use
1172 prepared transactions, as they are part of two-part commit and
1173 complicated to maintain. They should also not be confused with prepared
1174 STATEMENTS, which is what most people think of when they hear prepare.
1175 The default value for a warning is 1 second, to detect any use of
1176 prepared transactions, which is probably a mistake on most systems.
1177 Warning and critical are the number of seconds a prepared transaction
1178 has been open before an alert is given.
1179
1180 Example 1: Give a warning on detecting any prepared transactions:
1181
1182 check_postgres_prepared_txns -w 0
1183
1184 Example 2: Give a critical if any prepared transaction has been open
1185 longer than 10 seconds, but allow up to 360 seconds for the database
1186 'shrike':
1187
1188 check_postgres_prepared_txns --critical=10 --exclude=shrike
1189 check_postgres_prepared_txns --critical=360 --include=shrike
1190
1191 For MRTG output, returns the number of seconds the oldest transaction
1192 has been open as the first line, and which database is came from as the
1193 final line.
1194
1195 query_runtime
1196 ("symlink: check_postgres_query_runtime") Checks how long a specific
1197 query takes to run, by executing a "EXPLAIN ANALYZE" against it. The
1198 --warning and --critical options are the maximum amount of time the
1199 query should take. Valid units are seconds, minutes, and hours; any can
1200 be abbreviated to the first letter. If no units are given, 'seconds'
1201 are assumed. Both the warning and the critical option must be given.
1202 The name of the view or function to be run must be passed in to the
1203 --queryname option. It must consist of a single word (or schema.word),
1204 with optional parens at the end.
1205
1206 Example 1: Give a critical if the function named "speedtest" fails to
1207 run in 10 seconds or less.
1208
1209 check_postgres_query_runtime --queryname='speedtest()' --critical=10 --warning=10
1210
1211 For MRTG output, reports the time in seconds for the query to complete
1212 on the first line. The fourth line lists the database.
1213
1214 query_time
1215 ("symlink: check_postgres_query_time") Checks the length of running
1216 queries on one or more databases. There is no need to run this more
1217 than once on the same database cluster. Note that this already excludes
1218 queries that are "idle in transaction". Databases can be filtered by
1219 using the --include and --exclude options. See the "BASIC FILTERING"
1220 section for more details. You can also filter on the user running the
1221 query with the --includeuser and --excludeuser options. See the "USER
1222 NAME FILTERING" section for more details.
1223
1224 The values for the --warning and --critical options are amounts of
1225 time, and default to '2 minutes' and '5 minutes' respectively. Valid
1226 units are 'seconds', 'minutes', 'hours', or 'days'. Each may be written
1227 singular or abbreviated to just the first letter. If no units are
1228 given, the unit is assumed to be seconds.
1229
1230 This action requires Postgres 8.3 or better.
1231
1232 Example 1: Give a warning if any query has been running longer than 3
1233 minutes, and a critical if longer than 5 minutes.
1234
1235 check_postgres_query_time --port=5432 --warning='3 minutes' --critical='5 minutes'
1236
1237 Example 2: Using default values (2 and 5 minutes), check all databases
1238 except those starting with 'template'.
1239
1240 check_postgres_query_time --port=5432 --exclude=~^template
1241
1242 Example 3: Warn if user 'don' has a query running over 20 seconds
1243
1244 check_postgres_query_time --port=5432 --includeuser=don --warning=20s
1245
1246 For MRTG output, returns the length in seconds of the longest running
1247 query on the first line. The fourth line gives the name of the
1248 database.
1249
1250 replicate_row
1251 ("symlink: check_postgres_replicate_row") Checks that master-slave
1252 replication is working to one or more slaves. The slaves are specified
1253 the same as the normal databases, except with the number 2 at the end
1254 of them, so "--port2" instead of "--port", etc. The values or the
1255 --warning and --critical options are units of time, and at least one
1256 must be provided (no defaults). Valid units are 'seconds', 'minutes',
1257 'hours', or 'days'. Each may be written singular or abbreviated to just
1258 the first letter. If no units are given, the units are assumed to be
1259 seconds.
1260
1261 This check updates a single row on the master, and then measures how
1262 long it takes to be applied to the slaves. To do this, you need to pick
1263 a table that is being replicated, then find a row that can be changed,
1264 and is not going to be changed by any other process. A specific column
1265 of this row will be changed from one value to another. All of this is
1266 fed to the "repinfo" option, and should contain the following options,
1267 separated by commas: table name, primary key, key id, column, first
1268 value, second value.
1269
1270 Example 1: Slony is replicating a table named 'orders' from host
1271 'alpha' to host 'beta', in the database 'sales'. The primary key of the
1272 table is named id, and we are going to test the row with an id of 3
1273 (which is historical and never changed). There is a column named
1274 'salesrep' that we are going to toggle from a value of 'slon' to 'nols'
1275 to check on the replication. We want to throw a warning if the
1276 replication does not happen within 10 seconds.
1277
1278 check_postgres_replicate_row --host=alpha --dbname=sales --host2=beta
1279 --dbname2=sales --warning=10 --repinfo=orders,id,3,salesrep,slon,nols
1280
1281 Example 2: Bucardo is replicating a table named 'receipt' from host
1282 'green' to hosts 'red', 'blue', and 'yellow'. The database for both
1283 sides is 'public'. The slave databases are running on port 5455. The
1284 primary key is named 'receipt_id', the row we want to use has a value
1285 of 9, and the column we want to change for the test is called 'zone'.
1286 We'll toggle between 'north' and 'south' for the value of this column,
1287 and throw a critical if the change is not on all three slaves within 5
1288 seconds.
1289
1290 check_postgres_replicate_row --host=green --port2=5455 --host2=red,blue,yellow
1291 --critical=5 --repinfo=receipt,receipt_id,9,zone,north,south
1292
1293 For MRTG output, returns on the first line the time in seconds the
1294 replication takes to finish. The maximum time is set to 4 minutes 30
1295 seconds: if no replication has taken place in that long a time, an
1296 error is thrown.
1297
1298 same_schema
1299 ("symlink: check_postgres_same_schema") Verifies that two or more
1300 databases are identical as far as their schema (but not the data
1301 within). This is particularly handy for making sure your slaves have
1302 not been modified or corrupted in any way when using master to slave
1303 replication. Unlike most other actions, this has no warning or critical
1304 criteria - the databases are either in sync, or are not. If they are
1305 different, a detailed list of the differences is presented.
1306
1307 You may want to exclude or filter out certain differences. The way to
1308 do this is to add strings to the "--filter" option. To exclude a type
1309 of object, use "noname", where 'name' is the type of object, for
1310 example, "noschema". To exclude objects of a certain type by a regular
1311 expression against their name, use "noname=regex". See the examples
1312 below for a better understanding.
1313
1314 The types of objects that can be filtered include:
1315
1316 user
1317 schema
1318 table
1319 view
1320 index
1321 sequence
1322 constraint
1323 trigger
1324 function
1325
1326 The filter option "noposition" prevents verification of the position
1327 of columns within a table.
1328
1329 The filter option "nofuncbody" prevents comparison of the bodies of all
1330 functions.
1331
1332 The filter option "noperm" prevents comparison of object permissions.
1333
1334 To provide the second database, just append the differences to the
1335 first one by a call to the appropriate connection argument. For
1336 example, to compare databases on hosts alpha and bravo, use
1337 "--dbhost=alpha,bravo". Also see the examples below.
1338
1339 If only a single host is given, it is assumed we are doing a "time-
1340 based" report. The first time this is run a snapshot of all the items
1341 in the database is saved to a local file. When you run it again, that
1342 snapshot is read in and becomes "database #2" and is compared to the
1343 current database.
1344
1345 To replace the old stored file with the new version, use the --replace
1346 argument.
1347
1348 To enable snapshots at various points in time, you can use the
1349 "--suffix" argument to make the filenames unique to each run. See the
1350 examples below.
1351
1352 Example 1: Verify that two databases on hosts star and line are the
1353 same:
1354
1355 check_postgres_same_schema --dbhost=star,line
1356
1357 Example 2: Same as before, but exclude any triggers with "slony" in
1358 their name
1359
1360 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony"
1361
1362 Example 3: Same as before, but also exclude all indexes
1363
1364 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony noindexes"
1365
1366 Example 4: Check differences for the database "battlestar" on different
1367 ports
1368
1369 check_postgres_same_schema --dbname=battlestar --dbport=5432,5544
1370
1371 Example 5: Create a daily and weekly snapshot file
1372
1373 check_postgres_same_schema --dbname=cylon --suffix=daily
1374 check_postgres_same_schema --dbname=cylon --suffix=weekly
1375
1376 Example 6: Run a historical comparison, then replace the file
1377
1378 check_postgres_same_schema --dbname=cylon --suffix=daily --replace
1379
1380 sequence
1381 ("symlink: check_postgres_sequence") Checks how much room is left on
1382 all sequences in the database. This is measured as the percent of
1383 total possible values that have been used for each sequence. The
1384 --warning and --critical options should be expressed as percentages.
1385 The default values are 85% for the warning and 95% for the critical.
1386 You may use --include and --exclude to control which sequences are to
1387 be checked. Note that this check does account for unusual minvalue and
1388 increment by values, but does not care if the sequence is set to cycle
1389 or not.
1390
1391 The output for Nagios gives the name of the sequence, the percentage
1392 used, and the number of 'calls' left, indicating how many more times
1393 nextval can be called on that sequence before running into the maximum
1394 value.
1395
1396 The output for MRTG returns the highest percentage across all sequences
1397 on the first line, and the name of each sequence with that percentage
1398 on the fourth line, separated by a "|" (pipe) if there are more than
1399 one sequence at that percentage.
1400
1401 Example 1: Give a warning if any sequences are approaching 95% full.
1402
1403 check_postgres_sequence --dbport=5432 --warning=95%
1404
1405 Example 2: Check that the sequence named "orders_id_seq" is not more
1406 than half full.
1407
1408 check_postgres_sequence --dbport=5432 --critical=50% --include=orders_id_seq
1409
1410 settings_checksum
1411 ("symlink: check_postgres_settings_checksum") Checks that all the
1412 Postgres settings are the same as last time you checked. This is done
1413 by generating a checksum of a sorted list of setting names and their
1414 values. Note that different users in the same database may have
1415 different checksums, due to ALTER USER usage, and due to the fact that
1416 superusers see more settings than ordinary users. Either the --warning
1417 or the --critical option should be given, but not both. The value of
1418 each one is the checksum, a 32-character hexadecimal value. You can run
1419 with the special "--critical=0" option to find out an existing
1420 checksum.
1421
1422 This action requires the Digest::MD5 module.
1423
1424 Example 1: Find the initial checksum for the database on port 5555
1425 using the default user (usually postgres)
1426
1427 check_postgres_settings_checksum --port=5555 --critical=0
1428
1429 Example 2: Make sure no settings have changed and warn if so, using the
1430 checksum from above.
1431
1432 check_postgres_settings_checksum --port=5555 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1433
1434 For MRTG output, returns a 1 or 0 indicating success of failure of the
1435 checksum to match. A checksum must be provided as the "--mrtg"
1436 argument. The fourth line always gives the current checksum.
1437
1438 slony_status
1439 ("symlink: check_postgres_slony_status") Checks in the status of a
1440 Slony cluster by looking at the results of Slony's sl_status view. This
1441 is returned as the number of seconds of "lag time". The --warning and
1442 --critical options should be expressed as times. The default values are
1443 60 seconds for the warning and 300 seconds for the critical.
1444
1445 The optional argument --schema indicated the schema that Slony is
1446 installed under. If it is not given, the schema will be determined
1447 automatically each time this check is run.
1448
1449 Example 1: Give a warning if any Slony is lagged by more than 20
1450 seconds
1451
1452 check_postgres_slony_status --warning 20
1453
1454 Example 2: Give a critical if Slony, installed under the schema
1455 "_slony", is over 10 minutes lagged
1456
1457 check_postgres_slony_status --schema=_slony --critical=600
1458
1459 timesync
1460 ("symlink: check_postgres_timesync") Compares the local system time
1461 with the time reported by one or more databases. The --warning and
1462 --critical options represent the number of seconds between the two
1463 systems before an alert is given. If neither is specified, the default
1464 values are used, which are '2' and '5'. The warning value cannot be
1465 greater than the critical value. Due to the non-exact nature of this
1466 test, values of '0' or '1' are not recommended.
1467
1468 The string returned shows the time difference as well as the time on
1469 each side written out.
1470
1471 Example 1: Check that databases on hosts ankh, morpork, and klatch are
1472 no more than 3 seconds off from the local time:
1473
1474 check_postgres_timesync --host=ankh,morpork,klatch --critical=3
1475
1476 For MRTG output, returns one the first line the number of seconds
1477 difference between the local time and the database time. The fourth
1478 line returns the name of the database.
1479
1480 txn_idle
1481 ("symlink: check_postgres_txn_idle") Checks the number and duration of
1482 "idle in transaction" queries on one or more databases. There is no
1483 need to run this more than once on the same database cluster. Databases
1484 can be filtered by using the --include and --exclude options. See the
1485 "BASIC FILTERING" section below for more details.
1486
1487 The --warning and --critical options are given as units of time, signed
1488 integers, or integers for units of time, and both must be provided
1489 (there are no defaults). Valid units are 'seconds', 'minutes', 'hours',
1490 or 'days'. Each may be written singular or abbreviated to just the
1491 first letter. If no units are given and the numbers are unsigned, the
1492 units are assumed to be seconds.
1493
1494 This action requires Postgres 8.3 or better.
1495
1496 Example 1: Give a warning if any connection has been idle in
1497 transaction for more than 15 seconds:
1498
1499 check_postgres_txn_idle --port=5432 --warning='15 seconds'
1500
1501 Example 2: Give a warning if there are 50 or more transactions
1502
1503 check_postgres_txn_idle --port=5432 --warning='+50'
1504
1505 Example 3: Give a critical if 5 or more connections have been idle in
1506 transaction for more than 10 seconds:
1507
1508 check_postgres_txn_idle --port=5432 --critical='5 for 10 seconds'
1509
1510 For MRTG output, returns the time in seconds the longest idle
1511 transaction has been running. The fourth line returns the name of the
1512 database and other information about the longest transaction.
1513
1514 txn_time
1515 ("symlink: check_postgres_txn_time") Checks the length of open
1516 transactions on one or more databases. There is no need to run this
1517 command more than once per database cluster. Databases can be filtered
1518 by use of the --include and --exclude options. See the "BASIC
1519 FILTERING" section for more details. The owner of the transaction can
1520 also be filtered, by use of the --includeuser and --excludeuser
1521 options. See the "USER NAME FILTERING" section for more details.
1522
1523 The values or the --warning and --critical options are units of time,
1524 and must be provided (no default). Valid units are 'seconds',
1525 'minutes', 'hours', or 'days'. Each may be written singular or
1526 abbreviated to just the first letter. If no units are given, the units
1527 are assumed to be seconds.
1528
1529 This action requires Postgres 8.3 or better.
1530
1531 Example 1: Give a critical if any transaction has been open for more
1532 than 10 minutes:
1533
1534 check_postgres_txn_time --port=5432 --critical='10 minutes'
1535
1536 Example 1: Warn if user 'warehouse' has a transaction open over 30
1537 seconds
1538
1539 check_postgres_txn_time --port-5432 --warning=30s --includeuser=warehouse
1540
1541 For MRTG output, returns the maximum time in seconds a transaction has
1542 been open on the first line. The fourth line gives the name of the
1543 database.
1544
1545 txn_wraparound
1546 ("symlink: check_postgres_txn_wraparound") Checks how close to
1547 transaction wraparound one or more databases are getting. The
1548 --warning and --critical options indicate the number of transactions
1549 done, and must be a positive integer. If either option is not given,
1550 the default values of 1.3 and 1.4 billion are used. There is no need to
1551 run this command more than once per database cluster. For a more
1552 detailed discussion of what this number represents and what to do about
1553 it, please visit the page
1554 http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
1555 <http://www.postgresql.org/docs/current/static/routine-
1556 vacuuming.html#VACUUM-FOR-WRAPAROUND>
1557
1558 The warning and critical values can have underscores in the number for
1559 legibility, as Perl does.
1560
1561 Example 1: Check the default values for the localhost database
1562
1563 check_postgres_txn_wraparound --host=localhost
1564
1565 Example 2: Check port 6000 and give a critical when 1.7 billion
1566 transactions are hit:
1567
1568 check_postgres_txn_wraparound --port=6000 --critical=1_700_000_000
1569
1570 For MRTG output, returns the highest number of transactions for all
1571 databases on line one, while line 4 indicates which database it is.
1572
1573 version
1574 ("symlink: check_postgres_version") Checks that the required version of
1575 Postgres is running. The --warning and --critical options (only one is
1576 required) must be of the format X.Y or X.Y.Z where X is the major
1577 version number, Y is the minor version number, and Z is the revision.
1578
1579 Example 1: Give a warning if the database on port 5678 is not version
1580 8.4.10:
1581
1582 check_postgres_version --port=5678 -w=8.4.10
1583
1584 Example 2: Give a warning if any databases on hosts valley,grain, or
1585 sunshine is not 8.3:
1586
1587 check_postgres_version -H valley,grain,sunshine --critical=8.3
1588
1589 For MRTG output, reports a 1 or a 0 indicating success or failure on
1590 the first line. The fourth line indicates the current version. The
1591 version must be provided via the "--mrtg" option.
1592
1593 wal_files
1594 ("symlink: check_postgres_wal_files") Checks how many WAL files exist
1595 in the pg_xlog directory, which is found off of your data_directory,
1596 sometimes as a symlink to another physical disk for performance
1597 reasons. This action must be run as a superuser, in order to access the
1598 contents of the pg_xlog directory. The minimum version to use this
1599 action is Postgres 8.1. The --warning and --critical options are simply
1600 the number of files in the pg_xlog directory. What number to set this
1601 to will vary, but a general guideline is to put a number slightly
1602 higher than what is normally there, to catch problems early.
1603
1604 Normally, WAL files are closed and then re-used, but a long-running
1605 open transaction, or a faulty archive_command script, may cause
1606 Postgres to create too many files. Ultimately, this will cause the disk
1607 they are on to run out of space, at which point Postgres will shut
1608 down.
1609
1610 Example 1: Check that the number of WAL files is 20 or less on host
1611 "pluto"
1612
1613 check_postgres_wal_files --host=pluto --critical=20
1614
1615 For MRTG output, reports the number of WAL files on line 1.
1616
1617 rebuild_symlinks
1618 rebuild_symlinks_force
1619 This action requires no other arguments, and does not connect to any
1620 databases, but simply creates symlinks in the current directory for
1621 each action, in the form check_postgres_<action_name>. If the file
1622 already exists, it will not be overwritten. If the action is
1623 rebuild_symlinks_force, then symlinks will be overwritten. The option
1624 --symlinks is a shorter way of saying --action=rebuild_symlinks
1625
1627 The options --include and --exclude can be combined to limit which
1628 things are checked, depending on the action. The name of the database
1629 can be filtered when using the following actions: backends,
1630 database_size, locks, query_time, txn_idle, and txn_time. The name of
1631 a relation can be filtered when using the following actions: bloat,
1632 index_size, table_size, relation_size, last_vacuum, last_autovacuum,
1633 last_analyze, and last_autoanalyze. The name of a setting can be
1634 filtered when using the settings_checksum action. The name of a file
1635 system can be filtered when using the disk_space action.
1636
1637 If only an include option is given, then ONLY those entries that match
1638 will be checked. However, if given both exclude and include, the
1639 exclusion is done first, and the inclusion after, to reinstate things
1640 that may have been excluded. Both --include and --exclude can be given
1641 multiple times, and/or as comma-separated lists. A leading tilde will
1642 match the following word as a regular expression.
1643
1644 To match a schema, end the search term with a single period. Leading
1645 tildes can be used for schemas as well.
1646
1647 Be careful when using filtering: an inclusion rule on the backends, for
1648 example, may report no problems not only because the matching database
1649 had no backends, but because you misspelled the name of the database!
1650
1651 Examples:
1652
1653 Only checks items named pg_class:
1654
1655 --include=pg_class
1656
1657 Only checks items containing the letters 'pg_':
1658
1659 --include=~pg_
1660
1661 Only check items beginning with 'pg_':
1662
1663 --include=~^pg_
1664
1665 Exclude the item named 'test':
1666
1667 --exclude=test
1668
1669 Exclude all items containing the letters 'test:
1670
1671 --exclude=~test
1672
1673 Exclude all items in the schema 'pg_catalog':
1674
1675 --exclude='pg_catalog.'
1676
1677 Exclude all items containing the letters 'ace', but allow the item
1678 'faceoff':
1679
1680 --exclude=~ace --include=faceoff
1681
1682 Exclude all items which start with the letters 'pg_', which contain the
1683 letters 'slon', or which are named 'sql_settings' or 'green'.
1684 Specifically check items with the letters 'prod' in their names, and
1685 always check the item named 'pg_relname':
1686
1687 --exclude=~^pg_,~slon,sql_settings --exclude=green --include=~prod,pg_relname
1688
1690 The options --includeuser and --excludeuser can be used on some actions
1691 to only examine database objects owned by (or not owned by) one or more
1692 users. An --includeuser option always trumps an --excludeuser option.
1693 You can give each option more than once for multiple users, or you can
1694 give a comma-separated list. The actions that currently use these
1695 options are:
1696
1697 database_size
1698 last_analyze
1699 last_autoanalyze
1700 last_vacuum
1701 last_autovacuum
1702 query_time
1703 relation_size
1704 txn_time
1705
1706 Examples:
1707
1708 Only check items owned by the user named greg:
1709
1710 --includeuser=greg
1711
1712 Only check items owned by either watson or crick:
1713
1714 --includeuser=watson,crick
1715
1716 Only check items owned by crick,franklin, watson, or wilkins:
1717
1718 --includeuser=watson --includeuser=franklin --includeuser=crick,wilkins
1719
1720 Check all items except for those belonging to the user scott:
1721
1722 --excludeuser=scott
1723
1725 To help in setting things up, this program can be run in a "test mode"
1726 by specifying the --test option. This will perform some basic tests to
1727 make sure that the databases can be contacted, and that certain per-
1728 action prerequisites are met, such as whether the user is a superuser,
1729 if the version of Postgres is new enough, and if stats_row_level is
1730 enabled.
1731
1733 In addition to command-line configurations, you can put any options
1734 inside of a file. The file .check_postgresrc in the current directory
1735 will be used if found. If not found, then the file ~/.check_postgresrc
1736 will be used. Finally, the file /etc/check_postgresrc will be used if
1737 available. The format of the file is option = value, one per line. Any
1738 line starting with a '#' will be skipped. Any values loaded from a
1739 check_postgresrc file will be overwritten by command-line options. All
1740 check_postgresrc files can be ignored by supplying a
1741 "--no-checkpostgresrc" argument.
1742
1744 The environment variable $ENV{HOME} is used to look for a
1745 .check_postgresrc file.
1746
1748 Since this program uses the psql program, make sure it is accessible to
1749 the user running the script. If run as a cronjob, this often means
1750 modifying the PATH environment variable.
1751
1752 If you are using Nagios in embedded Perl mode, use the "--action"
1753 argument instead of symlinks, so that the plugin only gets compiled one
1754 time.
1755
1757 Access to a working version of psql, and the following very standard
1758 Perl modules:
1759
1760 Cwd
1761 Getopt::Long
1762 File::Basename
1763 File::Temp
1764 Time::HiRes (if $opt{showtime} is set to true, which is the default)
1765
1766 The "settings_checksum" action requires the Digest::MD5 module.
1767
1768 The "checkpoint" action requires the Date::Parse module.
1769
1770 Some actions require access to external programs. If psql is not
1771 explicitly specified, the command "which" is used to find it. The
1772 program "/bin/df" is needed by the "disk_space" action.
1773
1775 Development happens using the git system. You can clone the latest
1776 version by doing:
1777
1778 git clone git://bucardo.org/check_postgres.git
1779
1781 Three mailing lists are available. For discussions about the program,
1782 bug reports, feature requests, and commit notices, send email to
1783 check_postgres@bucardo.org
1784
1785 https://mail.endcrypt.com/mailman/listinfo/check_postgres
1786
1787 A low-volume list for announcement of new versions and important
1788 notices is the 'check_postgres-announce' list:
1789
1790 https://mail.endcrypt.com/mailman/listinfo/check_postgres-announce
1791
1792 Source code changes (via git-commit) are sent to the
1793 'check_postgres-commit' list:
1794
1795 https://mail.endcrypt.com/mailman/listinfo/check_postgres-commit
1796
1798 Items not specifically attributed are by GSM (Greg Sabino Mullane).
1799
1800 Version 2.18.0 October 2, 2011
1801 Redo the same_schema action. Use new --filter argument for all filtering.
1802 Allow comparisons between any number of databases.
1803 Remove the dbname2, dbport2, etc. arguments.
1804 Allow comparison of the same db over time.
1805
1806 Swap db1 and db2 if the slave is 1 for the hot standby check (David E. Wheeler)
1807
1808 Allow multiple --schema arguments for the slony_status action (GSM and Jehan-Guillaume de Rorthais)
1809
1810 Fix ORDER BY in the last vacuum/analyze action (Nicolas Thauvin)
1811
1812 Fix check_hot_standby_delay perfdata output (Nicolas Thauvin)
1813
1814 Look in the correct place for the .ready files with the archive_ready action (Nicolas Thauvin)
1815
1816 New action: commitratio (Guillaume Lelarge)
1817
1818 New action: hitratio (Guillaume Lelarge)
1819
1820 Make sure --action overrides the symlink naming trick.
1821
1822 Set defaults for archive_ready and wal_files (Thomas Guettler, GSM)
1823
1824 Better output for wal_files and archive_ready (GSM)
1825
1826 Fix warning when client_port set to empty string (bug #79)
1827
1828 Account for "empty row" in -x output (i.e. source of functions).
1829
1830 Fix some incorrectly named data fields (Andy Lester)
1831
1832 Expand the number of pgbouncer actions (Ruslan Kabalin)
1833
1834 Give detailed information and refactor txn_idle, txn_time, and query_time
1835 (Per request from bug #61)
1836
1837 Set maxalign to 8 in the bloat check if box identified as '64-bit'
1838 (Michel Sijmons, bug #66)
1839
1840 Support non-standard version strings in the bloat check.
1841 (Michel Sijmons and Gurjeet Singh, bug #66)
1842
1843 Do not show excluded databases in some output (Ruslan Kabalin)
1844
1845 Allow "and", "or" inside arguments (David E. Wheeler)
1846
1847 Add the "new_version_box" action.
1848
1849 Fix psql version regex (Peter Eisentraut, bug #69)
1850
1851 Add the --assume-standby-mode option (Ruslan Kabalin)
1852
1853 Note that txn_idle and query_time require 8.3 (Thomas Guettler)
1854
1855 Standardize and clean up all perfdata output (bug #52)
1856
1857 Exclude "idle in transaction" from the query_time check (bug #43)
1858
1859 Fix the perflimit for the bloat action (bug #50)
1860
1861 Clean up the custom_query action a bit.
1862
1863 Fix space in perfdata for hot_standby_delay action (Nicolas Thauvin)
1864
1865 Handle undef percents in check_fsm_relations (Andy Lester)
1866
1867 Fix typo in dbstats action (Stas Vitkovsky)
1868
1869 Fix MRTG for last vacuum and last_analyze actions.
1870
1871 Version 2.17.0 no public release
1872 Version 2.16.0 January 20, 2011
1873 Add new action 'hot_standby_delay' (Nicolas Thauvin)
1874 Add cache-busting for the version-grabbing utilities.
1875 Fix problem with going to next method for new_version_pg
1876 (Greg Sabino Mullane, reported by Hywel Mallett in bug #65)
1877 Allow /usr/local/etc as an alternative location for the
1878 check_postgresrc file (Hywel Mallett)
1879 Do not use tgisconstraint in same_schema if Postgres >= 9
1880 (Guillaume Lelarge)
1881
1882 Version 2.15.4 January 3, 2011
1883 Fix warning when using symlinks
1884 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #63)
1885
1886 Version 2.15.3 December 30, 2010
1887 Show OK for no matching txn_idle entries.
1888
1889 Version 2.15.2 December 28, 2010
1890 Better formatting of sizes in the bloat action output.
1891
1892 Remove duplicate perfs in bloat action output.
1893
1894 Version 2.15.1 December 27, 2010
1895 Fix problem when examining items in pg_settings (Greg Sabino Mullane)
1896
1897 For connection test, return critical, not unknown, on FATAL errors
1898 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #62)
1899
1900 Version 2.15.0 November 8, 2010
1901 Add --quiet argument to suppress output on OK Nagios results
1902 Add index comparison for same_schema (Norman Yamada and Greg Sabino Mullane)
1903 Use $ENV{PGSERVICE} instead of "service=" to prevent problems (Guillaume Lelarge)
1904 Add --man option to show the entire manual. (Andy Lester)
1905 Redo the internal run_command() sub to use -x and hashes instead of regexes.
1906 Fix error in custom logic (Andreas Mager)
1907 Add the "pgbouncer_checksum" action (Guillaume Lelarge)
1908 Fix regex to work on WIN32 for check_fsm_relations and check_fsm_pages (Luke Koops)
1909 Don't apply a LIMIT when using --exclude on the bloat action (Marti Raudsepp)
1910 Change the output of query_time to show pid,user,port, and address (Giles Westwood)
1911 Fix to show database properly when using slony_status (Guillaume Lelarge)
1912 Allow warning items for same_schema to be comma-separated (Guillaume Lelarge)
1913 Constraint definitions across Postgres versions match better in same_schema.
1914 Work against "EnterpriseDB" databases (Sivakumar Krishnamurthy and Greg Sabino Mullane)
1915 Separate perfdata with spaces (Jehan-Guillaume (ioguix) de Rorthais)
1916 Add new action "archive_ready" (Jehan-Guillaume (ioguix) de Rorthais)
1917
1918 Version 2.14.3 (March 1, 2010)
1919 Allow slony_status action to handle more than one slave.
1920 Use commas to separate function args in same_schema output (Robert Treat)
1921
1922 Version 2.14.2 (February 18, 2010)
1923 Change autovac_freeze default warn/critical back to 90%/95% (Robert Treat)
1924 Put all items one-per-line for relation size actions if --verbose=1
1925
1926 Version 2.14.1 (February 17, 2010)
1927 Don't use $^T in logfile check, as script may be long-running
1928 Change the error string for the logfile action for easier exclusion
1929 by programs like tail_n_mail
1930
1931 Version 2.14.0 (February 11, 2010)
1932 Added the 'slony_status' action.
1933 Changed the logfile sleep from 0.5 to 1, as 0.5 gets rounded to 0 on some boxes!
1934
1935 Version 2.13.2 (February 4, 2010)
1936 Allow timeout option to be used for logtime 'sleep' time.
1937
1938 Version 2.13.2 (February 4, 2010)
1939 Show offending database for query_time action.
1940 Apply perflimit to main output for sequence action.
1941 Add 'noowner' option to same_schema action.
1942 Raise sleep timeout for logfile check to 15 seconds.
1943
1944 Version 2.13.1 (February 2, 2010)
1945 Fix bug preventing column constraint differences from 2 > 1 for same_schema from being shown.
1946 Allow aliases 'dbname1', 'dbhost1', 'dbport1',etc.
1947 Added "nolanguage" as a filter for the same_schema option.
1948 Don't track "generic" table constraints (e.. $1, $2) using same_schema
1949
1950 Version 2.13.0 (January 29, 2010)
1951 Allow "nofunctions" as a filter for the same_schema option.
1952 Added "noperm" as a filter for the same_schema option.
1953 Ignore dropped columns when considered positions for same_schema (Guillaume Lelarge)
1954
1955 Version 2.12.1 (December 3, 2009)
1956 Change autovac_freeze default warn/critical from 90%/95% to 105%/120% (Marti Raudsepp)
1957
1958 Version 2.12.0 (December 3, 2009)
1959 Allow the temporary directory to be specified via the "tempdir" argument,
1960 for systems that need it (e.g. /tmp is not owned by root).
1961 Fix so old versions of Postgres (< 8.0) use the correct default database (Giles Westwood)
1962 For "same_schema" trigger mismatches, show the attached table.
1963 Add the new_version_bc check for Bucardo version checking.
1964 Add database name to perf output for last_vacuum|analyze (Guillaume Lelarge)
1965 Fix for bloat action against old versions of Postgres without the 'block_size' param.
1966
1967 Version 2.11.1 (August 27, 2009)
1968 Proper Nagios output for last_vacuum|analyze actions. (CA~Xdric Villemain)
1969 Proper Nagios output for locks action. (CA~Xdric Villemain)
1970 Proper Nagios output for txn_wraparound action. (CA~Xdric Villemain)
1971 Fix for constraints with embedded newlines for same_schema.
1972 Allow --exclude for all items when using same_schema.
1973
1974 Version 2.11.0 (August 23, 2009)
1975 Add Nagios perf output to the wal_files check (CA~Xdric Villemain)
1976 Add support for .check_postgresrc, per request from Albe Laurenz.
1977 Allow list of web fetch methods to be changed with the --get_method option.
1978 Add support for the --language argument, which overrides any ENV.
1979 Add the --no-check_postgresrc flag.
1980 Ensure check_postgresrc options are completely overridden by command-line options.
1981 Fix incorrect warning > critical logic in replicate_rows (Glyn Astill)
1982
1983 Version 2.10.0 (August 3, 2009)
1984 For same_schema, compare view definitions, and compare languages.
1985 Make script into a global executable via the Makefile.PL file.
1986 Better output when comparing two databases.
1987 Proper Nagios output syntax for autovac_freeze and backends checks (CA~Xdric Villemain)
1988
1989 Version 2.9.5 (July 24, 2009)
1990 Don't use a LIMIT in check_bloat if --include is used. Per complaint from Jeff Frost.
1991
1992 Version 2.9.4 (July 21, 2009)
1993 More French translations (Guillaume Lelarge)
1994
1995 Version 2.9.3 (July 14, 2009)
1996 Quote dbname in perf output for the backends check. (Davide Abrigo)
1997 Add 'fetch' as an alternative method for new_version checks, as this
1998 comes by default with FreeBSD. (Hywel Mallett)
1999
2000 Version 2.9.2 (July 12, 2009)
2001 Allow dots and dashes in database name for the backends check (Davide Abrigo)
2002 Check and display the database for each match in the bloat check (CA~Xdric Villemain)
2003 Handle 'too many connections' FATAL error in the backends check with a critical,
2004 rather than a generic error (Greg, idea by JA~Xrgen Schulz-BrA~Xssel)
2005 Do not allow perflimit to interfere with exclusion rules in the vacuum and
2006 analyze tests. (Greg, bug reported by Jeff Frost)
2007
2008 Version 2.9.1 (June 12, 2009)
2009 Fix for multiple databases with the check_bloat action (Mark Kirkwood)
2010 Fixes and improvements to the same_schema action (Jeff Boes)
2011 Write tests for same_schema, other minor test fixes (Jeff Boes)
2012
2013 Version 2.9.0 (May 28, 2009)
2014 Added the same_schema action (Greg)
2015
2016 Version 2.8.1 (May 15, 2009)
2017 Added timeout via statement_timeout in addition to perl alarm (Greg)
2018
2019 Version 2.8.0 (May 4, 2009)
2020 Added internationalization support (Greg)
2021 Added the 'disabled_triggers' check (Greg)
2022 Added the 'prepared_txns' check (Greg)
2023 Added the 'new_version_cp' and 'new_version_pg' checks (Greg)
2024 French translations (Guillaume Lelarge)
2025 Make the backends search return ok if no matches due to inclusion rules,
2026 per report by Guillaume Lelarge (Greg)
2027 Added comprehensive unit tests (Greg, Jeff Boes, Selena Deckelmann)
2028 Make fsm_pages and fsm_relations handle 8.4 servers smoothly. (Greg)
2029 Fix missing 'upd' field in show_dbstats (Andras Fabian)
2030 Allow ENV{PGCONTROLDATA} and ENV{PGBINDIR}. (Greg)
2031 Add various Perl module infrastructure (e.g. Makefile.PL) (Greg)
2032 Fix incorrect regex in txn_wraparound (Greg)
2033 For txn_wraparound: consistent ordering and fix duplicates in perf output (Andras Fabian)
2034 Add in missing exabyte regex check (Selena Deckelmann)
2035 Set stats to zero if we bail early due to USERWHERECLAUSE (Andras Fabian)
2036 Add additional items to dbstats output (Andras Fabian)
2037 Remove --schema option from the fsm_ checks. (Greg Mullane and Robert Treat)
2038 Handle case when ENV{PGUSER} is set. (Andy Lester)
2039 Many various fixes. (Jeff Boes)
2040 Fix --dbservice: check version and use ENV{PGSERVICE} for old versions (CA~Xdric Villemain)
2041
2042 Version 2.7.3 (February 10, 2009)
2043 Make the sequence action check if sequence being used for a int4 column and
2044 react appropriately. (Michael Glaesemann)
2045
2046 Version 2.7.2 (February 9, 2009)
2047 Fix to prevent multiple groupings if db arguments given.
2048
2049 Version 2.7.1 (February 6, 2009)
2050 Allow the -p argument for port to work again.
2051
2052 Version 2.7.0 (February 4, 2009)
2053 Do not require a connection argument, but use defaults and ENV variables when
2054 possible: PGHOST, PGPORT, PGUSER, PGDATABASE.
2055
2056 Version 2.6.1 (February 4, 2009)
2057 Only require Date::Parse to be loaded if using the checkpoint action.
2058
2059 Version 2.6.0 (January 26, 2009)
2060 Add the 'checkpoint' action.
2061
2062 Version 2.5.4 (January 7, 2009)
2063 Better checking of $opt{dbservice} structure (CA~Xdric Villemain)
2064 Fix time display in timesync action output (Selena Deckelmann)
2065 Fix documentation typos (Josh Tolley)
2066
2067 Version 2.5.3 (December 17, 2008)
2068 Minor fix to regex in verify_version (Lee Jensen)
2069
2070 Version 2.5.2 (December 16, 2008)
2071 Minor documentation tweak.
2072
2073 Version 2.5.1 (December 11, 2008)
2074 Add support for --noidle flag to prevent backends action from counting idle processes.
2075 Patch by Selena Deckelmann.
2076
2077 Fix small undefined warning when not using --dbservice.
2078
2079 Version 2.5.0 (December 4, 2008)
2080 Add support for the pg_Service.conf file with the --dbservice option.
2081
2082 Version 2.4.3 (November 7, 2008)
2083 Fix options for replicate_row action, per report from Jason Gordon.
2084
2085 Version 2.4.2 (November 6, 2008)
2086 Wrap File::Temp::cleanup() calls in eval, in case File::Temp is an older version.
2087 Patch by Chris Butler.
2088
2089 Version 2.4.1 (November 5, 2008)
2090 Cast numbers to numeric to support sequences ranges > bigint in check_sequence action.
2091 Thanks to Scott Marlowe for reporting this.
2092
2093 Version 2.4.0 (October 26, 2008)
2094 Add Cacti support with the dbstats action.
2095 Pretty up the time output for last vacuum and analyze actions.
2096 Show the percentage of backends on the check_backends action.
2097
2098 Version 2.3.10 (October 23, 2008)
2099 Fix minor warning in action check_bloat with multiple databases.
2100 Allow warning to be greater than critical when using the --reverse option.
2101 Support the --perflimit option for the check_sequence action.
2102
2103 Version 2.3.9 (October 23, 2008)
2104 Minor tweak to way we store the default port.
2105
2106 Version 2.3.8 (October 21, 2008)
2107 Allow the default port to be changed easily.
2108 Allow transform of simple output by MB, GB, etc.
2109
2110 Version 2.3.7 (October 14, 2008)
2111 Allow multiple databases in 'sequence' action. Reported by Christoph Zwerschke.
2112
2113 Version 2.3.6 (October 13, 2008)
2114 Add missing $schema to check_fsm_pages. (Robert Treat)
2115
2116 Version 2.3.5 (October 9, 2008)
2117 Change option 'checktype' to 'valtype' to prevent collisions with -c[ritical]
2118 Better handling of errors.
2119
2120 Version 2.3.4 (October 9, 2008)
2121 Do explicit cleanups of the temp directory, per problems reported by sb@nnx.com.
2122
2123 Version 2.3.3 (October 8, 2008)
2124 Account for cases where some rounding queries give -0 instead of 0.
2125 Thanks to Glyn Astill for helping to track this down.
2126
2127 Version 2.3.2 (October 8, 2008)
2128 Always quote identifiers in check_replicate_row action.
2129
2130 Version 2.3.1 (October 7, 2008)
2131 Give a better error if one of the databases cannot be reached.
2132
2133 Version 2.3.0 (October 4, 2008)
2134 Add the "sequence" action, thanks to Gavin M. Roy for the idea.
2135 Fix minor problem with autovac_freeze action when using MRTG output.
2136 Allow output argument to be case-insensitive.
2137 Documentation fixes.
2138
2139 Version 2.2.4 (October 3, 2008)
2140 Fix some minor typos
2141
2142 Version 2.2.3 (October 1, 2008)
2143 Expand range of allowed names for --repinfo argument (Glyn Astill)
2144 Documentation tweaks.
2145
2146 Version 2.2.2 (September 30, 2008)
2147 Fixes for minor output and scoping problems.
2148
2149 Version 2.2.1 (September 28, 2008)
2150 Add MRTG output to fsm_pages and fsm_relations.
2151 Force error messages to one-line for proper Nagios output.
2152 Check for invalid prereqs on failed command. From conversations with Euler Taveira de Oliveira.
2153 Tweak the fsm_pages formula a little.
2154
2155 Version 2.2.0 (September 25, 2008)
2156 Add fsm_pages and fsm_relations actions. (Robert Treat)
2157
2158 Version 2.1.4 (September 22, 2008)
2159 Fix for race condition in txn_time action.
2160 Add --debugoutput option.
2161
2162 Version 2.1.3 (September 22, 2008)
2163 Allow alternate arguments "dbhost" for "host" and "dbport" for "port".
2164 Output a zero as default value for second line of MRTG output.
2165
2166 Version 2.1.2 (July 28, 2008)
2167 Fix sorting error in the "disk_space" action for non-Nagios output.
2168 Allow --simple as a shortcut for --output=simple.
2169
2170 Version 2.1.1 (July 22, 2008)
2171 Don't check databases with datallowconn false for the "autovac_freeze" action.
2172
2173 Version 2.1.0 (July 18, 2008)
2174 Add the "autovac_freeze" action, thanks to Robert Treat for the idea and design.
2175 Put an ORDER BY on the "txn_wraparound" action.
2176
2177 Version 2.0.1 (July 16, 2008)
2178 Optimizations to speed up the "bloat" action quite a bit.
2179 Fix "version" action to not always output in mrtg mode.
2180
2181 Version 2.0.0 (July 15, 2008)
2182 Add support for MRTG and "simple" output options.
2183 Many small improvements to nearly all actions.
2184
2185 Version 1.9.1 (June 24, 2008)
2186 Fix an error in the bloat SQL in 1.9.0
2187 Allow percentage arguments to be over 99%
2188 Allow percentages in the bloat --warning and --critical (thanks to Robert Treat for the idea)
2189
2190 Version 1.9.0 (June 22, 2008)
2191 Don't include information_schema in certain checks. (Jeff Frost)
2192 Allow --include and --exclude to use schemas by using a trailing period.
2193
2194 Version 1.8.5 (June 22, 2008)
2195 Output schema name before table name where appropriate.
2196 Thanks to Jeff Frost.
2197
2198 Version 1.8.4 (June 19, 2008)
2199 Better detection of problems in --replicate_row.
2200
2201 Version 1.8.3 (June 18, 2008)
2202 Fix 'backends' action: there may be no rows in pg_stat_activity, so run a second
2203 query if needed to find the max_connections setting.
2204 Thanks to Jeff Frost for the bug report.
2205
2206 Version 1.8.2 (June 10, 2008)
2207 Changes to allow working under Nagios' embedded Perl mode. (Ioannis Tambouras)
2208
2209 Version 1.8.1 (June 9, 2008)
2210 Allow 'bloat' action to work on Postgres version 8.0.
2211 Allow for different commands to be run for each action depending on the server version.
2212 Give better warnings when running actions not available on older Postgres servers.
2213
2214 Version 1.8.0 (June 3, 2008)
2215 Add the --reverse option to the custom_query action.
2216
2217 Version 1.7.1 (June 2, 2008)
2218 Fix 'query_time' action: account for race condition in which zero rows appear in pg_stat_activity.
2219 Thanks to Dustin Black for the bug report.
2220
2221 Version 1.7.0 (May 11, 2008)
2222 Add --replicate_row action
2223
2224 Version 1.6.1 (May 11, 2008)
2225 Add --symlinks option as a shortcut to --action=rebuild_symlinks
2226
2227 Version 1.6.0 (May 11, 2008)
2228 Add the custom_query action.
2229
2230 Version 1.5.2 (May 2, 2008)
2231 Fix problem with too eager creation of custom pgpass file.
2232
2233 Version 1.5.1 (April 17, 2008)
2234 Add example Nagios configuration settings (Brian A. Seklecki)
2235
2236 Version 1.5.0 (April 16, 2008)
2237 Add the --includeuser and --excludeuser options. Documentation cleanup.
2238
2239 Version 1.4.3 (April 16, 2008)
2240 Add in the 'output' concept for future support of non-Nagios programs.
2241
2242 Version 1.4.2 (April 8, 2008)
2243 Fix bug preventing --dbpass argument from working (Robert Treat).
2244
2245 Version 1.4.1 (April 4, 2008)
2246 Minor documentation fixes.
2247
2248 Version 1.4.0 (April 2, 2008)
2249 Have 'wal_files' action use pg_ls_dir (idea by Robert Treat).
2250 For last_vacuum and last_analyze, respect autovacuum effects, add separate
2251 autovacuum checks (ideas by Robert Treat).
2252
2253 Version 1.3.1 (April 2, 2008)
2254 Have txn_idle use query_start, not xact_start.
2255
2256 Version 1.3.0 (March 23, 2008)
2257 Add in txn_idle and txn_time actions.
2258
2259 Version 1.2.0 (February 21, 2008)
2260 Add the 'wal_files' action, which counts the number of WAL files
2261 in your pg_xlog directory.
2262 Fix some typos in the docs.
2263 Explicitly allow -v as an argument.
2264 Allow for a null syslog_facility in the 'logfile' action.
2265
2266 Version 1.1.2 (February 5, 2008)
2267 Fix error preventing --action=rebuild_symlinks from working.
2268
2269 Version 1.1.1 (February 3, 2008)
2270 Switch vacuum and analyze date output to use 'DD', not 'D'. (Glyn Astill)
2271
2272 Version 1.1.0 (December 16, 2008)
2273 Fixes, enhancements, and performance tracking.
2274 Add performance data tracking via --showperf and --perflimit
2275 Lots of refactoring and cleanup of how actions handle arguments.
2276 Do basic checks to figure out syslog file for 'logfile' action.
2277 Allow for exact matching of beta versions with 'version' action.
2278 Redo the default arguments to only populate when neither 'warning' nor 'critical' is provided.
2279 Allow just warning OR critical to be given for the 'timesync' action.
2280 Remove 'redirect_stderr' requirement from 'logfile' due to 8.3 changes.
2281 Actions 'last_vacuum' and 'last_analyze' are 8.2 only (Robert Treat)
2282
2283 Version 1.0.16 (December 7, 2007)
2284 First public release, December 2007
2285
2287 The index bloat size optimization is rough.
2288
2289 Some actions may not work on older versions of Postgres (before 8.0).
2290
2291 Please report any problems to check_postgres@bucardo.org
2292
2294 Greg Sabino Mullane <greg@endpoint.com>
2295
2297 Some example Nagios configuration settings using this script:
2298
2299 define command {
2300 command_name check_postgres_size
2301 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action database_size -w $ARG1$ -c $ARG2$
2302 }
2303
2304 define command {
2305 command_name check_postgres_locks
2306 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action locks -w $ARG1$ -c $ARG2$
2307 }
2308
2309
2310 define service {
2311 use generic-other
2312 host_name dbhost.gtld
2313 service_description dbhost PostgreSQL Service Database Usage Size
2314 check_command check_postgres_size!256000000!512000000
2315 }
2316
2317 define service {
2318 use generic-other
2319 host_name dbhost.gtld
2320 service_description dbhost PostgreSQL Service Database Locks
2321 check_command check_postgres_locks!2!3
2322 }
2323
2325 Copyright (c) 2007-2011 Greg Sabino Mullane <greg@endpoint.com>.
2326
2327 Redistribution and use in source and binary forms, with or without
2328 modification, are permitted provided that the following conditions are
2329 met:
2330
2331 1. Redistributions of source code must retain the above copyright notice,
2332 this list of conditions and the following disclaimer.
2333 2. Redistributions in binary form must reproduce the above copyright notice,
2334 this list of conditions and the following disclaimer in the documentation
2335 and/or other materials provided with the distribution.
2336
2337 THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR
2338 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2339 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
2340 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
2341 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
2342 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
2343 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2344 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
2345 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
2346 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
2347 POSSIBILITY OF SUCH DAMAGE.
2348
2349
2350
2351perl v5.12.4 2011-10-02 check_postgres(3)