1CHECK_POSTGRES(1) User Contributed Perl Documentation CHECK_POSTGRES(1)
2
3
4
6 check_postgres.pl - a Postgres monitoring script for Nagios, MRTG,
7 Cacti, and others
8
9 This documents describes check_postgres.pl version 2.25.0
10
12 ## Create all symlinks
13 check_postgres.pl --symlinks
14
15 ## Check connection to Postgres database 'pluto':
16 check_postgres.pl --action=connection --db=pluto
17
18 ## Same things, but using the symlink
19 check_postgres_connection --db=pluto
20
21 ## Warn if > 100 locks, critical if > 200, or > 20 exclusive
22 check_postgres_locks --warning=100 --critical="total=200:exclusive=20"
23
24 ## Show the current number of idle connections on port 6543:
25 check_postgres_txn_idle --port=6543 --output=simple
26
27 ## There are many other actions and options, please keep reading.
28
29 The latest news and documentation can always be found at:
30 https://bucardo.org/check_postgres/
31
33 check_postgres.pl is a Perl script that runs many different tests
34 against one or more Postgres databases. It uses the psql program to
35 gather the information, and outputs the results in one of three
36 formats: Nagios, MRTG, or simple.
37
38 Output Modes
39 The output can be changed by use of the "--output" option. The default
40 output is nagios, although this can be changed at the top of the script
41 if you wish. The current option choices are nagios, mrtg, and simple.
42 To avoid having to enter the output argument each time, the type of
43 output is automatically set if no --output argument is given, and if
44 the current directory has one of the output options in its name. For
45 example, creating a directory named mrtg and populating it with
46 symlinks via the --symlinks argument would ensure that any actions run
47 from that directory will always default to an output of "mrtg" As a
48 shortcut for --output=simple, you can enter --simple, which also
49 overrides the directory naming trick.
50
51 Nagios output
52
53 The default output format is for Nagios, which is a single line of
54 information, along with four specific exit codes:
55
56 0 (OK)
57 1 (WARNING)
58 2 (CRITICAL)
59 3 (UNKNOWN)
60
61 The output line is one of the words above, a colon, and then a short
62 description of what was measured. Additional statistics information, as
63 well as the total time the command took, can be output as well: see the
64 documentation on the arguments --showperf, --perflimit, and --showtime.
65
66 MRTG output
67
68 The MRTG output is four lines, with the first line always giving a
69 single number of importance. When possible, this number represents an
70 actual value such as a number of bytes, but it may also be a 1 or a 0
71 for actions that only return "true" or "false", such as
72 check_postgres_version. The second line is an additional stat and is
73 only used for some actions. The third line indicates an "uptime" and is
74 not used. The fourth line is a description and usually indicates the
75 name of the database the stat from the first line was pulled from, but
76 may be different depending on the action.
77
78 Some actions accept an optional --mrtg argument to further control the
79 output.
80
81 See the documentation on each action for details on the exact MRTG
82 output for each one.
83
84 Simple output
85
86 The simple output is simply a truncated version of the MRTG one, and
87 simply returns the first number and nothing else. This is very useful
88 when you just want to check the state of something, regardless of any
89 threshold. You can transform the numeric output by appending KB, MB,
90 GB, TB, or EB to the output argument, for example:
91
92 --output=simple,MB
93
94 Cacti output
95
96 The Cacti output consists of one or more items on the same line, with a
97 simple name, a colon, and then a number. At the moment, the only action
98 with explicit Cacti output is 'dbstats', and using the --output option
99 is not needed in this case, as Cacti is the only output for this
100 action. For many other actions, using --simple is enough to make Cacti
101 happy.
102
104 All actions accept a common set of database options.
105
106 -H NAME or --host=NAME
107 Connect to the host indicated by NAME. Can be a comma-separated
108 list of names. Multiple host arguments are allowed. If no host is
109 given, defaults to the "PGHOST" environment variable or no host at
110 all (which indicates using a local Unix socket). You may also use
111 "--dbhost".
112
113 -p PORT or --port=PORT
114 Connects using the specified PORT number. Can be a comma-separated
115 list of port numbers, and multiple port arguments are allowed. If
116 no port number is given, defaults to the "PGPORT" environment
117 variable. If that is not set, it defaults to 5432. You may also use
118 "--dbport"
119
120 -db NAME or --dbname=NAME
121 Specifies which database to connect to. Can be a comma-separated
122 list of names, and multiple dbname arguments are allowed. If no
123 dbname option is provided, defaults to the "PGDATABASE" environment
124 variable. If that is not set, it defaults to 'postgres' if psql is
125 version 8 or greater, and 'template1' otherwise.
126
127 -u USERNAME or --dbuser=USERNAME
128 The name of the database user to connect as. Can be a comma-
129 separated list of usernames, and multiple dbuser arguments are
130 allowed. If this is not provided, it defaults to the "PGUSER"
131 environment variable, otherwise it defaults to 'postgres'.
132
133 --dbpass=PASSWORD
134 Provides the password to connect to the database with. Use of this
135 option is highly discouraged. Instead, one should use a .pgpass or
136 pg_service.conf file.
137
138 --dbservice=NAME
139 The name of a service inside of the pg_service.conf file. Before
140 version 9.0 of Postgres, this is a global file, usually found in
141 /etc/pg_service.conf. If you are using version 9.0 or higher of
142 Postgres, you can use the file ".pg_service.conf" in the home
143 directory of the user running the script, e.g. nagios.
144
145 This file contains a simple list of connection options. You can
146 also pass additional information when using this option such as
147 --dbservice="maindatabase sslmode=require"
148
149 The documentation for this file can be found at
150 <https://www.postgresql.org/docs/current/static/libpq-pgservice.html>
151
152 The database connection options can be grouped: --host=a,b --host=c
153 --port=1234 --port=3344 would connect to a-1234, b-1234, and c-3344.
154 Note that once set, an option carries over until it is changed again.
155
156 Examples:
157
158 --host=a,b --port=5433 --db=c
159 Connects twice to port 5433, using database c, to hosts a and b: a-5433-c b-5433-c
160
161 --host=a,b --port=5433 --db=c,d
162 Connects four times: a-5433-c a-5433-d b-5433-c b-5433-d
163
164 --host=a,b --host=foo --port=1234 --port=5433 --db=e,f
165 Connects six times: a-1234-e a-1234-f b-1234-e b-1234-f foo-5433-e foo-5433-f
166
167 --host=a,b --host=x --port=5432,5433 --dbuser=alice --dbuser=bob -db=baz
168 Connects three times: a-5432-alice-baz b-5433-alice-baz x-5433-bob-baz
169
170 --dbservice="foo" --port=5433
171 Connects using the named service 'foo' in the pg_service.conf file, but overrides the port
172
174 Other options include:
175
176 --action=NAME
177 States what action we are running. Required unless using a
178 symlinked file, in which case the name of the file is used to
179 figure out the action.
180
181 --warning=VAL or -w VAL
182 Sets the threshold at which a warning alert is fired. The valid
183 options for this option depends on the action used.
184
185 --critical=VAL or -c VAL
186 Sets the threshold at which a critical alert is fired. The valid
187 options for this option depends on the action used.
188
189 -t VAL or --timeout=VAL
190 Sets the timeout in seconds after which the script will abort
191 whatever it is doing and return an UNKNOWN status. The timeout is
192 per Postgres cluster, not for the entire script. The default value
193 is 10; the units are always in seconds.
194
195 --assume-standby-mode
196 If specified, first the check if server in standby mode will be
197 performed (--datadir is required), if so, all checks that require
198 SQL queries will be ignored and "Server in standby mode" with OK
199 status will be returned instead.
200
201 Example:
202
203 postgres@db$./check_postgres.pl --action=version --warning=8.1 --datadir /var/lib/postgresql/8.3/main/ --assume-standby-mode
204 POSTGRES_VERSION OK: Server in standby mode | time=0.00
205
206 --assume-prod
207 If specified, check if server in production mode is performed
208 (--datadir is required). The option is only relevant for
209 ("symlink: check_postgres_checkpoint").
210
211 Example:
212
213 postgres@db$./check_postgres.pl --action=checkpoint --datadir /var/lib/postgresql/8.3/main/ --assume-prod
214 POSTGRES_CHECKPOINT OK: Last checkpoint was 72 seconds ago | age=72;;300 mode=MASTER
215
216 --assume-async
217 If specified, indicates that any replication between servers is
218 asynchronous. The option is only relevant for ("symlink:
219 check_postgres_same_schema").
220
221 Example:
222 postgres@db$./check_postgres.pl --action=same_schema
223 --assume-async --dbhost=star,line
224
225 -h or --help
226 Displays a help screen with a summary of all actions and options.
227
228 --man
229 Displays the entire manual.
230
231 -V or --version
232 Shows the current version.
233
234 -v or --verbose
235 Set the verbosity level. Can call more than once to boost the
236 level. Setting it to three or higher (in other words, issuing "-v
237 -v -v") turns on debugging information for this program which is
238 sent to stderr.
239
240 --showperf=VAL
241 Determines if we output additional performance data in standard
242 Nagios format (at end of string, after a pipe symbol, using
243 name=value). VAL should be 0 or 1. The default is 1. Only takes
244 effect if using Nagios output mode.
245
246 --perflimit=i
247 Sets a limit as to how many items of interest are reported back
248 when using the showperf option. This only has an effect for actions
249 that return a large number of items, such as table_size. The
250 default is 0, or no limit. Be careful when using this with the
251 --include or --exclude options, as those restrictions are done
252 after the query has been run, and thus your limit may not include
253 the items you want. Only takes effect if using Nagios output mode.
254
255 --showtime=VAL
256 Determines if the time taken to run each query is shown in the
257 output. VAL should be 0 or 1. The default is 1. No effect unless
258 showperf is on. Only takes effect if using Nagios output mode.
259
260 --test
261 Enables test mode. See the "TEST MODE" section below.
262
263 --PGBINDIR=PATH
264 Tells the script where to find the psql binaries. Useful if you
265 have more than one version of the PostgreSQL executables on your
266 system, or if there are not in your path. Note that this option is
267 in all uppercase. By default, this option is not allowed. To enable
268 it, you must change the $NO_PSQL_OPTION near the top of the script
269 to 0. Avoid using this option if you can, and instead use
270 environment variable c<PGBINDIR> or hard-coded $PGBINDIR variable,
271 also near the top of the script, to set the path to the PostgreSQL
272 to use.
273
274 --PSQL=PATH
275 (deprecated, this option may be removed in a future release!)
276 Tells the script where to find the psql program. Useful if you have
277 more than one version of the psql executable on your system, or if
278 there is no psql program in your path. Note that this option is in
279 all uppercase. By default, this option is not allowed. To enable
280 it, you must change the $NO_PSQL_OPTION near the top of the script
281 to 0. Avoid using this option if you can, and instead hard-code
282 your psql location into the $PSQL variable, also near the top of
283 the script.
284
285 --symlinks
286 Creates symlinks to the main program for each action.
287
288 --output=VAL
289 Determines the format of the output, for use in various programs.
290 The default is 'nagios'. Available options are 'nagios', 'mrtg',
291 'simple' and 'cacti'.
292
293 --mrtg=VAL
294 Used only for the MRTG or simple output, for a few specific
295 actions.
296
297 --debugoutput=VAL
298 Outputs the exact string returned by psql, for use in debugging.
299 The value is one or more letters, which determine if the output is
300 displayed or not, where 'a' = all, 'c' = critical, 'w' = warning,
301 'o' = ok, and 'u' = unknown. Letters can be combined.
302
303 --get_method=VAL
304 Allows specification of the method used to fetch information for
305 the "new_version_cp", "new_version_pg", "new_version_bc",
306 "new_version_box", and "new_version_tnm" checks. The following
307 programs are tried, in order, to grab the information from the web:
308 GET, wget, fetch, curl, lynx, links. To force the use of just one
309 (and thus remove the overhead of trying all the others until one of
310 those works), enter one of the names as the argument to get_method.
311 For example, a BSD box might enter the following line in their
312 ".check_postgresrc" file:
313
314 get_method=fetch
315
316 --language=VAL
317 Set the language to use for all output messages. Normally, this is
318 detected by examining the environment variables LC_ALL,
319 LC_MESSAGES, and LANG, but setting this option will override any
320 such detection.
321
323 The action to be run is selected using the --action flag, or by using a
324 symlink to the main file that contains the name of the action inside of
325 it. For example, to run the action "timesync", you may either issue:
326
327 check_postgres.pl --action=timesync
328
329 or use a program named:
330
331 check_postgres_timesync
332
333 All the symlinks are created for you in the current directory if use
334 the option --symlinks:
335
336 perl check_postgres.pl --symlinks
337
338 If the file name already exists, it will not be overwritten. If the
339 file exists and is a symlink, you can force it to overwrite by using
340 "--action=build_symlinks_force".
341
342 Most actions take a --warning and a --critical option, indicating at
343 what point we change from OK to WARNING, and what point we go to
344 CRITICAL. Note that because criticals are always checked first, setting
345 the warning equal to the critical is an effective way to turn warnings
346 off and always give a critical.
347
348 The current supported actions are:
349
350 archive_ready
351 ("symlink: check_postgres_archive_ready") Checks how many WAL files
352 with extension .ready exist in the pg_xlog/archive_status directory
353 (PostgreSQL 10 and later: pg_wal/archive_status), which is found off of
354 your data_directory. If the --lsfunc option is not used then this
355 action must be run as a superuser, in order to access the contents of
356 the pg_xlog/archive_status directory. The minimum version to use this
357 action is Postgres 8.1. The --warning and --critical options are simply
358 the number of .ready files in the pg_xlog/archive_status directory.
359 Usually, these values should be low, turning on the archive mechanism,
360 we usually want it to archive WAL files as fast as possible.
361
362 If the archive command fail, number of WAL in your pg_xlog directory
363 will grow until exhausting all the disk space and force PostgreSQL to
364 stop immediately.
365
366 To avoid connecting as a database superuser, a wrapper function around
367 "pg_ls_dir()" should be defined as a superuser with SECURITY DEFINER,
368 and the --lsfunc option used. This example function, if defined by a
369 superuser, will allow the script to connect as a normal user nagios
370 with --lsfunc=ls_archive_status_dir
371
372 BEGIN;
373 CREATE FUNCTION ls_archive_status_dir()
374 RETURNS SETOF TEXT
375 AS $$ SELECT pg_ls_dir('pg_xlog/archive_status') $$
376 LANGUAGE SQL
377 SECURITY DEFINER;
378 REVOKE ALL ON FUNCTION ls_archive_status_dir() FROM PUBLIC;
379 GRANT EXECUTE ON FUNCTION ls_archive_status_dir() to nagios;
380 COMMIT;
381
382 Example 1: Check that the number of ready WAL files is 10 or less on
383 host "pluto", using a wrapper function "ls_archive_status_dir" to avoid
384 the need for superuser permissions
385
386 check_postgres_archive_ready --host=pluto --critical=10 --lsfunc=ls_archive_status_dir
387
388 For MRTG output, reports the number of ready WAL files on line 1.
389
390 autovac_freeze
391 ("symlink: check_postgres_autovac_freeze") Checks how close each
392 database is to the Postgres autovacuum_freeze_max_age setting. This
393 action will only work for databases version 8.2 or higher. The
394 --warning and --critical options should be expressed as percentages.
395 The 'age' of the transactions in each database is compared to the
396 autovacuum_freeze_max_age setting (200 million by default) to generate
397 a rounded percentage. The default values are 90% for the warning and
398 95% for the critical. Databases can be filtered by use of the --include
399 and --exclude options. See the "BASIC FILTERING" section for more
400 details.
401
402 Example 1: Give a warning when any databases on port 5432 are above 97%
403
404 check_postgres_autovac_freeze --port=5432 --warning="97%"
405
406 For MRTG output, the highest overall percentage is reported on the
407 first line, and the highest age is reported on the second line. All
408 databases which have the percentage from the first line are reported on
409 the fourth line, separated by a pipe symbol.
410
411 backends
412 ("symlink: check_postgres_backends") Checks the current number of
413 connections for one or more databases, and optionally compares it to
414 the maximum allowed, which is determined by the Postgres configuration
415 variable max_connections. The --warning and --critical options can take
416 one of three forms. First, a simple number can be given, which
417 represents the number of connections at which the alert will be given.
418 This choice does not use the max_connections setting. Second, the
419 percentage of available connections can be given. Third, a negative
420 number can be given which represents the number of connections left
421 until max_connections is reached. The default values for --warning and
422 --critical are '90%' and '95%'. You can also filter the databases by
423 use of the --include and --exclude options. See the "BASIC FILTERING"
424 section for more details.
425
426 To view only non-idle processes, you can use the --noidle argument.
427 Note that the user you are connecting as must be a superuser for this
428 to work properly.
429
430 Example 1: Give a warning when the number of connections on host quirm
431 reaches 120, and a critical if it reaches 150.
432
433 check_postgres_backends --host=quirm --warning=120 --critical=150
434
435 Example 2: Give a critical when we reach 75% of our max_connections
436 setting on hosts lancre or lancre2.
437
438 check_postgres_backends --warning='75%' --critical='75%' --host=lancre,lancre2
439
440 Example 3: Give a warning when there are only 10 more connection slots
441 left on host plasmid, and a critical when we have only 5 left.
442
443 check_postgres_backends --warning=-10 --critical=-5 --host=plasmid
444
445 Example 4: Check all databases except those with "test" in their name,
446 but allow ones that are named "pg_greatest". Connect as port 5432 on
447 the first two hosts, and as port 5433 on the third one. We want to
448 always throw a critical when we reach 30 or more connections.
449
450 check_postgres_backends --dbhost=hong,kong --dbhost=fooey --dbport=5432 --dbport=5433 --warning=30 --critical=30 --exclude="~test" --include="pg_greatest,~prod"
451
452 For MRTG output, the number of connections is reported on the first
453 line, and the fourth line gives the name of the database, plus the
454 current maximum_connections. If more than one database has been
455 queried, the one with the highest number of connections is output.
456
457 bloat
458 ("symlink: check_postgres_bloat") Checks the amount of bloat in tables
459 and indexes. (Bloat is generally the amount of dead unused space taken
460 up in a table or index. This space is usually reclaimed by use of the
461 VACUUM command.) This action requires that stats collection be enabled
462 on the target databases, and requires that ANALYZE is run frequently.
463 The --include and --exclude options can be used to filter out which
464 tables to look at. See the "BASIC FILTERING" section for more details.
465
466 The --warning and --critical options can be specified as sizes,
467 percents, or both. Valid size units are bytes, kilobytes, megabytes,
468 gigabytes, terabytes, exabytes, petabytes, and zettabytes. You can
469 abbreviate all of those with the first letter. Items without units are
470 assumed to be 'bytes'. The default values are '1 GB' and '5 GB'. The
471 value represents the number of "wasted bytes", or the difference
472 between what is actually used by the table and index, and what we
473 compute that it should be.
474
475 Note that this action has two hard-coded values to avoid false alarms
476 on smaller relations. Tables must have at least 10 pages, and indexes
477 at least 15, before they can be considered by this test. If you really
478 want to adjust these values, you can look for the variables $MINPAGES
479 and $MINIPAGES at the top of the "check_bloat" subroutine. These values
480 are ignored if either --exclude or --include is used.
481
482 Only the top 10 most bloated relations are shown. You can change this
483 number by using the --perflimit option to set your own limit.
484
485 The schema named 'information_schema' is excluded from this test, as
486 the only tables it contains are small and do not change.
487
488 Please note that the values computed by this action are not precise,
489 and should be used as a guideline only. Great effort was made to
490 estimate the correct size of a table, but in the end it is only an
491 estimate. The correct index size is even more of a guess than the
492 correct table size, but both should give a rough idea of how bloated
493 things are.
494
495 Example 1: Warn if any table on port 5432 is over 100 MB bloated, and
496 critical if over 200 MB
497
498 check_postgres_bloat --port=5432 --warning='100 M' --critical='200 M'
499
500 Example 2: Give a critical if table 'orders' on host 'sami' has more
501 than 10 megs of bloat
502
503 check_postgres_bloat --host=sami --include=orders --critical='10 MB'
504
505 Example 3: Give a critical if table 'q4' on database 'sales' is over
506 50% bloated
507
508 check_postgres_bloat --db=sales --include=q4 --critical='50%'
509
510 Example 4: Give a critical any table is over 20% bloated and has over
511 150 MB of bloat:
512
513 check_postgres_bloat --port=5432 --critical='20% and 150 M'
514
515 Example 5: Give a critical any table is over 40% bloated or has over
516 500 MB of bloat:
517
518 check_postgres_bloat --port=5432 --warning='500 M or 40%'
519
520 For MRTG output, the first line gives the highest number of wasted
521 bytes for the tables, and the second line gives the highest number of
522 wasted bytes for the indexes. The fourth line gives the database name,
523 table name, and index name information. If you want to output the bloat
524 ratio instead (how many times larger the relation is compared to how
525 large it should be), just pass in "--mrtg=ratio".
526
527 checkpoint
528 ("symlink: check_postgres_checkpoint") Determines how long since the
529 last checkpoint has been run. This must run on the same server as the
530 database that is being checked (e.g. the -h flag will not work). This
531 check is meant to run on a "warm standby" server that is actively
532 processing shipped WAL files, and is meant to check that your warm
533 standby is truly 'warm'. The data directory must be set, either by the
534 environment variable "PGDATA", or passing the "--datadir" argument. It
535 returns the number of seconds since the last checkpoint was run, as
536 determined by parsing the call to "pg_controldata". Because of this,
537 the pg_controldata executable must be available in the current path.
538 Alternatively, you can specify "PGBINDIR" as the directory that it
539 lives in. It is also possible to use the special options --assume-prod
540 or --assume-standby-mode, if the mode found is not the one expected, a
541 CRITICAL is emitted.
542
543 At least one warning or critical argument must be set.
544
545 This action requires the Date::Parse module.
546
547 For MRTG or simple output, returns the number of seconds.
548
549 cluster_id
550 ("symlink: check_postgres_cluster-id") Checks that the Database System
551 Identifier provided by pg_controldata is the same as last time you
552 checked. This must run on the same server as the database that is being
553 checked (e.g. the -h flag will not work). Either the --warning or the
554 --critical option should be given, but not both. The value of each one
555 is the cluster identifier, an integer value. You can run with the
556 special "--critical=0" option to find out an existing cluster
557 identifier.
558
559 Example 1: Find the initial identifier
560
561 check_postgres_cluster_id --critical=0 --datadir=/var//lib/postgresql/9.0/main
562
563 Example 2: Make sure the cluster is the same and warn if not, using the
564 result from above.
565
566 check_postgres_cluster_id --critical=5633695740047915135
567
568 For MRTG output, returns a 1 or 0 indicating success of failure of the
569 identifier to match. A identifier must be provided as the "--mrtg"
570 argument. The fourth line always gives the current identifier.
571
572 commitratio
573 ("symlink: check_postgres_commitratio") Checks the commit ratio of all
574 databases and complains when they are too low. There is no need to run
575 this command more than once per database cluster. Databases can be
576 filtered with the --include and --exclude options. See the "BASIC
577 FILTERING" section for more details. They can also be filtered by the
578 owner of the database with the --includeuser and --excludeuser options.
579 See the "USER NAME FILTERING" section for more details.
580
581 The warning and critical options should be specified as percentages.
582 There are not defaults for this action: the warning and critical must
583 be specified. The warning value cannot be greater than the critical
584 value. The output returns all databases sorted by commitratio, smallest
585 first.
586
587 Example: Warn if any database on host flagg is less than 90% in
588 commitratio, and critical if less then 80%.
589
590 check_postgres_database_commitratio --host=flagg --warning='90%' --critical='80%'
591
592 For MRTG output, returns the percentage of the database with the
593 smallest commitratio on the first line, and the name of the database on
594 the fourth line.
595
596 connection
597 ("symlink: check_postgres_connection") Simply connects, issues a
598 'SELECT version()', and leaves. Takes no --warning or --critical
599 options.
600
601 For MRTG output, simply outputs a 1 (good connection) or a 0 (bad
602 connection) on the first line.
603
604 custom_query
605 ("symlink: check_postgres_custom_query") Runs a custom query of your
606 choosing, and parses the results. The query itself is passed in
607 through the "query" argument, and should be kept as simple as possible.
608 If at all possible, wrap it in a view or a function to keep things
609 easier to manage. The query should return one or two columns. It is
610 required that one of the columns be named "result" and is the item that
611 will be checked against your warning and critical values. The second
612 column is for the performance data and any name can be used: this will
613 be the 'value' inside the performance data section.
614
615 At least one warning or critical argument must be specified. What these
616 are set to depends on the type of query you are running. There are four
617 types of custom_queries that can be run, specified by the "valtype"
618 argument. If none is specified, this action defaults to 'integer'. The
619 four types are:
620
621 integer: Does a simple integer comparison. The first column should be a
622 simple integer, and the warning and critical values should be the same.
623
624 string: The warning and critical are strings, and are triggered only if
625 the value in the first column matches it exactly. This is case-
626 sensitive.
627
628 time: The warning and the critical are times, and can have units of
629 seconds, minutes, hours, or days. Each may be written singular or
630 abbreviated to just the first letter. If no units are given, seconds
631 are assumed. The first column should be an integer representing the
632 number of seconds to check.
633
634 size: The warning and the critical are sizes, and can have units of
635 bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. Each
636 may be abbreviated to the first letter. If no units are given, bytes
637 are assumed. The first column should be an integer representing the
638 number of bytes to check.
639
640 Normally, an alert is triggered if the values returned are greater than
641 or equal to the critical or warning value. However, an option of
642 --reverse will trigger the alert if the returned value is lower than or
643 equal to the critical or warning value.
644
645 Example 1: Warn if any relation over 100 pages is named "rad", put the
646 number of pages inside the performance data section.
647
648 check_postgres_custom_query --valtype=string -w "rad" --query=
649 "SELECT relname AS result, relpages AS pages FROM pg_class WHERE relpages > 100"
650
651 Example 2: Give a critical if the "foobar" function returns a number
652 over 5MB:
653
654 check_postgres_custom_query --critical='5MB'--valtype=size --query="SELECT foobar() AS result"
655
656 Example 2: Warn if the function "snazzo" returns less than 42:
657
658 check_postgres_custom_query --critical=42 --query="SELECT snazzo() AS result" --reverse
659
660 If you come up with a useful custom_query, consider sending in a patch
661 to this program to make it into a standard action that other people can
662 use.
663
664 This action does not support MRTG or simple output yet.
665
666 database_size
667 ("symlink: check_postgres_database_size") Checks the size of all
668 databases and complains when they are too big. There is no need to run
669 this command more than once per database cluster. Databases can be
670 filtered with the --include and --exclude options. See the "BASIC
671 FILTERING" section for more details. They can also be filtered by the
672 owner of the database with the --includeuser and --excludeuser options.
673 See the "USER NAME FILTERING" section for more details.
674
675 The warning and critical options can be specified as bytes, kilobytes,
676 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
677 to the first letter as well. If no unit is given, the units are
678 assumed to be bytes. There are not defaults for this action: the
679 warning and critical must be specified. The warning value cannot be
680 greater than the critical value. The output returns all databases
681 sorted by size largest first, showing both raw bytes and a "pretty"
682 version of the size.
683
684 Example 1: Warn if any database on host flagg is over 1 TB in size, and
685 critical if over 1.1 TB.
686
687 check_postgres_database_size --host=flagg --warning='1 TB' --critical='1.1 t'
688
689 Example 2: Give a critical if the database template1 on port 5432 is
690 over 10 MB.
691
692 check_postgres_database_size --port=5432 --include=template1 --warning='10MB' --critical='10MB'
693
694 Example 3: Give a warning if any database on host 'tardis' owned by the
695 user 'tom' is over 5 GB
696
697 check_postgres_database_size --host=tardis --includeuser=tom --warning='5 GB' --critical='10 GB'
698
699 For MRTG output, returns the size in bytes of the largest database on
700 the first line, and the name of the database on the fourth line.
701
702 dbstats
703 ("symlink: check_postgres_dbstats") Reports information from the
704 pg_stat_database view, and outputs it in a Cacti-friendly manner. No
705 other output is supported, as the output is informational and does not
706 lend itself to alerts, such as used with Nagios. If no options are
707 given, all databases are returned, one per line. You can include a
708 specific database by use of the "--include" option, or you can use the
709 "--dbname" option.
710
711 Eleven items are returned on each line, in the format name:value,
712 separated by a single space. The items are:
713
714 backends
715 The number of currently running backends for this database.
716
717 commits
718 The total number of commits for this database since it was created
719 or reset.
720
721 rollbacks
722 The total number of rollbacks for this database since it was
723 created or reset.
724
725 read
726 The total number of disk blocks read.
727
728 hit The total number of buffer hits.
729
730 ret The total number of rows returned.
731
732 fetch
733 The total number of rows fetched.
734
735 ins The total number of rows inserted.
736
737 upd The total number of rows updated.
738
739 del The total number of rows deleted.
740
741 dbname
742 The name of the database.
743
744 Note that ret, fetch, ins, upd, and del items will always be 0 if
745 Postgres is version 8.2 or lower, as those stats were not available in
746 those versions.
747
748 If the dbname argument is given, seven additional items are returned:
749
750 idxscan
751 Total number of user index scans.
752
753 idxtupread
754 Total number of user index entries returned.
755
756 idxtupfetch
757 Total number of rows fetched by simple user index scans.
758
759 idxblksread
760 Total number of disk blocks read for all user indexes.
761
762 idxblkshit
763 Total number of buffer hits for all user indexes.
764
765 seqscan
766 Total number of sequential scans against all user tables.
767
768 seqtupread
769 Total number of tuples returned from all user tables.
770
771 Example 1: Grab the stats for a database named "products" on host
772 "willow":
773
774 check_postgres_dbstats --dbhost willow --dbname products
775
776 The output returned will be like this (all on one line, not wrapped):
777
778 backends:82 commits:58374408 rollbacks:1651 read:268435543 hit:2920381758 idxscan:310931294 idxtupread:2777040927
779 idxtupfetch:1840241349 idxblksread:62860110 idxblkshit:1107812216 seqscan:5085305 seqtupread:5370500520
780 ret:0 fetch:0 ins:0 upd:0 del:0 dbname:willow
781
782 disabled_triggers
783 ("symlink: check_postgres_disabled_triggers") Checks on the number of
784 disabled triggers inside the database. The --warning and --critical
785 options are the number of such triggers found, and both default to "1",
786 as in normal usage having disabled triggers is a dangerous event. If
787 the database being checked is 8.3 or higher, the check is for the
788 number of triggers that are in a 'disabled' status (as opposed to being
789 'always' or 'replica'). The output will show the name of the table and
790 the name of the trigger for each disabled trigger.
791
792 Example 1: Make sure that there are no disabled triggers
793
794 check_postgres_disabled_triggers
795
796 For MRTG output, returns the number of disabled triggers on the first
797 line.
798
799 disk_space
800 ("symlink: check_postgres_disk_space") Checks on the available physical
801 disk space used by Postgres. This action requires that you have the
802 executable "/bin/df" available to report on disk sizes, and it also
803 needs to be run as a superuser, so it can examine the data_directory
804 setting inside of Postgres. The --warning and --critical options are
805 given in either sizes or percentages or both. If using sizes, the
806 standard unit types are allowed: bytes, kilobytes, gigabytes,
807 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
808 to the first letter only; no units at all indicates 'bytes'. The
809 default values are '90%' and '95%'.
810
811 This command checks the following things to determine all of the
812 different physical disks being used by Postgres.
813
814 data_directory - The disk that the main data directory is on.
815
816 log directory - The disk that the log files are on.
817
818 WAL file directory - The disk that the write-ahead logs are on (e.g.
819 symlinked pg_xlog or pg_wal)
820
821 tablespaces - Each tablespace that is on a separate disk.
822
823 The output shows the total size used and available on each disk, as
824 well as the percentage, ordered by highest to lowest percentage used.
825 Each item above maps to a file system: these can be included or
826 excluded. See the "BASIC FILTERING" section for more details.
827
828 Example 1: Make sure that no file system is over 90% for the database
829 on port 5432.
830
831 check_postgres_disk_space --port=5432 --warning='90%' --critical='90%'
832
833 Example 2: Check that all file systems starting with /dev/sda are
834 smaller than 10 GB and 11 GB (warning and critical)
835
836 check_postgres_disk_space --port=5432 --warning='10 GB' --critical='11 GB' --include="~^/dev/sda"
837
838 Example 4: Make sure that no file system is both over 50% and has over
839 15 GB
840
841 check_postgres_disk_space --critical='50% and 15 GB'
842
843 Example 5: Issue a warning if any file system is either over 70% full
844 or has more than 1T
845
846 check_postgres_disk_space --warning='1T or 75'
847
848 For MRTG output, returns the size in bytes of the file system on the
849 first line, and the name of the file system on the fourth line.
850
851 fsm_pages
852 ("symlink: check_postgres_fsm_pages") Checks how close a cluster is to
853 the Postgres max_fsm_pages setting. This action will only work for
854 databases of 8.2 or higher, and it requires the contrib module
855 pg_freespacemap be installed. The --warning and --critical options
856 should be expressed as percentages. The number of used pages in the
857 free-space-map is determined by looking in the
858 pg_freespacemap_relations view, and running a formula based on the
859 formula used for outputting free-space-map pageslots in the vacuum
860 verbose command. The default values are 85% for the warning and 95% for
861 the critical.
862
863 Example 1: Give a warning when our cluster has used up 76% of the free-
864 space pageslots, with pg_freespacemap installed in database robert
865
866 check_postgres_fsm_pages --dbname=robert --warning="76%"
867
868 While you need to pass in the name of the database where
869 pg_freespacemap is installed, you only need to run this check once per
870 cluster. Also, checking this information does require obtaining special
871 locks on the free-space-map, so it is recommend you do not run this
872 check with short intervals.
873
874 For MRTG output, returns the percent of free-space-map on the first
875 line, and the number of pages currently used on the second line.
876
877 fsm_relations
878 ("symlink: check_postgres_fsm_relations") Checks how close a cluster is
879 to the Postgres max_fsm_relations setting. This action will only work
880 for databases of 8.2 or higher, and it requires the contrib module
881 pg_freespacemap be installed. The --warning and --critical options
882 should be expressed as percentages. The number of used relations in the
883 free-space-map is determined by looking in the
884 pg_freespacemap_relations view. The default values are 85% for the
885 warning and 95% for the critical.
886
887 Example 1: Give a warning when our cluster has used up 80% of the free-
888 space relations, with pg_freespacemap installed in database dylan
889
890 check_postgres_fsm_relations --dbname=dylan --warning="75%"
891
892 While you need to pass in the name of the database where
893 pg_freespacemap is installed, you only need to run this check once per
894 cluster. Also, checking this information does require obtaining special
895 locks on the free-space-map, so it is recommend you do not run this
896 check with short intervals.
897
898 For MRTG output, returns the percent of free-space-map on the first
899 line, the number of relations currently used on the second line.
900
901 hitratio
902 ("symlink: check_postgres_hitratio") Checks the hit ratio of all
903 databases and complains when they are too low. There is no need to run
904 this command more than once per database cluster. Databases can be
905 filtered with the --include and --exclude options. See the "BASIC
906 FILTERING" section for more details. They can also be filtered by the
907 owner of the database with the --includeuser and --excludeuser options.
908 See the "USER NAME FILTERING" section for more details.
909
910 The warning and critical options should be specified as percentages.
911 There are not defaults for this action: the warning and critical must
912 be specified. The warning value cannot be greater than the critical
913 value. The output returns all databases sorted by hitratio, smallest
914 first.
915
916 Example: Warn if any database on host flagg is less than 90% in
917 hitratio, and critical if less then 80%.
918
919 check_postgres_hitratio --host=flagg --warning='90%' --critical='80%'
920
921 For MRTG output, returns the percentage of the database with the
922 smallest hitratio on the first line, and the name of the database on
923 the fourth line.
924
925 hot_standby_delay
926 ("symlink: check_hot_standby_delay") Checks the streaming replication
927 lag by computing the delta between the current xlog position of a
928 master server and the replay location of a slave connected to it. The
929 slave server must be in hot_standby (e.g. read only) mode, therefore
930 the minimum version to use this action is Postgres 9.0. The --warning
931 and --critical options are the delta between the xlog locations. Since
932 these values are byte offsets in the WAL they should match the expected
933 transaction volume of your application to prevent false positives or
934 negatives.
935
936 The first "--dbname", "--host", and "--port", etc. options are
937 considered the master; the second belongs to the slave.
938
939 Byte values should be based on the volume of transactions needed to
940 have the streaming replication disconnect from the master because of
941 too much lag, determined by the Postgres configuration variable
942 wal_keep_segments. For units of time, valid units are 'seconds',
943 'minutes', 'hours', or 'days'. Each may be written singular or
944 abbreviated to just the first letter. When specifying both, in the form
945 'bytes and time', both conditions must be true for the threshold to be
946 met.
947
948 You must provide information on how to reach the databases by providing
949 a comma separated list to the --dbhost and --dbport parameters, such as
950 "--dbport=5432,5543". If not given, the action fails.
951
952 Example 1: Warn a database with a local replica on port 5433 is behind
953 on any xlog replay at all
954
955 check_hot_standby_delay --dbport=5432,5433 --warning='1'
956
957 Example 2: Give a critical if the last transaction replica1 receives is
958 more than 10 minutes ago
959
960 check_hot_standby_delay --dbhost=master,replica1 --critical='10 min'
961
962 Example 3: Allow replica1 to be 1 WAL segment behind, if the master is
963 momentarily seeing more activity than the streaming replication
964 connection can handle, or 10 minutes behind, if the master is seeing
965 very little activity and not processing any transactions, but not both,
966 which would indicate a lasting problem with the replication connection.
967
968 check_hot_standby_delay --dbhost=master,replica1 --warning='1048576 and 2 min' --critical='16777216 and 10 min'
969
970 relation_size
971 index_size
972 table_size
973 indexes_size
974 total_relation_size
975 (symlinks: "check_postgres_relation_size", "check_postgres_index_size",
976 "check_postgres_table_size", "check_postgres_indexes_size", and
977 "check_postgres_total_relation_size")
978
979 The actions relation_size and index_size check for a relation (table,
980 index, materialized view), respectively an index that has grown too
981 big, using the pg_relation_size() function.
982
983 The action table_size checks tables and materialized views using
984 pg_table_size(), i.e. including relation forks and TOAST table.
985
986 The action indexes_size checks tables and materialized views for the
987 size of the attached indexes using pg_indexes_size().
988
989 The action total_relation_size checks relations using
990 pg_total_relation_size(), i.e. including relation forks, indexes and
991 TOAST table.
992
993 Relations can be filtered with the --include and --exclude options. See
994 the "BASIC FILTERING" section for more details. Relations can also be
995 filtered by the user that owns them, by using the --includeuser and
996 --excludeuser options. See the "USER NAME FILTERING" section for more
997 details.
998
999 The values for the --warning and --critical options are file sizes, and
1000 may have units of bytes, kilobytes, megabytes, gigabytes, terabytes, or
1001 exabytes. Each can be abbreviated to the first letter. If no units are
1002 given, bytes are assumed. There are no default values: both the warning
1003 and the critical option must be given. The return text shows the size
1004 of the largest relation found.
1005
1006 If the --showperf option is enabled, all of the relations with their
1007 sizes will be given. To prevent this, it is recommended that you set
1008 the --perflimit option, which will cause the query to do a "ORDER BY
1009 size DESC LIMIT (perflimit)".
1010
1011 Example 1: Give a critical if any table is larger than 600MB on host
1012 burrick.
1013
1014 check_postgres_table_size --critical='600 MB' --warning='600 MB' --host=burrick
1015
1016 Example 2: Warn if the table products is over 4 GB in size, and give a
1017 critical at 4.5 GB.
1018
1019 check_postgres_table_size --host=burrick --warning='4 GB' --critical='4.5 GB' --include=products
1020
1021 Example 3: Warn if any index not owned by postgres goes over 500 MB.
1022
1023 check_postgres_index_size --port=5432 --excludeuser=postgres -w 500MB -c 600MB
1024
1025 For MRTG output, returns the size in bytes of the largest relation, and
1026 the name of the database and relation as the fourth line.
1027
1028 last_analyze
1029 last_vacuum
1030 last_autoanalyze
1031 last_autovacuum
1032 (symlinks: "check_postgres_last_analyze", "check_postgres_last_vacuum",
1033 "check_postgres_last_autoanalyze", and
1034 "check_postgres_last_autovacuum") Checks how long it has been since
1035 vacuum (or analyze) was last run on each table in one or more
1036 databases. Use of these actions requires that the target database is
1037 version 8.3 or greater, or that the version is 8.2 and the
1038 configuration variable stats_row_level has been enabled. Tables can be
1039 filtered with the --include and --exclude options. See the "BASIC
1040 FILTERING" section for more details. Tables can also be filtered by
1041 their owner by use of the --includeuser and --excludeuser options. See
1042 the "USER NAME FILTERING" section for more details.
1043
1044 The units for --warning and --critical are specified as times. Valid
1045 units are seconds, minutes, hours, and days; all can be abbreviated to
1046 the first letter. If no units are given, 'seconds' are assumed. The
1047 default values are '1 day' and '2 days'. Please note that there are
1048 cases in which this field does not get automatically populated. If
1049 certain tables are giving you problems, make sure that they have dead
1050 rows to vacuum, or just exclude them from the test.
1051
1052 The schema named 'information_schema' is excluded from this test, as
1053 the only tables it contains are small and do not change.
1054
1055 Note that the non-'auto' versions will also check on the auto versions
1056 as well. In other words, using last_vacuum will report on the last
1057 vacuum, whether it was a normal vacuum, or one run by the autovacuum
1058 daemon.
1059
1060 Example 1: Warn if any table has not been vacuumed in 3 days, and give
1061 a critical at a week, for host wormwood
1062
1063 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d'
1064
1065 Example 2: Same as above, but skip tables belonging to the users 'eve'
1066 or 'mallory'
1067
1068 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d' --excludeuser=eve,mallory
1069
1070 For MRTG output, returns (on the first line) the LEAST amount of time
1071 in seconds since a table was last vacuumed or analyzed. The fourth line
1072 returns the name of the database and name of the table.
1073
1074 listener
1075 ("symlink: check_postgres_listener") Confirm that someone is listening
1076 for one or more specific strings (using the LISTEN/NOTIFY system), by
1077 looking at the pg_listener table. Only one of warning or critical is
1078 needed. The format is a simple string representing the LISTEN target,
1079 or a tilde character followed by a string for a regular expression
1080 check. Note that this check will not work on versions of Postgres 9.0
1081 or higher.
1082
1083 Example 1: Give a warning if nobody is listening for the string
1084 bucardo_mcp_ping on ports 5555 and 5556
1085
1086 check_postgres_listener --port=5555,5556 --warning=bucardo_mcp_ping
1087
1088 Example 2: Give a critical if there are no active LISTEN requests
1089 matching 'grimm' on database oskar
1090
1091 check_postgres_listener --db oskar --critical=~grimm
1092
1093 For MRTG output, returns a 1 or a 0 on the first, indicating success or
1094 failure. The name of the notice must be provided via the --mrtg option.
1095
1096 locks
1097 ("symlink: check_postgres_locks") Check the total number of locks on
1098 one or more databases. There is no need to run this more than once per
1099 database cluster. Databases can be filtered with the --include and
1100 --exclude options. See the "BASIC FILTERING" section for more details.
1101
1102 The --warning and --critical options can be specified as simple
1103 numbers, which represent the total number of locks, or they can be
1104 broken down by type of lock. Valid lock names are 'total', 'waiting',
1105 or the name of a lock type used by Postgres. These names are case-
1106 insensitive and do not need the "lock" part on the end, so exclusive
1107 will match 'ExclusiveLock'. The format is name=number, with different
1108 items separated by colons or semicolons (or any other symbol).
1109
1110 Example 1: Warn if the number of locks is 100 or more, and critical if
1111 200 or more, on host garrett
1112
1113 check_postgres_locks --host=garrett --warning=100 --critical=200
1114
1115 Example 2: On the host artemus, warn if 200 or more locks exist, and
1116 give a critical if over 250 total locks exist, or if over 20 exclusive
1117 locks exist, or if over 5 connections are waiting for a lock.
1118
1119 check_postgres_locks --host=artemus --warning=200 --critical="total=250:waiting=5:exclusive=20"
1120
1121 For MRTG output, returns the number of locks on the first line, and the
1122 name of the database on the fourth line.
1123
1124 logfile
1125 ("symlink: check_postgres_logfile") Ensures that the logfile is in the
1126 expected location and is being logged to. This action issues a command
1127 that throws an error on each database it is checking, and ensures that
1128 the message shows up in the logs. It scans the various log_* settings
1129 inside of Postgres to figure out where the logs should be. If you are
1130 using syslog, it does a rough (but not foolproof) scan of
1131 /etc/syslog.conf. Alternatively, you can provide the name of the
1132 logfile with the --logfile option. This is especially useful if the
1133 logs have a custom rotation scheme driven be an external program. The
1134 --logfile option supports the following escape characters: "%Y %m %d
1135 %H", which represent the current year, month, date, and hour
1136 respectively. An error is always reported as critical unless the
1137 warning option has been passed in as a non-zero value. Other than that
1138 specific usage, the "--warning" and "--critical" options should not be
1139 used.
1140
1141 Example 1: On port 5432, ensure the logfile is being written to the
1142 file /home/greg/pg8.2.log
1143
1144 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log
1145
1146 Example 2: Same as above, but raise a warning, not a critical
1147
1148 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log -w 1
1149
1150 For MRTG output, returns a 1 or 0 on the first line, indicating success
1151 or failure. In case of a failure, the fourth line will provide more
1152 detail on the failure encountered.
1153
1154 new_version_bc
1155 ("symlink: check_postgres_new_version_bc") Checks if a newer version of
1156 the Bucardo program is available. The current version is obtained by
1157 running "bucardo_ctl --version". If a major upgrade is available, a
1158 warning is returned. If a revision upgrade is available, a critical is
1159 returned. (Bucardo is a master to slave, and master to master
1160 replication system for Postgres: see <https://bucardo.org/> for more
1161 information). See also the information on the "--get_method" option.
1162
1163 new_version_box
1164 ("symlink: check_postgres_new_version_box") Checks if a newer version
1165 of the boxinfo program is available. The current version is obtained by
1166 running "boxinfo.pl --version". If a major upgrade is available, a
1167 warning is returned. If a revision upgrade is available, a critical is
1168 returned. (boxinfo is a program for grabbing important information from
1169 a server and putting it into a HTML format: see
1170 <https://bucardo.org/Boxinfo/> for more information). See also the
1171 information on the "--get_method" option.
1172
1173 new_version_cp
1174 ("symlink: check_postgres_new_version_cp") Checks if a newer version of
1175 this program (check_postgres.pl) is available, by grabbing the version
1176 from a small text file on the main page of the home page for the
1177 project. Returns a warning if the returned version does not match the
1178 one you are running. Recommended interval to check is once a day. See
1179 also the information on the "--get_method" option.
1180
1181 new_version_pg
1182 ("symlink: check_postgres_new_version_pg") Checks if a newer revision
1183 of Postgres exists for each database connected to. Note that this only
1184 checks for revision, e.g. going from 8.3.6 to 8.3.7. Revisions are
1185 always 100% binary compatible and involve no dump and restore to
1186 upgrade. Revisions are made to address bugs, so upgrading as soon as
1187 possible is always recommended. Returns a warning if you do not have
1188 the latest revision. It is recommended this check is run at least once
1189 a day. See also the information on the "--get_method" option.
1190
1191 new_version_tnm
1192 ("symlink: check_postgres_new_version_tnm") Checks if a newer version
1193 of the tail_n_mail program is available. The current version is
1194 obtained by running "tail_n_mail --version". If a major upgrade is
1195 available, a warning is returned. If a revision upgrade is available, a
1196 critical is returned. (tail_n_mail is a log monitoring tool that can
1197 send mail when interesting events appear in your Postgres logs. See:
1198 <https://bucardo.org/tail_n_mail/> for more information). See also the
1199 information on the "--get_method" option.
1200
1201 pgb_pool_cl_active
1202 pgb_pool_cl_waiting
1203 pgb_pool_sv_active
1204 pgb_pool_sv_idle
1205 pgb_pool_sv_used
1206 pgb_pool_sv_tested
1207 pgb_pool_sv_login
1208 pgb_pool_maxwait
1209 (symlinks: "check_postgres_pgb_pool_cl_active",
1210 "check_postgres_pgb_pool_cl_waiting",
1211 "check_postgres_pgb_pool_sv_active", "check_postgres_pgb_pool_sv_idle",
1212 "check_postgres_pgb_pool_sv_used", "check_postgres_pgb_pool_sv_tested",
1213 "check_postgres_pgb_pool_sv_login", and
1214 "check_postgres_pgb_pool_maxwait")
1215
1216 Examines pgbouncer's pool statistics. Each pool has a set of "client"
1217 connections, referring to connections from external clients, and
1218 "server" connections, referring to connections to PostgreSQL itself.
1219 The related check_postgres actions are prefixed by "cl_" and "sv_",
1220 respectively. Active client connections are those connections currently
1221 linked with an active server connection. Client connections may also be
1222 "waiting", meaning they have not yet been allocated a server
1223 connection. Server connections are "active" (linked to a client),
1224 "idle" (standing by for a client connection to link with), "used" (just
1225 unlinked from a client, and not yet returned to the idle pool),
1226 "tested" (currently being tested) and "login" (in the process of
1227 logging in). The maxwait value shows how long in seconds the oldest
1228 waiting client connection has been waiting.
1229
1230 pgbouncer_backends
1231 ("symlink: check_postgres_pgbouncer_backends") Checks the current
1232 number of connections for one or more databases through pgbouncer, and
1233 optionally compares it to the maximum allowed, which is determined by
1234 the pgbouncer configuration variable max_client_conn. The --warning and
1235 --critical options can take one of three forms. First, a simple number
1236 can be given, which represents the number of connections at which the
1237 alert will be given. This choice does not use the max_connections
1238 setting. Second, the percentage of available connections can be given.
1239 Third, a negative number can be given which represents the number of
1240 connections left until max_connections is reached. The default values
1241 for --warning and --critical are '90%' and '95%'. You can also filter
1242 the databases by use of the --include and --exclude options. See the
1243 "BASIC FILTERING" section for more details.
1244
1245 To view only non-idle processes, you can use the --noidle argument.
1246 Note that the user you are connecting as must be a superuser for this
1247 to work properly.
1248
1249 Example 1: Give a warning when the number of connections on host quirm
1250 reaches 120, and a critical if it reaches 150.
1251
1252 check_postgres_pgbouncer_backends --host=quirm --warning=120 --critical=150 -p 6432 -u pgbouncer
1253
1254 Example 2: Give a critical when we reach 75% of our max_connections
1255 setting on hosts lancre or lancre2.
1256
1257 check_postgres_pgbouncer_backends --warning='75%' --critical='75%' --host=lancre,lancre2 -p 6432 -u pgbouncer
1258
1259 Example 3: Give a warning when there are only 10 more connection slots
1260 left on host plasmid, and a critical when we have only 5 left.
1261
1262 check_postgres_pgbouncer_backends --warning=-10 --critical=-5 --host=plasmid -p 6432 -u pgbouncer
1263
1264 For MRTG output, the number of connections is reported on the first
1265 line, and the fourth line gives the name of the database, plus the
1266 current max_client_conn. If more than one database has been queried,
1267 the one with the highest number of connections is output.
1268
1269 pgbouncer_checksum
1270 ("symlink: check_postgres_pgbouncer_checksum") Checks that all the
1271 pgBouncer settings are the same as last time you checked. This is done
1272 by generating a checksum of a sorted list of setting names and their
1273 values. Note that you shouldn't specify the database name, it will
1274 automatically default to pgbouncer. Either the --warning or the
1275 --critical option should be given, but not both. The value of each one
1276 is the checksum, a 32-character hexadecimal value. You can run with the
1277 special "--critical=0" option to find out an existing checksum.
1278
1279 This action requires the Digest::MD5 module.
1280
1281 Example 1: Find the initial checksum for pgbouncer configuration on
1282 port 6432 using the default user (usually postgres)
1283
1284 check_postgres_pgbouncer_checksum --port=6432 --critical=0
1285
1286 Example 2: Make sure no settings have changed and warn if so, using the
1287 checksum from above.
1288
1289 check_postgres_pgbouncer_checksum --port=6432 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1290
1291 For MRTG output, returns a 1 or 0 indicating success of failure of the
1292 checksum to match. A checksum must be provided as the "--mrtg"
1293 argument. The fourth line always gives the current checksum.
1294
1295 pgagent_jobs
1296 ("symlink: check_postgres_pgagent_jobs") Checks that all the pgAgent
1297 jobs that have executed in the preceding interval of time have
1298 succeeded. This is done by checking for any steps that have a non-zero
1299 result.
1300
1301 Either "--warning" or "--critical", or both, may be specified as times,
1302 and jobs will be checked for failures withing the specified periods of
1303 time before the current time. Valid units are seconds, minutes, hours,
1304 and days; all can be abbreviated to the first letter. If no units are
1305 given, 'seconds' are assumed.
1306
1307 Example 1: Give a critical when any jobs executed in the last day have
1308 failed.
1309
1310 check_postgres_pgagent_jobs --critical=1d
1311
1312 Example 2: Give a warning when any jobs executed in the last week have
1313 failed.
1314
1315 check_postgres_pgagent_jobs --warning=7d
1316
1317 Example 3: Give a critical for jobs that have failed in the last 2
1318 hours and a warning for jobs that have failed in the last 4 hours:
1319
1320 check_postgres_pgagent_jobs --critical=2h --warning=4h
1321
1322 prepared_txns
1323 ("symlink: check_postgres_prepared_txns") Check on the age of any
1324 existing prepared transactions. Note that most people will NOT use
1325 prepared transactions, as they are part of two-part commit and
1326 complicated to maintain. They should also not be confused with prepared
1327 STATEMENTS, which is what most people think of when they hear prepare.
1328 The default value for a warning is 1 second, to detect any use of
1329 prepared transactions, which is probably a mistake on most systems.
1330 Warning and critical are the number of seconds a prepared transaction
1331 has been open before an alert is given.
1332
1333 Example 1: Give a warning on detecting any prepared transactions:
1334
1335 check_postgres_prepared_txns -w 0
1336
1337 Example 2: Give a critical if any prepared transaction has been open
1338 longer than 10 seconds, but allow up to 360 seconds for the database
1339 'shrike':
1340
1341 check_postgres_prepared_txns --critical=10 --exclude=shrike
1342 check_postgres_prepared_txns --critical=360 --include=shrike
1343
1344 For MRTG output, returns the number of seconds the oldest transaction
1345 has been open as the first line, and which database is came from as the
1346 final line.
1347
1348 query_runtime
1349 ("symlink: check_postgres_query_runtime") Checks how long a specific
1350 query takes to run, by executing a "EXPLAIN ANALYZE" against it. The
1351 --warning and --critical options are the maximum amount of time the
1352 query should take. Valid units are seconds, minutes, and hours; any can
1353 be abbreviated to the first letter. If no units are given, 'seconds'
1354 are assumed. Both the warning and the critical option must be given.
1355 The name of the view or function to be run must be passed in to the
1356 --queryname option. It must consist of a single word (or schema.word),
1357 with optional parens at the end.
1358
1359 Example 1: Give a critical if the function named "speedtest" fails to
1360 run in 10 seconds or less.
1361
1362 check_postgres_query_runtime --queryname='speedtest()' --critical=10 --warning=10
1363
1364 For MRTG output, reports the time in seconds for the query to complete
1365 on the first line. The fourth line lists the database.
1366
1367 query_time
1368 ("symlink: check_postgres_query_time") Checks the length of running
1369 queries on one or more databases. There is no need to run this more
1370 than once on the same database cluster. Note that this already excludes
1371 queries that are "idle in transaction". Databases can be filtered by
1372 using the --include and --exclude options. See the "BASIC FILTERING"
1373 section for more details. You can also filter on the user running the
1374 query with the --includeuser and --excludeuser options. See the "USER
1375 NAME FILTERING" section for more details.
1376
1377 The values for the --warning and --critical options are amounts of
1378 time, and at least one must be provided (no defaults). Valid units are
1379 'seconds', 'minutes', 'hours', or 'days'. Each may be written singular
1380 or abbreviated to just the first letter. If no units are given, the
1381 unit is assumed to be seconds.
1382
1383 This action requires Postgres 8.1 or better.
1384
1385 Example 1: Give a warning if any query has been running longer than 3
1386 minutes, and a critical if longer than 5 minutes.
1387
1388 check_postgres_query_time --port=5432 --warning='3 minutes' --critical='5 minutes'
1389
1390 Example 2: Using default values (2 and 5 minutes), check all databases
1391 except those starting with 'template'.
1392
1393 check_postgres_query_time --port=5432 --exclude=~^template
1394
1395 Example 3: Warn if user 'don' has a query running over 20 seconds
1396
1397 check_postgres_query_time --port=5432 --includeuser=don --warning=20s
1398
1399 For MRTG output, returns the length in seconds of the longest running
1400 query on the first line. The fourth line gives the name of the
1401 database.
1402
1403 replicate_row
1404 ("symlink: check_postgres_replicate_row") Checks that master-slave
1405 replication is working to one or more slaves.
1406
1407 The first "--dbname", "--host", and "--port", etc. options are
1408 considered the master; subsequent uses are the slaves. The values or
1409 the --warning and --critical options are units of time, and at least
1410 one must be provided (no defaults). Valid units are 'seconds',
1411 'minutes', 'hours', or 'days'. Each may be written singular or
1412 abbreviated to just the first letter. If no units are given, the units
1413 are assumed to be seconds.
1414
1415 This check updates a single row on the master, and then measures how
1416 long it takes to be applied to the slaves. To do this, you need to pick
1417 a table that is being replicated, then find a row that can be changed,
1418 and is not going to be changed by any other process. A specific column
1419 of this row will be changed from one value to another. All of this is
1420 fed to the "repinfo" option, and should contain the following options,
1421 separated by commas: table name, primary key, key id, column, first
1422 value, second value.
1423
1424 Example 1: Slony is replicating a table named 'orders' from host
1425 'alpha' to host 'beta', in the database 'sales'. The primary key of the
1426 table is named id, and we are going to test the row with an id of 3
1427 (which is historical and never changed). There is a column named
1428 'salesrep' that we are going to toggle from a value of 'slon' to 'nols'
1429 to check on the replication. We want to throw a warning if the
1430 replication does not happen within 10 seconds.
1431
1432 check_postgres_replicate_row --host=alpha --dbname=sales --host=beta
1433 --dbname=sales --warning=10 --repinfo=orders,id,3,salesrep,slon,nols
1434
1435 Example 2: Bucardo is replicating a table named 'receipt' from host
1436 'green' to hosts 'red', 'blue', and 'yellow'. The database for both
1437 sides is 'public'. The slave databases are running on port 5455. The
1438 primary key is named 'receipt_id', the row we want to use has a value
1439 of 9, and the column we want to change for the test is called 'zone'.
1440 We'll toggle between 'north' and 'south' for the value of this column,
1441 and throw a critical if the change is not on all three slaves within 5
1442 seconds.
1443
1444 check_postgres_replicate_row --host=green --port=5455 --host=red,blue,yellow
1445 --critical=5 --repinfo=receipt,receipt_id,9,zone,north,south
1446
1447 For MRTG output, returns on the first line the time in seconds the
1448 replication takes to finish. The maximum time is set to 4 minutes 30
1449 seconds: if no replication has taken place in that long a time, an
1450 error is thrown.
1451
1452 replication_slots
1453 ("symlink: check_postgres_replication_slots") Check the quantity of
1454 WAL retained for any replication slots in the target database cluster.
1455 This is handy for monitoring environments where all WAL archiving and
1456 replication is taking place over replication slots.
1457
1458 Warning and critical are total bytes retained for the slot. E.g:
1459
1460 check_postgres_replication_slots --port=5432 --host=yellow -warning=32M -critical=64M
1461
1462 Specific named slots can be monitored using --include/--exclude
1463
1464 same_schema
1465 ("symlink: check_postgres_same_schema") Verifies that two or more
1466 databases are identical as far as their schema (but not the data
1467 within). Unlike most other actions, this has no warning or critical
1468 criteria - the databases are either in sync, or are not. If they are
1469 different, a detailed list of the differences is presented.
1470
1471 You may want to exclude or filter out certain differences. The way to
1472 do this is to add strings to the "--filter" option. To exclude a type
1473 of object, use "noname", where 'name' is the type of object, for
1474 example, "noschema". To exclude objects of a certain type by a regular
1475 expression against their name, use "noname=regex". See the examples
1476 below for a better understanding.
1477
1478 The types of objects that can be filtered include:
1479
1480 user
1481 schema
1482 table
1483 view
1484 index
1485 sequence
1486 constraint
1487 trigger
1488 function
1489
1490 The filter option "noposition" prevents verification of the position
1491 of columns within a table.
1492
1493 The filter option "nofuncbody" prevents comparison of the bodies of all
1494 functions.
1495
1496 The filter option "noperm" prevents comparison of object permissions.
1497
1498 To provide the second database, just append the differences to the
1499 first one by a call to the appropriate connection argument. For
1500 example, to compare databases on hosts alpha and bravo, use
1501 "--dbhost=alpha,bravo". Also see the examples below.
1502
1503 If only a single host is given, it is assumed we are doing a "time-
1504 based" report. The first time this is run a snapshot of all the items
1505 in the database is saved to a local file. When you run it again, that
1506 snapshot is read in and becomes "database #2" and is compared to the
1507 current database.
1508
1509 To replace the old stored file with the new version, use the --replace
1510 argument.
1511
1512 If you need to write the stored file to a specific directory, use the
1513 --audit-file-dir argument.
1514
1515 To avoid false positives on value based checks caused by replication
1516 lag on asynchronous replicas, use the --assume-async option.
1517
1518 To enable snapshots at various points in time, you can use the
1519 "--suffix" argument to make the filenames unique to each run. See the
1520 examples below.
1521
1522 Example 1: Verify that two databases on hosts star and line are the
1523 same:
1524
1525 check_postgres_same_schema --dbhost=star,line
1526
1527 Example 2: Same as before, but exclude any triggers with "slony" in
1528 their name
1529
1530 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony"
1531
1532 Example 3: Same as before, but also exclude all indexes
1533
1534 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony noindexes"
1535
1536 Example 4: Check differences for the database "battlestar" on different
1537 ports
1538
1539 check_postgres_same_schema --dbname=battlestar --dbport=5432,5544
1540
1541 Example 5: Create a daily and weekly snapshot file
1542
1543 check_postgres_same_schema --dbname=cylon --suffix=daily
1544 check_postgres_same_schema --dbname=cylon --suffix=weekly
1545
1546 Example 6: Run a historical comparison, then replace the file
1547
1548 check_postgres_same_schema --dbname=cylon --suffix=daily --replace
1549
1550 Example 7: Verify that two databases on hosts star and line are the
1551 same, excluding value data (i.e. sequence last_val):
1552
1553 check_postgres_same_schema --dbhost=star,line --assume-async
1554
1555 sequence
1556 ("symlink: check_postgres_sequence") Checks how much room is left on
1557 all sequences in the database. This is measured as the percent of
1558 total possible values that have been used for each sequence. The
1559 --warning and --critical options should be expressed as percentages.
1560 The default values are 85% for the warning and 95% for the critical.
1561 You may use --include and --exclude to control which sequences are to
1562 be checked. Note that this check does account for unusual minvalue and
1563 increment by values. By default it does not care if the sequence is set
1564 to cycle or not, and by passing --skipcycled sequenced set to cycle are
1565 reported with 0% usage.
1566
1567 The output for Nagios gives the name of the sequence, the percentage
1568 used, and the number of 'calls' left, indicating how many more times
1569 nextval can be called on that sequence before running into the maximum
1570 value.
1571
1572 The output for MRTG returns the highest percentage across all sequences
1573 on the first line, and the name of each sequence with that percentage
1574 on the fourth line, separated by a "|" (pipe) if there are more than
1575 one sequence at that percentage.
1576
1577 Example 1: Give a warning if any sequences are approaching 95% full.
1578
1579 check_postgres_sequence --dbport=5432 --warning=95%
1580
1581 Example 2: Check that the sequence named "orders_id_seq" is not more
1582 than half full.
1583
1584 check_postgres_sequence --dbport=5432 --critical=50% --include=orders_id_seq
1585
1586 settings_checksum
1587 ("symlink: check_postgres_settings_checksum") Checks that all the
1588 Postgres settings are the same as last time you checked. This is done
1589 by generating a checksum of a sorted list of setting names and their
1590 values. Note that different users in the same database may have
1591 different checksums, due to ALTER USER usage, and due to the fact that
1592 superusers see more settings than ordinary users. Either the --warning
1593 or the --critical option should be given, but not both. The value of
1594 each one is the checksum, a 32-character hexadecimal value. You can run
1595 with the special "--critical=0" option to find out an existing
1596 checksum.
1597
1598 This action requires the Digest::MD5 module.
1599
1600 Example 1: Find the initial checksum for the database on port 5555
1601 using the default user (usually postgres)
1602
1603 check_postgres_settings_checksum --port=5555 --critical=0
1604
1605 Example 2: Make sure no settings have changed and warn if so, using the
1606 checksum from above.
1607
1608 check_postgres_settings_checksum --port=5555 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1609
1610 For MRTG output, returns a 1 or 0 indicating success of failure of the
1611 checksum to match. A checksum must be provided as the "--mrtg"
1612 argument. The fourth line always gives the current checksum.
1613
1614 slony_status
1615 ("symlink: check_postgres_slony_status") Checks in the status of a
1616 Slony cluster by looking at the results of Slony's sl_status view. This
1617 is returned as the number of seconds of "lag time". The --warning and
1618 --critical options should be expressed as times. The default values are
1619 60 seconds for the warning and 300 seconds for the critical.
1620
1621 The optional argument --schema indicated the schema that Slony is
1622 installed under. If it is not given, the schema will be determined
1623 automatically each time this check is run.
1624
1625 Example 1: Give a warning if any Slony is lagged by more than 20
1626 seconds
1627
1628 check_postgres_slony_status --warning 20
1629
1630 Example 2: Give a critical if Slony, installed under the schema
1631 "_slony", is over 10 minutes lagged
1632
1633 check_postgres_slony_status --schema=_slony --critical=600
1634
1635 timesync
1636 ("symlink: check_postgres_timesync") Compares the local system time
1637 with the time reported by one or more databases. The --warning and
1638 --critical options represent the number of seconds between the two
1639 systems before an alert is given. If neither is specified, the default
1640 values are used, which are '2' and '5'. The warning value cannot be
1641 greater than the critical value. Due to the non-exact nature of this
1642 test, values of '0' or '1' are not recommended.
1643
1644 The string returned shows the time difference as well as the time on
1645 each side written out.
1646
1647 Example 1: Check that databases on hosts ankh, morpork, and klatch are
1648 no more than 3 seconds off from the local time:
1649
1650 check_postgres_timesync --host=ankh,morpork,klatch --critical=3
1651
1652 For MRTG output, returns one the first line the number of seconds
1653 difference between the local time and the database time. The fourth
1654 line returns the name of the database.
1655
1656 txn_idle
1657 ("symlink: check_postgres_txn_idle") Checks the number and duration of
1658 "idle in transaction" queries on one or more databases. There is no
1659 need to run this more than once on the same database cluster. Databases
1660 can be filtered by using the --include and --exclude options. See the
1661 "BASIC FILTERING" section below for more details.
1662
1663 The --warning and --critical options are given as units of time, signed
1664 integers, or integers for units of time, and at least one must be
1665 provided (there are no defaults). Valid units are 'seconds', 'minutes',
1666 'hours', or 'days'. Each may be written singular or abbreviated to just
1667 the first letter. If no units are given and the numbers are unsigned,
1668 the units are assumed to be seconds.
1669
1670 This action requires Postgres 8.3 or better.
1671
1672 As of PostgreSQL 10, you can just GRANT pg_read_all_stats to an
1673 unprivileged user account. In all earlier versions, superuser
1674 privileges are required to see the queries of all users in the system;
1675 UNKNOWN is returned if queries cannot be checked. To only include
1676 queries by the connecting user, use --includeuser.
1677
1678 Example 1: Give a warning if any connection has been idle in
1679 transaction for more than 15 seconds:
1680
1681 check_postgres_txn_idle --port=5432 --warning='15 seconds'
1682
1683 Example 2: Give a warning if there are 50 or more transactions
1684
1685 check_postgres_txn_idle --port=5432 --warning='+50'
1686
1687 Example 3: Give a critical if 5 or more connections have been idle in
1688 transaction for more than 10 seconds:
1689
1690 check_postgres_txn_idle --port=5432 --critical='5 for 10 seconds'
1691
1692 For MRTG output, returns the time in seconds the longest idle
1693 transaction has been running. The fourth line returns the name of the
1694 database and other information about the longest transaction.
1695
1696 txn_time
1697 ("symlink: check_postgres_txn_time") Checks the length of open
1698 transactions on one or more databases. There is no need to run this
1699 command more than once per database cluster. Databases can be filtered
1700 by use of the --include and --exclude options. See the "BASIC
1701 FILTERING" section for more details. The owner of the transaction can
1702 also be filtered, by use of the --includeuser and --excludeuser
1703 options. See the "USER NAME FILTERING" section for more details.
1704
1705 The values or the --warning and --critical options are units of time,
1706 and at least one must be provided (no default). Valid units are
1707 'seconds', 'minutes', 'hours', or 'days'. Each may be written singular
1708 or abbreviated to just the first letter. If no units are given, the
1709 units are assumed to be seconds.
1710
1711 This action requires Postgres 8.3 or better.
1712
1713 Example 1: Give a critical if any transaction has been open for more
1714 than 10 minutes:
1715
1716 check_postgres_txn_time --port=5432 --critical='10 minutes'
1717
1718 Example 1: Warn if user 'warehouse' has a transaction open over 30
1719 seconds
1720
1721 check_postgres_txn_time --port-5432 --warning=30s --includeuser=warehouse
1722
1723 For MRTG output, returns the maximum time in seconds a transaction has
1724 been open on the first line. The fourth line gives the name of the
1725 database.
1726
1727 txn_wraparound
1728 ("symlink: check_postgres_txn_wraparound") Checks how close to
1729 transaction wraparound one or more databases are getting. The
1730 --warning and --critical options indicate the number of transactions
1731 done, and must be a positive integer. If either option is not given,
1732 the default values of 1.3 and 1.4 billion are used. There is no need to
1733 run this command more than once per database cluster. For a more
1734 detailed discussion of what this number represents and what to do about
1735 it, please visit the page
1736 <https://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND>
1737
1738 The warning and critical values can have underscores in the number for
1739 legibility, as Perl does.
1740
1741 Example 1: Check the default values for the localhost database
1742
1743 check_postgres_txn_wraparound --host=localhost
1744
1745 Example 2: Check port 6000 and give a critical when 1.7 billion
1746 transactions are hit:
1747
1748 check_postgres_txn_wraparound --port=6000 --critical=1_700_000_000
1749
1750 For MRTG output, returns the highest number of transactions for all
1751 databases on line one, while line 4 indicates which database it is.
1752
1753 version
1754 ("symlink: check_postgres_version") Checks that the required version of
1755 Postgres is running. The --warning and --critical options (only one is
1756 required) must be of the format X.Y or X.Y.Z where X is the major
1757 version number, Y is the minor version number, and Z is the revision.
1758
1759 Example 1: Give a warning if the database on port 5678 is not version
1760 8.4.10:
1761
1762 check_postgres_version --port=5678 -w=8.4.10
1763
1764 Example 2: Give a warning if any databases on hosts valley,grain, or
1765 sunshine is not 8.3:
1766
1767 check_postgres_version -H valley,grain,sunshine --critical=8.3
1768
1769 For MRTG output, reports a 1 or a 0 indicating success or failure on
1770 the first line. The fourth line indicates the current version. The
1771 version must be provided via the "--mrtg" option.
1772
1773 wal_files
1774 ("symlink: check_postgres_wal_files") Checks how many WAL files exist
1775 in the pg_xlog directory (PostgreSQL 10 and later" pg_wal), which is
1776 found off of your data_directory, sometimes as a symlink to another
1777 physical disk for performance reasons. If the --lsfunc option is not
1778 used then this action must be run as a superuser, in order to access
1779 the contents of the pg_xlog directory. The minimum version to use this
1780 action is Postgres 8.1. The --warning and --critical options are simply
1781 the number of files in the pg_xlog directory. What number to set this
1782 to will vary, but a general guideline is to put a number slightly
1783 higher than what is normally there, to catch problems early.
1784
1785 Normally, WAL files are closed and then re-used, but a long-running
1786 open transaction, or a faulty archive_command script, may cause
1787 Postgres to create too many files. Ultimately, this will cause the disk
1788 they are on to run out of space, at which point Postgres will shut
1789 down.
1790
1791 To avoid connecting as a database superuser, a wrapper function around
1792 "pg_ls_dir()" should be defined as a superuser with SECURITY DEFINER,
1793 and the --lsfunc option used. This example function, if defined by a
1794 superuser, will allow the script to connect as a normal user nagios
1795 with --lsfunc=ls_xlog_dir
1796
1797 BEGIN;
1798 CREATE FUNCTION ls_xlog_dir()
1799 RETURNS SETOF TEXT
1800 AS $$ SELECT pg_ls_dir('pg_xlog') $$
1801 LANGUAGE SQL
1802 SECURITY DEFINER;
1803 REVOKE ALL ON FUNCTION ls_xlog_dir() FROM PUBLIC;
1804 GRANT EXECUTE ON FUNCTION ls_xlog_dir() to nagios;
1805 COMMIT;
1806
1807 Example 1: Check that the number of ready WAL files is 10 or less on
1808 host "pluto", using a wrapper function "ls_xlog_dir" to avoid the need
1809 for superuser permissions
1810
1811 check_postgres_archive_ready --host=pluto --critical=10 --lsfunc=ls_xlog_dir
1812
1813 For MRTG output, reports the number of WAL files on line 1.
1814
1815 rebuild_symlinks
1816 rebuild_symlinks_force
1817 This action requires no other arguments, and does not connect to any
1818 databases, but simply creates symlinks in the current directory for
1819 each action, in the form check_postgres_<action_name>. If the file
1820 already exists, it will not be overwritten. If the action is
1821 rebuild_symlinks_force, then symlinks will be overwritten. The option
1822 --symlinks is a shorter way of saying --action=rebuild_symlinks
1823
1825 The options --include and --exclude can be combined to limit which
1826 things are checked, depending on the action. The name of the database
1827 can be filtered when using the following actions: backends,
1828 database_size, locks, query_time, txn_idle, and txn_time. The name of
1829 a relation can be filtered when using the following actions: bloat,
1830 index_size, table_size, relation_size, last_vacuum, last_autovacuum,
1831 last_analyze, and last_autoanalyze. The name of a setting can be
1832 filtered when using the settings_checksum action. The name of a file
1833 system can be filtered when using the disk_space action.
1834
1835 If only an include option is given, then ONLY those entries that match
1836 will be checked. However, if given both exclude and include, the
1837 exclusion is done first, and the inclusion after, to reinstate things
1838 that may have been excluded. Both --include and --exclude can be given
1839 multiple times, and/or as comma-separated lists. A leading tilde will
1840 match the following word as a regular expression.
1841
1842 To match a schema, end the search term with a single period. Leading
1843 tildes can be used for schemas as well.
1844
1845 Be careful when using filtering: an inclusion rule on the backends, for
1846 example, may report no problems not only because the matching database
1847 had no backends, but because you misspelled the name of the database!
1848
1849 Examples:
1850
1851 Only checks items named pg_class:
1852
1853 --include=pg_class
1854
1855 Only checks items containing the letters 'pg_':
1856
1857 --include=~pg_
1858
1859 Only check items beginning with 'pg_':
1860
1861 --include=~^pg_
1862
1863 Exclude the item named 'test':
1864
1865 --exclude=test
1866
1867 Exclude all items containing the letters 'test:
1868
1869 --exclude=~test
1870
1871 Exclude all items in the schema 'pg_catalog':
1872
1873 --exclude='pg_catalog.'
1874
1875 Exclude all items containing the letters 'ace', but allow the item
1876 'faceoff':
1877
1878 --exclude=~ace --include=faceoff
1879
1880 Exclude all items which start with the letters 'pg_', which contain the
1881 letters 'slon', or which are named 'sql_settings' or 'green'.
1882 Specifically check items with the letters 'prod' in their names, and
1883 always check the item named 'pg_relname':
1884
1885 --exclude=~^pg_,~slon,sql_settings --exclude=green --include=~prod,pg_relname
1886
1888 The options --includeuser and --excludeuser can be used on some actions
1889 to only examine database objects owned by (or not owned by) one or more
1890 users. An --includeuser option always trumps an --excludeuser option.
1891 You can give each option more than once for multiple users, or you can
1892 give a comma-separated list. The actions that currently use these
1893 options are:
1894
1895 database_size
1896 last_analyze
1897 last_autoanalyze
1898 last_vacuum
1899 last_autovacuum
1900 query_time
1901 relation_size
1902 txn_time
1903
1904 Examples:
1905
1906 Only check items owned by the user named greg:
1907
1908 --includeuser=greg
1909
1910 Only check items owned by either watson or crick:
1911
1912 --includeuser=watson,crick
1913
1914 Only check items owned by crick,franklin, watson, or wilkins:
1915
1916 --includeuser=watson --includeuser=franklin --includeuser=crick,wilkins
1917
1918 Check all items except for those belonging to the user scott:
1919
1920 --excludeuser=scott
1921
1923 To help in setting things up, this program can be run in a "test mode"
1924 by specifying the --test option. This will perform some basic tests to
1925 make sure that the databases can be contacted, and that certain per-
1926 action prerequisites are met, such as whether the user is a superuser,
1927 if the version of Postgres is new enough, and if stats_row_level is
1928 enabled.
1929
1931 In addition to command-line configurations, you can put any options
1932 inside of a file. The file .check_postgresrc in the current directory
1933 will be used if found. If not found, then the file ~/.check_postgresrc
1934 will be used. Finally, the file /etc/check_postgresrc will be used if
1935 available. The format of the file is option = value, one per line. Any
1936 line starting with a '#' will be skipped. Any values loaded from a
1937 check_postgresrc file will be overwritten by command-line options. All
1938 check_postgresrc files can be ignored by supplying a
1939 "--no-checkpostgresrc" argument.
1940
1942 The environment variable $ENV{HOME} is used to look for a
1943 .check_postgresrc file. The environment variable $ENV{PGBINDIR} is
1944 used to look for PostgreSQL binaries.
1945
1947 Since this program uses the psql program, make sure it is accessible to
1948 the user running the script. If run as a cronjob, this often means
1949 modifying the PATH environment variable.
1950
1951 If you are using Nagios in embedded Perl mode, use the "--action"
1952 argument instead of symlinks, so that the plugin only gets compiled one
1953 time.
1954
1956 Access to a working version of psql, and the following very standard
1957 Perl modules:
1958
1959 Cwd
1960 Getopt::Long
1961 File::Basename
1962 File::Temp
1963 Time::HiRes (if $opt{showtime} is set to true, which is the default)
1964
1965 The "settings_checksum" action requires the Digest::MD5 module.
1966
1967 The "checkpoint" action requires the Date::Parse module.
1968
1969 Some actions require access to external programs. If psql is not
1970 explicitly specified, the command "which" is used to find it. The
1971 program "/bin/df" is needed by the "disk_space" action.
1972
1974 Development happens using the git system. You can clone the latest
1975 version by doing:
1976
1977 https://github.com/bucardo/check_postgres
1978 git clone https://github.com/bucardo/check_postgres.git
1979
1981 Three mailing lists are available. For discussions about the program,
1982 bug reports, feature requests, and commit notices, send email to
1983 check_postgres@bucardo.org
1984
1985 <https://mail.endcrypt.com/mailman/listinfo/check_postgres>
1986
1987 A low-volume list for announcement of new versions and important
1988 notices is the 'check_postgres-announce' list:
1989
1990 <https://mail.endcrypt.com/mailman/listinfo/check_postgres-announce>
1991
1992 Source code changes (via git-commit) are sent to the
1993 'check_postgres-commit' list:
1994
1995 <https://mail.endcrypt.com/mailman/listinfo/check_postgres-commit>
1996
1998 Items not specifically attributed are by GSM (Greg Sabino Mullane).
1999
2000 Version 2.25.1 Released ??, 2020
2001 Fix check_replication_slots on recently promoted servers (Christoph Berg)
2002
2003 Version 2.25.0 Released February 3, 2020
2004 Allow same_schema objects to be included or excluded with --object and --skipobject
2005 (Greg Sabino Mullane)
2006
2007 Fix to allow mixing service names and other connection parameters for same_schema
2008 (Greg Sabino Mullane)
2009
2010 Version 2.24.0 Released May 30, 2018
2011 Support new_version_pg for PG10
2012 (Michael Pirogov)
2013
2014 Option to skip CYCLE sequences in action sequence
2015 (Christoph Moench-Tegeder)
2016
2017 Output per-database perfdata for pgbouncer pool checks
2018 (George Hansper)
2019
2020 German message translations
2021 (Holger Jacobs)
2022
2023 Consider only client backends in query_time and friends
2024 (David Christensen)
2025
2026 Version 2.23.0 Released October 31, 2017
2027 Support PostgreSQL 10.
2028 (David Christensen, Christoph Berg)
2029
2030 Change table_size to use pg_table_size() on 9.0+, i.e. include the TOAST
2031 table size in the numbers reported. Add new actions indexes_size and
2032 total_relation_size, using the respective pg_indexes_size() and
2033 pg_total_relation_size() functions. All size checks will now also check
2034 materialized views where applicable.
2035 (Christoph Berg)
2036
2037 Connection errors are now always critical, not unknown.
2038 (Christoph Berg)
2039
2040 New action replication_slots checking if logical or physical replication
2041 slots have accumulated too much data
2042 (Glyn Astill)
2043
2044 Multiple same_schema improvements
2045 (Glyn Astill)
2046
2047 Add Spanish message translations
2048 (Luis Vazquez)
2049
2050 Allow a wrapper function to run wal_files and archive_ready actions as
2051 non-superuser
2052 (Joshua Elsasser)
2053
2054 Add some defensive casting to the bloat query
2055 (Greg Sabino Mullane)
2056
2057 Invoke psql with option -X
2058 (Peter Eisentraut)
2059
2060 Update postgresql.org URLs to use https.
2061 (Magnus Hagander)
2062
2063 check_txn_idle: Don't fail when query contains 'disabled' word
2064 (Marco Nenciarini)
2065
2066 check_txn_idle: Use state_change instead of query_start.
2067 (Sebastian Webber)
2068
2069 check_hot_standby_delay: Correct extra space in perfdata
2070 (Adrien Nayrat)
2071
2072 Remove \r from psql output as it can confuse some regexes
2073 (Greg Sabino Mullane)
2074
2075 Sort failed jobs in check_pgagent_jobs for stable output.
2076 (Christoph Berg)
2077
2078 Version 2.22.0 June 30, 2015
2079 Add xact timestamp support to hot_standby_delay.
2080 Allow the hot_standby_delay check to accept xlog byte position or
2081 timestamp lag intervals as thresholds, or even both at the same time.
2082 (Josh Williams)
2083
2084 Query all sequences per DB in parallel for action=sequence.
2085 (Christoph Berg)
2086
2087 Fix bloat check to use correct SQL depending on the server version.
2088 (Adrian Vondendriesch)
2089
2090 Show actual long-running query in query_time output
2091 (Peter Eisentraut)
2092
2093 Add explicit ORDER BY to the slony_status check to get the most lagged server.
2094 (Jeff Frost)
2095
2096 Improved multi-slave support in replicate_row.
2097 (Andrew Yochum)
2098
2099 Change the way tables are quoted in replicate_row.
2100 (Glyn Astill)
2101
2102 Don't swallow space before the -c flag when reporting errors
2103 (Jeff Janes)
2104
2105 Fix and extend hot_standby_delay documentation
2106 (Michael Renner)
2107
2108 Declare POD encoding to be utf8.
2109 (Christoph Berg)
2110
2111 Version 2.21.0 September 24, 2013
2112 Fix issue with SQL steps in check_pgagent_jobs for sql steps which perform deletes
2113 (Rob Emery via github pull)
2114
2115 Install man page in section 1.
2116 (Peter Eisentraut, bug 53, github issue 26)
2117
2118 Order lock types in check_locks output to make the ordering predictable;
2119 setting SKIP_NETWORK_TESTS will skip the new_version tests; other minor test
2120 suite fixes.
2121 (Christoph Berg)
2122
2123 Fix same_schema check on 9.3 by ignoring relminmxid differences in pg_class
2124 (Christoph Berg)
2125
2126 Version 2.20.1 June 24, 2013
2127 Make connection check failures return CRITICAL not UNKNOWN
2128 (Dominic Hargreaves)
2129
2130 Fix --reverse option when using string comparisons in custom queries
2131 (Nathaniel Waisbrot)
2132
2133 Compute correct 'totalwastedbytes' in the bloat query
2134 (Michael Renner)
2135
2136 Do not use pg_stats "inherited" column in bloat query, if the
2137 database is 8.4 or older. (Greg Sabino Mullane, per bug 121)
2138
2139 Remove host reordering in hot_standby_delay check
2140 (Josh Williams, with help from Jacobo Blasco)
2141
2142 Better output for the "simple" flag
2143 (Greg Sabino Mullane)
2144
2145 Force same_schema to ignore the 'relallvisible' column
2146 (Greg Sabino Mullane)
2147
2148 Version 2.20.0 March 13, 2013
2149 Add check for pgagent jobs (David E. Wheeler)
2150
2151 Force STDOUT to use utf8 for proper output
2152 (Greg Sabino Mullane; reported by Emmanuel Lesouef)
2153
2154 Fixes for Postgres 9.2: new pg_stat_activity view,
2155 and use pg_tablespace_location, (Josh Williams)
2156
2157 Allow for spaces in item lists when doing same_schema.
2158
2159 Allow txn_idle to work again for < 8.3 servers by switching to query_time.
2160
2161 Fix the check_bloat SQL to take inherited tables into account,
2162 and assume 2k for non-analyzed columns. (Geert Pante)
2163
2164 Cache sequence information to speed up same_schema runs.
2165
2166 Fix --excludeuser in check_txn_idle (Mika Eloranta)
2167
2168 Fix user clause handling in check_txn_idle (Michael van Bracht)
2169
2170 Adjust docs to show colon as a better separator inside args for locks
2171 (Charles Sprickman)
2172
2173 Fix undefined $SQL2 error in check_txn_idle [github issue 16] (Patric Bechtel)
2174
2175 Prevent "uninitialized value" warnings when showing the port (Henrik Ahlgren)
2176
2177 Do not assume everyone has a HOME [github issue 23]
2178
2179 Version 2.19.0 January 17, 2012
2180 Add the --assume-prod option (Cédric Villemain)
2181
2182 Add the cluster_id check (Cédric Villemain)
2183
2184 Improve settings_checksum and checkpoint tests (Cédric Villemain)
2185
2186 Do not do an inner join to pg_user when checking database size
2187 (Greg Sabino Mullane; reported by Emmanuel Lesouef)
2188
2189 Use the full path when getting sequence information for same_schema.
2190 (Greg Sabino Mullane; reported by Cindy Wise)
2191
2192 Fix the formula for calculating xlog positions (Euler Taveira de Oliveira)
2193
2194 Better ordering of output for bloat check - make indexes as important
2195 as tables (Greg Sabino Mullane; reported by Jens Wilke)
2196
2197 Show the dbservice if it was used at top of same_schema output
2198 (Mike Blackwell)
2199
2200 Better installation paths (Greg Sabino Mullane, per bug 53)
2201
2202 Version 2.18.0 October 2, 2011
2203 Redo the same_schema action. Use new --filter argument for all filtering.
2204 Allow comparisons between any number of databases.
2205 Remove the dbname2, dbport2, etc. arguments.
2206 Allow comparison of the same db over time.
2207
2208 Swap db1 and db2 if the slave is 1 for the hot standby check (David E. Wheeler)
2209
2210 Allow multiple --schema arguments for the slony_status action (GSM and Jehan-Guillaume de Rorthais)
2211
2212 Fix ORDER BY in the last vacuum/analyze action (Nicolas Thauvin)
2213
2214 Fix check_hot_standby_delay perfdata output (Nicolas Thauvin)
2215
2216 Look in the correct place for the .ready files with the archive_ready action (Nicolas Thauvin)
2217
2218 New action: commitratio (Guillaume Lelarge)
2219
2220 New action: hitratio (Guillaume Lelarge)
2221
2222 Make sure --action overrides the symlink naming trick.
2223
2224 Set defaults for archive_ready and wal_files (Thomas Guettler, GSM)
2225
2226 Better output for wal_files and archive_ready (GSM)
2227
2228 Fix warning when client_port set to empty string (bug #79)
2229
2230 Account for "empty row" in -x output (i.e. source of functions).
2231
2232 Fix some incorrectly named data fields (Andy Lester)
2233
2234 Expand the number of pgbouncer actions (Ruslan Kabalin)
2235
2236 Give detailed information and refactor txn_idle, txn_time, and query_time
2237 (Per request from bug #61)
2238
2239 Set maxalign to 8 in the bloat check if box identified as '64-bit'
2240 (Michel Sijmons, bug #66)
2241
2242 Support non-standard version strings in the bloat check.
2243 (Michel Sijmons and Gurjeet Singh, bug #66)
2244
2245 Do not show excluded databases in some output (Ruslan Kabalin)
2246
2247 Allow "and", "or" inside arguments (David E. Wheeler)
2248
2249 Add the "new_version_box" action.
2250
2251 Fix psql version regex (Peter Eisentraut, bug #69)
2252
2253 Add the --assume-standby-mode option (Ruslan Kabalin)
2254
2255 Note that txn_idle and query_time require 8.3 (Thomas Guettler)
2256
2257 Standardize and clean up all perfdata output (bug #52)
2258
2259 Exclude "idle in transaction" from the query_time check (bug #43)
2260
2261 Fix the perflimit for the bloat action (bug #50)
2262
2263 Clean up the custom_query action a bit.
2264
2265 Fix space in perfdata for hot_standby_delay action (Nicolas Thauvin)
2266
2267 Handle undef percents in check_fsm_relations (Andy Lester)
2268
2269 Fix typo in dbstats action (Stas Vitkovsky)
2270
2271 Fix MRTG for last vacuum and last_analyze actions.
2272
2273 Version 2.17.0 no public release
2274 Version 2.16.0 January 20, 2011
2275 Add new action 'hot_standby_delay' (Nicolas Thauvin)
2276 Add cache-busting for the version-grabbing utilities.
2277 Fix problem with going to next method for new_version_pg
2278 (Greg Sabino Mullane, reported by Hywel Mallett in bug #65)
2279 Allow /usr/local/etc as an alternative location for the
2280 check_postgresrc file (Hywel Mallett)
2281 Do not use tgisconstraint in same_schema if Postgres >= 9
2282 (Guillaume Lelarge)
2283
2284 Version 2.15.4 January 3, 2011
2285 Fix warning when using symlinks
2286 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #63)
2287
2288 Version 2.15.3 December 30, 2010
2289 Show OK for no matching txn_idle entries.
2290
2291 Version 2.15.2 December 28, 2010
2292 Better formatting of sizes in the bloat action output.
2293
2294 Remove duplicate perfs in bloat action output.
2295
2296 Version 2.15.1 December 27, 2010
2297 Fix problem when examining items in pg_settings (Greg Sabino Mullane)
2298
2299 For connection test, return critical, not unknown, on FATAL errors
2300 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #62)
2301
2302 Version 2.15.0 November 8, 2010
2303 Add --quiet argument to suppress output on OK Nagios results
2304 Add index comparison for same_schema (Norman Yamada and Greg Sabino Mullane)
2305 Use $ENV{PGSERVICE} instead of "service=" to prevent problems (Guillaume Lelarge)
2306 Add --man option to show the entire manual. (Andy Lester)
2307 Redo the internal run_command() sub to use -x and hashes instead of regexes.
2308 Fix error in custom logic (Andreas Mager)
2309 Add the "pgbouncer_checksum" action (Guillaume Lelarge)
2310 Fix regex to work on WIN32 for check_fsm_relations and check_fsm_pages (Luke Koops)
2311 Don't apply a LIMIT when using --exclude on the bloat action (Marti Raudsepp)
2312 Change the output of query_time to show pid,user,port, and address (Giles Westwood)
2313 Fix to show database properly when using slony_status (Guillaume Lelarge)
2314 Allow warning items for same_schema to be comma-separated (Guillaume Lelarge)
2315 Constraint definitions across Postgres versions match better in same_schema.
2316 Work against "EnterpriseDB" databases (Sivakumar Krishnamurthy and Greg Sabino Mullane)
2317 Separate perfdata with spaces (Jehan-Guillaume (ioguix) de Rorthais)
2318 Add new action "archive_ready" (Jehan-Guillaume (ioguix) de Rorthais)
2319
2320 Version 2.14.3 (March 1, 2010)
2321 Allow slony_status action to handle more than one slave.
2322 Use commas to separate function args in same_schema output (Robert Treat)
2323
2324 Version 2.14.2 (February 18, 2010)
2325 Change autovac_freeze default warn/critical back to 90%/95% (Robert Treat)
2326 Put all items one-per-line for relation size actions if --verbose=1
2327
2328 Version 2.14.1 (February 17, 2010)
2329 Don't use $^T in logfile check, as script may be long-running
2330 Change the error string for the logfile action for easier exclusion
2331 by programs like tail_n_mail
2332
2333 Version 2.14.0 (February 11, 2010)
2334 Added the 'slony_status' action.
2335 Changed the logfile sleep from 0.5 to 1, as 0.5 gets rounded to 0 on some boxes!
2336
2337 Version 2.13.2 (February 4, 2010)
2338 Allow timeout option to be used for logtime 'sleep' time.
2339
2340 Version 2.13.2 (February 4, 2010)
2341 Show offending database for query_time action.
2342 Apply perflimit to main output for sequence action.
2343 Add 'noowner' option to same_schema action.
2344 Raise sleep timeout for logfile check to 15 seconds.
2345
2346 Version 2.13.1 (February 2, 2010)
2347 Fix bug preventing column constraint differences from 2 > 1 for same_schema from being shown.
2348 Allow aliases 'dbname1', 'dbhost1', 'dbport1',etc.
2349 Added "nolanguage" as a filter for the same_schema option.
2350 Don't track "generic" table constraints (e.. $1, $2) using same_schema
2351
2352 Version 2.13.0 (January 29, 2010)
2353 Allow "nofunctions" as a filter for the same_schema option.
2354 Added "noperm" as a filter for the same_schema option.
2355 Ignore dropped columns when considered positions for same_schema (Guillaume Lelarge)
2356
2357 Version 2.12.1 (December 3, 2009)
2358 Change autovac_freeze default warn/critical from 90%/95% to 105%/120% (Marti Raudsepp)
2359
2360 Version 2.12.0 (December 3, 2009)
2361 Allow the temporary directory to be specified via the "tempdir" argument,
2362 for systems that need it (e.g. /tmp is not owned by root).
2363 Fix so old versions of Postgres (< 8.0) use the correct default database (Giles Westwood)
2364 For "same_schema" trigger mismatches, show the attached table.
2365 Add the new_version_bc check for Bucardo version checking.
2366 Add database name to perf output for last_vacuum|analyze (Guillaume Lelarge)
2367 Fix for bloat action against old versions of Postgres without the 'block_size' param.
2368
2369 Version 2.11.1 (August 27, 2009)
2370 Proper Nagios output for last_vacuum|analyze actions. (Cédric Villemain)
2371 Proper Nagios output for locks action. (Cédric Villemain)
2372 Proper Nagios output for txn_wraparound action. (Cédric Villemain)
2373 Fix for constraints with embedded newlines for same_schema.
2374 Allow --exclude for all items when using same_schema.
2375
2376 Version 2.11.0 (August 23, 2009)
2377 Add Nagios perf output to the wal_files check (Cédric Villemain)
2378 Add support for .check_postgresrc, per request from Albe Laurenz.
2379 Allow list of web fetch methods to be changed with the --get_method option.
2380 Add support for the --language argument, which overrides any ENV.
2381 Add the --no-check_postgresrc flag.
2382 Ensure check_postgresrc options are completely overridden by command-line options.
2383 Fix incorrect warning > critical logic in replicate_rows (Glyn Astill)
2384
2385 Version 2.10.0 (August 3, 2009)
2386 For same_schema, compare view definitions, and compare languages.
2387 Make script into a global executable via the Makefile.PL file.
2388 Better output when comparing two databases.
2389 Proper Nagios output syntax for autovac_freeze and backends checks (Cédric Villemain)
2390
2391 Version 2.9.5 (July 24, 2009)
2392 Don't use a LIMIT in check_bloat if --include is used. Per complaint from Jeff Frost.
2393
2394 Version 2.9.4 (July 21, 2009)
2395 More French translations (Guillaume Lelarge)
2396
2397 Version 2.9.3 (July 14, 2009)
2398 Quote dbname in perf output for the backends check. (Davide Abrigo)
2399 Add 'fetch' as an alternative method for new_version checks, as this
2400 comes by default with FreeBSD. (Hywel Mallett)
2401
2402 Version 2.9.2 (July 12, 2009)
2403 Allow dots and dashes in database name for the backends check (Davide Abrigo)
2404 Check and display the database for each match in the bloat check (Cédric Villemain)
2405 Handle 'too many connections' FATAL error in the backends check with a critical,
2406 rather than a generic error (Greg, idea by Jürgen Schulz-Brüssel)
2407 Do not allow perflimit to interfere with exclusion rules in the vacuum and
2408 analyze tests. (Greg, bug reported by Jeff Frost)
2409
2410 Version 2.9.1 (June 12, 2009)
2411 Fix for multiple databases with the check_bloat action (Mark Kirkwood)
2412 Fixes and improvements to the same_schema action (Jeff Boes)
2413 Write tests for same_schema, other minor test fixes (Jeff Boes)
2414
2415 Version 2.9.0 (May 28, 2009)
2416 Added the same_schema action (Greg)
2417
2418 Version 2.8.1 (May 15, 2009)
2419 Added timeout via statement_timeout in addition to perl alarm (Greg)
2420
2421 Version 2.8.0 (May 4, 2009)
2422 Added internationalization support (Greg)
2423 Added the 'disabled_triggers' check (Greg)
2424 Added the 'prepared_txns' check (Greg)
2425 Added the 'new_version_cp' and 'new_version_pg' checks (Greg)
2426 French translations (Guillaume Lelarge)
2427 Make the backends search return ok if no matches due to inclusion rules,
2428 per report by Guillaume Lelarge (Greg)
2429 Added comprehensive unit tests (Greg, Jeff Boes, Selena Deckelmann)
2430 Make fsm_pages and fsm_relations handle 8.4 servers smoothly. (Greg)
2431 Fix missing 'upd' field in show_dbstats (Andras Fabian)
2432 Allow ENV{PGCONTROLDATA} and ENV{PGBINDIR}. (Greg)
2433 Add various Perl module infrastructure (e.g. Makefile.PL) (Greg)
2434 Fix incorrect regex in txn_wraparound (Greg)
2435 For txn_wraparound: consistent ordering and fix duplicates in perf output (Andras Fabian)
2436 Add in missing exabyte regex check (Selena Deckelmann)
2437 Set stats to zero if we bail early due to USERWHERECLAUSE (Andras Fabian)
2438 Add additional items to dbstats output (Andras Fabian)
2439 Remove --schema option from the fsm_ checks. (Greg Mullane and Robert Treat)
2440 Handle case when ENV{PGUSER} is set. (Andy Lester)
2441 Many various fixes. (Jeff Boes)
2442 Fix --dbservice: check version and use ENV{PGSERVICE} for old versions (Cédric Villemain)
2443
2444 Version 2.7.3 (February 10, 2009)
2445 Make the sequence action check if sequence being used for a int4 column and
2446 react appropriately. (Michael Glaesemann)
2447
2448 Version 2.7.2 (February 9, 2009)
2449 Fix to prevent multiple groupings if db arguments given.
2450
2451 Version 2.7.1 (February 6, 2009)
2452 Allow the -p argument for port to work again.
2453
2454 Version 2.7.0 (February 4, 2009)
2455 Do not require a connection argument, but use defaults and ENV variables when
2456 possible: PGHOST, PGPORT, PGUSER, PGDATABASE.
2457
2458 Version 2.6.1 (February 4, 2009)
2459 Only require Date::Parse to be loaded if using the checkpoint action.
2460
2461 Version 2.6.0 (January 26, 2009)
2462 Add the 'checkpoint' action.
2463
2464 Version 2.5.4 (January 7, 2009)
2465 Better checking of $opt{dbservice} structure (Cédric Villemain)
2466 Fix time display in timesync action output (Selena Deckelmann)
2467 Fix documentation typos (Josh Tolley)
2468
2469 Version 2.5.3 (December 17, 2008)
2470 Minor fix to regex in verify_version (Lee Jensen)
2471
2472 Version 2.5.2 (December 16, 2008)
2473 Minor documentation tweak.
2474
2475 Version 2.5.1 (December 11, 2008)
2476 Add support for --noidle flag to prevent backends action from counting idle processes.
2477 Patch by Selena Deckelmann.
2478
2479 Fix small undefined warning when not using --dbservice.
2480
2481 Version 2.5.0 (December 4, 2008)
2482 Add support for the pg_Service.conf file with the --dbservice option.
2483
2484 Version 2.4.3 (November 7, 2008)
2485 Fix options for replicate_row action, per report from Jason Gordon.
2486
2487 Version 2.4.2 (November 6, 2008)
2488 Wrap File::Temp::cleanup() calls in eval, in case File::Temp is an older version.
2489 Patch by Chris Butler.
2490
2491 Version 2.4.1 (November 5, 2008)
2492 Cast numbers to numeric to support sequences ranges > bigint in check_sequence action.
2493 Thanks to Scott Marlowe for reporting this.
2494
2495 Version 2.4.0 (October 26, 2008)
2496 Add Cacti support with the dbstats action.
2497 Pretty up the time output for last vacuum and analyze actions.
2498 Show the percentage of backends on the check_backends action.
2499
2500 Version 2.3.10 (October 23, 2008)
2501 Fix minor warning in action check_bloat with multiple databases.
2502 Allow warning to be greater than critical when using the --reverse option.
2503 Support the --perflimit option for the check_sequence action.
2504
2505 Version 2.3.9 (October 23, 2008)
2506 Minor tweak to way we store the default port.
2507
2508 Version 2.3.8 (October 21, 2008)
2509 Allow the default port to be changed easily.
2510 Allow transform of simple output by MB, GB, etc.
2511
2512 Version 2.3.7 (October 14, 2008)
2513 Allow multiple databases in 'sequence' action. Reported by Christoph Zwerschke.
2514
2515 Version 2.3.6 (October 13, 2008)
2516 Add missing $schema to check_fsm_pages. (Robert Treat)
2517
2518 Version 2.3.5 (October 9, 2008)
2519 Change option 'checktype' to 'valtype' to prevent collisions with -c[ritical]
2520 Better handling of errors.
2521
2522 Version 2.3.4 (October 9, 2008)
2523 Do explicit cleanups of the temp directory, per problems reported by sb@nnx.com.
2524
2525 Version 2.3.3 (October 8, 2008)
2526 Account for cases where some rounding queries give -0 instead of 0.
2527 Thanks to Glyn Astill for helping to track this down.
2528
2529 Version 2.3.2 (October 8, 2008)
2530 Always quote identifiers in check_replicate_row action.
2531
2532 Version 2.3.1 (October 7, 2008)
2533 Give a better error if one of the databases cannot be reached.
2534
2535 Version 2.3.0 (October 4, 2008)
2536 Add the "sequence" action, thanks to Gavin M. Roy for the idea.
2537 Fix minor problem with autovac_freeze action when using MRTG output.
2538 Allow output argument to be case-insensitive.
2539 Documentation fixes.
2540
2541 Version 2.2.4 (October 3, 2008)
2542 Fix some minor typos
2543
2544 Version 2.2.3 (October 1, 2008)
2545 Expand range of allowed names for --repinfo argument (Glyn Astill)
2546 Documentation tweaks.
2547
2548 Version 2.2.2 (September 30, 2008)
2549 Fixes for minor output and scoping problems.
2550
2551 Version 2.2.1 (September 28, 2008)
2552 Add MRTG output to fsm_pages and fsm_relations.
2553 Force error messages to one-line for proper Nagios output.
2554 Check for invalid prereqs on failed command. From conversations with Euler Taveira de Oliveira.
2555 Tweak the fsm_pages formula a little.
2556
2557 Version 2.2.0 (September 25, 2008)
2558 Add fsm_pages and fsm_relations actions. (Robert Treat)
2559
2560 Version 2.1.4 (September 22, 2008)
2561 Fix for race condition in txn_time action.
2562 Add --debugoutput option.
2563
2564 Version 2.1.3 (September 22, 2008)
2565 Allow alternate arguments "dbhost" for "host" and "dbport" for "port".
2566 Output a zero as default value for second line of MRTG output.
2567
2568 Version 2.1.2 (July 28, 2008)
2569 Fix sorting error in the "disk_space" action for non-Nagios output.
2570 Allow --simple as a shortcut for --output=simple.
2571
2572 Version 2.1.1 (July 22, 2008)
2573 Don't check databases with datallowconn false for the "autovac_freeze" action.
2574
2575 Version 2.1.0 (July 18, 2008)
2576 Add the "autovac_freeze" action, thanks to Robert Treat for the idea and design.
2577 Put an ORDER BY on the "txn_wraparound" action.
2578
2579 Version 2.0.1 (July 16, 2008)
2580 Optimizations to speed up the "bloat" action quite a bit.
2581 Fix "version" action to not always output in mrtg mode.
2582
2583 Version 2.0.0 (July 15, 2008)
2584 Add support for MRTG and "simple" output options.
2585 Many small improvements to nearly all actions.
2586
2587 Version 1.9.1 (June 24, 2008)
2588 Fix an error in the bloat SQL in 1.9.0
2589 Allow percentage arguments to be over 99%
2590 Allow percentages in the bloat --warning and --critical (thanks to Robert Treat for the idea)
2591
2592 Version 1.9.0 (June 22, 2008)
2593 Don't include information_schema in certain checks. (Jeff Frost)
2594 Allow --include and --exclude to use schemas by using a trailing period.
2595
2596 Version 1.8.5 (June 22, 2008)
2597 Output schema name before table name where appropriate.
2598 Thanks to Jeff Frost.
2599
2600 Version 1.8.4 (June 19, 2008)
2601 Better detection of problems in --replicate_row.
2602
2603 Version 1.8.3 (June 18, 2008)
2604 Fix 'backends' action: there may be no rows in pg_stat_activity, so run a second
2605 query if needed to find the max_connections setting.
2606 Thanks to Jeff Frost for the bug report.
2607
2608 Version 1.8.2 (June 10, 2008)
2609 Changes to allow working under Nagios' embedded Perl mode. (Ioannis Tambouras)
2610
2611 Version 1.8.1 (June 9, 2008)
2612 Allow 'bloat' action to work on Postgres version 8.0.
2613 Allow for different commands to be run for each action depending on the server version.
2614 Give better warnings when running actions not available on older Postgres servers.
2615
2616 Version 1.8.0 (June 3, 2008)
2617 Add the --reverse option to the custom_query action.
2618
2619 Version 1.7.1 (June 2, 2008)
2620 Fix 'query_time' action: account for race condition in which zero rows appear in pg_stat_activity.
2621 Thanks to Dustin Black for the bug report.
2622
2623 Version 1.7.0 (May 11, 2008)
2624 Add --replicate_row action
2625
2626 Version 1.6.1 (May 11, 2008)
2627 Add --symlinks option as a shortcut to --action=rebuild_symlinks
2628
2629 Version 1.6.0 (May 11, 2008)
2630 Add the custom_query action.
2631
2632 Version 1.5.2 (May 2, 2008)
2633 Fix problem with too eager creation of custom pgpass file.
2634
2635 Version 1.5.1 (April 17, 2008)
2636 Add example Nagios configuration settings (Brian A. Seklecki)
2637
2638 Version 1.5.0 (April 16, 2008)
2639 Add the --includeuser and --excludeuser options. Documentation cleanup.
2640
2641 Version 1.4.3 (April 16, 2008)
2642 Add in the 'output' concept for future support of non-Nagios programs.
2643
2644 Version 1.4.2 (April 8, 2008)
2645 Fix bug preventing --dbpass argument from working (Robert Treat).
2646
2647 Version 1.4.1 (April 4, 2008)
2648 Minor documentation fixes.
2649
2650 Version 1.4.0 (April 2, 2008)
2651 Have 'wal_files' action use pg_ls_dir (idea by Robert Treat).
2652 For last_vacuum and last_analyze, respect autovacuum effects, add separate
2653 autovacuum checks (ideas by Robert Treat).
2654
2655 Version 1.3.1 (April 2, 2008)
2656 Have txn_idle use query_start, not xact_start.
2657
2658 Version 1.3.0 (March 23, 2008)
2659 Add in txn_idle and txn_time actions.
2660
2661 Version 1.2.0 (February 21, 2008)
2662 Add the 'wal_files' action, which counts the number of WAL files
2663 in your pg_xlog directory.
2664 Fix some typos in the docs.
2665 Explicitly allow -v as an argument.
2666 Allow for a null syslog_facility in the 'logfile' action.
2667
2668 Version 1.1.2 (February 5, 2008)
2669 Fix error preventing --action=rebuild_symlinks from working.
2670
2671 Version 1.1.1 (February 3, 2008)
2672 Switch vacuum and analyze date output to use 'DD', not 'D'. (Glyn Astill)
2673
2674 Version 1.1.0 (December 16, 2008)
2675 Fixes, enhancements, and performance tracking.
2676 Add performance data tracking via --showperf and --perflimit
2677 Lots of refactoring and cleanup of how actions handle arguments.
2678 Do basic checks to figure out syslog file for 'logfile' action.
2679 Allow for exact matching of beta versions with 'version' action.
2680 Redo the default arguments to only populate when neither 'warning' nor 'critical' is provided.
2681 Allow just warning OR critical to be given for the 'timesync' action.
2682 Remove 'redirect_stderr' requirement from 'logfile' due to 8.3 changes.
2683 Actions 'last_vacuum' and 'last_analyze' are 8.2 only (Robert Treat)
2684
2685 Version 1.0.16 (December 7, 2007)
2686 First public release, December 2007
2687
2689 The index bloat size optimization is rough.
2690
2691 Some actions may not work on older versions of Postgres (before 8.0).
2692
2693 Please report any problems to check_postgres@bucardo.org
2694
2696 Greg Sabino Mullane <greg@turnstep.com>
2697
2699 Some example Nagios configuration settings using this script:
2700
2701 define command {
2702 command_name check_postgres_size
2703 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action database_size -w $ARG1$ -c $ARG2$
2704 }
2705
2706 define command {
2707 command_name check_postgres_locks
2708 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action locks -w $ARG1$ -c $ARG2$
2709 }
2710
2711
2712 define service {
2713 use generic-other
2714 host_name dbhost.gtld
2715 service_description dbhost PostgreSQL Service Database Usage Size
2716 check_command check_postgres_size!256000000!512000000
2717 }
2718
2719 define service {
2720 use generic-other
2721 host_name dbhost.gtld
2722 service_description dbhost PostgreSQL Service Database Locks
2723 check_command check_postgres_locks!2!3
2724 }
2725
2727 Copyright (c) 2007-2020 Greg Sabino Mullane <greg@turnstep.com>.
2728
2729 Redistribution and use in source and binary forms, with or without
2730 modification, are permitted provided that the following conditions are
2731 met:
2732
2733 1. Redistributions of source code must retain the above copyright notice,
2734 this list of conditions and the following disclaimer.
2735 2. Redistributions in binary form must reproduce the above copyright notice,
2736 this list of conditions and the following disclaimer in the documentation
2737 and/or other materials provided with the distribution.
2738
2739 THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR
2740 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2741 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
2742 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
2743 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
2744 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
2745 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2746 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
2747 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
2748 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
2749 POSSIBILITY OF SUCH DAMAGE.
2750
2751
2752
2753perl v5.34.0 2022-01-19 CHECK_POSTGRES(1)