1CHECK_POSTGRES(1) User Contributed Perl Documentation CHECK_POSTGRES(1)
2
3
4
6 check_postgres.pl - a Postgres monitoring script for Nagios, MRTG,
7 Cacti, and others
8
9 This documents describes check_postgres.pl version 2.24.0
10
12 ## Create all symlinks
13 check_postgres.pl --symlinks
14
15 ## Check connection to Postgres database 'pluto':
16 check_postgres.pl --action=connection --db=pluto
17
18 ## Same things, but using the symlink
19 check_postgres_connection --db=pluto
20
21 ## Warn if > 100 locks, critical if > 200, or > 20 exclusive
22 check_postgres_locks --warning=100 --critical="total=200:exclusive=20"
23
24 ## Show the current number of idle connections on port 6543:
25 check_postgres_txn_idle --port=6543 --output=simple
26
27 ## There are many other actions and options, please keep reading.
28
29 The latest news and documentation can always be found at:
30 https://bucardo.org/check_postgres/
31
33 check_postgres.pl is a Perl script that runs many different tests
34 against one or more Postgres databases. It uses the psql program to
35 gather the information, and outputs the results in one of three
36 formats: Nagios, MRTG, or simple.
37
38 Output Modes
39 The output can be changed by use of the "--output" option. The default
40 output is nagios, although this can be changed at the top of the script
41 if you wish. The current option choices are nagios, mrtg, and simple.
42 To avoid having to enter the output argument each time, the type of
43 output is automatically set if no --output argument is given, and if
44 the current directory has one of the output options in its name. For
45 example, creating a directory named mrtg and populating it with
46 symlinks via the --symlinks argument would ensure that any actions run
47 from that directory will always default to an output of "mrtg" As a
48 shortcut for --output=simple, you can enter --simple, which also
49 overrides the directory naming trick.
50
51 Nagios output
52
53 The default output format is for Nagios, which is a single line of
54 information, along with four specific exit codes:
55
56 0 (OK)
57 1 (WARNING)
58 2 (CRITICAL)
59 3 (UNKNOWN)
60
61 The output line is one of the words above, a colon, and then a short
62 description of what was measured. Additional statistics information, as
63 well as the total time the command took, can be output as well: see the
64 documentation on the arguments --showperf, --perflimit, and --showtime.
65
66 MRTG output
67
68 The MRTG output is four lines, with the first line always giving a
69 single number of importance. When possible, this number represents an
70 actual value such as a number of bytes, but it may also be a 1 or a 0
71 for actions that only return "true" or "false", such as
72 check_postgres_version. The second line is an additional stat and is
73 only used for some actions. The third line indicates an "uptime" and is
74 not used. The fourth line is a description and usually indicates the
75 name of the database the stat from the first line was pulled from, but
76 may be different depending on the action.
77
78 Some actions accept an optional --mrtg argument to further control the
79 output.
80
81 See the documentation on each action for details on the exact MRTG
82 output for each one.
83
84 Simple output
85
86 The simple output is simply a truncated version of the MRTG one, and
87 simply returns the first number and nothing else. This is very useful
88 when you just want to check the state of something, regardless of any
89 threshold. You can transform the numeric output by appending KB, MB,
90 GB, TB, or EB to the output argument, for example:
91
92 --output=simple,MB
93
94 Cacti output
95
96 The Cacti output consists of one or more items on the same line, with a
97 simple name, a colon, and then a number. At the moment, the only action
98 with explicit Cacti output is 'dbstats', and using the --output option
99 is not needed in this case, as Cacti is the only output for this
100 action. For many other actions, using --simple is enough to make Cacti
101 happy.
102
104 All actions accept a common set of database options.
105
106 -H NAME or --host=NAME
107 Connect to the host indicated by NAME. Can be a comma-separated
108 list of names. Multiple host arguments are allowed. If no host is
109 given, defaults to the "PGHOST" environment variable or no host at
110 all (which indicates using a local Unix socket). You may also use
111 "--dbhost".
112
113 -p PORT or --port=PORT
114 Connects using the specified PORT number. Can be a comma-separated
115 list of port numbers, and multiple port arguments are allowed. If
116 no port number is given, defaults to the "PGPORT" environment
117 variable. If that is not set, it defaults to 5432. You may also use
118 "--dbport"
119
120 -db NAME or --dbname=NAME
121 Specifies which database to connect to. Can be a comma-separated
122 list of names, and multiple dbname arguments are allowed. If no
123 dbname option is provided, defaults to the "PGDATABASE" environment
124 variable. If that is not set, it defaults to 'postgres' if psql is
125 version 8 or greater, and 'template1' otherwise.
126
127 -u USERNAME or --dbuser=USERNAME
128 The name of the database user to connect as. Can be a comma-
129 separated list of usernames, and multiple dbuser arguments are
130 allowed. If this is not provided, it defaults to the "PGUSER"
131 environment variable, otherwise it defaults to 'postgres'.
132
133 --dbpass=PASSWORD
134 Provides the password to connect to the database with. Use of this
135 option is highly discouraged. Instead, one should use a .pgpass or
136 pg_service.conf file.
137
138 --dbservice=NAME
139 The name of a service inside of the pg_service.conf file. Before
140 version 9.0 of Postgres, this is a global file, usually found in
141 /etc/pg_service.conf. If you are using version 9.0 or higher of
142 Postgres, you can use the file ".pg_service.conf" in the home
143 directory of the user running the script, e.g. nagios.
144
145 This file contains a simple list of connection options. You can
146 also pass additional information when using this option such as
147 --dbservice="maindatabase sslmode=require"
148
149 The documentation for this file can be found at
150 https://www.postgresql.org/docs/current/static/libpq-pgservice.html
151
152 The database connection options can be grouped: --host=a,b --host=c
153 --port=1234 --port=3344 would connect to a-1234, b-1234, and c-3344.
154 Note that once set, an option carries over until it is changed again.
155
156 Examples:
157
158 --host=a,b --port=5433 --db=c
159 Connects twice to port 5433, using database c, to hosts a and b: a-5433-c b-5433-c
160
161 --host=a,b --port=5433 --db=c,d
162 Connects four times: a-5433-c a-5433-d b-5433-c b-5433-d
163
164 --host=a,b --host=foo --port=1234 --port=5433 --db=e,f
165 Connects six times: a-1234-e a-1234-f b-1234-e b-1234-f foo-5433-e foo-5433-f
166
167 --host=a,b --host=x --port=5432,5433 --dbuser=alice --dbuser=bob -db=baz
168 Connects three times: a-5432-alice-baz b-5433-alice-baz x-5433-bob-baz
169
170 --dbservice="foo" --port=5433
171 Connects using the named service 'foo' in the pg_service.conf file, but overrides the port
172
174 Other options include:
175
176 --action=NAME
177 States what action we are running. Required unless using a
178 symlinked file, in which case the name of the file is used to
179 figure out the action.
180
181 --warning=VAL or -w VAL
182 Sets the threshold at which a warning alert is fired. The valid
183 options for this option depends on the action used.
184
185 --critical=VAL or -c VAL
186 Sets the threshold at which a critical alert is fired. The valid
187 options for this option depends on the action used.
188
189 -t VAL or --timeout=VAL
190 Sets the timeout in seconds after which the script will abort
191 whatever it is doing and return an UNKNOWN status. The timeout is
192 per Postgres cluster, not for the entire script. The default value
193 is 10; the units are always in seconds.
194
195 --assume-standby-mode
196 If specified, first the check if server in standby mode will be
197 performed (--datadir is required), if so, all checks that require
198 SQL queries will be ignored and "Server in standby mode" with OK
199 status will be returned instead.
200
201 Example:
202
203 postgres@db$./check_postgres.pl --action=version --warning=8.1 --datadir /var/lib/postgresql/8.3/main/ --assume-standby-mode
204 POSTGRES_VERSION OK: Server in standby mode | time=0.00
205
206 --assume-prod
207 If specified, check if server in production mode is performed
208 (--datadir is required). The option is only relevant for
209 ("symlink: check_postgres_checkpoint").
210
211 Example:
212
213 postgres@db$./check_postgres.pl --action=checkpoint --datadir /var/lib/postgresql/8.3/main/ --assume-prod
214 POSTGRES_CHECKPOINT OK: Last checkpoint was 72 seconds ago | age=72;;300 mode=MASTER
215
216 --assume-async
217 If specified, indicates that any replication between servers is
218 asynchronous. The option is only relevant for ("symlink:
219 check_postgres_same_schema").
220
221 Example:
222 postgres@db$./check_postgres.pl --action=same_schema
223 --assume-async --dbhost=star,line
224
225 -h or --help
226 Displays a help screen with a summary of all actions and options.
227
228 --man
229 Displays the entire manual.
230
231 -V or --version
232 Shows the current version.
233
234 -v or --verbose
235 Set the verbosity level. Can call more than once to boost the
236 level. Setting it to three or higher (in other words, issuing "-v
237 -v -v") turns on debugging information for this program which is
238 sent to stderr.
239
240 --showperf=VAL
241 Determines if we output additional performance data in standard
242 Nagios format (at end of string, after a pipe symbol, using
243 name=value). VAL should be 0 or 1. The default is 1. Only takes
244 effect if using Nagios output mode.
245
246 --perflimit=i
247 Sets a limit as to how many items of interest are reported back
248 when using the showperf option. This only has an effect for actions
249 that return a large number of items, such as table_size. The
250 default is 0, or no limit. Be careful when using this with the
251 --include or --exclude options, as those restrictions are done
252 after the query has been run, and thus your limit may not include
253 the items you want. Only takes effect if using Nagios output mode.
254
255 --showtime=VAL
256 Determines if the time taken to run each query is shown in the
257 output. VAL should be 0 or 1. The default is 1. No effect unless
258 showperf is on. Only takes effect if using Nagios output mode.
259
260 --test
261 Enables test mode. See the "TEST MODE" section below.
262
263 --PGBINDIR=PATH
264 Tells the script where to find the psql binaries. Useful if you
265 have more than one version of the PostgreSQL executables on your
266 system, or if there are not in your path. Note that this option is
267 in all uppercase. By default, this option is not allowed. To enable
268 it, you must change the $NO_PSQL_OPTION near the top of the script
269 to 0. Avoid using this option if you can, and instead use
270 environment variable c<PGBINDIR> or hard-coded $PGBINDIR variable,
271 also near the top of the script, to set the path to the PostgreSQL
272 to use.
273
274 --PSQL=PATH
275 (deprecated, this option may be removed in a future release!)
276 Tells the script where to find the psql program. Useful if you have
277 more than one version of the psql executable on your system, or if
278 there is no psql program in your path. Note that this option is in
279 all uppercase. By default, this option is not allowed. To enable
280 it, you must change the $NO_PSQL_OPTION near the top of the script
281 to 0. Avoid using this option if you can, and instead hard-code
282 your psql location into the $PSQL variable, also near the top of
283 the script.
284
285 --symlinks
286 Creates symlinks to the main program for each action.
287
288 --output=VAL
289 Determines the format of the output, for use in various programs.
290 The default is 'nagios'. Available options are 'nagios', 'mrtg',
291 'simple' and 'cacti'.
292
293 --mrtg=VAL
294 Used only for the MRTG or simple output, for a few specific
295 actions.
296
297 --debugoutput=VAL
298 Outputs the exact string returned by psql, for use in debugging.
299 The value is one or more letters, which determine if the output is
300 displayed or not, where 'a' = all, 'c' = critical, 'w' = warning,
301 'o' = ok, and 'u' = unknown. Letters can be combined.
302
303 --get_method=VAL
304 Allows specification of the method used to fetch information for
305 the "new_version_cp", "new_version_pg", "new_version_bc",
306 "new_version_box", and "new_version_tnm" checks. The following
307 programs are tried, in order, to grab the information from the web:
308 GET, wget, fetch, curl, lynx, links. To force the use of just one
309 (and thus remove the overhead of trying all the others until one of
310 those works), enter one of the names as the argument to get_method.
311 For example, a BSD box might enter the following line in their
312 ".check_postgresrc" file:
313
314 get_method=fetch
315
316 --language=VAL
317 Set the language to use for all output messages. Normally, this is
318 detected by examining the environment variables LC_ALL,
319 LC_MESSAGES, and LANG, but setting this option will override any
320 such detection.
321
323 The action to be run is selected using the --action flag, or by using a
324 symlink to the main file that contains the name of the action inside of
325 it. For example, to run the action "timesync", you may either issue:
326
327 check_postgres.pl --action=timesync
328
329 or use a program named:
330
331 check_postgres_timesync
332
333 All the symlinks are created for you in the current directory if use
334 the option --symlinks:
335
336 perl check_postgres.pl --symlinks
337
338 If the file name already exists, it will not be overwritten. If the
339 file exists and is a symlink, you can force it to overwrite by using
340 "--action=build_symlinks_force".
341
342 Most actions take a --warning and a --critical option, indicating at
343 what point we change from OK to WARNING, and what point we go to
344 CRITICAL. Note that because criticals are always checked first, setting
345 the warning equal to the critical is an effective way to turn warnings
346 off and always give a critical.
347
348 The current supported actions are:
349
350 archive_ready
351 ("symlink: check_postgres_archive_ready") Checks how many WAL files
352 with extension .ready exist in the pg_xlog/archive_status directory
353 (PostgreSQL 10 and later: pg_wal/archive_status), which is found off of
354 your data_directory. If the --lsfunc option is not used then this
355 action must be run as a superuser, in order to access the contents of
356 the pg_xlog/archive_status directory. The minimum version to use this
357 action is Postgres 8.1. The --warning and --critical options are simply
358 the number of .ready files in the pg_xlog/archive_status directory.
359 Usually, these values should be low, turning on the archive mechanism,
360 we usually want it to archive WAL files as fast as possible.
361
362 If the archive command fail, number of WAL in your pg_xlog directory
363 will grow until exhausting all the disk space and force PostgreSQL to
364 stop immediately.
365
366 To avoid connecting as a database superuser, a wrapper function around
367 "pg_ls_dir()" should be defined as a superuser with SECURITY DEFINER,
368 and the --lsfunc option used. This example function, if defined by a
369 superuser, will allow the script to connect as a normal user nagios
370 with --lsfunc=ls_archive_status_dir
371
372 BEGIN;
373 CREATE FUNCTION ls_archive_status_dir()
374 RETURNS SETOF TEXT
375 AS $$ SELECT pg_ls_dir('pg_xlog/archive_status') $$
376 LANGUAGE SQL
377 SECURITY DEFINER;
378 REVOKE ALL ON FUNCTION ls_archive_status_dir() FROM PUBLIC;
379 GRANT EXECUTE ON FUNCTION ls_archive_status_dir() to nagios;
380 COMMIT;
381
382 Example 1: Check that the number of ready WAL files is 10 or less on
383 host "pluto", using a wrapper function "ls_archive_status_dir" to avoid
384 the need for superuser permissions
385
386 check_postgres_archive_ready --host=pluto --critical=10 --lsfunc=ls_archive_status_dir
387
388 For MRTG output, reports the number of ready WAL files on line 1.
389
390 autovac_freeze
391 ("symlink: check_postgres_autovac_freeze") Checks how close each
392 database is to the Postgres autovacuum_freeze_max_age setting. This
393 action will only work for databases version 8.2 or higher. The
394 --warning and --critical options should be expressed as percentages.
395 The 'age' of the transactions in each database is compared to the
396 autovacuum_freeze_max_age setting (200 million by default) to generate
397 a rounded percentage. The default values are 90% for the warning and
398 95% for the critical. Databases can be filtered by use of the --include
399 and --exclude options. See the "BASIC FILTERING" section for more
400 details.
401
402 Example 1: Give a warning when any databases on port 5432 are above 97%
403
404 check_postgres_autovac_freeze --port=5432 --warning="97%"
405
406 For MRTG output, the highest overall percentage is reported on the
407 first line, and the highest age is reported on the second line. All
408 databases which have the percentage from the first line are reported on
409 the fourth line, separated by a pipe symbol.
410
411 backends
412 ("symlink: check_postgres_backends") Checks the current number of
413 connections for one or more databases, and optionally compares it to
414 the maximum allowed, which is determined by the Postgres configuration
415 variable max_connections. The --warning and --critical options can take
416 one of three forms. First, a simple number can be given, which
417 represents the number of connections at which the alert will be given.
418 This choice does not use the max_connections setting. Second, the
419 percentage of available connections can be given. Third, a negative
420 number can be given which represents the number of connections left
421 until max_connections is reached. The default values for --warning and
422 --critical are '90%' and '95%'. You can also filter the databases by
423 use of the --include and --exclude options. See the "BASIC FILTERING"
424 section for more details.
425
426 To view only non-idle processes, you can use the --noidle argument.
427 Note that the user you are connecting as must be a superuser for this
428 to work properly.
429
430 Example 1: Give a warning when the number of connections on host quirm
431 reaches 120, and a critical if it reaches 150.
432
433 check_postgres_backends --host=quirm --warning=120 --critical=150
434
435 Example 2: Give a critical when we reach 75% of our max_connections
436 setting on hosts lancre or lancre2.
437
438 check_postgres_backends --warning='75%' --critical='75%' --host=lancre,lancre2
439
440 Example 3: Give a warning when there are only 10 more connection slots
441 left on host plasmid, and a critical when we have only 5 left.
442
443 check_postgres_backends --warning=-10 --critical=-5 --host=plasmid
444
445 Example 4: Check all databases except those with "test" in their name,
446 but allow ones that are named "pg_greatest". Connect as port 5432 on
447 the first two hosts, and as port 5433 on the third one. We want to
448 always throw a critical when we reach 30 or more connections.
449
450 check_postgres_backends --dbhost=hong,kong --dbhost=fooey --dbport=5432 --dbport=5433 --warning=30 --critical=30 --exclude="~test" --include="pg_greatest,~prod"
451
452 For MRTG output, the number of connections is reported on the first
453 line, and the fourth line gives the name of the database, plus the
454 current maximum_connections. If more than one database has been
455 queried, the one with the highest number of connections is output.
456
457 bloat
458 ("symlink: check_postgres_bloat") Checks the amount of bloat in tables
459 and indexes. (Bloat is generally the amount of dead unused space taken
460 up in a table or index. This space is usually reclaimed by use of the
461 VACUUM command.) This action requires that stats collection be enabled
462 on the target databases, and requires that ANALYZE is run frequently.
463 The --include and --exclude options can be used to filter out which
464 tables to look at. See the "BASIC FILTERING" section for more details.
465
466 The --warning and --critical options can be specified as sizes,
467 percents, or both. Valid size units are bytes, kilobytes, megabytes,
468 gigabytes, terabytes, exabytes, petabytes, and zettabytes. You can
469 abbreviate all of those with the first letter. Items without units are
470 assumed to be 'bytes'. The default values are '1 GB' and '5 GB'. The
471 value represents the number of "wasted bytes", or the difference
472 between what is actually used by the table and index, and what we
473 compute that it should be.
474
475 Note that this action has two hard-coded values to avoid false alarms
476 on smaller relations. Tables must have at least 10 pages, and indexes
477 at least 15, before they can be considered by this test. If you really
478 want to adjust these values, you can look for the variables $MINPAGES
479 and $MINIPAGES at the top of the "check_bloat" subroutine. These values
480 are ignored if either --exclude or --include is used.
481
482 Only the top 10 most bloated relations are shown. You can change this
483 number by using the --perflimit option to set your own limit.
484
485 The schema named 'information_schema' is excluded from this test, as
486 the only tables it contains are small and do not change.
487
488 Please note that the values computed by this action are not precise,
489 and should be used as a guideline only. Great effort was made to
490 estimate the correct size of a table, but in the end it is only an
491 estimate. The correct index size is even more of a guess than the
492 correct table size, but both should give a rough idea of how bloated
493 things are.
494
495 Example 1: Warn if any table on port 5432 is over 100 MB bloated, and
496 critical if over 200 MB
497
498 check_postgres_bloat --port=5432 --warning='100 M' --critical='200 M'
499
500 Example 2: Give a critical if table 'orders' on host 'sami' has more
501 than 10 megs of bloat
502
503 check_postgres_bloat --host=sami --include=orders --critical='10 MB'
504
505 Example 3: Give a critical if table 'q4' on database 'sales' is over
506 50% bloated
507
508 check_postgres_bloat --db=sales --include=q4 --critical='50%'
509
510 Example 4: Give a critical any table is over 20% bloated and has over
511 150 MB of bloat:
512
513 check_postgres_bloat --port=5432 --critical='20% and 150 M'
514
515 Example 5: Give a critical any table is over 40% bloated or has over
516 500 MB of bloat:
517
518 check_postgres_bloat --port=5432 --warning='500 M or 40%'
519
520 For MRTG output, the first line gives the highest number of wasted
521 bytes for the tables, and the second line gives the highest number of
522 wasted bytes for the indexes. The fourth line gives the database name,
523 table name, and index name information. If you want to output the bloat
524 ratio instead (how many times larger the relation is compared to how
525 large it should be), just pass in "--mrtg=ratio".
526
527 checkpoint
528 ("symlink: check_postgres_checkpoint") Determines how long since the
529 last checkpoint has been run. This must run on the same server as the
530 database that is being checked (e.g. the -h flag will not work). This
531 check is meant to run on a "warm standby" server that is actively
532 processing shipped WAL files, and is meant to check that your warm
533 standby is truly 'warm'. The data directory must be set, either by the
534 environment variable "PGDATA", or passing the "--datadir" argument. It
535 returns the number of seconds since the last checkpoint was run, as
536 determined by parsing the call to "pg_controldata". Because of this,
537 the pg_controldata executable must be available in the current path.
538 Alternatively, you can specify "PGBINDIR" as the directory that it
539 lives in. It is also possible to use the special options --assume-prod
540 or --assume-standby-mode, if the mode found is not the one expected, a
541 CRITICAL is emitted.
542
543 At least one warning or critical argument must be set.
544
545 This action requires the Date::Parse module.
546
547 For MRTG or simple output, returns the number of seconds.
548
549 cluster_id
550 ("symlink: check_postgres_cluster-id") Checks that the Database System
551 Identifier provided by pg_controldata is the same as last time you
552 checked. This must run on the same server as the database that is being
553 checked (e.g. the -h flag will not work). Either the --warning or the
554 --critical option should be given, but not both. The value of each one
555 is the cluster identifier, an integer value. You can run with the
556 special "--critical=0" option to find out an existing cluster
557 identifier.
558
559 Example 1: Find the initial identifier
560
561 check_postgres_cluster_id --critical=0 --datadir=/var//lib/postgresql/9.0/main
562
563 Example 2: Make sure the cluster is the same and warn if not, using the
564 result from above.
565
566 check_postgres_cluster_id --critical=5633695740047915135
567
568 For MRTG output, returns a 1 or 0 indicating success of failure of the
569 identifier to match. A identifier must be provided as the "--mrtg"
570 argument. The fourth line always gives the current identifier.
571
572 commitratio
573 ("symlink: check_postgres_commitratio") Checks the commit ratio of all
574 databases and complains when they are too low. There is no need to run
575 this command more than once per database cluster. Databases can be
576 filtered with the --include and --exclude options. See the "BASIC
577 FILTERING" section for more details. They can also be filtered by the
578 owner of the database with the --includeuser and --excludeuser options.
579 See the "USER NAME FILTERING" section for more details.
580
581 The warning and critical options should be specified as percentages.
582 There are not defaults for this action: the warning and critical must
583 be specified. The warning value cannot be greater than the critical
584 value. The output returns all databases sorted by commitratio, smallest
585 first.
586
587 Example: Warn if any database on host flagg is less than 90% in
588 commitratio, and critical if less then 80%.
589
590 check_postgres_database_commitratio --host=flagg --warning='90%' --critical='80%'
591
592 For MRTG output, returns the percentage of the database with the
593 smallest commitratio on the first line, and the name of the database on
594 the fourth line.
595
596 connection
597 ("symlink: check_postgres_connection") Simply connects, issues a
598 'SELECT version()', and leaves. Takes no --warning or --critical
599 options.
600
601 For MRTG output, simply outputs a 1 (good connection) or a 0 (bad
602 connection) on the first line.
603
604 custom_query
605 ("symlink: check_postgres_custom_query") Runs a custom query of your
606 choosing, and parses the results. The query itself is passed in
607 through the "query" argument, and should be kept as simple as possible.
608 If at all possible, wrap it in a view or a function to keep things
609 easier to manage. The query should return one or two columns. It is
610 required that one of the columns be named "result" and is the item that
611 will be checked against your warning and critical values. The second
612 column is for the performance data and any name can be used: this will
613 be the 'value' inside the performance data section.
614
615 At least one warning or critical argument must be specified. What these
616 are set to depends on the type of query you are running. There are four
617 types of custom_queries that can be run, specified by the "valtype"
618 argument. If none is specified, this action defaults to 'integer'. The
619 four types are:
620
621 integer: Does a simple integer comparison. The first column should be a
622 simple integer, and the warning and critical values should be the same.
623
624 string: The warning and critical are strings, and are triggered only if
625 the value in the first column matches it exactly. This is case-
626 sensitive.
627
628 time: The warning and the critical are times, and can have units of
629 seconds, minutes, hours, or days. Each may be written singular or
630 abbreviated to just the first letter. If no units are given, seconds
631 are assumed. The first column should be an integer representing the
632 number of seconds to check.
633
634 size: The warning and the critical are sizes, and can have units of
635 bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. Each
636 may be abbreviated to the first letter. If no units are given, bytes
637 are assumed. The first column should be an integer representing the
638 number of bytes to check.
639
640 Normally, an alert is triggered if the values returned are greater than
641 or equal to the critical or warning value. However, an option of
642 --reverse will trigger the alert if the returned value is lower than or
643 equal to the critical or warning value.
644
645 Example 1: Warn if any relation over 100 pages is named "rad", put the
646 number of pages inside the performance data section.
647
648 check_postgres_custom_query --valtype=string -w "rad" --query=
649 "SELECT relname AS result, relpages AS pages FROM pg_class WHERE relpages > 100"
650
651 Example 2: Give a critical if the "foobar" function returns a number
652 over 5MB:
653
654 check_postgres_custom_query --critical='5MB'--valtype=size --query="SELECT foobar() AS result"
655
656 Example 2: Warn if the function "snazzo" returns less than 42:
657
658 check_postgres_custom_query --critical=42 --query="SELECT snazzo() AS result" --reverse
659
660 If you come up with a useful custom_query, consider sending in a patch
661 to this program to make it into a standard action that other people can
662 use.
663
664 This action does not support MRTG or simple output yet.
665
666 database_size
667 ("symlink: check_postgres_database_size") Checks the size of all
668 databases and complains when they are too big. There is no need to run
669 this command more than once per database cluster. Databases can be
670 filtered with the --include and --exclude options. See the "BASIC
671 FILTERING" section for more details. They can also be filtered by the
672 owner of the database with the --includeuser and --excludeuser options.
673 See the "USER NAME FILTERING" section for more details.
674
675 The warning and critical options can be specified as bytes, kilobytes,
676 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
677 to the first letter as well. If no unit is given, the units are
678 assumed to be bytes. There are not defaults for this action: the
679 warning and critical must be specified. The warning value cannot be
680 greater than the critical value. The output returns all databases
681 sorted by size largest first, showing both raw bytes and a "pretty"
682 version of the size.
683
684 Example 1: Warn if any database on host flagg is over 1 TB in size, and
685 critical if over 1.1 TB.
686
687 check_postgres_database_size --host=flagg --warning='1 TB' --critical='1.1 t'
688
689 Example 2: Give a critical if the database template1 on port 5432 is
690 over 10 MB.
691
692 check_postgres_database_size --port=5432 --include=template1 --warning='10MB' --critical='10MB'
693
694 Example 3: Give a warning if any database on host 'tardis' owned by the
695 user 'tom' is over 5 GB
696
697 check_postgres_database_size --host=tardis --includeuser=tom --warning='5 GB' --critical='10 GB'
698
699 For MRTG output, returns the size in bytes of the largest database on
700 the first line, and the name of the database on the fourth line.
701
702 dbstats
703 ("symlink: check_postgres_dbstats") Reports information from the
704 pg_stat_database view, and outputs it in a Cacti-friendly manner. No
705 other output is supported, as the output is informational and does not
706 lend itself to alerts, such as used with Nagios. If no options are
707 given, all databases are returned, one per line. You can include a
708 specific database by use of the "--include" option, or you can use the
709 "--dbname" option.
710
711 Eleven items are returned on each line, in the format name:value,
712 separated by a single space. The items are:
713
714 backends
715 The number of currently running backends for this database.
716
717 commits
718 The total number of commits for this database since it was created
719 or reset.
720
721 rollbacks
722 The total number of rollbacks for this database since it was
723 created or reset.
724
725 read
726 The total number of disk blocks read.
727
728 hit The total number of buffer hits.
729
730 ret The total number of rows returned.
731
732 fetch
733 The total number of rows fetched.
734
735 ins The total number of rows inserted.
736
737 upd The total number of rows updated.
738
739 del The total number of rows deleted.
740
741 dbname
742 The name of the database.
743
744 Note that ret, fetch, ins, upd, and del items will always be 0 if
745 Postgres is version 8.2 or lower, as those stats were not available in
746 those versions.
747
748 If the dbname argument is given, seven additional items are returned:
749
750 idxscan
751 Total number of user index scans.
752
753 idxtupread
754 Total number of user index entries returned.
755
756 idxtupfetch
757 Total number of rows fetched by simple user index scans.
758
759 idxblksread
760 Total number of disk blocks read for all user indexes.
761
762 idxblkshit
763 Total number of buffer hits for all user indexes.
764
765 seqscan
766 Total number of sequential scans against all user tables.
767
768 seqtupread
769 Total number of tuples returned from all user tables.
770
771 Example 1: Grab the stats for a database named "products" on host
772 "willow":
773
774 check_postgres_dbstats --dbhost willow --dbname products
775
776 The output returned will be like this (all on one line, not wrapped):
777
778 backends:82 commits:58374408 rollbacks:1651 read:268435543 hit:2920381758 idxscan:310931294 idxtupread:2777040927
779 idxtupfetch:1840241349 idxblksread:62860110 idxblkshit:1107812216 seqscan:5085305 seqtupread:5370500520
780 ret:0 fetch:0 ins:0 upd:0 del:0 dbname:willow
781
782 disabled_triggers
783 ("symlink: check_postgres_disabled_triggers") Checks on the number of
784 disabled triggers inside the database. The --warning and --critical
785 options are the number of such triggers found, and both default to "1",
786 as in normal usage having disabled triggers is a dangerous event. If
787 the database being checked is 8.3 or higher, the check is for the
788 number of triggers that are in a 'disabled' status (as opposed to being
789 'always' or 'replica'). The output will show the name of the table and
790 the name of the trigger for each disabled trigger.
791
792 Example 1: Make sure that there are no disabled triggers
793
794 check_postgres_disabled_triggers
795
796 For MRTG output, returns the number of disabled triggers on the first
797 line.
798
799 disk_space
800 ("symlink: check_postgres_disk_space") Checks on the available physical
801 disk space used by Postgres. This action requires that you have the
802 executable "/bin/df" available to report on disk sizes, and it also
803 needs to be run as a superuser, so it can examine the data_directory
804 setting inside of Postgres. The --warning and --critical options are
805 given in either sizes or percentages or both. If using sizes, the
806 standard unit types are allowed: bytes, kilobytes, gigabytes,
807 megabytes, gigabytes, terabytes, or exabytes. Each may be abbreviated
808 to the first letter only; no units at all indicates 'bytes'. The
809 default values are '90%' and '95%'.
810
811 This command checks the following things to determine all of the
812 different physical disks being used by Postgres.
813
814 data_directory - The disk that the main data directory is on.
815
816 log directory - The disk that the log files are on.
817
818 WAL file directory - The disk that the write-ahead logs are on (e.g.
819 symlinked pg_xlog or pg_wal)
820
821 tablespaces - Each tablespace that is on a separate disk.
822
823 The output shows the total size used and available on each disk, as
824 well as the percentage, ordered by highest to lowest percentage used.
825 Each item above maps to a file system: these can be included or
826 excluded. See the "BASIC FILTERING" section for more details.
827
828 Example 1: Make sure that no file system is over 90% for the database
829 on port 5432.
830
831 check_postgres_disk_space --port=5432 --warning='90%' --critical='90%'
832
833 Example 2: Check that all file systems starting with /dev/sda are
834 smaller than 10 GB and 11 GB (warning and critical)
835
836 check_postgres_disk_space --port=5432 --warning='10 GB' --critical='11 GB' --include="~^/dev/sda"
837
838 Example 4: Make sure that no file system is both over 50% and has over
839 15 GB
840
841 check_postgres_disk_space --critical='50% and 15 GB'
842
843 Example 5: Issue a warning if any file system is either over 70% full
844 or has more than 1T
845
846 check_postgres_disk_space --warning='1T or 75'
847
848 For MRTG output, returns the size in bytes of the file system on the
849 first line, and the name of the file system on the fourth line.
850
851 fsm_pages
852 ("symlink: check_postgres_fsm_pages") Checks how close a cluster is to
853 the Postgres max_fsm_pages setting. This action will only work for
854 databases of 8.2 or higher, and it requires the contrib module
855 pg_freespacemap be installed. The --warning and --critical options
856 should be expressed as percentages. The number of used pages in the
857 free-space-map is determined by looking in the
858 pg_freespacemap_relations view, and running a formula based on the
859 formula used for outputting free-space-map pageslots in the vacuum
860 verbose command. The default values are 85% for the warning and 95% for
861 the critical.
862
863 Example 1: Give a warning when our cluster has used up 76% of the free-
864 space pageslots, with pg_freespacemap installed in database robert
865
866 check_postgres_fsm_pages --dbname=robert --warning="76%"
867
868 While you need to pass in the name of the database where
869 pg_freespacemap is installed, you only need to run this check once per
870 cluster. Also, checking this information does require obtaining special
871 locks on the free-space-map, so it is recommend you do not run this
872 check with short intervals.
873
874 For MRTG output, returns the percent of free-space-map on the first
875 line, and the number of pages currently used on the second line.
876
877 fsm_relations
878 ("symlink: check_postgres_fsm_relations") Checks how close a cluster is
879 to the Postgres max_fsm_relations setting. This action will only work
880 for databases of 8.2 or higher, and it requires the contrib module
881 pg_freespacemap be installed. The --warning and --critical options
882 should be expressed as percentages. The number of used relations in the
883 free-space-map is determined by looking in the
884 pg_freespacemap_relations view. The default values are 85% for the
885 warning and 95% for the critical.
886
887 Example 1: Give a warning when our cluster has used up 80% of the free-
888 space relations, with pg_freespacemap installed in database dylan
889
890 check_postgres_fsm_relations --dbname=dylan --warning="75%"
891
892 While you need to pass in the name of the database where
893 pg_freespacemap is installed, you only need to run this check once per
894 cluster. Also, checking this information does require obtaining special
895 locks on the free-space-map, so it is recommend you do not run this
896 check with short intervals.
897
898 For MRTG output, returns the percent of free-space-map on the first
899 line, the number of relations currently used on the second line.
900
901 hitratio
902 ("symlink: check_postgres_hitratio") Checks the hit ratio of all
903 databases and complains when they are too low. There is no need to run
904 this command more than once per database cluster. Databases can be
905 filtered with the --include and --exclude options. See the "BASIC
906 FILTERING" section for more details. They can also be filtered by the
907 owner of the database with the --includeuser and --excludeuser options.
908 See the "USER NAME FILTERING" section for more details.
909
910 The warning and critical options should be specified as percentages.
911 There are not defaults for this action: the warning and critical must
912 be specified. The warning value cannot be greater than the critical
913 value. The output returns all databases sorted by hitratio, smallest
914 first.
915
916 Example: Warn if any database on host flagg is less than 90% in
917 hitratio, and critical if less then 80%.
918
919 check_postgres_hitratio --host=flagg --warning='90%' --critical='80%'
920
921 For MRTG output, returns the percentage of the database with the
922 smallest hitratio on the first line, and the name of the database on
923 the fourth line.
924
925 hot_standby_delay
926 ("symlink: check_hot_standby_delay") Checks the streaming replication
927 lag by computing the delta between the current xlog position of a
928 master server and the replay location of a slave connected to it. The
929 slave server must be in hot_standby (e.g. read only) mode, therefore
930 the minimum version to use this action is Postgres 9.0. The --warning
931 and --critical options are the delta between the xlog locations. Since
932 these values are byte offsets in the WAL they should match the expected
933 transaction volume of your application to prevent false positives or
934 negatives.
935
936 The first "--dbname", "--host", and "--port", etc. options are
937 considered the master; the second belongs to the slave.
938
939 Byte values should be based on the volume of transactions needed to
940 have the streaming replication disconnect from the master because of
941 too much lag, determined by the Postgres configuration variable
942 wal_keep_segments. For units of time, valid units are 'seconds',
943 'minutes', 'hours', or 'days'. Each may be written singular or
944 abbreviated to just the first letter. When specifying both, in the form
945 'bytes and time', both conditions must be true for the threshold to be
946 met.
947
948 You must provide information on how to reach the databases by providing
949 a comma separated list to the --dbhost and --dbport parameters, such as
950 "--dbport=5432,5543". If not given, the action fails.
951
952 Example 1: Warn a database with a local replica on port 5433 is behind
953 on any xlog replay at all
954
955 check_hot_standby_delay --dbport=5432,5433 --warning='1'
956
957 Example 2: Give a critical if the last transaction replica1 receives is
958 more than 10 minutes ago
959
960 check_hot_standby_delay --dbhost=master,replica1 --critical='10 min'
961
962 Example 3: Allow replica1 to be 1 WAL segment behind, if the master is
963 momentarily seeing more activity than the streaming replication
964 connection can handle, or 10 minutes behind, if the master is seeing
965 very little activity and not processing any transactions, but not both,
966 which would indicate a lasting problem with the replication connection.
967
968 check_hot_standby_delay --dbhost=master,replica1 --warning='1048576 and 2 min' --critical='16777216 and 10 min'
969
970 relation_size
971 index_size
972 table_size
973 indexes_size
974 total_relation_size
975 (symlinks: "check_postgres_relation_size", "check_postgres_index_size",
976 "check_postgres_table_size", "check_postgres_indexes_size", and
977 "check_postgres_total_relation_size")
978
979 The actions relation_size and index_size check for a relation (table,
980 index, materialized view), respectively an index that has grown too
981 big, using the pg_relation_size() function.
982
983 The action table_size checks tables and materialized views using
984 pg_table_size(), i.e. including relation forks and TOAST table.
985
986 The action indexes_size checks tables and materialized views for the
987 size of the attached indexes using pg_indexes_size().
988
989 The action total_relation_size checks relations using
990 pg_total_relation_size(), i.e. including relation forks, indexes and
991 TOAST table.
992
993 Relations can be filtered with the --include and --exclude options. See
994 the "BASIC FILTERING" section for more details. Relations can also be
995 filtered by the user that owns them, by using the --includeuser and
996 --excludeuser options. See the "USER NAME FILTERING" section for more
997 details.
998
999 The values for the --warning and --critical options are file sizes, and
1000 may have units of bytes, kilobytes, megabytes, gigabytes, terabytes, or
1001 exabytes. Each can be abbreviated to the first letter. If no units are
1002 given, bytes are assumed. There are no default values: both the warning
1003 and the critical option must be given. The return text shows the size
1004 of the largest relation found.
1005
1006 If the --showperf option is enabled, all of the relations with their
1007 sizes will be given. To prevent this, it is recommended that you set
1008 the --perflimit option, which will cause the query to do a "ORDER BY
1009 size DESC LIMIT (perflimit)".
1010
1011 Example 1: Give a critical if any table is larger than 600MB on host
1012 burrick.
1013
1014 check_postgres_table_size --critical='600 MB' --warning='600 MB' --host=burrick
1015
1016 Example 2: Warn if the table products is over 4 GB in size, and give a
1017 critical at 4.5 GB.
1018
1019 check_postgres_table_size --host=burrick --warning='4 GB' --critical='4.5 GB' --include=products
1020
1021 Example 3: Warn if any index not owned by postgres goes over 500 MB.
1022
1023 check_postgres_index_size --port=5432 --excludeuser=postgres -w 500MB -c 600MB
1024
1025 For MRTG output, returns the size in bytes of the largest relation, and
1026 the name of the database and relation as the fourth line.
1027
1028 last_analyze
1029 last_vacuum
1030 last_autoanalyze
1031 last_autovacuum
1032 (symlinks: "check_postgres_last_analyze", "check_postgres_last_vacuum",
1033 "check_postgres_last_autoanalyze", and
1034 "check_postgres_last_autovacuum") Checks how long it has been since
1035 vacuum (or analyze) was last run on each table in one or more
1036 databases. Use of these actions requires that the target database is
1037 version 8.3 or greater, or that the version is 8.2 and the
1038 configuration variable stats_row_level has been enabled. Tables can be
1039 filtered with the --include and --exclude options. See the "BASIC
1040 FILTERING" section for more details. Tables can also be filtered by
1041 their owner by use of the --includeuser and --excludeuser options. See
1042 the "USER NAME FILTERING" section for more details.
1043
1044 The units for --warning and --critical are specified as times. Valid
1045 units are seconds, minutes, hours, and days; all can be abbreviated to
1046 the first letter. If no units are given, 'seconds' are assumed. The
1047 default values are '1 day' and '2 days'. Please note that there are
1048 cases in which this field does not get automatically populated. If
1049 certain tables are giving you problems, make sure that they have dead
1050 rows to vacuum, or just exclude them from the test.
1051
1052 The schema named 'information_schema' is excluded from this test, as
1053 the only tables it contains are small and do not change.
1054
1055 Note that the non-'auto' versions will also check on the auto versions
1056 as well. In other words, using last_vacuum will report on the last
1057 vacuum, whether it was a normal vacuum, or one run by the autovacuum
1058 daemon.
1059
1060 Example 1: Warn if any table has not been vacuumed in 3 days, and give
1061 a critical at a week, for host wormwood
1062
1063 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d'
1064
1065 Example 2: Same as above, but skip tables belonging to the users 'eve'
1066 or 'mallory'
1067
1068 check_postgres_last_vacuum --host=wormwood --warning='3d' --critical='7d' --excludeuser=eve,mallory
1069
1070 For MRTG output, returns (on the first line) the LEAST amount of time
1071 in seconds since a table was last vacuumed or analyzed. The fourth line
1072 returns the name of the database and name of the table.
1073
1074 listener
1075 ("symlink: check_postgres_listener") Confirm that someone is listening
1076 for one or more specific strings (using the LISTEN/NOTIFY system), by
1077 looking at the pg_listener table. Only one of warning or critical is
1078 needed. The format is a simple string representing the LISTEN target,
1079 or a tilde character followed by a string for a regular expression
1080 check. Note that this check will not work on versions of Postgres 9.0
1081 or higher.
1082
1083 Example 1: Give a warning if nobody is listening for the string
1084 bucardo_mcp_ping on ports 5555 and 5556
1085
1086 check_postgres_listener --port=5555,5556 --warning=bucardo_mcp_ping
1087
1088 Example 2: Give a critical if there are no active LISTEN requests
1089 matching 'grimm' on database oskar
1090
1091 check_postgres_listener --db oskar --critical=~grimm
1092
1093 For MRTG output, returns a 1 or a 0 on the first, indicating success or
1094 failure. The name of the notice must be provided via the --mrtg option.
1095
1096 locks
1097 ("symlink: check_postgres_locks") Check the total number of locks on
1098 one or more databases. There is no need to run this more than once per
1099 database cluster. Databases can be filtered with the --include and
1100 --exclude options. See the "BASIC FILTERING" section for more details.
1101
1102 The --warning and --critical options can be specified as simple
1103 numbers, which represent the total number of locks, or they can be
1104 broken down by type of lock. Valid lock names are 'total', 'waiting',
1105 or the name of a lock type used by Postgres. These names are case-
1106 insensitive and do not need the "lock" part on the end, so exclusive
1107 will match 'ExclusiveLock'. The format is name=number, with different
1108 items separated by colons or semicolons (or any other symbol).
1109
1110 Example 1: Warn if the number of locks is 100 or more, and critical if
1111 200 or more, on host garrett
1112
1113 check_postgres_locks --host=garrett --warning=100 --critical=200
1114
1115 Example 2: On the host artemus, warn if 200 or more locks exist, and
1116 give a critical if over 250 total locks exist, or if over 20 exclusive
1117 locks exist, or if over 5 connections are waiting for a lock.
1118
1119 check_postgres_locks --host=artemus --warning=200 --critical="total=250:waiting=5:exclusive=20"
1120
1121 For MRTG output, returns the number of locks on the first line, and the
1122 name of the database on the fourth line.
1123
1124 logfile
1125 ("symlink: check_postgres_logfile") Ensures that the logfile is in the
1126 expected location and is being logged to. This action issues a command
1127 that throws an error on each database it is checking, and ensures that
1128 the message shows up in the logs. It scans the various log_* settings
1129 inside of Postgres to figure out where the logs should be. If you are
1130 using syslog, it does a rough (but not foolproof) scan of
1131 /etc/syslog.conf. Alternatively, you can provide the name of the
1132 logfile with the --logfile option. This is especially useful if the
1133 logs have a custom rotation scheme driven be an external program. The
1134 --logfile option supports the following escape characters: "%Y %m %d
1135 %H", which represent the current year, month, date, and hour
1136 respectively. An error is always reported as critical unless the
1137 warning option has been passed in as a non-zero value. Other than that
1138 specific usage, the "--warning" and "--critical" options should not be
1139 used.
1140
1141 Example 1: On port 5432, ensure the logfile is being written to the
1142 file /home/greg/pg8.2.log
1143
1144 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log
1145
1146 Example 2: Same as above, but raise a warning, not a critical
1147
1148 check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log -w 1
1149
1150 For MRTG output, returns a 1 or 0 on the first line, indicating success
1151 or failure. In case of a failure, the fourth line will provide more
1152 detail on the failure encountered.
1153
1154 new_version_bc
1155 ("symlink: check_postgres_new_version_bc") Checks if a newer version of
1156 the Bucardo program is available. The current version is obtained by
1157 running "bucardo_ctl --version". If a major upgrade is available, a
1158 warning is returned. If a revision upgrade is available, a critical is
1159 returned. (Bucardo is a master to slave, and master to master
1160 replication system for Postgres: see https://bucardo.org/ for more
1161 information). See also the information on the "--get_method" option.
1162
1163 new_version_box
1164 ("symlink: check_postgres_new_version_box") Checks if a newer version
1165 of the boxinfo program is available. The current version is obtained by
1166 running "boxinfo.pl --version". If a major upgrade is available, a
1167 warning is returned. If a revision upgrade is available, a critical is
1168 returned. (boxinfo is a program for grabbing important information from
1169 a server and putting it into a HTML format: see
1170 https://bucardo.org/Boxinfo/ for more information). See also the
1171 information on the "--get_method" option.
1172
1173 new_version_cp
1174 ("symlink: check_postgres_new_version_cp") Checks if a newer version of
1175 this program (check_postgres.pl) is available, by grabbing the version
1176 from a small text file on the main page of the home page for the
1177 project. Returns a warning if the returned version does not match the
1178 one you are running. Recommended interval to check is once a day. See
1179 also the information on the "--get_method" option.
1180
1181 new_version_pg
1182 ("symlink: check_postgres_new_version_pg") Checks if a newer revision
1183 of Postgres exists for each database connected to. Note that this only
1184 checks for revision, e.g. going from 8.3.6 to 8.3.7. Revisions are
1185 always 100% binary compatible and involve no dump and restore to
1186 upgrade. Revisions are made to address bugs, so upgrading as soon as
1187 possible is always recommended. Returns a warning if you do not have
1188 the latest revision. It is recommended this check is run at least once
1189 a day. See also the information on the "--get_method" option.
1190
1191 new_version_tnm
1192 ("symlink: check_postgres_new_version_tnm") Checks if a newer version
1193 of the tail_n_mail program is available. The current version is
1194 obtained by running "tail_n_mail --version". If a major upgrade is
1195 available, a warning is returned. If a revision upgrade is available, a
1196 critical is returned. (tail_n_mail is a log monitoring tool that can
1197 send mail when interesting events appear in your Postgres logs. See:
1198 https://bucardo.org/tail_n_mail/ for more information). See also the
1199 information on the "--get_method" option.
1200
1201 pgb_pool_cl_active
1202 pgb_pool_cl_waiting
1203 pgb_pool_sv_active
1204 pgb_pool_sv_idle
1205 pgb_pool_sv_used
1206 pgb_pool_sv_tested
1207 pgb_pool_sv_login
1208 pgb_pool_maxwait
1209 (symlinks: "check_postgres_pgb_pool_cl_active",
1210 "check_postgres_pgb_pool_cl_waiting",
1211 "check_postgres_pgb_pool_sv_active", "check_postgres_pgb_pool_sv_idle",
1212 "check_postgres_pgb_pool_sv_used", "check_postgres_pgb_pool_sv_tested",
1213 "check_postgres_pgb_pool_sv_login", and
1214 "check_postgres_pgb_pool_maxwait")
1215
1216 Examines pgbouncer's pool statistics. Each pool has a set of "client"
1217 connections, referring to connections from external clients, and
1218 "server" connections, referring to connections to PostgreSQL itself.
1219 The related check_postgres actions are prefixed by "cl_" and "sv_",
1220 respectively. Active client connections are those connections currently
1221 linked with an active server connection. Client connections may also be
1222 "waiting", meaning they have not yet been allocated a server
1223 connection. Server connections are "active" (linked to a client),
1224 "idle" (standing by for a client connection to link with), "used" (just
1225 unlinked from a client, and not yet returned to the idle pool),
1226 "tested" (currently being tested) and "login" (in the process of
1227 logging in). The maxwait value shows how long in seconds the oldest
1228 waiting client connection has been waiting.
1229
1230 pgbouncer_backends
1231 ("symlink: check_postgres_pgbouncer_backends") Checks the current
1232 number of connections for one or more databases through pgbouncer, and
1233 optionally compares it to the maximum allowed, which is determined by
1234 the pgbouncer configuration variable max_client_conn. The --warning and
1235 --critical options can take one of three forms. First, a simple number
1236 can be given, which represents the number of connections at which the
1237 alert will be given. This choice does not use the max_connections
1238 setting. Second, the percentage of available connections can be given.
1239 Third, a negative number can be given which represents the number of
1240 connections left until max_connections is reached. The default values
1241 for --warning and --critical are '90%' and '95%'. You can also filter
1242 the databases by use of the --include and --exclude options. See the
1243 "BASIC FILTERING" section for more details.
1244
1245 To view only non-idle processes, you can use the --noidle argument.
1246 Note that the user you are connecting as must be a superuser for this
1247 to work properly.
1248
1249 Example 1: Give a warning when the number of connections on host quirm
1250 reaches 120, and a critical if it reaches 150.
1251
1252 check_postgres_pgbouncer_backends --host=quirm --warning=120 --critical=150 -p 6432 -u pgbouncer
1253
1254 Example 2: Give a critical when we reach 75% of our max_connections
1255 setting on hosts lancre or lancre2.
1256
1257 check_postgres_pgbouncer_backends --warning='75%' --critical='75%' --host=lancre,lancre2 -p 6432 -u pgbouncer
1258
1259 Example 3: Give a warning when there are only 10 more connection slots
1260 left on host plasmid, and a critical when we have only 5 left.
1261
1262 check_postgres_pgbouncer_backends --warning=-10 --critical=-5 --host=plasmid -p 6432 -u pgbouncer
1263
1264 For MRTG output, the number of connections is reported on the first
1265 line, and the fourth line gives the name of the database, plus the
1266 current max_client_conn. If more than one database has been queried,
1267 the one with the highest number of connections is output.
1268
1269 pgbouncer_checksum
1270 ("symlink: check_postgres_pgbouncer_checksum") Checks that all the
1271 pgBouncer settings are the same as last time you checked. This is done
1272 by generating a checksum of a sorted list of setting names and their
1273 values. Note that you shouldn't specify the database name, it will
1274 automatically default to pgbouncer. Either the --warning or the
1275 --critical option should be given, but not both. The value of each one
1276 is the checksum, a 32-character hexadecimal value. You can run with the
1277 special "--critical=0" option to find out an existing checksum.
1278
1279 This action requires the Digest::MD5 module.
1280
1281 Example 1: Find the initial checksum for pgbouncer configuration on
1282 port 6432 using the default user (usually postgres)
1283
1284 check_postgres_pgbouncer_checksum --port=6432 --critical=0
1285
1286 Example 2: Make sure no settings have changed and warn if so, using the
1287 checksum from above.
1288
1289 check_postgres_pgbouncer_checksum --port=6432 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1290
1291 For MRTG output, returns a 1 or 0 indicating success of failure of the
1292 checksum to match. A checksum must be provided as the "--mrtg"
1293 argument. The fourth line always gives the current checksum.
1294
1295 pgagent_jobs
1296 ("symlink: check_postgres_pgagent_jobs") Checks that all the pgAgent
1297 jobs that have executed in the preceding interval of time have
1298 succeeded. This is done by checking for any steps that have a non-zero
1299 result.
1300
1301 Either "--warning" or "--critical", or both, may be specified as times,
1302 and jobs will be checked for failures withing the specified periods of
1303 time before the current time. Valid units are seconds, minutes, hours,
1304 and days; all can be abbreviated to the first letter. If no units are
1305 given, 'seconds' are assumed.
1306
1307 Example 1: Give a critical when any jobs executed in the last day have
1308 failed.
1309
1310 check_postgres_pgagent_jobs --critical=1d
1311
1312 Example 2: Give a warning when any jobs executed in the last week have
1313 failed.
1314
1315 check_postgres_pgagent_jobs --warning=7d
1316
1317 Example 3: Give a critical for jobs that have failed in the last 2
1318 hours and a warning for jobs that have failed in the last 4 hours:
1319
1320 check_postgres_pgagent_jobs --critical=2h --warning=4h
1321
1322 prepared_txns
1323 ("symlink: check_postgres_prepared_txns") Check on the age of any
1324 existing prepared transactions. Note that most people will NOT use
1325 prepared transactions, as they are part of two-part commit and
1326 complicated to maintain. They should also not be confused with prepared
1327 STATEMENTS, which is what most people think of when they hear prepare.
1328 The default value for a warning is 1 second, to detect any use of
1329 prepared transactions, which is probably a mistake on most systems.
1330 Warning and critical are the number of seconds a prepared transaction
1331 has been open before an alert is given.
1332
1333 Example 1: Give a warning on detecting any prepared transactions:
1334
1335 check_postgres_prepared_txns -w 0
1336
1337 Example 2: Give a critical if any prepared transaction has been open
1338 longer than 10 seconds, but allow up to 360 seconds for the database
1339 'shrike':
1340
1341 check_postgres_prepared_txns --critical=10 --exclude=shrike
1342 check_postgres_prepared_txns --critical=360 --include=shrike
1343
1344 For MRTG output, returns the number of seconds the oldest transaction
1345 has been open as the first line, and which database is came from as the
1346 final line.
1347
1348 query_runtime
1349 ("symlink: check_postgres_query_runtime") Checks how long a specific
1350 query takes to run, by executing a "EXPLAIN ANALYZE" against it. The
1351 --warning and --critical options are the maximum amount of time the
1352 query should take. Valid units are seconds, minutes, and hours; any can
1353 be abbreviated to the first letter. If no units are given, 'seconds'
1354 are assumed. Both the warning and the critical option must be given.
1355 The name of the view or function to be run must be passed in to the
1356 --queryname option. It must consist of a single word (or schema.word),
1357 with optional parens at the end.
1358
1359 Example 1: Give a critical if the function named "speedtest" fails to
1360 run in 10 seconds or less.
1361
1362 check_postgres_query_runtime --queryname='speedtest()' --critical=10 --warning=10
1363
1364 For MRTG output, reports the time in seconds for the query to complete
1365 on the first line. The fourth line lists the database.
1366
1367 query_time
1368 ("symlink: check_postgres_query_time") Checks the length of running
1369 queries on one or more databases. There is no need to run this more
1370 than once on the same database cluster. Note that this already excludes
1371 queries that are "idle in transaction". Databases can be filtered by
1372 using the --include and --exclude options. See the "BASIC FILTERING"
1373 section for more details. You can also filter on the user running the
1374 query with the --includeuser and --excludeuser options. See the "USER
1375 NAME FILTERING" section for more details.
1376
1377 The values for the --warning and --critical options are amounts of
1378 time, and at least one must be provided (no defaults). Valid units are
1379 'seconds', 'minutes', 'hours', or 'days'. Each may be written singular
1380 or abbreviated to just the first letter. If no units are given, the
1381 unit is assumed to be seconds.
1382
1383 This action requires Postgres 8.1 or better.
1384
1385 Example 1: Give a warning if any query has been running longer than 3
1386 minutes, and a critical if longer than 5 minutes.
1387
1388 check_postgres_query_time --port=5432 --warning='3 minutes' --critical='5 minutes'
1389
1390 Example 2: Using default values (2 and 5 minutes), check all databases
1391 except those starting with 'template'.
1392
1393 check_postgres_query_time --port=5432 --exclude=~^template
1394
1395 Example 3: Warn if user 'don' has a query running over 20 seconds
1396
1397 check_postgres_query_time --port=5432 --includeuser=don --warning=20s
1398
1399 For MRTG output, returns the length in seconds of the longest running
1400 query on the first line. The fourth line gives the name of the
1401 database.
1402
1403 replicate_row
1404 ("symlink: check_postgres_replicate_row") Checks that master-slave
1405 replication is working to one or more slaves.
1406
1407 The first "--dbname", "--host", and "--port", etc. options are
1408 considered the master; subsequent uses are the slaves. The values or
1409 the --warning and --critical options are units of time, and at least
1410 one must be provided (no defaults). Valid units are 'seconds',
1411 'minutes', 'hours', or 'days'. Each may be written singular or
1412 abbreviated to just the first letter. If no units are given, the units
1413 are assumed to be seconds.
1414
1415 This check updates a single row on the master, and then measures how
1416 long it takes to be applied to the slaves. To do this, you need to pick
1417 a table that is being replicated, then find a row that can be changed,
1418 and is not going to be changed by any other process. A specific column
1419 of this row will be changed from one value to another. All of this is
1420 fed to the "repinfo" option, and should contain the following options,
1421 separated by commas: table name, primary key, key id, column, first
1422 value, second value.
1423
1424 Example 1: Slony is replicating a table named 'orders' from host
1425 'alpha' to host 'beta', in the database 'sales'. The primary key of the
1426 table is named id, and we are going to test the row with an id of 3
1427 (which is historical and never changed). There is a column named
1428 'salesrep' that we are going to toggle from a value of 'slon' to 'nols'
1429 to check on the replication. We want to throw a warning if the
1430 replication does not happen within 10 seconds.
1431
1432 check_postgres_replicate_row --host=alpha --dbname=sales --host=beta
1433 --dbname=sales --warning=10 --repinfo=orders,id,3,salesrep,slon,nols
1434
1435 Example 2: Bucardo is replicating a table named 'receipt' from host
1436 'green' to hosts 'red', 'blue', and 'yellow'. The database for both
1437 sides is 'public'. The slave databases are running on port 5455. The
1438 primary key is named 'receipt_id', the row we want to use has a value
1439 of 9, and the column we want to change for the test is called 'zone'.
1440 We'll toggle between 'north' and 'south' for the value of this column,
1441 and throw a critical if the change is not on all three slaves within 5
1442 seconds.
1443
1444 check_postgres_replicate_row --host=green --port=5455 --host=red,blue,yellow
1445 --critical=5 --repinfo=receipt,receipt_id,9,zone,north,south
1446
1447 For MRTG output, returns on the first line the time in seconds the
1448 replication takes to finish. The maximum time is set to 4 minutes 30
1449 seconds: if no replication has taken place in that long a time, an
1450 error is thrown.
1451
1452 replication_slots
1453 ("symlink: check_postgres_replication_slots") Check the quantity of
1454 WAL retained for any replication slots in the target database cluster.
1455 This is handy for monitoring environments where all WAL archiving and
1456 replication is taking place over replication slots.
1457
1458 Warning and critical are total bytes retained for the slot. E.g:
1459
1460 check_postgres_replication_slots --port=5432 --host=yellow -warning=32M -critical=64M
1461
1462 Specific named slots can be monitored using --include/--exclude
1463
1464 same_schema
1465 ("symlink: check_postgres_same_schema") Verifies that two or more
1466 databases are identical as far as their schema (but not the data
1467 within). This is particularly handy for making sure your slaves have
1468 not been modified or corrupted in any way when using master to slave
1469 replication. Unlike most other actions, this has no warning or critical
1470 criteria - the databases are either in sync, or are not. If they are
1471 different, a detailed list of the differences is presented.
1472
1473 You may want to exclude or filter out certain differences. The way to
1474 do this is to add strings to the "--filter" option. To exclude a type
1475 of object, use "noname", where 'name' is the type of object, for
1476 example, "noschema". To exclude objects of a certain type by a regular
1477 expression against their name, use "noname=regex". See the examples
1478 below for a better understanding.
1479
1480 The types of objects that can be filtered include:
1481
1482 user
1483 schema
1484 table
1485 view
1486 index
1487 sequence
1488 constraint
1489 trigger
1490 function
1491
1492 The filter option "noposition" prevents verification of the position
1493 of columns within a table.
1494
1495 The filter option "nofuncbody" prevents comparison of the bodies of all
1496 functions.
1497
1498 The filter option "noperm" prevents comparison of object permissions.
1499
1500 To provide the second database, just append the differences to the
1501 first one by a call to the appropriate connection argument. For
1502 example, to compare databases on hosts alpha and bravo, use
1503 "--dbhost=alpha,bravo". Also see the examples below.
1504
1505 If only a single host is given, it is assumed we are doing a "time-
1506 based" report. The first time this is run a snapshot of all the items
1507 in the database is saved to a local file. When you run it again, that
1508 snapshot is read in and becomes "database #2" and is compared to the
1509 current database.
1510
1511 To replace the old stored file with the new version, use the --replace
1512 argument.
1513
1514 If you need to write the stored file to a specific directory, use the
1515 --audit-file-dir argument.
1516
1517 To avoid false positives on value based checks caused by replication
1518 lag on asynchronous replicas, use the --assume-async option.
1519
1520 To enable snapshots at various points in time, you can use the
1521 "--suffix" argument to make the filenames unique to each run. See the
1522 examples below.
1523
1524 Example 1: Verify that two databases on hosts star and line are the
1525 same:
1526
1527 check_postgres_same_schema --dbhost=star,line
1528
1529 Example 2: Same as before, but exclude any triggers with "slony" in
1530 their name
1531
1532 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony"
1533
1534 Example 3: Same as before, but also exclude all indexes
1535
1536 check_postgres_same_schema --dbhost=star,line --filter="notrigger=slony noindexes"
1537
1538 Example 4: Check differences for the database "battlestar" on different
1539 ports
1540
1541 check_postgres_same_schema --dbname=battlestar --dbport=5432,5544
1542
1543 Example 5: Create a daily and weekly snapshot file
1544
1545 check_postgres_same_schema --dbname=cylon --suffix=daily
1546 check_postgres_same_schema --dbname=cylon --suffix=weekly
1547
1548 Example 6: Run a historical comparison, then replace the file
1549
1550 check_postgres_same_schema --dbname=cylon --suffix=daily --replace
1551
1552 Example 7: Verify that two databases on hosts star and line are the
1553 same, excluding value data (i.e. sequence last_val):
1554
1555 check_postgres_same_schema --dbhost=star,line --assume-async
1556
1557 sequence
1558 ("symlink: check_postgres_sequence") Checks how much room is left on
1559 all sequences in the database. This is measured as the percent of
1560 total possible values that have been used for each sequence. The
1561 --warning and --critical options should be expressed as percentages.
1562 The default values are 85% for the warning and 95% for the critical.
1563 You may use --include and --exclude to control which sequences are to
1564 be checked. Note that this check does account for unusual minvalue and
1565 increment by values. By default it does not care if the sequence is set
1566 to cycle or not, and by passing --skipcycled sequenced set to cycle are
1567 reported with 0% usage.
1568
1569 The output for Nagios gives the name of the sequence, the percentage
1570 used, and the number of 'calls' left, indicating how many more times
1571 nextval can be called on that sequence before running into the maximum
1572 value.
1573
1574 The output for MRTG returns the highest percentage across all sequences
1575 on the first line, and the name of each sequence with that percentage
1576 on the fourth line, separated by a "|" (pipe) if there are more than
1577 one sequence at that percentage.
1578
1579 Example 1: Give a warning if any sequences are approaching 95% full.
1580
1581 check_postgres_sequence --dbport=5432 --warning=95%
1582
1583 Example 2: Check that the sequence named "orders_id_seq" is not more
1584 than half full.
1585
1586 check_postgres_sequence --dbport=5432 --critical=50% --include=orders_id_seq
1587
1588 settings_checksum
1589 ("symlink: check_postgres_settings_checksum") Checks that all the
1590 Postgres settings are the same as last time you checked. This is done
1591 by generating a checksum of a sorted list of setting names and their
1592 values. Note that different users in the same database may have
1593 different checksums, due to ALTER USER usage, and due to the fact that
1594 superusers see more settings than ordinary users. Either the --warning
1595 or the --critical option should be given, but not both. The value of
1596 each one is the checksum, a 32-character hexadecimal value. You can run
1597 with the special "--critical=0" option to find out an existing
1598 checksum.
1599
1600 This action requires the Digest::MD5 module.
1601
1602 Example 1: Find the initial checksum for the database on port 5555
1603 using the default user (usually postgres)
1604
1605 check_postgres_settings_checksum --port=5555 --critical=0
1606
1607 Example 2: Make sure no settings have changed and warn if so, using the
1608 checksum from above.
1609
1610 check_postgres_settings_checksum --port=5555 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231
1611
1612 For MRTG output, returns a 1 or 0 indicating success of failure of the
1613 checksum to match. A checksum must be provided as the "--mrtg"
1614 argument. The fourth line always gives the current checksum.
1615
1616 slony_status
1617 ("symlink: check_postgres_slony_status") Checks in the status of a
1618 Slony cluster by looking at the results of Slony's sl_status view. This
1619 is returned as the number of seconds of "lag time". The --warning and
1620 --critical options should be expressed as times. The default values are
1621 60 seconds for the warning and 300 seconds for the critical.
1622
1623 The optional argument --schema indicated the schema that Slony is
1624 installed under. If it is not given, the schema will be determined
1625 automatically each time this check is run.
1626
1627 Example 1: Give a warning if any Slony is lagged by more than 20
1628 seconds
1629
1630 check_postgres_slony_status --warning 20
1631
1632 Example 2: Give a critical if Slony, installed under the schema
1633 "_slony", is over 10 minutes lagged
1634
1635 check_postgres_slony_status --schema=_slony --critical=600
1636
1637 timesync
1638 ("symlink: check_postgres_timesync") Compares the local system time
1639 with the time reported by one or more databases. The --warning and
1640 --critical options represent the number of seconds between the two
1641 systems before an alert is given. If neither is specified, the default
1642 values are used, which are '2' and '5'. The warning value cannot be
1643 greater than the critical value. Due to the non-exact nature of this
1644 test, values of '0' or '1' are not recommended.
1645
1646 The string returned shows the time difference as well as the time on
1647 each side written out.
1648
1649 Example 1: Check that databases on hosts ankh, morpork, and klatch are
1650 no more than 3 seconds off from the local time:
1651
1652 check_postgres_timesync --host=ankh,morpork,klatch --critical=3
1653
1654 For MRTG output, returns one the first line the number of seconds
1655 difference between the local time and the database time. The fourth
1656 line returns the name of the database.
1657
1658 txn_idle
1659 ("symlink: check_postgres_txn_idle") Checks the number and duration of
1660 "idle in transaction" queries on one or more databases. There is no
1661 need to run this more than once on the same database cluster. Databases
1662 can be filtered by using the --include and --exclude options. See the
1663 "BASIC FILTERING" section below for more details.
1664
1665 The --warning and --critical options are given as units of time, signed
1666 integers, or integers for units of time, and at least one must be
1667 provided (there are no defaults). Valid units are 'seconds', 'minutes',
1668 'hours', or 'days'. Each may be written singular or abbreviated to just
1669 the first letter. If no units are given and the numbers are unsigned,
1670 the units are assumed to be seconds.
1671
1672 This action requires Postgres 8.3 or better.
1673
1674 As of PostgreSQL 10, you can just GRANT pg_read_all_stats to an
1675 unprivileged user account. In all earlier versions, superuser
1676 privileges are required to see the queries of all users in the system;
1677 UNKNOWN is returned if queries cannot be checked. To only include
1678 queries by the connecting user, use --includeuser.
1679
1680 Example 1: Give a warning if any connection has been idle in
1681 transaction for more than 15 seconds:
1682
1683 check_postgres_txn_idle --port=5432 --warning='15 seconds'
1684
1685 Example 2: Give a warning if there are 50 or more transactions
1686
1687 check_postgres_txn_idle --port=5432 --warning='+50'
1688
1689 Example 3: Give a critical if 5 or more connections have been idle in
1690 transaction for more than 10 seconds:
1691
1692 check_postgres_txn_idle --port=5432 --critical='5 for 10 seconds'
1693
1694 For MRTG output, returns the time in seconds the longest idle
1695 transaction has been running. The fourth line returns the name of the
1696 database and other information about the longest transaction.
1697
1698 txn_time
1699 ("symlink: check_postgres_txn_time") Checks the length of open
1700 transactions on one or more databases. There is no need to run this
1701 command more than once per database cluster. Databases can be filtered
1702 by use of the --include and --exclude options. See the "BASIC
1703 FILTERING" section for more details. The owner of the transaction can
1704 also be filtered, by use of the --includeuser and --excludeuser
1705 options. See the "USER NAME FILTERING" section for more details.
1706
1707 The values or the --warning and --critical options are units of time,
1708 and at least one must be provided (no default). Valid units are
1709 'seconds', 'minutes', 'hours', or 'days'. Each may be written singular
1710 or abbreviated to just the first letter. If no units are given, the
1711 units are assumed to be seconds.
1712
1713 This action requires Postgres 8.3 or better.
1714
1715 Example 1: Give a critical if any transaction has been open for more
1716 than 10 minutes:
1717
1718 check_postgres_txn_time --port=5432 --critical='10 minutes'
1719
1720 Example 1: Warn if user 'warehouse' has a transaction open over 30
1721 seconds
1722
1723 check_postgres_txn_time --port-5432 --warning=30s --includeuser=warehouse
1724
1725 For MRTG output, returns the maximum time in seconds a transaction has
1726 been open on the first line. The fourth line gives the name of the
1727 database.
1728
1729 txn_wraparound
1730 ("symlink: check_postgres_txn_wraparound") Checks how close to
1731 transaction wraparound one or more databases are getting. The
1732 --warning and --critical options indicate the number of transactions
1733 done, and must be a positive integer. If either option is not given,
1734 the default values of 1.3 and 1.4 billion are used. There is no need to
1735 run this command more than once per database cluster. For a more
1736 detailed discussion of what this number represents and what to do about
1737 it, please visit the page
1738 <https://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND>
1739
1740 The warning and critical values can have underscores in the number for
1741 legibility, as Perl does.
1742
1743 Example 1: Check the default values for the localhost database
1744
1745 check_postgres_txn_wraparound --host=localhost
1746
1747 Example 2: Check port 6000 and give a critical when 1.7 billion
1748 transactions are hit:
1749
1750 check_postgres_txn_wraparound --port=6000 --critical=1_700_000_000
1751
1752 For MRTG output, returns the highest number of transactions for all
1753 databases on line one, while line 4 indicates which database it is.
1754
1755 version
1756 ("symlink: check_postgres_version") Checks that the required version of
1757 Postgres is running. The --warning and --critical options (only one is
1758 required) must be of the format X.Y or X.Y.Z where X is the major
1759 version number, Y is the minor version number, and Z is the revision.
1760
1761 Example 1: Give a warning if the database on port 5678 is not version
1762 8.4.10:
1763
1764 check_postgres_version --port=5678 -w=8.4.10
1765
1766 Example 2: Give a warning if any databases on hosts valley,grain, or
1767 sunshine is not 8.3:
1768
1769 check_postgres_version -H valley,grain,sunshine --critical=8.3
1770
1771 For MRTG output, reports a 1 or a 0 indicating success or failure on
1772 the first line. The fourth line indicates the current version. The
1773 version must be provided via the "--mrtg" option.
1774
1775 wal_files
1776 ("symlink: check_postgres_wal_files") Checks how many WAL files exist
1777 in the pg_xlog directory (PostgreSQL 10 and later" pg_wal), which is
1778 found off of your data_directory, sometimes as a symlink to another
1779 physical disk for performance reasons. If the --lsfunc option is not
1780 used then this action must be run as a superuser, in order to access
1781 the contents of the pg_xlog directory. The minimum version to use this
1782 action is Postgres 8.1. The --warning and --critical options are simply
1783 the number of files in the pg_xlog directory. What number to set this
1784 to will vary, but a general guideline is to put a number slightly
1785 higher than what is normally there, to catch problems early.
1786
1787 Normally, WAL files are closed and then re-used, but a long-running
1788 open transaction, or a faulty archive_command script, may cause
1789 Postgres to create too many files. Ultimately, this will cause the disk
1790 they are on to run out of space, at which point Postgres will shut
1791 down.
1792
1793 To avoid connecting as a database superuser, a wrapper function around
1794 "pg_ls_dir()" should be defined as a superuser with SECURITY DEFINER,
1795 and the --lsfunc option used. This example function, if defined by a
1796 superuser, will allow the script to connect as a normal user nagios
1797 with --lsfunc=ls_xlog_dir
1798
1799 BEGIN;
1800 CREATE FUNCTION ls_xlog_dir()
1801 RETURNS SETOF TEXT
1802 AS $$ SELECT pg_ls_dir('pg_xlog') $$
1803 LANGUAGE SQL
1804 SECURITY DEFINER;
1805 REVOKE ALL ON FUNCTION ls_xlog_dir() FROM PUBLIC;
1806 GRANT EXECUTE ON FUNCTION ls_xlog_dir() to nagios;
1807 COMMIT;
1808
1809 Example 1: Check that the number of ready WAL files is 10 or less on
1810 host "pluto", using a wrapper function "ls_xlog_dir" to avoid the need
1811 for superuser permissions
1812
1813 check_postgres_archive_ready --host=pluto --critical=10 --lsfunc=ls_xlog_dir
1814
1815 For MRTG output, reports the number of WAL files on line 1.
1816
1817 rebuild_symlinks
1818 rebuild_symlinks_force
1819 This action requires no other arguments, and does not connect to any
1820 databases, but simply creates symlinks in the current directory for
1821 each action, in the form check_postgres_<action_name>. If the file
1822 already exists, it will not be overwritten. If the action is
1823 rebuild_symlinks_force, then symlinks will be overwritten. The option
1824 --symlinks is a shorter way of saying --action=rebuild_symlinks
1825
1827 The options --include and --exclude can be combined to limit which
1828 things are checked, depending on the action. The name of the database
1829 can be filtered when using the following actions: backends,
1830 database_size, locks, query_time, txn_idle, and txn_time. The name of
1831 a relation can be filtered when using the following actions: bloat,
1832 index_size, table_size, relation_size, last_vacuum, last_autovacuum,
1833 last_analyze, and last_autoanalyze. The name of a setting can be
1834 filtered when using the settings_checksum action. The name of a file
1835 system can be filtered when using the disk_space action.
1836
1837 If only an include option is given, then ONLY those entries that match
1838 will be checked. However, if given both exclude and include, the
1839 exclusion is done first, and the inclusion after, to reinstate things
1840 that may have been excluded. Both --include and --exclude can be given
1841 multiple times, and/or as comma-separated lists. A leading tilde will
1842 match the following word as a regular expression.
1843
1844 To match a schema, end the search term with a single period. Leading
1845 tildes can be used for schemas as well.
1846
1847 Be careful when using filtering: an inclusion rule on the backends, for
1848 example, may report no problems not only because the matching database
1849 had no backends, but because you misspelled the name of the database!
1850
1851 Examples:
1852
1853 Only checks items named pg_class:
1854
1855 --include=pg_class
1856
1857 Only checks items containing the letters 'pg_':
1858
1859 --include=~pg_
1860
1861 Only check items beginning with 'pg_':
1862
1863 --include=~^pg_
1864
1865 Exclude the item named 'test':
1866
1867 --exclude=test
1868
1869 Exclude all items containing the letters 'test:
1870
1871 --exclude=~test
1872
1873 Exclude all items in the schema 'pg_catalog':
1874
1875 --exclude='pg_catalog.'
1876
1877 Exclude all items containing the letters 'ace', but allow the item
1878 'faceoff':
1879
1880 --exclude=~ace --include=faceoff
1881
1882 Exclude all items which start with the letters 'pg_', which contain the
1883 letters 'slon', or which are named 'sql_settings' or 'green'.
1884 Specifically check items with the letters 'prod' in their names, and
1885 always check the item named 'pg_relname':
1886
1887 --exclude=~^pg_,~slon,sql_settings --exclude=green --include=~prod,pg_relname
1888
1890 The options --includeuser and --excludeuser can be used on some actions
1891 to only examine database objects owned by (or not owned by) one or more
1892 users. An --includeuser option always trumps an --excludeuser option.
1893 You can give each option more than once for multiple users, or you can
1894 give a comma-separated list. The actions that currently use these
1895 options are:
1896
1897 database_size
1898 last_analyze
1899 last_autoanalyze
1900 last_vacuum
1901 last_autovacuum
1902 query_time
1903 relation_size
1904 txn_time
1905
1906 Examples:
1907
1908 Only check items owned by the user named greg:
1909
1910 --includeuser=greg
1911
1912 Only check items owned by either watson or crick:
1913
1914 --includeuser=watson,crick
1915
1916 Only check items owned by crick,franklin, watson, or wilkins:
1917
1918 --includeuser=watson --includeuser=franklin --includeuser=crick,wilkins
1919
1920 Check all items except for those belonging to the user scott:
1921
1922 --excludeuser=scott
1923
1925 To help in setting things up, this program can be run in a "test mode"
1926 by specifying the --test option. This will perform some basic tests to
1927 make sure that the databases can be contacted, and that certain per-
1928 action prerequisites are met, such as whether the user is a superuser,
1929 if the version of Postgres is new enough, and if stats_row_level is
1930 enabled.
1931
1933 In addition to command-line configurations, you can put any options
1934 inside of a file. The file .check_postgresrc in the current directory
1935 will be used if found. If not found, then the file ~/.check_postgresrc
1936 will be used. Finally, the file /etc/check_postgresrc will be used if
1937 available. The format of the file is option = value, one per line. Any
1938 line starting with a '#' will be skipped. Any values loaded from a
1939 check_postgresrc file will be overwritten by command-line options. All
1940 check_postgresrc files can be ignored by supplying a
1941 "--no-checkpostgresrc" argument.
1942
1944 The environment variable $ENV{HOME} is used to look for a
1945 .check_postgresrc file. The environment variable $ENV{PGBINDIR} is
1946 used to look for PostgreSQL binaries.
1947
1949 Since this program uses the psql program, make sure it is accessible to
1950 the user running the script. If run as a cronjob, this often means
1951 modifying the PATH environment variable.
1952
1953 If you are using Nagios in embedded Perl mode, use the "--action"
1954 argument instead of symlinks, so that the plugin only gets compiled one
1955 time.
1956
1958 Access to a working version of psql, and the following very standard
1959 Perl modules:
1960
1961 Cwd
1962 Getopt::Long
1963 File::Basename
1964 File::Temp
1965 Time::HiRes (if $opt{showtime} is set to true, which is the default)
1966
1967 The "settings_checksum" action requires the Digest::MD5 module.
1968
1969 The "checkpoint" action requires the Date::Parse module.
1970
1971 Some actions require access to external programs. If psql is not
1972 explicitly specified, the command "which" is used to find it. The
1973 program "/bin/df" is needed by the "disk_space" action.
1974
1976 Development happens using the git system. You can clone the latest
1977 version by doing:
1978
1979 https://github.com/bucardo/check_postgres
1980 git clone https://github.com/bucardo/check_postgres.git
1981
1983 Three mailing lists are available. For discussions about the program,
1984 bug reports, feature requests, and commit notices, send email to
1985 check_postgres@bucardo.org
1986
1987 https://mail.endcrypt.com/mailman/listinfo/check_postgres
1988
1989 A low-volume list for announcement of new versions and important
1990 notices is the 'check_postgres-announce' list:
1991
1992 https://mail.endcrypt.com/mailman/listinfo/check_postgres-announce
1993
1994 Source code changes (via git-commit) are sent to the
1995 'check_postgres-commit' list:
1996
1997 https://mail.endcrypt.com/mailman/listinfo/check_postgres-commit
1998
2000 Items not specifically attributed are by GSM (Greg Sabino Mullane).
2001
2002 Version 2.24.0 Released May 30, 2018
2003 Support new_version_pg for PG10
2004 (Michael Pirogov)
2005
2006 Option to skip CYCLE sequences in action sequence
2007 (Christoph Moench-Tegeder)
2008
2009 Output per-database perfdata for pgbouncer pool checks
2010 (George Hansper)
2011
2012 German message translations
2013 (Holger Jacobs)
2014
2015 Consider only client backends in query_time and friends
2016 (David Christensen)
2017
2018 Version 2.23.0 Released October 31, 2017
2019 Support PostgreSQL 10.
2020 (David Christensen, Christoph Berg)
2021
2022 Change table_size to use pg_table_size() on 9.0+, i.e. include the TOAST
2023 table size in the numbers reported. Add new actions indexes_size and
2024 total_relation_size, using the respective pg_indexes_size() and
2025 pg_total_relation_size() functions. All size checks will now also check
2026 materialized views where applicable.
2027 (Christoph Berg)
2028
2029 Connection errors are now always critical, not unknown.
2030 (Christoph Berg)
2031
2032 New action replication_slots checking if logical or physical replication
2033 slots have accumulated too much data
2034 (Glyn Astill)
2035
2036 Multiple same_schema improvements
2037 (Glyn Astill)
2038
2039 Add Spanish message translations
2040 (Luis Vazquez)
2041
2042 Allow a wrapper function to run wal_files and archive_ready actions as
2043 non-superuser
2044 (Joshua Elsasser)
2045
2046 Add some defensive casting to the bloat query
2047 (Greg Sabino Mullane)
2048
2049 Invoke psql with option -X
2050 (Peter Eisentraut)
2051
2052 Update postgresql.org URLs to use https.
2053 (Magnus Hagander)
2054
2055 check_txn_idle: Don't fail when query contains 'disabled' word
2056 (Marco Nenciarini)
2057
2058 check_txn_idle: Use state_change instead of query_start.
2059 (Sebastian Webber)
2060
2061 check_hot_standby_delay: Correct extra space in perfdata
2062 (Adrien Nayrat)
2063
2064 Remove \r from psql output as it can confuse some regexes
2065 (Greg Sabino Mullane)
2066
2067 Sort failed jobs in check_pgagent_jobs for stable output.
2068 (Christoph Berg)
2069
2070 Version 2.22.0 June 30, 2015
2071 Add xact timestamp support to hot_standby_delay.
2072 Allow the hot_standby_delay check to accept xlog byte position or
2073 timestamp lag intervals as thresholds, or even both at the same time.
2074 (Josh Williams)
2075
2076 Query all sequences per DB in parallel for action=sequence.
2077 (Christoph Berg)
2078
2079 Fix bloat check to use correct SQL depending on the server version.
2080 (Adrian Vondendriesch)
2081
2082 Show actual long-running query in query_time output
2083 (Peter Eisentraut)
2084
2085 Add explicit ORDER BY to the slony_status check to get the most lagged server.
2086 (Jeff Frost)
2087
2088 Improved multi-slave support in replicate_row.
2089 (Andrew Yochum)
2090
2091 Change the way tables are quoted in replicate_row.
2092 (Glyn Astill)
2093
2094 Don't swallow space before the -c flag when reporting errors
2095 (Jeff Janes)
2096
2097 Fix and extend hot_standby_delay documentation
2098 (Michael Renner)
2099
2100 Declare POD encoding to be utf8.
2101 (Christoph Berg)
2102
2103 Version 2.21.0 September 24, 2013
2104 Fix issue with SQL steps in check_pgagent_jobs for sql steps which perform deletes
2105 (Rob Emery via github pull)
2106
2107 Install man page in section 1.
2108 (Peter Eisentraut, bug 53, github issue 26)
2109
2110 Order lock types in check_locks output to make the ordering predictable;
2111 setting SKIP_NETWORK_TESTS will skip the new_version tests; other minor test
2112 suite fixes.
2113 (Christoph Berg)
2114
2115 Fix same_schema check on 9.3 by ignoring relminmxid differences in pg_class
2116 (Christoph Berg)
2117
2118 Version 2.20.1 June 24, 2013
2119 Make connection check failures return CRITICAL not UNKNOWN
2120 (Dominic Hargreaves)
2121
2122 Fix --reverse option when using string comparisons in custom queries
2123 (Nathaniel Waisbrot)
2124
2125 Compute correct 'totalwastedbytes' in the bloat query
2126 (Michael Renner)
2127
2128 Do not use pg_stats "inherited" column in bloat query, if the
2129 database is 8.4 or older. (Greg Sabino Mullane, per bug 121)
2130
2131 Remove host reordering in hot_standby_delay check
2132 (Josh Williams, with help from Jacobo Blasco)
2133
2134 Better output for the "simple" flag
2135 (Greg Sabino Mullane)
2136
2137 Force same_schema to ignore the 'relallvisible' column
2138 (Greg Sabino Mullane)
2139
2140 Version 2.20.0 March 13, 2013
2141 Add check for pgagent jobs (David E. Wheeler)
2142
2143 Force STDOUT to use utf8 for proper output
2144 (Greg Sabino Mullane; reported by Emmanuel Lesouef)
2145
2146 Fixes for Postgres 9.2: new pg_stat_activity view,
2147 and use pg_tablespace_location, (Josh Williams)
2148
2149 Allow for spaces in item lists when doing same_schema.
2150
2151 Allow txn_idle to work again for < 8.3 servers by switching to query_time.
2152
2153 Fix the check_bloat SQL to take inherited tables into account,
2154 and assume 2k for non-analyzed columns. (Geert Pante)
2155
2156 Cache sequence information to speed up same_schema runs.
2157
2158 Fix --excludeuser in check_txn_idle (Mika Eloranta)
2159
2160 Fix user clause handling in check_txn_idle (Michael van Bracht)
2161
2162 Adjust docs to show colon as a better separator inside args for locks
2163 (Charles Sprickman)
2164
2165 Fix undefined $SQL2 error in check_txn_idle [github issue 16] (Patric Bechtel)
2166
2167 Prevent "uninitialized value" warnings when showing the port (Henrik Ahlgren)
2168
2169 Do not assume everyone has a HOME [github issue 23]
2170
2171 Version 2.19.0 January 17, 2012
2172 Add the --assume-prod option (Cédric Villemain)
2173
2174 Add the cluster_id check (Cédric Villemain)
2175
2176 Improve settings_checksum and checkpoint tests (Cédric Villemain)
2177
2178 Do not do an inner join to pg_user when checking database size
2179 (Greg Sabino Mullane; reported by Emmanuel Lesouef)
2180
2181 Use the full path when getting sequence information for same_schema.
2182 (Greg Sabino Mullane; reported by Cindy Wise)
2183
2184 Fix the formula for calculating xlog positions (Euler Taveira de Oliveira)
2185
2186 Better ordering of output for bloat check - make indexes as important
2187 as tables (Greg Sabino Mullane; reported by Jens Wilke)
2188
2189 Show the dbservice if it was used at top of same_schema output
2190 (Mike Blackwell)
2191
2192 Better installation paths (Greg Sabino Mullane, per bug 53)
2193
2194 Version 2.18.0 October 2, 2011
2195 Redo the same_schema action. Use new --filter argument for all filtering.
2196 Allow comparisons between any number of databases.
2197 Remove the dbname2, dbport2, etc. arguments.
2198 Allow comparison of the same db over time.
2199
2200 Swap db1 and db2 if the slave is 1 for the hot standby check (David E. Wheeler)
2201
2202 Allow multiple --schema arguments for the slony_status action (GSM and Jehan-Guillaume de Rorthais)
2203
2204 Fix ORDER BY in the last vacuum/analyze action (Nicolas Thauvin)
2205
2206 Fix check_hot_standby_delay perfdata output (Nicolas Thauvin)
2207
2208 Look in the correct place for the .ready files with the archive_ready action (Nicolas Thauvin)
2209
2210 New action: commitratio (Guillaume Lelarge)
2211
2212 New action: hitratio (Guillaume Lelarge)
2213
2214 Make sure --action overrides the symlink naming trick.
2215
2216 Set defaults for archive_ready and wal_files (Thomas Guettler, GSM)
2217
2218 Better output for wal_files and archive_ready (GSM)
2219
2220 Fix warning when client_port set to empty string (bug #79)
2221
2222 Account for "empty row" in -x output (i.e. source of functions).
2223
2224 Fix some incorrectly named data fields (Andy Lester)
2225
2226 Expand the number of pgbouncer actions (Ruslan Kabalin)
2227
2228 Give detailed information and refactor txn_idle, txn_time, and query_time
2229 (Per request from bug #61)
2230
2231 Set maxalign to 8 in the bloat check if box identified as '64-bit'
2232 (Michel Sijmons, bug #66)
2233
2234 Support non-standard version strings in the bloat check.
2235 (Michel Sijmons and Gurjeet Singh, bug #66)
2236
2237 Do not show excluded databases in some output (Ruslan Kabalin)
2238
2239 Allow "and", "or" inside arguments (David E. Wheeler)
2240
2241 Add the "new_version_box" action.
2242
2243 Fix psql version regex (Peter Eisentraut, bug #69)
2244
2245 Add the --assume-standby-mode option (Ruslan Kabalin)
2246
2247 Note that txn_idle and query_time require 8.3 (Thomas Guettler)
2248
2249 Standardize and clean up all perfdata output (bug #52)
2250
2251 Exclude "idle in transaction" from the query_time check (bug #43)
2252
2253 Fix the perflimit for the bloat action (bug #50)
2254
2255 Clean up the custom_query action a bit.
2256
2257 Fix space in perfdata for hot_standby_delay action (Nicolas Thauvin)
2258
2259 Handle undef percents in check_fsm_relations (Andy Lester)
2260
2261 Fix typo in dbstats action (Stas Vitkovsky)
2262
2263 Fix MRTG for last vacuum and last_analyze actions.
2264
2265 Version 2.17.0 no public release
2266 Version 2.16.0 January 20, 2011
2267 Add new action 'hot_standby_delay' (Nicolas Thauvin)
2268 Add cache-busting for the version-grabbing utilities.
2269 Fix problem with going to next method for new_version_pg
2270 (Greg Sabino Mullane, reported by Hywel Mallett in bug #65)
2271 Allow /usr/local/etc as an alternative location for the
2272 check_postgresrc file (Hywel Mallett)
2273 Do not use tgisconstraint in same_schema if Postgres >= 9
2274 (Guillaume Lelarge)
2275
2276 Version 2.15.4 January 3, 2011
2277 Fix warning when using symlinks
2278 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #63)
2279
2280 Version 2.15.3 December 30, 2010
2281 Show OK for no matching txn_idle entries.
2282
2283 Version 2.15.2 December 28, 2010
2284 Better formatting of sizes in the bloat action output.
2285
2286 Remove duplicate perfs in bloat action output.
2287
2288 Version 2.15.1 December 27, 2010
2289 Fix problem when examining items in pg_settings (Greg Sabino Mullane)
2290
2291 For connection test, return critical, not unknown, on FATAL errors
2292 (Greg Sabino Mullane, reported by Peter Eisentraut in bug #62)
2293
2294 Version 2.15.0 November 8, 2010
2295 Add --quiet argument to suppress output on OK Nagios results
2296 Add index comparison for same_schema (Norman Yamada and Greg Sabino Mullane)
2297 Use $ENV{PGSERVICE} instead of "service=" to prevent problems (Guillaume Lelarge)
2298 Add --man option to show the entire manual. (Andy Lester)
2299 Redo the internal run_command() sub to use -x and hashes instead of regexes.
2300 Fix error in custom logic (Andreas Mager)
2301 Add the "pgbouncer_checksum" action (Guillaume Lelarge)
2302 Fix regex to work on WIN32 for check_fsm_relations and check_fsm_pages (Luke Koops)
2303 Don't apply a LIMIT when using --exclude on the bloat action (Marti Raudsepp)
2304 Change the output of query_time to show pid,user,port, and address (Giles Westwood)
2305 Fix to show database properly when using slony_status (Guillaume Lelarge)
2306 Allow warning items for same_schema to be comma-separated (Guillaume Lelarge)
2307 Constraint definitions across Postgres versions match better in same_schema.
2308 Work against "EnterpriseDB" databases (Sivakumar Krishnamurthy and Greg Sabino Mullane)
2309 Separate perfdata with spaces (Jehan-Guillaume (ioguix) de Rorthais)
2310 Add new action "archive_ready" (Jehan-Guillaume (ioguix) de Rorthais)
2311
2312 Version 2.14.3 (March 1, 2010)
2313 Allow slony_status action to handle more than one slave.
2314 Use commas to separate function args in same_schema output (Robert Treat)
2315
2316 Version 2.14.2 (February 18, 2010)
2317 Change autovac_freeze default warn/critical back to 90%/95% (Robert Treat)
2318 Put all items one-per-line for relation size actions if --verbose=1
2319
2320 Version 2.14.1 (February 17, 2010)
2321 Don't use $^T in logfile check, as script may be long-running
2322 Change the error string for the logfile action for easier exclusion
2323 by programs like tail_n_mail
2324
2325 Version 2.14.0 (February 11, 2010)
2326 Added the 'slony_status' action.
2327 Changed the logfile sleep from 0.5 to 1, as 0.5 gets rounded to 0 on some boxes!
2328
2329 Version 2.13.2 (February 4, 2010)
2330 Allow timeout option to be used for logtime 'sleep' time.
2331
2332 Version 2.13.2 (February 4, 2010)
2333 Show offending database for query_time action.
2334 Apply perflimit to main output for sequence action.
2335 Add 'noowner' option to same_schema action.
2336 Raise sleep timeout for logfile check to 15 seconds.
2337
2338 Version 2.13.1 (February 2, 2010)
2339 Fix bug preventing column constraint differences from 2 > 1 for same_schema from being shown.
2340 Allow aliases 'dbname1', 'dbhost1', 'dbport1',etc.
2341 Added "nolanguage" as a filter for the same_schema option.
2342 Don't track "generic" table constraints (e.. $1, $2) using same_schema
2343
2344 Version 2.13.0 (January 29, 2010)
2345 Allow "nofunctions" as a filter for the same_schema option.
2346 Added "noperm" as a filter for the same_schema option.
2347 Ignore dropped columns when considered positions for same_schema (Guillaume Lelarge)
2348
2349 Version 2.12.1 (December 3, 2009)
2350 Change autovac_freeze default warn/critical from 90%/95% to 105%/120% (Marti Raudsepp)
2351
2352 Version 2.12.0 (December 3, 2009)
2353 Allow the temporary directory to be specified via the "tempdir" argument,
2354 for systems that need it (e.g. /tmp is not owned by root).
2355 Fix so old versions of Postgres (< 8.0) use the correct default database (Giles Westwood)
2356 For "same_schema" trigger mismatches, show the attached table.
2357 Add the new_version_bc check for Bucardo version checking.
2358 Add database name to perf output for last_vacuum|analyze (Guillaume Lelarge)
2359 Fix for bloat action against old versions of Postgres without the 'block_size' param.
2360
2361 Version 2.11.1 (August 27, 2009)
2362 Proper Nagios output for last_vacuum|analyze actions. (Cédric Villemain)
2363 Proper Nagios output for locks action. (Cédric Villemain)
2364 Proper Nagios output for txn_wraparound action. (Cédric Villemain)
2365 Fix for constraints with embedded newlines for same_schema.
2366 Allow --exclude for all items when using same_schema.
2367
2368 Version 2.11.0 (August 23, 2009)
2369 Add Nagios perf output to the wal_files check (Cédric Villemain)
2370 Add support for .check_postgresrc, per request from Albe Laurenz.
2371 Allow list of web fetch methods to be changed with the --get_method option.
2372 Add support for the --language argument, which overrides any ENV.
2373 Add the --no-check_postgresrc flag.
2374 Ensure check_postgresrc options are completely overridden by command-line options.
2375 Fix incorrect warning > critical logic in replicate_rows (Glyn Astill)
2376
2377 Version 2.10.0 (August 3, 2009)
2378 For same_schema, compare view definitions, and compare languages.
2379 Make script into a global executable via the Makefile.PL file.
2380 Better output when comparing two databases.
2381 Proper Nagios output syntax for autovac_freeze and backends checks (Cédric Villemain)
2382
2383 Version 2.9.5 (July 24, 2009)
2384 Don't use a LIMIT in check_bloat if --include is used. Per complaint from Jeff Frost.
2385
2386 Version 2.9.4 (July 21, 2009)
2387 More French translations (Guillaume Lelarge)
2388
2389 Version 2.9.3 (July 14, 2009)
2390 Quote dbname in perf output for the backends check. (Davide Abrigo)
2391 Add 'fetch' as an alternative method for new_version checks, as this
2392 comes by default with FreeBSD. (Hywel Mallett)
2393
2394 Version 2.9.2 (July 12, 2009)
2395 Allow dots and dashes in database name for the backends check (Davide Abrigo)
2396 Check and display the database for each match in the bloat check (Cédric Villemain)
2397 Handle 'too many connections' FATAL error in the backends check with a critical,
2398 rather than a generic error (Greg, idea by Jürgen Schulz-Brüssel)
2399 Do not allow perflimit to interfere with exclusion rules in the vacuum and
2400 analyze tests. (Greg, bug reported by Jeff Frost)
2401
2402 Version 2.9.1 (June 12, 2009)
2403 Fix for multiple databases with the check_bloat action (Mark Kirkwood)
2404 Fixes and improvements to the same_schema action (Jeff Boes)
2405 Write tests for same_schema, other minor test fixes (Jeff Boes)
2406
2407 Version 2.9.0 (May 28, 2009)
2408 Added the same_schema action (Greg)
2409
2410 Version 2.8.1 (May 15, 2009)
2411 Added timeout via statement_timeout in addition to perl alarm (Greg)
2412
2413 Version 2.8.0 (May 4, 2009)
2414 Added internationalization support (Greg)
2415 Added the 'disabled_triggers' check (Greg)
2416 Added the 'prepared_txns' check (Greg)
2417 Added the 'new_version_cp' and 'new_version_pg' checks (Greg)
2418 French translations (Guillaume Lelarge)
2419 Make the backends search return ok if no matches due to inclusion rules,
2420 per report by Guillaume Lelarge (Greg)
2421 Added comprehensive unit tests (Greg, Jeff Boes, Selena Deckelmann)
2422 Make fsm_pages and fsm_relations handle 8.4 servers smoothly. (Greg)
2423 Fix missing 'upd' field in show_dbstats (Andras Fabian)
2424 Allow ENV{PGCONTROLDATA} and ENV{PGBINDIR}. (Greg)
2425 Add various Perl module infrastructure (e.g. Makefile.PL) (Greg)
2426 Fix incorrect regex in txn_wraparound (Greg)
2427 For txn_wraparound: consistent ordering and fix duplicates in perf output (Andras Fabian)
2428 Add in missing exabyte regex check (Selena Deckelmann)
2429 Set stats to zero if we bail early due to USERWHERECLAUSE (Andras Fabian)
2430 Add additional items to dbstats output (Andras Fabian)
2431 Remove --schema option from the fsm_ checks. (Greg Mullane and Robert Treat)
2432 Handle case when ENV{PGUSER} is set. (Andy Lester)
2433 Many various fixes. (Jeff Boes)
2434 Fix --dbservice: check version and use ENV{PGSERVICE} for old versions (Cédric Villemain)
2435
2436 Version 2.7.3 (February 10, 2009)
2437 Make the sequence action check if sequence being used for a int4 column and
2438 react appropriately. (Michael Glaesemann)
2439
2440 Version 2.7.2 (February 9, 2009)
2441 Fix to prevent multiple groupings if db arguments given.
2442
2443 Version 2.7.1 (February 6, 2009)
2444 Allow the -p argument for port to work again.
2445
2446 Version 2.7.0 (February 4, 2009)
2447 Do not require a connection argument, but use defaults and ENV variables when
2448 possible: PGHOST, PGPORT, PGUSER, PGDATABASE.
2449
2450 Version 2.6.1 (February 4, 2009)
2451 Only require Date::Parse to be loaded if using the checkpoint action.
2452
2453 Version 2.6.0 (January 26, 2009)
2454 Add the 'checkpoint' action.
2455
2456 Version 2.5.4 (January 7, 2009)
2457 Better checking of $opt{dbservice} structure (Cédric Villemain)
2458 Fix time display in timesync action output (Selena Deckelmann)
2459 Fix documentation typos (Josh Tolley)
2460
2461 Version 2.5.3 (December 17, 2008)
2462 Minor fix to regex in verify_version (Lee Jensen)
2463
2464 Version 2.5.2 (December 16, 2008)
2465 Minor documentation tweak.
2466
2467 Version 2.5.1 (December 11, 2008)
2468 Add support for --noidle flag to prevent backends action from counting idle processes.
2469 Patch by Selena Deckelmann.
2470
2471 Fix small undefined warning when not using --dbservice.
2472
2473 Version 2.5.0 (December 4, 2008)
2474 Add support for the pg_Service.conf file with the --dbservice option.
2475
2476 Version 2.4.3 (November 7, 2008)
2477 Fix options for replicate_row action, per report from Jason Gordon.
2478
2479 Version 2.4.2 (November 6, 2008)
2480 Wrap File::Temp::cleanup() calls in eval, in case File::Temp is an older version.
2481 Patch by Chris Butler.
2482
2483 Version 2.4.1 (November 5, 2008)
2484 Cast numbers to numeric to support sequences ranges > bigint in check_sequence action.
2485 Thanks to Scott Marlowe for reporting this.
2486
2487 Version 2.4.0 (October 26, 2008)
2488 Add Cacti support with the dbstats action.
2489 Pretty up the time output for last vacuum and analyze actions.
2490 Show the percentage of backends on the check_backends action.
2491
2492 Version 2.3.10 (October 23, 2008)
2493 Fix minor warning in action check_bloat with multiple databases.
2494 Allow warning to be greater than critical when using the --reverse option.
2495 Support the --perflimit option for the check_sequence action.
2496
2497 Version 2.3.9 (October 23, 2008)
2498 Minor tweak to way we store the default port.
2499
2500 Version 2.3.8 (October 21, 2008)
2501 Allow the default port to be changed easily.
2502 Allow transform of simple output by MB, GB, etc.
2503
2504 Version 2.3.7 (October 14, 2008)
2505 Allow multiple databases in 'sequence' action. Reported by Christoph Zwerschke.
2506
2507 Version 2.3.6 (October 13, 2008)
2508 Add missing $schema to check_fsm_pages. (Robert Treat)
2509
2510 Version 2.3.5 (October 9, 2008)
2511 Change option 'checktype' to 'valtype' to prevent collisions with -c[ritical]
2512 Better handling of errors.
2513
2514 Version 2.3.4 (October 9, 2008)
2515 Do explicit cleanups of the temp directory, per problems reported by sb@nnx.com.
2516
2517 Version 2.3.3 (October 8, 2008)
2518 Account for cases where some rounding queries give -0 instead of 0.
2519 Thanks to Glyn Astill for helping to track this down.
2520
2521 Version 2.3.2 (October 8, 2008)
2522 Always quote identifiers in check_replicate_row action.
2523
2524 Version 2.3.1 (October 7, 2008)
2525 Give a better error if one of the databases cannot be reached.
2526
2527 Version 2.3.0 (October 4, 2008)
2528 Add the "sequence" action, thanks to Gavin M. Roy for the idea.
2529 Fix minor problem with autovac_freeze action when using MRTG output.
2530 Allow output argument to be case-insensitive.
2531 Documentation fixes.
2532
2533 Version 2.2.4 (October 3, 2008)
2534 Fix some minor typos
2535
2536 Version 2.2.3 (October 1, 2008)
2537 Expand range of allowed names for --repinfo argument (Glyn Astill)
2538 Documentation tweaks.
2539
2540 Version 2.2.2 (September 30, 2008)
2541 Fixes for minor output and scoping problems.
2542
2543 Version 2.2.1 (September 28, 2008)
2544 Add MRTG output to fsm_pages and fsm_relations.
2545 Force error messages to one-line for proper Nagios output.
2546 Check for invalid prereqs on failed command. From conversations with Euler Taveira de Oliveira.
2547 Tweak the fsm_pages formula a little.
2548
2549 Version 2.2.0 (September 25, 2008)
2550 Add fsm_pages and fsm_relations actions. (Robert Treat)
2551
2552 Version 2.1.4 (September 22, 2008)
2553 Fix for race condition in txn_time action.
2554 Add --debugoutput option.
2555
2556 Version 2.1.3 (September 22, 2008)
2557 Allow alternate arguments "dbhost" for "host" and "dbport" for "port".
2558 Output a zero as default value for second line of MRTG output.
2559
2560 Version 2.1.2 (July 28, 2008)
2561 Fix sorting error in the "disk_space" action for non-Nagios output.
2562 Allow --simple as a shortcut for --output=simple.
2563
2564 Version 2.1.1 (July 22, 2008)
2565 Don't check databases with datallowconn false for the "autovac_freeze" action.
2566
2567 Version 2.1.0 (July 18, 2008)
2568 Add the "autovac_freeze" action, thanks to Robert Treat for the idea and design.
2569 Put an ORDER BY on the "txn_wraparound" action.
2570
2571 Version 2.0.1 (July 16, 2008)
2572 Optimizations to speed up the "bloat" action quite a bit.
2573 Fix "version" action to not always output in mrtg mode.
2574
2575 Version 2.0.0 (July 15, 2008)
2576 Add support for MRTG and "simple" output options.
2577 Many small improvements to nearly all actions.
2578
2579 Version 1.9.1 (June 24, 2008)
2580 Fix an error in the bloat SQL in 1.9.0
2581 Allow percentage arguments to be over 99%
2582 Allow percentages in the bloat --warning and --critical (thanks to Robert Treat for the idea)
2583
2584 Version 1.9.0 (June 22, 2008)
2585 Don't include information_schema in certain checks. (Jeff Frost)
2586 Allow --include and --exclude to use schemas by using a trailing period.
2587
2588 Version 1.8.5 (June 22, 2008)
2589 Output schema name before table name where appropriate.
2590 Thanks to Jeff Frost.
2591
2592 Version 1.8.4 (June 19, 2008)
2593 Better detection of problems in --replicate_row.
2594
2595 Version 1.8.3 (June 18, 2008)
2596 Fix 'backends' action: there may be no rows in pg_stat_activity, so run a second
2597 query if needed to find the max_connections setting.
2598 Thanks to Jeff Frost for the bug report.
2599
2600 Version 1.8.2 (June 10, 2008)
2601 Changes to allow working under Nagios' embedded Perl mode. (Ioannis Tambouras)
2602
2603 Version 1.8.1 (June 9, 2008)
2604 Allow 'bloat' action to work on Postgres version 8.0.
2605 Allow for different commands to be run for each action depending on the server version.
2606 Give better warnings when running actions not available on older Postgres servers.
2607
2608 Version 1.8.0 (June 3, 2008)
2609 Add the --reverse option to the custom_query action.
2610
2611 Version 1.7.1 (June 2, 2008)
2612 Fix 'query_time' action: account for race condition in which zero rows appear in pg_stat_activity.
2613 Thanks to Dustin Black for the bug report.
2614
2615 Version 1.7.0 (May 11, 2008)
2616 Add --replicate_row action
2617
2618 Version 1.6.1 (May 11, 2008)
2619 Add --symlinks option as a shortcut to --action=rebuild_symlinks
2620
2621 Version 1.6.0 (May 11, 2008)
2622 Add the custom_query action.
2623
2624 Version 1.5.2 (May 2, 2008)
2625 Fix problem with too eager creation of custom pgpass file.
2626
2627 Version 1.5.1 (April 17, 2008)
2628 Add example Nagios configuration settings (Brian A. Seklecki)
2629
2630 Version 1.5.0 (April 16, 2008)
2631 Add the --includeuser and --excludeuser options. Documentation cleanup.
2632
2633 Version 1.4.3 (April 16, 2008)
2634 Add in the 'output' concept for future support of non-Nagios programs.
2635
2636 Version 1.4.2 (April 8, 2008)
2637 Fix bug preventing --dbpass argument from working (Robert Treat).
2638
2639 Version 1.4.1 (April 4, 2008)
2640 Minor documentation fixes.
2641
2642 Version 1.4.0 (April 2, 2008)
2643 Have 'wal_files' action use pg_ls_dir (idea by Robert Treat).
2644 For last_vacuum and last_analyze, respect autovacuum effects, add separate
2645 autovacuum checks (ideas by Robert Treat).
2646
2647 Version 1.3.1 (April 2, 2008)
2648 Have txn_idle use query_start, not xact_start.
2649
2650 Version 1.3.0 (March 23, 2008)
2651 Add in txn_idle and txn_time actions.
2652
2653 Version 1.2.0 (February 21, 2008)
2654 Add the 'wal_files' action, which counts the number of WAL files
2655 in your pg_xlog directory.
2656 Fix some typos in the docs.
2657 Explicitly allow -v as an argument.
2658 Allow for a null syslog_facility in the 'logfile' action.
2659
2660 Version 1.1.2 (February 5, 2008)
2661 Fix error preventing --action=rebuild_symlinks from working.
2662
2663 Version 1.1.1 (February 3, 2008)
2664 Switch vacuum and analyze date output to use 'DD', not 'D'. (Glyn Astill)
2665
2666 Version 1.1.0 (December 16, 2008)
2667 Fixes, enhancements, and performance tracking.
2668 Add performance data tracking via --showperf and --perflimit
2669 Lots of refactoring and cleanup of how actions handle arguments.
2670 Do basic checks to figure out syslog file for 'logfile' action.
2671 Allow for exact matching of beta versions with 'version' action.
2672 Redo the default arguments to only populate when neither 'warning' nor 'critical' is provided.
2673 Allow just warning OR critical to be given for the 'timesync' action.
2674 Remove 'redirect_stderr' requirement from 'logfile' due to 8.3 changes.
2675 Actions 'last_vacuum' and 'last_analyze' are 8.2 only (Robert Treat)
2676
2677 Version 1.0.16 (December 7, 2007)
2678 First public release, December 2007
2679
2681 The index bloat size optimization is rough.
2682
2683 Some actions may not work on older versions of Postgres (before 8.0).
2684
2685 Please report any problems to check_postgres@bucardo.org
2686
2688 Greg Sabino Mullane <greg@endpoint.com>
2689
2691 Some example Nagios configuration settings using this script:
2692
2693 define command {
2694 command_name check_postgres_size
2695 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action database_size -w $ARG1$ -c $ARG2$
2696 }
2697
2698 define command {
2699 command_name check_postgres_locks
2700 command_line $USER2$/check_postgres.pl -H $HOSTADDRESS$ -u pgsql -db postgres --action locks -w $ARG1$ -c $ARG2$
2701 }
2702
2703
2704 define service {
2705 use generic-other
2706 host_name dbhost.gtld
2707 service_description dbhost PostgreSQL Service Database Usage Size
2708 check_command check_postgres_size!256000000!512000000
2709 }
2710
2711 define service {
2712 use generic-other
2713 host_name dbhost.gtld
2714 service_description dbhost PostgreSQL Service Database Locks
2715 check_command check_postgres_locks!2!3
2716 }
2717
2719 Copyright (c) 2007-2017 Greg Sabino Mullane <greg@endpoint.com>.
2720
2721 Redistribution and use in source and binary forms, with or without
2722 modification, are permitted provided that the following conditions are
2723 met:
2724
2725 1. Redistributions of source code must retain the above copyright notice,
2726 this list of conditions and the following disclaimer.
2727 2. Redistributions in binary form must reproduce the above copyright notice,
2728 this list of conditions and the following disclaimer in the documentation
2729 and/or other materials provided with the distribution.
2730
2731 THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR
2732 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2733 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
2734 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
2735 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
2736 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
2737 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2738 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
2739 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
2740 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
2741 POSSIBILITY OF SUCH DAMAGE.
2742
2743
2744
2745perl v5.28.1 2018-05-30 CHECK_POSTGRES(1)