1PG_AUTO_FAILOVER(1) pg_auto_failover PG_AUTO_FAILOVER(1)
2
3
4
6 pg_auto_failover - pg_auto_failover Documentation
7
8 The pg_auto_failover project is an Open Source Software project. The
9 development happens at https://github.com/citusdata/pg_auto_failover
10 and is public: everyone is welcome to participate by opening issues or
11 pull requests, giving feedback, etc.
12
13 Remember that the first steps are to actually play with the pg_autoctl
14 command, then read the entire available documentation (after all, I
15 took the time to write it), and then to address the community in a kind
16 and polite way — the same way you would expect people to use when ad‐
17 dressing you.
18
19 NOTE:
20 The development of pg_auto_failover has been driven by Citus Data,
21 since then a team at Microsoft. The Citus Data team at Microsoft
22 generously maintains the pg_auto_failover Open Source Software so
23 that its users may continue using it in production.
24
25 For enhancements, improvements, and new features, consider con‐
26 tributing to the project. Pull Requests are reviewed as part of the
27 offered maintenance.
28
29 NOTE:
30 Assistance is provided as usual with Open Source projects, on a vol‐
31 untary basis. If you need help to cook a patch, enhance the documen‐
32 tation, or even to use the software, you're welcome to ask questions
33 and expect some level of free guidance.
34
36 pg_auto_failover is an extension for PostgreSQL that monitors and man‐
37 ages failover for postgres clusters. It is optimised for simplicity and
38 correctness.
39
40 Single Standby Architecture
41 [image: pg_auto_failover Architecture with a primary and a standby
42 node] [image] pg_auto_failover architecture with a primary and a
43 standby node.UNINDENT
44
45 pg_auto_failover implements Business Continuity for your PostgreSQL
46 services. pg_auto_failover implements a single PostgreSQL service us‐
47 ing multiple nodes with automated failover, and automates PostgreSQL
48 maintenance operations in a way that guarantees availability of the
49 service to its users and applications.
50
51 To that end, pg_auto_failover uses three nodes (machines, servers)
52 per PostgreSQL service:
53
54 • a PostgreSQL primary node,
55
56 • a PostgreSQL secondary node, using Synchronous Hot Standby,
57
58 • a pg_auto_failover Monitor node that acts both as a witness and an
59 orchestrator.
60
61 The pg_auto_failover Monitor implements a state machine and relies on
62 in-core PostgreSQL facilities to deliver HA. For example. when the sec‐
63 ondary node is detected to be unavailable, or when its lag is reported
64 above a defined threshold (the default is 1 WAL files, or 16MB, see the
65 pgautofailover.promote_wal_log_threshold GUC on the pg_auto_failover
66 monitor), then the Monitor removes it from the synchro‐
67 nous_standby_names setting on the primary node. Until the secondary is
68 back to being monitored healthy, failover and switchover operations are
69 not allowed, preventing data loss.
70
71 Multiple Standby Architecture
72 [image: pg_auto_failover Architecture for a standalone PostgreSQL
73 service] [image] pg_auto_failover architecture with a primary and two
74 standby nodes.UNINDENT
75
76 In the pictured architecture, pg_auto_failover implements Business
77 Continuity and data availability by implementing a single PostgreSQL
78 service using multiple with automated failover and data redundancy.
79 Even after losing any Postgres node in a production system, this ar‐
80 chitecture maintains two copies of the data on two different nodes.
81
82 When using more than one standby, different architectures can be
83 achieved with pg_auto_failover, depending on the objectives and
84 trade-offs needed for your production setup.
85
86 Multiple Standbys Architecture with 3 standby nodes, one async
87 [image: pg_auto_failover architecture with a primary and three
88 standby nodes] [image] pg_auto_failover architecture with a primary
89 and three standby nodes.UNINDENT
90
91 When setting the three parameters above, it's possible to design very
92 different Postgres architectures for your production needs.
93
94 In this case, the system is setup with two standby nodes participat‐
95 ing in the replication quorum, allowing for number_sync_standbys = 1.
96 The system always maintains a minimum of two copies of the data set:
97 one on the primary, another one on one on either node B or node D.
98 Whenever we lose one of those nodes, we can hold to this guarantee of
99 two copies of the data set.
100
101 Adding to that, we have the standby server D which has been set up to
102 not participate in the replication quorum. Node D will not be found
103 in the synchronous_standby_names list of nodes. Also, node D is set
104 up in a way to never be a candidate for failover, with candidate-pri‐
105 ority = 0.
106
107 This architecture would fit a situation where nodes A, B, and C are
108 deployed in the same data center or availability zone, and node D in
109 another. Those three nodes are set up to support the main production
110 traffic and implement high availability of both the Postgres service
111 and the data set.
112
113 Node D might be set up for Business Continuity in case the first data
114 center is lost, or maybe for reporting the need for deployment on an‐
115 other application domain.
116
117 Citus Architecture
118 [image: pg_auto_failover architecture with a Citus formation] [image]
119 pg_auto_failover architecture with a Citus formation.UNINDENT
120
121 pg_auto_failover implements Business Continuity for your Citus ser‐
122 vices. pg_auto_failover implements a single Citus formation service
123 using multiple Citus nodes with automated failover, and automates
124 PostgreSQL maintenance operations in a way that guarantees availabil‐
125 ity of the service to its users and applications.
126
127 In that case, pg_auto_failover knows how to orchestrate a Citus coor‐
128 dinator failover and a Citus worker failover. A Citus worker failover
129 can be achieved with a very minimal downtime to the application,
130 where during a short time window SQL writes may error out.
131
132 In this figure we see a single standby node for each Citus node, co‐
133 ordinator and workers. It is possible to implement more standby
134 nodes, and even read-only nodes for load balancing, see Citus Secon‐
135 daries and read-replica.
136
138 pg_auto_failover includes the command line tool pg_autoctl that imple‐
139 ments many commands to manage your Postgres nodes. To implement the
140 Postgres architectures described in this documentation, and more, it is
141 generally possible to use only some of the many pg_autoctl commands.
142
143 This section of the documentation is a short introduction to the main
144 commands that are useful when getting started with pg_auto_failover.
145 More commands are available and help deal with a variety of situations,
146 see the Manual Pages for the whole list.
147
148 To understand which replication settings to use in your case, see
149 Architecture Basics section and then the Multi-node Architectures sec‐
150 tion.
151
152 To follow a step by step guide that you can reproduce on your own Azure
153 subscription and create a production Postgres setup from VMs, see the
154 pg_auto_failover Tutorial section.
155
156 To understand how to setup pg_auto_failover in a way that is compliant
157 with your internal security guide lines, read the Security settings for
158 pg_auto_failover section.
159
160 Command line environment, configuration files, etc
161 As a command line tool pg_autoctl depends on some environment vari‐
162 ables. Mostly, the tool re-uses the Postgres environment variables
163 that you might already know.
164
165 To manage a Postgres node pg_auto_failover needs to know its data di‐
166 rectory location on-disk. For that, some users will find it easier to
167 export the PGDATA variable in their environment. The alternative con‐
168 sists of always using the --pgdata option that is available to all the
169 pg_autoctl commands.
170
171 Creating Postgres Nodes
172 To get started with the simplest Postgres failover setup, 3 nodes are
173 needed: the pg_auto_failover monitor, and 2 Postgres nodes that will
174 get assigned roles by the monitor. One Postgres node will be assigned
175 the primary role, the other one will get assigned the secondary role.
176
177 To create the monitor use the command:
178
179 $ pg_autoctl create monitor
180
181 The create the Postgres nodes use the following command on each node
182 you want to create:
183
184 $ pg_autoctl create postgres
185
186 While those create commands initialize your nodes, now you have to ac‐
187 tually run the Postgres service that are expected to be running. For
188 that you can manually run the following command on every node:
189
190 $ pg_autoctl run
191
192 It is also possible (and recommended) to integrate the pg_auto_failover
193 service in your usual service management facility. When using systemd
194 the following commands can be used to produce the unit file configura‐
195 tion required:
196
197 $ pg_autoctl show systemd
198 INFO HINT: to complete a systemd integration, run the following commands:
199 INFO pg_autoctl -q show systemd --pgdata "/tmp/pgaf/m" | sudo tee /etc/systemd/system/pgautofailover.service
200 INFO sudo systemctl daemon-reload
201 INFO sudo systemctl enable pgautofailover
202 INFO sudo systemctl start pgautofailover
203 [Unit]
204 ...
205
206 While it is expected that for a production deployment each node actu‐
207 ally is a separate machine (virtual or physical, or even a container),
208 it is also possible to run several Postgres nodes all on the same ma‐
209 chine for testing or development purposes.
210
211 TIP:
212 When running several pg_autoctl nodes on the same machine for test‐
213 ing or contributing to pg_auto_failover, each Postgres instance
214 needs to run on its own port, and with its own data directory. It
215 can make things easier to then set the environment variables PGDATA
216 and PGPORT in each terminal, shell, or tab where each instance is
217 started.
218
219 Inspecting nodes
220 Once your Postgres nodes have been created, and once each pg_autoctl
221 service is running, it is possible to inspect the current state of the
222 formation with the following command:
223
224 $ pg_autoctl show state
225
226 The pg_autoctl show state commands outputs the current state of the
227 system only once. Sometimes it would be nice to have an auto-updated
228 display such as provided by common tools such as watch(1) or top(1) and
229 the like. For that, the following commands are available (see also
230 pg_autoctl watch):
231
232 $ pg_autoctl watch
233 $ pg_autoctl show state --watch
234
235 To analyze what's been happening to get to the current state, it is
236 possible to review the past events generated by the pg_auto_failover
237 monitor with the following command:
238
239 $ pg_autoctl show events
240
241 HINT:
242 The pg_autoctl show commands can be run from any node in your sys‐
243 tem. Those command need to connect to the monitor and print the
244 current state or the current known list of events as per the monitor
245 view of the system.
246
247 Use pg_autoctl show state --local to have a view of the local state
248 of a given node without connecting to the monitor Postgres instance.
249
250 The option --json is available in most pg_autoctl commands and
251 switches the output format from a human readable table form to a
252 program friendly JSON pretty-printed output.
253
254 Inspecting and Editing Replication Settings
255 When creating a node it is possible to use the --candidate-priority and
256 the --replication-quorum options to set the replication properties as
257 required by your choice of Postgres architecture.
258
259 To review the current replication settings of a formation, use one of
260 the two following commands, which are convenient aliases (the same com‐
261 mand with two ways to invoke it):
262
263 $ pg_autoctl show settings
264 $ pg_autoctl get formation settings
265
266 It is also possible to edit those replication settings at any time
267 while your nodes are in production: you can change your mind or adjust
268 to new elements without having to re-deploy everything. Just use the
269 following commands to adjust the replication settings on the fly:
270
271 $ pg_autoctl set formation number-sync-standbys
272 $ pg_autoctl set node replication-quorum
273 $ pg_autoctl set node candidate-priority
274
275 IMPORTANT:
276 The pg_autoctl get and pg_autoctl set commands always connect to the
277 monitor Postgres instance.
278
279 The pg_autoctl set command then changes the replication settings on
280 the node registration on the monitor. Then the monitor assigns the
281 APPLY_SETTINGS state to the current primary node in the system for
282 it to apply the new replication settings to its Postgres streaming
283 replication setup.
284
285 As a result, the pg_autoctl set commands requires a stable state in
286 the system to be allowed to proceed. Namely, the current primary
287 node in the system must have both its Current State and its Assigned
288 State set to primary, as per the pg_autoctl show state output.
289
290 Implementing Maintenance Operations
291 When a Postgres node must be taken offline for a maintenance operation,
292 such as e.g. a kernel security upgrade or a minor Postgres update, it
293 is best to make it so that the pg_auto_failover monitor knows about it.
294
295 • For one thing, a node that is known to be in maintenance does not
296 participate in failovers. If you are running with two Postgres
297 nodes, then failover operations are entirely prevented while the
298 standby node is in maintenance.
299
300 • Moreover, depending on your replication settings, enabling mainte‐
301 nance on your standby ensures that the primary node switches to
302 async replication before Postgres is shut down on the secondary,
303 avoiding write queries to be blocked.
304
305 To implement maintenance operations, use the following commands:
306
307 $ pg_autoctl enable maintenance
308 $ pg_autoctl disable maintenance
309
310 The main pg_autoctl run service that is expected to be running in the
311 background should continue to run during the whole maintenance opera‐
312 tion. When a node is in the maintenance state, the pg_autoctl service
313 is not controlling the Postgres service anymore.
314
315 Note that it is possible to enable maintenance on a primary Postgres
316 node, and that operation then requires a failover to happen first. It
317 is possible to have pg_auto_failover orchestrate that for you when us‐
318 ing the command:
319
320 $ pg_autoctl enable maintenance --allow-failover
321
322 IMPORTANT:
323 The pg_autoctl enable and pg_autoctl disable commands requires a
324 stable state in the system to be allowed to proceed. Namely, the
325 current primary node in the system must have both its Current State
326 and its Assigned State set to primary, as per the pg_autoctl show
327 state output.
328
329 Manual failover, switchover, and promotions
330 In the cases when a failover is needed without having an actual node
331 failure, the pg_auto_failover monitor can be used to orchestrate the
332 operation. Use one of the following commands, which are synonyms in the
333 pg_auto_failover design:
334
335 $ pg_autoctl perform failover
336 $ pg_autoctl perform switchover
337
338 Finally, it is also possible to “elect” a new primary node in your for‐
339 mation with the command:
340
341 $ pg_autoctl perform promotion
342
343 IMPORTANT:
344 The pg_autoctl perform commands requires a stable state in the sys‐
345 tem to be allowed to proceed. Namely, the current primary node in
346 the system must have both its Current State and its Assigned State
347 set to primary, as per the pg_autoctl show state output.
348
349 What's next?
350 This section of the documentation is meant to help users get started by
351 focusing on the main commands of the pg_autoctl tool. Each command has
352 many options that can have very small impact, or pretty big impact in
353 terms of security or architecture. Read the rest of the manual to un‐
354 derstand how to best use the many pg_autoctl options to implement your
355 specific Postgres production architecture.
356
358 In this guide we’ll create a Postgres setup with two nodes, a primary
359 and a standby. Then we'll add a second standby node. We’ll simulate
360 failure in the Postgres nodes and see how the system continues to func‐
361 tion.
362
363 This tutorial uses docker-compose in order to separate the architecture
364 design from some of the implementation details. This allows reasonning
365 at the architecture level within this tutorial, and better see which
366 software component needs to be deployed and run on which node.
367
368 The setup provided in this tutorial is good for replaying at home in
369 the lab. It is not intended to be production ready though. In particu‐
370 lar, no attention have been spent on volume management. After all, this
371 is a tutorial: the goal is to walk through the first steps of using
372 pg_auto_failover to implement Postgres automated failover.
373
374 Pre-requisites
375 When using docker-compose we describe a list of services, each service
376 may run on one or more nodes, and each service just runs a single iso‐
377 lated process in a container.
378
379 Within the context of a tutorial, or even a development environment,
380 this matches very well to provisioning separate physical machines
381 on-prem, or Virtual Machines either on-prem on in a Cloud service.
382
383 The docker image used in this tutorial is named pg_auto_failover:tuto‐
384 rial. It can be built locally when using the attached Dockerfile found
385 within the GitHub repository for pg_auto_failover.
386
387 To build the image, either use the provided Makefile and run make
388 build, or run the docker build command directly:
389
390 $ git clone https://github.com/citusdata/pg_auto_failover
391 $ cd pg_auto_failover/docs/tutorial
392
393 $ docker build -t pg_auto_failover:tutorial -f Dockerfile ../..
394 $ docker-compose build
395
396 Postgres failover with two nodes
397 Using docker-compose makes it easy enough to create an architecture
398 that looks like the following diagram:
399 [image: pg_auto_failover Architecture with a primary and a standby
400 node] [image] pg_auto_failover architecture with a primary and a
401 standby node.UNINDENT
402
403 Such an architecture provides failover capabilities, though it does
404 not provide with High Availability of both the Postgres service and
405 the data. See the Multi-node Architectures chapter of our docs to
406 understand more about this.
407
408 To create a cluster we use the following docker-compose definition:
409
410 version: "3.9" # optional since v1.27.0
411
412 services:
413
414 app:
415 build:
416 context: .
417 dockerfile: Dockerfile.app
418 environment:
419 PGUSER: tutorial
420 PGDATABASE: tutorial
421 PGHOST: node1,node2
422 PGPORT: 5432
423 PGAPPNAME: tutorial
424 PGSSLMODE: require
425 PGTARGETSESSIONATTRS: read-write
426
427 monitor:
428 image: pg_auto_failover:tutorial
429 volumes:
430 - ./monitor:/var/lib/postgres
431 environment:
432 PGDATA: /var/lib/postgres/pgaf
433 PG_AUTOCTL_SSL_SELF_SIGNED: true
434 expose:
435 - 5432
436 command: |
437 pg_autoctl create monitor --auth trust --run
438
439 node1:
440 image: pg_auto_failover:tutorial
441 hostname: node1
442 volumes:
443 - ./node1:/var/lib/postgres
444 environment:
445 PGDATA: /var/lib/postgres/pgaf
446 PGUSER: tutorial
447 PGDATABASE: tutorial
448 PG_AUTOCTL_HBA_LAN: true
449 PG_AUTOCTL_AUTH_METHOD: "trust"
450 PG_AUTOCTL_SSL_SELF_SIGNED: true
451 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
452 expose:
453 - 5432
454 command: |
455 pg_autoctl create postgres --name node1 --pg-hba-lan --run
456
457 node2:
458 image: pg_auto_failover:tutorial
459 hostname: node2
460 volumes:
461 - ./node2:/var/lib/postgres
462 environment:
463 PGDATA: /var/lib/postgres/pgaf
464 PGUSER: tutorial
465 PGDATABASE: tutorial
466 PG_AUTOCTL_HBA_LAN: true
467 PG_AUTOCTL_AUTH_METHOD: "trust"
468 PG_AUTOCTL_SSL_SELF_SIGNED: true
469 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
470 expose:
471 - 5432
472 command: |
473 pg_autoctl create postgres --name node2 --pg-hba-lan --run
474
475
476 To run the full Citus cluster with HA from this definition, we can use
477 the following command:
478
479 $ docker-compose up
480
481 The command above starts the services up. The first service is the mon‐
482 itor and is created with the command pg_autoctl create monitor. The op‐
483 tions for this command are exposed in the environment, and could have
484 been specified on the command line too:
485
486 $ pg_autoctl create postgres --ssl-self-signed --auth trust --pg-hba-lan --run
487
488 While the Postgres nodes are being provisionned by docker-compose, you
489 can run the following command and have a dynamic dashboard to follow
490 what's happening. The following command is like top for
491 pg_auto_failover:
492
493 $ docker-compose exec monitor pg_autoctl watch
494
495 After a little while, you can run the pg_autoctl show state command and
496 see a stable result:
497
498 $ docker-compose exec monitor pg_autoctl show state
499
500 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
501 ------+-------+------------+----------------+--------------+---------------------+--------------------
502 node2 | 1 | node2:5432 | 1: 0/3000148 | read-write | primary | primary
503 node1 | 2 | node1:5432 | 1: 0/3000148 | read-only | secondary | secondary
504
505 We can review the available Postgres URIs with the pg_autoctl show uri
506 command:
507
508 $ docker-compose exec monitor pg_autoctl show uri
509 Type | Name | Connection String
510 -------------+---------+-------------------------------
511 monitor | monitor | postgres://autoctl_node@58053a02af03:5432/pg_auto_failover?sslmode=require
512 formation | default | postgres://node2:5432,node1:5432/tutorial?target_session_attrs=read-write&sslmode=require
513
514 Add application data
515 Let's create a database schema with a single table, and some data in
516 there.
517
518 $ docker-compose exec app psql
519
520 -- in psql
521
522 CREATE TABLE companies
523 (
524 id bigserial PRIMARY KEY,
525 name text NOT NULL,
526 image_url text,
527 created_at timestamp without time zone NOT NULL,
528 updated_at timestamp without time zone NOT NULL
529 );
530
531 Next download and ingest some sample data, still from within our psql
532 session:
533
534 \copy companies from program 'curl -o- https://examples.citusdata.com/mt_ref_arch/companies.csv' with csv
535 ( COPY 75 )
536
537 Our first failover
538 When using pg_auto_failover, it is possible (and easy) to trigger a
539 failover without having to orchestrate an incident, or power down the
540 current primary.
541
542 $ docker-compose exec monitor pg_autoctl perform switchover
543 14:57:16 992 INFO Waiting 60 secs for a notification with state "primary" in formation "default" and group 0
544 14:57:16 992 INFO Listening monitor notifications about state changes in formation "default" and group 0
545 14:57:16 992 INFO Following table displays times when notifications are received
546 Time | Name | Node | Host:Port | Current State | Assigned State
547 ---------+-------+-------+------------+---------------------+--------------------
548 14:57:16 | node2 | 1 | node2:5432 | primary | draining
549 14:57:16 | node1 | 2 | node1:5432 | secondary | prepare_promotion
550 14:57:16 | node1 | 2 | node1:5432 | prepare_promotion | prepare_promotion
551 14:57:16 | node1 | 2 | node1:5432 | prepare_promotion | stop_replication
552 14:57:16 | node2 | 1 | node2:5432 | primary | demote_timeout
553 14:57:17 | node2 | 1 | node2:5432 | draining | demote_timeout
554 14:57:17 | node2 | 1 | node2:5432 | demote_timeout | demote_timeout
555 14:57:19 | node1 | 2 | node1:5432 | stop_replication | stop_replication
556 14:57:19 | node1 | 2 | node1:5432 | stop_replication | wait_primary
557 14:57:19 | node2 | 1 | node2:5432 | demote_timeout | demoted
558 14:57:19 | node2 | 1 | node2:5432 | demoted | demoted
559 14:57:19 | node1 | 2 | node1:5432 | wait_primary | wait_primary
560 14:57:19 | node2 | 1 | node2:5432 | demoted | catchingup
561 14:57:26 | node2 | 1 | node2:5432 | demoted | catchingup
562 14:57:38 | node2 | 1 | node2:5432 | demoted | catchingup
563 14:57:39 | node2 | 1 | node2:5432 | catchingup | catchingup
564 14:57:39 | node2 | 1 | node2:5432 | catchingup | secondary
565 14:57:39 | node2 | 1 | node2:5432 | secondary | secondary
566 14:57:40 | node1 | 2 | node1:5432 | wait_primary | primary
567 14:57:40 | node1 | 2 | node1:5432 | primary | primary
568
569 The new state after the failover looks like the following:
570
571 $ docker-compose exec monitor pg_autoctl show state
572 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
573 ------+-------+------------+----------------+--------------+---------------------+--------------------
574 node2 | 1 | node2:5432 | 2: 0/5002698 | read-only | secondary | secondary
575 node1 | 2 | node1:5432 | 2: 0/5002698 | read-write | primary | primary
576
577 And we can verify that we still have the data available:
578
579 docker-compose exec app psql -c "select count(*) from companies"
580 count
581 -------
582 75
583 (1 row)
584
585 Next steps
586 As mentioned in the first section of this tutorial, the way we use
587 docker-compose here is not meant to be production ready. It's useful to
588 understand and play with a distributed system such as Postgres multiple
589 nodes sytem and failovers.
590
591 See the command pg_autoctl do tmux compose session for more details
592 about how to run a docker-compose test environment with docker-compose,
593 including external volumes for each node.
594
595 See also the complete Azure VMs Tutorial for a guide on how-to provi‐
596 sion an Azure network and then Azure VMs with pg_auto_failover, includ‐
597 ing systemd coverage and a failover triggered by stopping a full VM.
598
600 In this guide we’ll create a primary and secondary Postgres node and
601 set up pg_auto_failover to replicate data between them. We’ll simulate
602 failure in the primary node and see how the system smoothly switches
603 (fails over) to the secondary.
604
605 For illustration, we'll run our databases on virtual machines in the
606 Azure platform, but the techniques here are relevant to any cloud
607 provider or on-premise network. We'll use four virtual machines: a pri‐
608 mary database, a secondary database, a monitor, and an "application."
609 The monitor watches the other nodes’ health, manages global state, and
610 assigns nodes their roles.
611
612 Create virtual network
613 Our database machines need to talk to each other and to the monitor
614 node, so let's create a virtual network.
615
616 az group create \
617 --name ha-demo \
618 --location eastus
619
620 az network vnet create \
621 --resource-group ha-demo \
622 --name ha-demo-net \
623 --address-prefix 10.0.0.0/16
624
625 We need to open ports 5432 (Postgres) and 22 (SSH) between the ma‐
626 chines, and also give ourselves access from our remote IP. We'll do
627 this with a network security group and a subnet.
628
629 az network nsg create \
630 --resource-group ha-demo \
631 --name ha-demo-nsg
632
633 az network nsg rule create \
634 --resource-group ha-demo \
635 --nsg-name ha-demo-nsg \
636 --name ha-demo-ssh-and-pg \
637 --access allow \
638 --protocol Tcp \
639 --direction Inbound \
640 --priority 100 \
641 --source-address-prefixes `curl ifconfig.me` 10.0.1.0/24 \
642 --source-port-range "*" \
643 --destination-address-prefix "*" \
644 --destination-port-ranges 22 5432
645
646 az network vnet subnet create \
647 --resource-group ha-demo \
648 --vnet-name ha-demo-net \
649 --name ha-demo-subnet \
650 --address-prefixes 10.0.1.0/24 \
651 --network-security-group ha-demo-nsg
652
653 Finally add four virtual machines (ha-demo-a, ha-demo-b, ha-demo-moni‐
654 tor, and ha-demo-app). For speed we background the az vm create pro‐
655 cesses and run them in parallel:
656
657 # create VMs in parallel
658 for node in monitor a b app
659 do
660 az vm create \
661 --resource-group ha-demo \
662 --name ha-demo-${node} \
663 --vnet-name ha-demo-net \
664 --subnet ha-demo-subnet \
665 --nsg ha-demo-nsg \
666 --public-ip-address ha-demo-${node}-ip \
667 --image debian \
668 --admin-username ha-admin \
669 --generate-ssh-keys &
670 done
671 wait
672
673 To make it easier to SSH into these VMs in future steps, let's make a
674 shell function to retrieve their IP addresses:
675
676 # run this in your local shell as well
677
678 vm_ip () {
679 az vm list-ip-addresses -g ha-demo -n ha-demo-$1 -o tsv \
680 --query '[] [] .virtualMachine.network.publicIpAddresses[0].ipAddress'
681 }
682
683 # for convenience with ssh
684
685 for node in monitor a b app
686 do
687 ssh-keyscan -H `vm_ip $node` >> ~/.ssh/known_hosts
688 done
689
690 Let's review what we created so far.
691
692 az resource list --output table --query \
693 "[?resourceGroup=='ha-demo'].{ name: name, flavor: kind, resourceType: type, region: location }"
694
695 This shows the following resources:
696
697 Name ResourceType Region
698 ------------------------------- ----------------------------------------------------- --------
699 ha-demo-a Microsoft.Compute/virtualMachines eastus
700 ha-demo-app Microsoft.Compute/virtualMachines eastus
701 ha-demo-b Microsoft.Compute/virtualMachines eastus
702 ha-demo-monitor Microsoft.Compute/virtualMachines eastus
703 ha-demo-appVMNic Microsoft.Network/networkInterfaces eastus
704 ha-demo-aVMNic Microsoft.Network/networkInterfaces eastus
705 ha-demo-bVMNic Microsoft.Network/networkInterfaces eastus
706 ha-demo-monitorVMNic Microsoft.Network/networkInterfaces eastus
707 ha-demo-nsg Microsoft.Network/networkSecurityGroups eastus
708 ha-demo-a-ip Microsoft.Network/publicIPAddresses eastus
709 ha-demo-app-ip Microsoft.Network/publicIPAddresses eastus
710 ha-demo-b-ip Microsoft.Network/publicIPAddresses eastus
711 ha-demo-monitor-ip Microsoft.Network/publicIPAddresses eastus
712 ha-demo-net Microsoft.Network/virtualNetworks eastus
713
714 Install the "pg_autoctl" executable
715 This guide uses Debian Linux, but similar steps will work on other dis‐
716 tributions. All that differs are the packages and paths. See Installing
717 pg_auto_failover.
718
719 The pg_auto_failover system is distributed as a single pg_autoctl bi‐
720 nary with subcommands to initialize and manage a replicated PostgreSQL
721 service. We’ll install the binary with the operating system package
722 manager on all nodes. It will help us run and observe PostgreSQL.
723
724 for node in monitor a b app
725 do
726 az vm run-command invoke \
727 --resource-group ha-demo \
728 --name ha-demo-${node} \
729 --command-id RunShellScript \
730 --scripts \
731 "sudo touch /home/ha-admin/.hushlogin" \
732 "curl https://install.citusdata.com/community/deb.sh | sudo bash" \
733 "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-common" \
734 "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" \
735 "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-11-auto-failover-1.4" \
736 "sudo usermod -a -G postgres ha-admin" &
737 done
738 wait
739
740 Run a monitor
741 The pg_auto_failover monitor is the first component to run. It periodi‐
742 cally attempts to contact the other nodes and watches their health. It
743 also maintains global state that “keepers” on each node consult to de‐
744 termine their own roles in the system.
745
746 # on the monitor virtual machine
747
748 ssh -l ha-admin `vm_ip monitor` -- \
749 pg_autoctl create monitor \
750 --auth trust \
751 --ssl-self-signed \
752 --pgdata monitor \
753 --pgctl /usr/lib/postgresql/11/bin/pg_ctl
754
755 This command initializes a PostgreSQL cluster at the location pointed
756 by the --pgdata option. When --pgdata is omitted, pg_autoctl attempts
757 to use the PGDATA environment variable. If a PostgreSQL instance had
758 already existing in the destination directory, this command would have
759 configured it to serve as a monitor.
760
761 pg_auto_failover, installs the pgautofailover Postgres extension, and
762 grants access to a new autoctl_node user.
763
764 In the Quick Start we use --auth trust to avoid complex security set‐
765 tings. The Postgres trust authentication method is not considered a
766 reasonable choice for production environments. Consider either using
767 the --skip-pg-hba option or --auth scram-sha-256 and then setting up
768 passwords yourself.
769
770 At this point the monitor is created. Now we'll install it as a service
771 with systemd so that it will resume if the VM restarts.
772
773 ssh -T -l ha-admin `vm_ip monitor` << CMD
774 pg_autoctl -q show systemd --pgdata ~ha-admin/monitor > pgautofailover.service
775 sudo mv pgautofailover.service /etc/systemd/system
776 sudo systemctl daemon-reload
777 sudo systemctl enable pgautofailover
778 sudo systemctl start pgautofailover
779 CMD
780
781 Bring up the nodes
782 We’ll create the primary database using the pg_autoctl create subcom‐
783 mand.
784
785 ssh -l ha-admin `vm_ip a` -- \
786 pg_autoctl create postgres \
787 --pgdata ha \
788 --auth trust \
789 --ssl-self-signed \
790 --username ha-admin \
791 --dbname appdb \
792 --hostname ha-demo-a.internal.cloudapp.net \
793 --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
794 --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'
795
796 Notice the user and database name in the monitor connection string --
797 these are what monitor init created. We also give it the path to pg_ctl
798 so that the keeper will use the correct version of pg_ctl in future
799 even if other versions of postgres are installed on the system.
800
801 In the example above, the keeper creates a primary database. It chooses
802 to set up node A as primary because the monitor reports there are no
803 other nodes in the system yet. This is one example of how the keeper is
804 state-based: it makes observations and then adjusts its state, in this
805 case from "init" to "single."
806
807 Also add a setting to trust connections from our "application" VM:
808
809 ssh -T -l ha-admin `vm_ip a` << CMD
810 echo 'hostssl "appdb" "ha-admin" ha-demo-app.internal.cloudapp.net trust' \
811 >> ~ha-admin/ha/pg_hba.conf
812 CMD
813
814 At this point the monitor and primary node are created and running.
815 Next we need to run the keeper. It’s an independent process so that it
816 can continue operating even if the PostgreSQL process goes terminates
817 on the node. We'll install it as a service with systemd so that it will
818 resume if the VM restarts.
819
820 ssh -T -l ha-admin `vm_ip a` << CMD
821 pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
822 sudo mv pgautofailover.service /etc/systemd/system
823 sudo systemctl daemon-reload
824 sudo systemctl enable pgautofailover
825 sudo systemctl start pgautofailover
826 CMD
827
828 Next connect to node B and do the same process. We'll do both steps at
829 once:
830
831 ssh -l ha-admin `vm_ip b` -- \
832 pg_autoctl create postgres \
833 --pgdata ha \
834 --auth trust \
835 --ssl-self-signed \
836 --username ha-admin \
837 --dbname appdb \
838 --hostname ha-demo-b.internal.cloudapp.net \
839 --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
840 --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'
841
842 ssh -T -l ha-admin `vm_ip b` << CMD
843 pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
844 sudo mv pgautofailover.service /etc/systemd/system
845 sudo systemctl daemon-reload
846 sudo systemctl enable pgautofailover
847 sudo systemctl start pgautofailover
848 CMD
849
850 It discovers from the monitor that a primary exists, and then switches
851 its own state to be a hot standby and begins streaming WAL contents
852 from the primary.
853
854 Node communication
855 For convenience, pg_autoctl modifies each node's pg_hba.conf file to
856 allow the nodes to connect to one another. For instance, pg_autoctl
857 added the following lines to node A:
858
859 # automatically added to node A
860
861 hostssl "appdb" "ha-admin" ha-demo-a.internal.cloudapp.net trust
862 hostssl replication "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
863 hostssl "appdb" "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
864
865 For pg_hba.conf on the monitor node pg_autoctl inspects the local net‐
866 work and makes its best guess about the subnet to allow. In our case it
867 guessed correctly:
868
869 # automatically added to the monitor
870
871 hostssl "pg_auto_failover" "autoctl_node" 10.0.1.0/24 trust
872
873 If worker nodes have more ad-hoc addresses and are not in the same sub‐
874 net, it's better to disable pg_autoctl's automatic modification of
875 pg_hba using the --skip-pg-hba command line option during creation. You
876 will then need to edit the hba file by hand. Another reason for manual
877 edits would be to use special authentication methods.
878
879 Watch the replication
880 First let’s verify that the monitor knows about our nodes, and see what
881 states it has assigned them:
882
883 ssh -l ha-admin `vm_ip monitor` pg_autoctl show state --pgdata monitor
884
885 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
886 -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
887 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/3000060 | yes | primary | primary
888 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/3000060 | yes | secondary | secondary
889
890 This looks good. We can add data to the primary, and later see it ap‐
891 pear in the secondary. We'll connect to the database from inside our
892 "app" virtual machine, using a connection string obtained from the mon‐
893 itor.
894
895 ssh -l ha-admin `vm_ip monitor` pg_autoctl show uri --pgdata monitor
896
897 Type | Name | Connection String
898 -----------+---------+-------------------------------
899 monitor | monitor | postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net:5432/pg_auto_failover?sslmode=require
900 formation | default | postgres://ha-demo-b.internal.cloudapp.net:5432,ha-demo-a.internal.cloudapp.net:5432/appdb?target_session_attrs=read-write&sslmode=require
901
902 Now we'll get the connection string and store it in a local environment
903 variable:
904
905 APP_DB_URI=$( \
906 ssh -l ha-admin `vm_ip monitor` \
907 pg_autoctl show uri --formation default --pgdata monitor \
908 )
909
910 The connection string contains both our nodes, comma separated, and in‐
911 cludes the url parameter ?target_session_attrs=read-write telling psql
912 that we want to connect to whichever of these servers supports reads
913 and writes. That will be the primary server.
914
915 # connect to database via psql on the app vm and
916 # create a table with a million rows
917 ssh -l ha-admin -t `vm_ip app` -- \
918 psql "'$APP_DB_URI'" \
919 -c "'CREATE TABLE foo AS SELECT generate_series(1,1000000) bar;'"
920
921 Cause a failover
922 Now that we've added data to node A, let's switch which is considered
923 the primary and which the secondary. After the switch we'll connect
924 again and query the data, this time from node B.
925
926 # initiate failover to node B
927 ssh -l ha-admin -t `vm_ip monitor` \
928 pg_autoctl perform switchover --pgdata monitor
929
930 Once node B is marked "primary" (or "wait_primary") we can connect and
931 verify that the data is still present:
932
933 # connect to database via psql on the app vm
934 ssh -l ha-admin -t `vm_ip app` -- \
935 psql "'$APP_DB_URI'" \
936 -c "'SELECT count(*) FROM foo;'"
937
938 It shows
939
940 count
941 ---------
942 1000000
943
944 Cause a node failure
945 This plot is too boring, time to introduce a problem. We’ll turn off VM
946 for node B (currently the primary after our previous failover) and
947 watch node A get promoted.
948
949 In one terminal let’s keep an eye on events:
950
951 ssh -t -l ha-admin `vm_ip monitor` -- \
952 watch -n 1 -d pg_autoctl show state --pgdata monitor
953
954 In another terminal we’ll turn off the virtual server.
955
956 az vm stop \
957 --resource-group ha-demo \
958 --name ha-demo-b
959
960 After a number of failed attempts to talk to node B, the monitor deter‐
961 mines the node is unhealthy and puts it into the "demoted" state. The
962 monitor promotes node A to be the new primary.
963
964 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
965 -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
966 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/6D4E068 | yes | wait_primary | wait_primary
967 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/6D4E000 | yes | demoted | catchingup
968
969 Node A cannot be considered in full "primary" state since there is no
970 secondary present, but it can still serve client requests. It is marked
971 as "wait_primary" until a secondary appears, to indicate that it's run‐
972 ning without a backup.
973
974 Let's add some data while B is offline.
975
976 # notice how $APP_DB_URI continues to work no matter which node
977 # is serving as primary
978 ssh -l ha-admin -t `vm_ip app` -- \
979 psql "'$APP_DB_URI'" \
980 -c "'INSERT INTO foo SELECT generate_series(1000001, 2000000);'"
981
982 Resurrect node B
983 Run this command to bring node B back online:
984
985 az vm start \
986 --resource-group ha-demo \
987 --name ha-demo-b
988
989 Now the next time the keeper retries its health check, it brings the
990 node back. Node B goes through the state "catchingup" while it updates
991 its data to match A. Once that's done, B becomes a secondary, and A is
992 now a full primary again.
993
994 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
995 -------+-------+--------------------------------------+------------+-----------+---------------------+--------------------
996 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/12000738 | yes | primary | primary
997 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/12000738 | yes | secondary | secondary
998
999 What's more, if we connect directly to the database again, all two mil‐
1000 lion rows are still present.
1001
1002 ssh -l ha-admin -t `vm_ip app` -- \
1003 psql "'$APP_DB_URI'" \
1004 -c "'SELECT count(*) FROM foo;'"
1005
1006 It shows
1007
1008 count
1009 ---------
1010 2000000
1011
1013 pg_auto_failover is designed as a simple and robust way to manage auto‐
1014 mated Postgres failover in production. On-top of robust operations,
1015 pg_auto_failover setup is flexible and allows either Business Continu‐
1016 ity or High Availability configurations. pg_auto_failover design in‐
1017 cludes configuration changes in a live system without downtime.
1018
1019 pg_auto_failover is designed to be able to handle a single PostgreSQL
1020 service using three nodes. In this setting, the system is resilient to
1021 losing any one of three nodes.
1022 [image: pg_auto_failover Architecture for a standalone PostgreSQL
1023 service] [image] pg_auto_failover Architecture for a standalone Post‐
1024 greSQL service.UNINDENT
1025
1026 It is important to understand that when using only two Postgres nodes
1027 then pg_auto_failover is optimized for Business Continuity. In the
1028 event of losing a single node, pg_auto_failover is capable of contin‐
1029 uing the PostgreSQL service, and prevents any data loss when doing
1030 so, thanks to PostgreSQL Synchronous Replication.
1031
1032 That said, there is a trade-off involved in this architecture. The
1033 business continuity bias relaxes replication guarantees for asynchro‐
1034 nous replication in the event of a standby node failure. This allows
1035 the PostgreSQL service to accept writes when there's a single server
1036 available, and opens the service for potential data loss if the pri‐
1037 mary server were also to fail.
1038
1039 The pg_auto_failover Monitor
1040 Each PostgreSQL node in pg_auto_failover runs a Keeper process which
1041 informs a central Monitor node about notable local changes. Some
1042 changes require the Monitor to orchestrate a correction across the
1043 cluster:
1044
1045 • New nodes
1046
1047 At initialization time, it's necessary to prepare the configura‐
1048 tion of each node for PostgreSQL streaming replication, and get
1049 the cluster to converge to the nominal state with both a primary
1050 and a secondary node in each group. The monitor determines each
1051 new node's role
1052
1053 • Node failure
1054
1055 The monitor orchestrates a failover when it detects an unhealthy
1056 node. The design of pg_auto_failover allows the monitor to shut
1057 down service to a previously designated primary node without caus‐
1058 ing a "split-brain" situation.
1059
1060 The monitor is the authoritative node that manages global state and
1061 makes changes in the cluster by issuing commands to the nodes' keeper
1062 processes. A pg_auto_failover monitor node failure has limited impact
1063 on the system. While it prevents reacting to other nodes' failures, it
1064 does not affect replication. The PostgreSQL streaming replication
1065 setup installed by pg_auto_failover does not depend on having the moni‐
1066 tor up and running.
1067
1068 pg_auto_failover Glossary
1069 pg_auto_failover handles a single PostgreSQL service with the following
1070 concepts:
1071
1072 Monitor
1073 The pg_auto_failover monitor is a service that keeps track of one or
1074 several formations containing groups of nodes.
1075
1076 The monitor is implemented as a PostgreSQL extension, so when you run
1077 the command pg_autoctl create monitor a PostgreSQL instance is initial‐
1078 ized, configured with the extension, and started. The monitor service
1079 embeds a PostgreSQL instance.
1080
1081 Formation
1082 A formation is a logical set of PostgreSQL services that are managed
1083 together.
1084
1085 It is possible to operate many formations with a single monitor in‐
1086 stance. Each formation has a group of Postgres nodes and the FSM or‐
1087 chestration implemented by the monitor applies separately to each
1088 group.
1089
1090 Group
1091 A group of two PostgreSQL nodes work together to provide a single Post‐
1092 greSQL service in a Highly Available fashion. A group consists of a
1093 PostgreSQL primary server and a secondary server setup with Hot Standby
1094 synchronous replication. Note that pg_auto_failover can orchestrate the
1095 whole setting-up of the replication for you.
1096
1097 In pg_auto_failover versions up to 1.3, a single Postgres group can
1098 contain only two Postgres nodes. Starting with pg_auto_failover 1.4,
1099 there's no limit to the number of Postgres nodes in a single group.
1100 Note that each Postgres instance that belongs to the same group serves
1101 the same dataset in its data directory (PGDATA).
1102
1103 NOTE:
1104 The notion of a formation that contains multiple groups in
1105 pg_auto_failover is useful when setting up and managing a whole Ci‐
1106 tus formation, where the coordinator nodes belong to group zero of
1107 the formation, and each Citus worker node becomes its own group and
1108 may have Postgres standby nodes.
1109
1110 Keeper
1111 The pg_auto_failover keeper is an agent that must be running on the
1112 same server where your PostgreSQL nodes are running. The keeper con‐
1113 trols the local PostgreSQL instance (using both the pg_ctl command-line
1114 tool and SQL queries), and communicates with the monitor:
1115
1116 • it sends updated data about the local node, such as the WAL delta
1117 in between servers, measured via PostgreSQL statistics views.
1118
1119 • it receives state assignments from the monitor.
1120
1121 Also the keeper maintains local state that includes the most recent
1122 communication established with the monitor and the other PostgreSQL
1123 node of its group, enabling it to detect Network Partitions.
1124
1125 NOTE:
1126 In pg_auto_failover versions up to and including 1.3, the keeper
1127 process started with pg_autoctl run manages a separate Postgres in‐
1128 stance, running as its own process tree.
1129
1130 Starting in pg_auto_failover version 1.4, the keeper process
1131 (started with pg_autoctl run) runs the Postgres instance as a
1132 sub-process of the main pg_autoctl process, allowing tighter control
1133 over the Postgres execution. Running the sub-process also makes the
1134 solution work better both in container environments (because it's
1135 now a single process tree) and with systemd, because it uses a spe‐
1136 cific cgroup per service unit.
1137
1138 Node
1139 A node is a server (virtual or physical) that runs PostgreSQL instances
1140 and a keeper service. At any given time, any node might be a primary or
1141 a secondary Postgres instance. The whole point of pg_auto_failover is
1142 to decide this state.
1143
1144 As a result, refrain from naming your nodes with the role you intend
1145 for them. Their roles can change. If they didn't, your system wouldn't
1146 need pg_auto_failover!
1147
1148 State
1149 A state is the representation of the per-instance and per-group situa‐
1150 tion. The monitor and the keeper implement a Finite State Machine to
1151 drive operations in the PostgreSQL groups; allowing pg_auto_failover to
1152 implement High Availability with the goal of zero data loss.
1153
1154 The keeper main loop enforces the current expected state of the local
1155 PostgreSQL instance, and reports the current state and some more infor‐
1156 mation to the monitor. The monitor uses this set of information and its
1157 own health-check information to drive the State Machine and assign a
1158 goal state to the keeper.
1159
1160 The keeper implements the transitions between a current state and a
1161 monitor-assigned goal state.
1162
1163 Client-side HA
1164 Implementing client-side High Availability is included in PostgreSQL's
1165 driver libpq from version 10 onward. Using this driver, it is possible
1166 to specify multiple host names or IP addresses in the same connection
1167 string:
1168
1169 $ psql -d "postgresql://host1,host2/dbname?target_session_attrs=read-write"
1170 $ psql -d "postgresql://host1:port2,host2:port2/dbname?target_session_attrs=read-write"
1171 $ psql -d "host=host1,host2 port=port1,port2 target_session_attrs=read-write"
1172
1173 When using either of the syntax above, the psql application attempts to
1174 connect to host1, and when successfully connected, checks the tar‐
1175 get_session_attrs as per the PostgreSQL documentation of it:
1176 If this parameter is set to read-write, only a connection in which
1177 read-write transactions are accepted by default is considered ac‐
1178 ceptable. The query SHOW transaction_read_only will be sent upon
1179 any successful connection; if it returns on, the connection will be
1180 closed. If multiple hosts were specified in the connection string,
1181 any remaining servers will be tried just as if the connection at‐
1182 tempt had failed. The default value of this parameter, any, regards
1183 all connections as acceptable.
1184
1185 When the connection attempt to host1 fails, or when the target_ses‐
1186 sion_attrs can not be verified, then the psql application attempts to
1187 connect to host2.
1188
1189 The behavior is implemented in the connection library libpq, so any ap‐
1190 plication using it can benefit from this implementation, not just psql.
1191
1192 When using pg_auto_failover, configure your application connection
1193 string to use the primary and the secondary server host names, and set
1194 target_session_attrs=read-write too, so that your application automati‐
1195 cally connects to the current primary, even after a failover occurred.
1196
1197 Monitoring protocol
1198 The monitor interacts with the data nodes in 2 ways:
1199
1200 • Data nodes periodically connect and run SELECT pgauto‐
1201 failover.node_active(...) to communicate their current state and
1202 obtain their goal state.
1203
1204 • The monitor periodically connects to all the data nodes to see if
1205 they are healthy, doing the equivalent of pg_isready.
1206
1207 When a data node calls node_active, the state of the node is stored in
1208 the pgautofailover.node table and the state machines of both nodes are
1209 progressed. The state machines are described later in this readme. The
1210 monitor typically only moves one state forward and waits for the
1211 node(s) to converge except in failure states.
1212
1213 If a node is not communicating to the monitor, it will either cause a
1214 failover (if node is a primary), disabling synchronous replication (if
1215 node is a secondary), or cause the state machine to pause until the
1216 node comes back (other cases). In most cases, the latter is harmless,
1217 though in some cases it may cause downtime to last longer, e.g. if a
1218 standby goes down during a failover.
1219
1220 To simplify operations, a node is only considered unhealthy if the mon‐
1221 itor cannot connect and it hasn't reported its state through node_ac‐
1222 tive for a while. This allows, for example, PostgreSQL to be restarted
1223 without causing a health check failure.
1224
1225 Synchronous vs. asynchronous replication
1226 By default, pg_auto_failover uses synchronous replication, which means
1227 all writes block until at least one standby node has reported receiving
1228 them. To handle cases in which the standby fails, the primary switches
1229 between two states called wait_primary and primary based on the health
1230 of standby nodes, and based on the replication setting num‐
1231 ber_sync_standby.
1232
1233 When in the wait_primary state, synchronous replication is disabled by
1234 automatically setting synchronous_standby_names = '' to allow writes to
1235 proceed. However doing so also disables failover, since the standby
1236 might get arbitrarily far behind. If the standby is responding to
1237 health checks and within 1 WAL segment of the primary (by default),
1238 synchronous replication is enabled again on the primary by setting syn‐
1239 chronous_standby_names = '*' which may cause a short latency spike
1240 since writes will then block until the standby has caught up.
1241
1242 When using several standby nodes with replication quorum enabled, the
1243 actual setting for synchronous_standby_names is set to a list of those
1244 standby nodes that are set to participate to the replication quorum.
1245
1246 If you wish to disable synchronous replication, you need to add the
1247 following to postgresql.conf:
1248
1249 synchronous_commit = 'local'
1250
1251 This ensures that writes return as soon as they are committed on the
1252 primary -- under all circumstances. In that case, failover might lead
1253 to some data loss, but failover is not initiated if the secondary is
1254 more than 10 WAL segments (by default) behind on the primary. During a
1255 manual failover, the standby will continue accepting writes from the
1256 old primary. The standby will stop accepting writes only if it's fully
1257 caught up (most common), the primary fails, or it does not receive
1258 writes for 2 minutes.
1259
1260 A note about performance
1261 In some cases the performance impact on write latency when setting syn‐
1262 chronous replication makes the application fail to deliver expected
1263 performance. If testing or production feedback shows this to be the
1264 case, it is beneficial to switch to using asynchronous replication.
1265
1266 The way to use asynchronous replication in pg_auto_failover is to
1267 change the synchronous_commit setting. This setting can be set per
1268 transaction, per session, or per user. It does not have to be set glob‐
1269 ally on your Postgres instance.
1270
1271 One way to benefit from that would be:
1272
1273 alter role fast_and_loose set synchronous_commit to local;
1274
1275 That way performance-critical parts of the application don't have to
1276 wait for the standby nodes. Only use this when you can also lower your
1277 data durability guarantees.
1278
1279 Node recovery
1280 When bringing a node back after a failover, the keeper (pg_autoctl run)
1281 can simply be restarted. It will also restart postgres if needed and
1282 obtain its goal state from the monitor. If the failed node was a pri‐
1283 mary and was demoted, it will learn this from the monitor. Once the
1284 node reports, it is allowed to come back as a standby by running
1285 pg_rewind. If it is too far behind, the node performs a new pg_base‐
1286 backup.
1287
1289 Pg_auto_failover allows you to have more than one standby node, and of‐
1290 fers advanced control over your production architecture characteris‐
1291 tics.
1292
1293 Architectures with two standby nodes
1294 When adding your second standby node with default settings, you get the
1295 following architecture:
1296 [image: pg_auto_failover architecture with two standby nodes] [image]
1297 pg_auto_failover architecture with two standby nodes.UNINDENT
1298
1299 In this case, three nodes get set up with the same characteristics,
1300 achieving HA for both the Postgres service and the production
1301 dataset. An important setting for this architecture is num‐
1302 ber_sync_standbys.
1303
1304 The replication setting number_sync_standbys sets how many standby
1305 nodes the primary should wait for when committing a transaction. In
1306 order to have a good availability in your system, pg_auto_failover
1307 requires number_sync_standbys + 1 standby nodes participating in the
1308 replication quorum: this allows any standby node to fail without im‐
1309 pact on the system's ability to respect the replication quorum.
1310
1311 When only two nodes are registered in a group on the monitor we have
1312 a primary and a single secondary node. Then number_sync_standbys can
1313 only be set to zero. When adding a second standby node to a
1314 pg_auto_failover group, then the monitor automatically increments
1315 number_sync_standbys to one, as we see in the diagram above.
1316
1317 When number_sync_standbys is set to zero then pg_auto_failover imple‐
1318 ments the Business Continuity setup as seen in Architecture Basics:
1319 synchronous replication is then used as a way to guarantee that
1320 failover can be implemented without data loss.
1321
1322 In more details:
1323
1324 1. With number_sync_standbys set to one, this architecture always
1325 maintains two copies of the dataset: one on the current primary
1326 node (node A in the previous diagram), and one on the standby
1327 that acknowledges the transaction first (either node B or node C
1328 in the diagram).
1329
1330 When one of the standby nodes is unavailable, the second copy of
1331 the dataset can still be maintained thanks to the remaining
1332 standby.
1333
1334 When both the standby nodes are unavailable, then it's no longer
1335 possible to guarantee the replication quorum, and thus writes on
1336 the primary are blocked. The Postgres primary node waits until at
1337 least one standby node acknowledges the transactions locally com‐
1338 mitted, thus degrading your Postgres service to read-only.
1339
1340 0. It is possible to manually set number_sync_standbys to zero when
1341 having registered two standby nodes to the monitor, overriding
1342 the default behavior.
1343
1344 In that case, when the second standby node becomes unhealthy at
1345 the same time as the first standby node, the primary node is as‐
1346 signed the state Wait_primary. In that state, synchronous repli‐
1347 cation is disabled on the primary by setting synchro‐
1348 nous_standby_names to an empty string. Writes are allowed on the
1349 primary, even though there's no extra copy of the production
1350 dataset available at this time.
1351
1352 Setting number_sync_standbys to zero allows data to be written
1353 even when both standby nodes are down. In this case, a single
1354 copy of the production data set is kept and, if the primary was
1355 then to fail, some data will be lost. How much depends on your
1356 backup and recovery mechanisms.
1357
1358 Replication Settings and Postgres Architectures
1359 The entire flexibility of pg_auto_failover can be leveraged with the
1360 following three replication settings:
1361
1362 • Number of sync stanbys
1363
1364 • Replication quorum
1365
1366 • Candidate priority
1367
1368 Number Sync Standbys
1369 This parameter is used by Postgres in the synchronous_standby_names pa‐
1370 rameter: number_sync_standby is the number of synchronous standbys for
1371 whose replies transactions must wait.
1372
1373 This parameter can be set at the formation level in pg_auto_failover,
1374 meaning that it applies to the current primary, and "follows" a
1375 failover to apply to any new primary that might replace the current
1376 one.
1377
1378 To set this parameter to the value <n>, use the following command:
1379
1380 pg_autoctl set formation number-sync-standbys <n>
1381
1382 The default value in pg_auto_failover is zero. When set to zero, the
1383 Postgres parameter synchronous_standby_names can be set to either '*'
1384 or to '':
1385
1386 • synchronous_standby_names = '*' means that any standby may partici‐
1387 pate in the replication quorum for transactions with synchronous_com‐
1388 mit set to on or higher values.
1389
1390 pg_autofailover uses synchronous_standby_names = '*' when there's at
1391 least one standby that is known to be healthy.
1392
1393 • synchronous_standby_names = '' (empty string) disables synchrous com‐
1394 mit and makes all your commits asynchronous, meaning that transaction
1395 commits will not wait for replication. In other words, a single copy
1396 of your production data is maintained when synchronous_standby_names
1397 is set that way.
1398
1399 pg_autofailover uses synchronous_standby_names = '' only when num‐
1400 ber_sync_standbys is set to zero and there's no standby node known
1401 healthy by the monitor.
1402
1403 In order to set number_sync_standbys to a non-zero value,
1404 pg_auto_failover requires that at least number_sync_standbys + 1
1405 standby nodes be registered in the system.
1406
1407 When the first standby node is added to the pg_auto_failover monitor,
1408 the only acceptable value for number_sync_standbys is zero. When a sec‐
1409 ond standby is added that participates in the replication quorum, then
1410 number_sync_standbys is automatically set to one.
1411
1412 The command pg_autoctl set formation number-sync-standbys can be used
1413 to change the value of this parameter in a formation, even when all the
1414 nodes are already running in production. The pg_auto_failover monitor
1415 then sets a transition for the primary to update its local value of
1416 synchronous_standby_names.
1417
1418 Replication Quorum
1419 The replication quorum setting is a boolean and defaults to true, and
1420 can be set per-node. Pg_auto_failover includes a given node in synchro‐
1421 nous_standby_names only when the replication quorum parameter has been
1422 set to true. This means that asynchronous replication will be used for
1423 nodes where replication-quorum is set to false.
1424
1425 It is possible to force asynchronous replication globally by setting
1426 replication quorum to false on all the nodes in a formation. Remember
1427 that failovers will happen, and thus to set your replication settings
1428 on the current primary node too when needed: it is going to be a
1429 standby later.
1430
1431 To set this parameter to either true or false, use one of the following
1432 commands:
1433
1434 pg_autoctl set node replication-quorum true
1435 pg_autoctl set node replication-quorum false
1436
1437 Candidate Priority
1438 The candidate priority setting is an integer that can be set to any
1439 value between 0 (zero) and 100 (one hundred). The default value is 50.
1440 When the pg_auto_failover monitor decides to orchestrate a failover, it
1441 uses each node's candidate priority to pick the new primary node.
1442
1443 When setting the candidate priority of a node down to zero, this node
1444 will never be selected to be promoted as the new primary when a
1445 failover is orchestrated by the monitor. The monitor will instead wait
1446 until another node registered is healthy and in a position to be pro‐
1447 moted.
1448
1449 To set this parameter to the value <n>, use the following command:
1450
1451 pg_autoctl set node candidate-priority <n>
1452
1453 When nodes have the same candidate priority, the monitor then picks the
1454 standby with the most advanced LSN position published to the monitor.
1455 When more than one node has published the same LSN position, a random
1456 one is chosen.
1457
1458 When the candidate for failover has not published the most advanced LSN
1459 position in the WAL, pg_auto_failover orchestrates an intermediate step
1460 in the failover mechanism. The candidate fetches the missing WAL bytes
1461 from one of the standby with the most advanced LSN position prior to
1462 being promoted. Postgres allows this operation thanks to cascading
1463 replication: any standby can be the upstream node for another standby.
1464
1465 It is required at all times that at least two nodes have a non-zero
1466 candidate priority in any pg_auto_failover formation. Otherwise no
1467 failover is possible.
1468
1469 Auditing replication settings
1470 The command pg_autoctl get formation settings (also known as pg_autoctl
1471 show settings) can be used to obtain a summary of all the replication
1472 settings currently in effect in a formation. Still using the first dia‐
1473 gram on this page, we get the following summary:
1474
1475 $ pg_autoctl get formation settings
1476 Context | Name | Setting | Value
1477 ----------+---------+---------------------------+-------------------------------------------------------------
1478 formation | default | number_sync_standbys | 1
1479 primary | node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
1480 node | node_A | replication quorum | true
1481 node | node_B | replication quorum | true
1482 node | node_C | replication quorum | true
1483 node | node_A | candidate priority | 50
1484 node | node_B | candidate priority | 50
1485 node | node_C | candidate priority | 50
1486
1487 We can see that the number_sync_standbys has been used to compute the
1488 current value of the synchronous_standby_names setting on the primary.
1489
1490 Because all the nodes in that example have the same default candidate
1491 priority (50), then pg_auto_failover is using the form ANY 1 with the
1492 list of standby nodes that are currently participating in the replica‐
1493 tion quorum.
1494
1495 The entries in the synchronous_standby_names list are meant to match
1496 the application_name connection setting used in the primary_conninfo,
1497 and the format used by pg_auto_failover there is the format string
1498 "pgautofailover_standby_%d" where %d is replaced by the node id. This
1499 allows keeping the same connection string to the primary when the node
1500 name is changed (using the command pg_autoctl set metadata --name).
1501
1502 Here we can see the node id of each registered Postgres node with the
1503 following command:
1504
1505 $ pg_autoctl show state
1506 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
1507 -------+-------+----------------+-----------+-----------+---------------------+--------------------
1508 node_A | 1 | localhost:5001 | 0/7002310 | yes | primary | primary
1509 node_B | 2 | localhost:5002 | 0/7002310 | yes | secondary | secondary
1510 node_C | 3 | localhost:5003 | 0/7002310 | yes | secondary | secondary
1511
1512 When setting pg_auto_failover with per formation number_sync_standby
1513 and then per node replication quorum and candidate priority replication
1514 settings, those properties are then used to compute the synchro‐
1515 nous_standby_names value on the primary node. This value is automati‐
1516 cally maintained on the primary by pg_auto_failover, and is updated ei‐
1517 ther when replication settings are changed or when a failover happens.
1518
1519 The other situation when the pg_auto_failover replication settings are
1520 used is a candidate election when a failover happens and there is more
1521 than two nodes registered in a group. Then the node with the highest
1522 candidate priority is selected, as detailed above in the Candidate Pri‐
1523 ority section.
1524
1525 Sample architectures with three standby nodes
1526 When setting the three parameters above, it's possible to design very
1527 different Postgres architectures for your production needs.
1528 [image: pg_auto_failover architecture with three standby nodes] [im‐
1529 age] pg_auto_failover architecture with three standby nodes.UNINDENT
1530
1531 In this case, the system is set up with three standby nodes all set
1532 the same way, with default parameters. The default parameters support
1533 setting number_sync_standbys = 2. This means that Postgres will main‐
1534 tain three copies of the production data set at all times.
1535
1536 On the other hand, if two standby nodes were to fail at the same
1537 time, despite the fact that two copies of the data are still main‐
1538 tained, the Postgres service would be degraded to read-only.
1539
1540 With this architecture diagram, here's the summary that we obtain:
1541
1542 $ pg_autoctl show settings
1543 Context | Name | Setting | Value
1544 ----------+---------+---------------------------+---------------------------------------------------------------------------------------
1545 formation | default | number_sync_standbys | 2
1546 primary | node_A | synchronous_standby_names | 'ANY 2 (pgautofailover_standby_2, pgautofailover_standby_4, pgautofailover_standby_3)'
1547 node | node_A | replication quorum | true
1548 node | node_B | replication quorum | true
1549 node | node_C | replication quorum | true
1550 node | node_D | replication quorum | true
1551 node | node_A | candidate priority | 50
1552 node | node_B | candidate priority | 50
1553 node | node_C | candidate priority | 50
1554 node | node_D | candidate priority | 50
1555
1556 Sample architecture with three standby nodes, one async
1557 [image: pg_auto_failover architecture with three standby nodes, one
1558 async] [image] pg_auto_failover architecture with three standby
1559 nodes, one async.UNINDENT
1560
1561 In this case, the system is set up with two standby nodes participat‐
1562 ing in the replication quorum, allowing for number_sync_standbys = 1.
1563 The system always maintains at least two copies of the data set, one
1564 on the primary, another on either node B or node C. Whenever we lose
1565 one of those nodes, we can hold to the guarantee of having two copies
1566 of the data set.
1567
1568 Additionally, we have the standby server D which has been set up to
1569 not participate in the replication quorum. Node D will not be found
1570 in the synchronous_standby_names list of nodes. Also, node D is set
1571 up to never be a candidate for failover, with candidate-priority = 0.
1572
1573 This architecture would fit a situation with nodes A, B, and C are
1574 deployed in the same data center or availability zone and node D in
1575 another one. Those three nodes are set up to support the main pro‐
1576 duction traffic and implement high availability of both the Postgres
1577 service and the data set.
1578
1579 Node D might be set up for Business Continuity in case the first data
1580 center is lost, or maybe for reporting needs on another application
1581 domain.
1582
1583 With this architecture diagram, here's the summary that we obtain:
1584
1585 pg_autoctl show settings
1586 Context | Name | Setting | Value
1587 ----------+---------+---------------------------+-------------------------------------------------------------
1588 formation | default | number_sync_standbys | 1
1589 primary | node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
1590 node | node_A | replication quorum | true
1591 node | node_B | replication quorum | true
1592 node | node_C | replication quorum | true
1593 node | node_D | replication quorum | false
1594 node | node_A | candidate priority | 50
1595 node | node_B | candidate priority | 50
1596 node | node_C | candidate priority | 50
1597 node | node_D | candidate priority | 0
1598
1600 The usual pg_autoctl commands work both with Postgres standalone nodes
1601 and with Citus nodes.
1602 [image: pg_auto_failover architecture with a Citus formation] [image]
1603 pg_auto_failover architecture with a Citus formation.UNINDENT
1604
1605 When using pg_auto_failover with Citus, a pg_auto_failover formation
1606 is composed of a coordinator and a set of worker nodes.
1607
1608 When High-Availability is enabled at the formation level, which is
1609 the default, then a minimum of two coordinator nodes are required: a
1610 primary and a secondary coordinator to be able to orchestrate a
1611 failover when needed.
1612
1613 The same applies to the worker nodes: when using pg_auto_failover for
1614 Citus HA, then each worker node is a pg_auto_failover group in the
1615 formation, and each worker group is setup with at least two nodes
1616 (primary, secondary).
1617
1618 Setting-up your first Citus formation
1619 Have a look at our documentation of Citus Cluster Quick Start for more
1620 details with a full tutorial setup on a single VM, for testing and QA.
1621
1622 Citus specific commands and operations
1623 When setting up Citus with pg_auto_failover, the following Citus spe‐
1624 cific commands are provided. Other pg_autoctl commands work as usual
1625 when deploying a Citus formation, so that you can use the rest of this
1626 documentation to operate your Citus deployments.
1627
1628 pg_autoctl create coordinator
1629 This creates a Citus coordinator, that is to say a Postgres node with
1630 the Citus extension loaded and ready to act as a coordinator. The coor‐
1631 dinator is always places in the pg_auto_failover group zero of a given
1632 formation.
1633
1634 See pg_autoctl create coordinator for details.
1635
1636 IMPORTANT:
1637 The default --dbname is the same as the current system user name,
1638 which in many case is going to be postgres. Please make sure to use
1639 the --dbname option with the actual database that you're going to
1640 use with your application.
1641
1642 Citus does not support multiple databases, you have to use the data‐
1643 base where Citus is created. When using Citus, that is essential to
1644 the well behaving of worker failover.
1645
1646 pg_autoctl create worker
1647 This command creates a new Citus worker node, that is to say a Postgres
1648 node with the Citus extensions loaded, and registered to the Citus co‐
1649 ordinator created with the previous command. Because the Citus coordi‐
1650 nator is always given group zero, the pg_auto_failover monitor knows
1651 how to reach the Citus coordinator and automate workers registration.
1652
1653 The default worker creation policy is to assign the primary role to the
1654 first worker registered, then secondary in the same group, then primary
1655 in a new group, etc.
1656
1657 If you want to extend an existing group to have a third worker node in
1658 the same group, enabling multiple-standby capabilities in your setup,
1659 then make sure to use the --group option to the pg_autoctl create
1660 worker command.
1661
1662 See pg_autoctl create worker for details.
1663
1664 pg_autoctl activate
1665 This command calls the Citus “activation” API so that a node can be
1666 used to host shards for your reference and distributed tables.
1667
1668 When creating a Citus worker, pg_autoctl create worker automatically
1669 activates the worker node to the coordinator. You only need this com‐
1670 mand when something unexpected have happened and you want to manually
1671 make sure the worker node has been activated at the Citus coordinator
1672 level.
1673
1674 Starting with Citus 10 it is also possible to activate the coordinator
1675 itself as a node with shard placement. Use pg_autoctl activate on your
1676 Citus coordinator node manually to use that feature.
1677
1678 When the Citus coordinator is activated, an extra step is then needed
1679 for it to host shards of distributed tables. If you want your coordina‐
1680 tor to have shards, then have a look at the Citus API
1681 citus_set_node_property to set the shouldhaveshards property to true.
1682
1683 See pg_autoctl activate for details.
1684
1685 Citus worker failover
1686 When a failover is orchestrated by pg_auto_failover for a Citus worker
1687 node, Citus offers the opportunity to make the failover close to trans‐
1688 parent to the application. Because the application connects to the co‐
1689 ordinator, which in turn connects to the worker nodes, then it is pos‐
1690 sible with Citus to _pause_ the SQL write traffic on the coordinator
1691 for the shards hosted on a failed worker node. The Postgres failover
1692 then happens while the traffic is kept on the coordinator, and resumes
1693 as soon as a secondary worker node is ready to accept read-write
1694 queries.
1695
1696 This is implemented thanks to Citus smart locking strategy in its ci‐
1697 tus_update_node API, and pg_auto_failover takes full benefit with a
1698 special built set of FSM transitions for Citus workers.
1699
1700 Citus Secondaries and read-replica
1701 It is possible to setup Citus read-only replicas. This Citus feature
1702 allows using a set of dedicated nodes (both coordinator and workers) to
1703 serve read-only traffic, such as reporting, analytics, or other parts
1704 of your workload that are read-only.
1705
1706 Citus read-replica nodes are a solution for load-balancing. Those nodes
1707 can't be used as HA failover targets, and thus have their candi‐
1708 date-priority set to zero. This setting of a read-replica can not be
1709 changed later.
1710
1711 This setup is done by setting the Citus property citus.use_sec‐
1712 ondary_nodes to always (it defaults to never), and the Citus property
1713 citus.cluster_name to your read-only cluster name.
1714
1715 Both of those settings are entirely supported and managed by pg_autoctl
1716 when using the --citus-secondary --cluster-name cluster_d options to
1717 the pg_autoctl create coordinator|worker commands.
1718
1719 Here is an example where we have created a formation with three nodes
1720 for HA for the coordinator (one primary and two secondary nodes), then
1721 a single worker node with the same three nodes setup for HA, and then
1722 one read-replica environment on-top of that, for a total of 8 nodes:
1723
1724 $ pg_autoctl show state
1725 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
1726 ---------+-------+----------------+-----------+-----------+---------------------+--------------------
1727 coord0a | 0/1 | localhost:5501 | 0/5003298 | yes | primary | primary
1728 coord0b | 0/3 | localhost:5502 | 0/5003298 | yes | secondary | secondary
1729 coord0c | 0/6 | localhost:5503 | 0/5003298 | yes | secondary | secondary
1730 coord0d | 0/7 | localhost:5504 | 0/5003298 | yes | secondary | secondary
1731 worker1a | 1/2 | localhost:5505 | 0/4000170 | yes | primary | primary
1732 worker1b | 1/4 | localhost:5506 | 0/4000170 | yes | secondary | secondary
1733 worker1c | 1/5 | localhost:5507 | 0/4000170 | yes | secondary | secondary
1734 reader1d | 1/8 | localhost:5508 | 0/4000170 | yes | secondary | secondary
1735
1736 Nodes named coord0d and reader1d are members of the read-replica clus‐
1737 ter cluster_d. We can see that a read-replica cluster needs a dedicated
1738 coordinator and then one dedicated worker node per group.
1739
1740 TIP:
1741 It is possible to name the nodes in a pg_auto_failover formation ei‐
1742 ther at creation time or later, using one of those commands:
1743
1744 $ pg_autoctl create worker --name ...
1745 $ pg_autoctl set node metadata --name ...
1746
1747 Here coord0d is the node name for the dedicated coordinator for
1748 cluster_d, and reader1d is the node name for the dedicated worker
1749 for cluster_d in the worker group 1 (the only worker group in that
1750 setup).
1751
1752 Now the usual command to show the connection strings for your applica‐
1753 tion is listing the read-replica setup that way:
1754
1755 $ pg_autoctl show uri
1756 Type | Name | Connection String
1757 -------------+-----------+-------------------------------
1758 monitor | monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
1759 formation | default | postgres://localhost:5503,localhost:5501,localhost:5502/postgres?target_session_attrs=read-write&sslmode=prefer
1760 read-replica | cluster_d | postgres://localhost:5504/postgres?sslmode=prefer
1761
1762 Given that setup, your application can now use the formation default
1763 Postgres URI to connect to the highly-available read-write service, or
1764 to the read-replica cluster_d service to connect to the read-only
1765 replica where you can offload some of your SQL workload.
1766
1767 When connecting to the cluster_d connection string, the Citus proper‐
1768 ties citus.use_secondary_nodes and citus.cluster_name are automatically
1769 setup to their expected values, of course.
1770
1772 In this guide we’ll create a Citus cluster with a coordinator node and
1773 three workers. Every node will have a secondary for failover. We’ll
1774 simulate failure in the coordinator and worker nodes and see how the
1775 system continues to function.
1776
1777 This tutorial uses docker-compose in order to separate the architecture
1778 design from some of the implementation details. This allows reasonning
1779 at the architecture level within this tutorial, and better see which
1780 software component needs to be deployed and run on which node.
1781
1782 The setup provided in this tutorial is good for replaying at home in
1783 the lab. It is not intended to be production ready though. In particu‐
1784 lar, no attention have been spent on volume management. After all, this
1785 is a tutorial: the goal is to walk through the first steps of using
1786 pg_auto_failover to provide HA to a Citus formation.
1787
1788 Pre-requisites
1789 When using docker-compose we describe a list of services, each service
1790 may run on one or more nodes, and each service just runs a single iso‐
1791 lated process in a container.
1792
1793 Within the context of a tutorial, or even a development environment,
1794 this matches very well to provisioning separate physical machines
1795 on-prem, or Virtual Machines either on-prem on in a Cloud service.
1796
1797 The docker image used in this tutorial is named pg_auto_failover:citus.
1798 It can be built locally when using the attached Dockerfile found within
1799 the GitHub repository for pg_auto_failover.
1800
1801 To build the image, either use the provided Makefile and run make
1802 build, or run the docker build command directly:
1803
1804 $ git clone https://github.com/citusdata/pg_auto_failover
1805 $ cd pg_auto_failover/docs/cluster
1806
1807 $ docker build -t pg_auto_failover:citus -f Dockerfile ../..
1808 $ docker-compose build
1809
1810 Our first Citus Cluster
1811 To create a cluster we use the following docker-compose definition:
1812
1813 version: "3.9" # optional since v1.27.0
1814
1815 services:
1816
1817 monitor:
1818 image: pg_auto_failover:citus
1819 environment:
1820 PGDATA: /tmp/pgaf
1821 command: |
1822 pg_autoctl create monitor --ssl-self-signed --auth trust --run
1823 expose:
1824 - 5432
1825
1826 coord:
1827 image: pg_auto_failover:citus
1828 environment:
1829 PGDATA: /tmp/pgaf
1830 PGUSER: citus
1831 PGDATABASE: citus
1832 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
1833 expose:
1834 - 5432
1835 command: |
1836 pg_autoctl create coordinator --ssl-self-signed --auth trust --pg-hba-lan --run
1837
1838 worker:
1839 image: pg_auto_failover:citus
1840 environment:
1841 PGDATA: /tmp/pgaf
1842 PGUSER: citus
1843 PGDATABASE: citus
1844 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
1845 expose:
1846 - 5432
1847 command: |
1848 pg_autoctl create worker --ssl-self-signed --auth trust --pg-hba-lan --run
1849
1850
1851 To run the full Citus cluster with HA from this definition, we can use
1852 the following command:
1853
1854 $ docker-compose up --scale coord=2 --scale worker=6
1855
1856 The command above starts the services up. The command also specifies a
1857 --scale option that is different for each service. We need:
1858
1859 • one monitor node, and the default scale for a service is 1,
1860
1861 • one primary Citus coordinator node and one secondary Cituscoordi‐
1862 nator node, which is to say two coordinator nodes,
1863
1864 • and three Citus worker nodes, each worker with both a primary
1865 Postgres node and a secondary Postgres node, so that's a scale of
1866 6 here.
1867
1868 The default policy for the pg_auto_failover monitor is to assign a pri‐
1869 mary and a secondary per auto failover Group. In our case, every node
1870 being provisioned with the same command, we benefit from that default
1871 policy:
1872
1873 $ pg_autoctl create worker --ssl-self-signed --auth trust --pg-hba-lan --run
1874
1875 When provisioning a production cluster, it is often required to have a
1876 better control over which node participates in which group, then using
1877 the --group N option in the pg_autoctl create worker command line.
1878
1879 Within a given group, the first node that registers is a primary, and
1880 the other nodes are secondary nodes. The monitor takes care of that in
1881 a way that we don't have to. In a High Availability setup, every node
1882 should be ready to be promoted primary at any time, so knowing which
1883 node in a group is assigned primary first is not very interesting.
1884
1885 While the cluster is being provisionned by docker-compose, you can run
1886 the following command and have a dynamic dashboard to follow what's
1887 happening. The following command is like top for pg_auto_failover:
1888
1889 $ docker-compose exec monitor pg_autoctl watch
1890
1891 Because the pg_basebackup operation that is used to create the sec‐
1892 ondary nodes takes some time when using Citus, because of the first
1893 CHECKPOINT which is quite slow. So at first when inquiring about the
1894 cluster state you might see the following output:
1895
1896 $ docker-compose exec monitor pg_autoctl show state
1897 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
1898 ---------+-------+-------------------+----------------+--------------+---------------------+--------------------
1899 coord0a | 0/1 | cd52db444544:5432 | 1: 0/200C4A0 | read-write | wait_primary | wait_primary
1900 coord0b | 0/2 | 66a31034f2e4:5432 | 1: 0/0 | none ! | wait_standby | catchingup
1901 worker1a | 1/3 | dae7c062e2c1:5432 | 1: 0/2003B18 | read-write | wait_primary | wait_primary
1902 worker1b | 1/4 | 397e6069b09b:5432 | 1: 0/0 | none ! | wait_standby | catchingup
1903 worker2a | 2/5 | 5bf86f9ef784:5432 | 1: 0/2006AB0 | read-write | wait_primary | wait_primary
1904 worker2b | 2/6 | 23498b801a61:5432 | 1: 0/0 | none ! | wait_standby | catchingup
1905 worker3a | 3/7 | c23610380024:5432 | 1: 0/2003B18 | read-write | wait_primary | wait_primary
1906 worker3b | 3/8 | 2ea8aac8a04a:5432 | 1: 0/0 | none ! | wait_standby | catchingup
1907
1908 After a while though (typically around a minute or less), you can run
1909 that same command again for stable result:
1910
1911 $ docker-compose exec monitor pg_autoctl show state
1912
1913 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
1914 ---------+-------+-------------------+----------------+--------------+---------------------+--------------------
1915 coord0a | 0/1 | cd52db444544:5432 | 1: 0/3127AD0 | read-write | primary | primary
1916 coord0b | 0/2 | 66a31034f2e4:5432 | 1: 0/3127AD0 | read-only | secondary | secondary
1917 worker1a | 1/3 | dae7c062e2c1:5432 | 1: 0/311B610 | read-write | primary | primary
1918 worker1b | 1/4 | 397e6069b09b:5432 | 1: 0/311B610 | read-only | secondary | secondary
1919 worker2a | 2/5 | 5bf86f9ef784:5432 | 1: 0/311B610 | read-write | primary | primary
1920 worker2b | 2/6 | 23498b801a61:5432 | 1: 0/311B610 | read-only | secondary | secondary
1921 worker3a | 3/7 | c23610380024:5432 | 1: 0/311B648 | read-write | primary | primary
1922 worker3b | 3/8 | 2ea8aac8a04a:5432 | 1: 0/311B648 | read-only | secondary | secondary
1923
1924 You can see from the above that the coordinator node has a primary and
1925 secondary instance for high availability. When connecting to the coor‐
1926 dinator, clients should try connecting to whichever instance is running
1927 and supports reads and writes.
1928
1929 We can review the available Postgres URIs with the pg_autoctl show uri
1930 command:
1931
1932 $ docker-compose exec monitor pg_autoctl show uri
1933 Type | Name | Connection String
1934 -------------+---------+-------------------------------
1935 monitor | monitor | postgres://autoctl_node@552dd89d5d63:5432/pg_auto_failover?sslmode=require
1936 formation | default | postgres://66a31034f2e4:5432,cd52db444544:5432/citus?target_session_attrs=read-write&sslmode=require
1937
1938 To check that Citus worker nodes have been registered to the coordina‐
1939 tor, we can run a psql session right from the coordinator container:
1940
1941 $ docker-compose exec coord psql -d citus -c 'select * from citus_get_active_worker_nodes();'
1942 node_name | node_port
1943 --------------+-----------
1944 dae7c062e2c1 | 5432
1945 5bf86f9ef784 | 5432
1946 c23610380024 | 5432
1947 (3 rows)
1948
1949 We are now reaching the limits of using a simplified docker-compose
1950 setup. When using the --scale option, it is not possible to give a
1951 specific hostname to each running node, and then we get a randomly gen‐
1952 erated string instead or useful node names such as worker1a or
1953 worker3b.
1954
1955 Create a Citus Cluster, take two
1956 In order to implement the following architecture, we need to introduce
1957 a more complex docker-compose file than in the previous section.
1958 [image: pg_auto_failover architecture with a Citus formation] [image]
1959 pg_auto_failover architecture with a Citus formation.UNINDENT
1960
1961 This time we create a cluster using the following docker-compose def‐
1962 inition:
1963
1964 version: "3.9" # optional since v1.27.0
1965
1966 x-coord: &coordinator
1967 image: pg_auto_failover:citus
1968 environment:
1969 PGDATA: /tmp/pgaf
1970 PGUSER: citus
1971 PGDATABASE: citus
1972 PG_AUTOCTL_HBA_LAN: true
1973 PG_AUTOCTL_AUTH_METHOD: "trust"
1974 PG_AUTOCTL_SSL_SELF_SIGNED: true
1975 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
1976 expose:
1977 - 5432
1978
1979 x-worker: &worker
1980 image: pg_auto_failover:citus
1981 environment:
1982 PGDATA: /tmp/pgaf
1983 PGUSER: citus
1984 PGDATABASE: citus
1985 PG_AUTOCTL_HBA_LAN: true
1986 PG_AUTOCTL_AUTH_METHOD: "trust"
1987 PG_AUTOCTL_SSL_SELF_SIGNED: true
1988 PG_AUTOCTL_MONITOR: "postgresql://autoctl_node@monitor/pg_auto_failover"
1989 expose:
1990 - 5432
1991
1992 services:
1993 app:
1994 build:
1995 context: .
1996 dockerfile: Dockerfile.app
1997 environment:
1998 PGUSER: citus
1999 PGDATABASE: citus
2000 PGHOST: coord0a,coord0b
2001 PGPORT: 5432
2002 PGAPPNAME: demo
2003 PGSSLMODE: require
2004 PGTARGETSESSIONATTRS: read-write
2005
2006 monitor:
2007 image: pg_auto_failover:citus
2008 environment:
2009 PGDATA: /tmp/pgaf
2010 PG_AUTOCTL_SSL_SELF_SIGNED: true
2011 expose:
2012 - 5432
2013 command: |
2014 pg_autoctl create monitor --auth trust --run
2015
2016 coord0a:
2017 <<: *coordinator
2018 hostname: coord0a
2019 command: |
2020 pg_autoctl create coordinator --name coord0a --run
2021
2022 coord0b:
2023 <<: *coordinator
2024 hostname: coord0b
2025 command: |
2026 pg_autoctl create coordinator --name coord0b --run
2027
2028 worker1a:
2029 <<: *worker
2030 hostname: worker1a
2031 command: |
2032 pg_autoctl create worker --group 1 --name worker1a --run
2033
2034 worker1b:
2035 <<: *worker
2036 hostname: worker1b
2037 command: |
2038 pg_autoctl create worker --group 1 --name worker1b --run
2039
2040 worker2a:
2041 <<: *worker
2042 hostname: worker2a
2043 command: |
2044 pg_autoctl create worker --group 2 --name worker2a --run
2045
2046 worker2b:
2047 <<: *worker
2048 hostname: worker2b
2049 command: |
2050 pg_autoctl create worker --group 2 --name worker2b --run
2051
2052 worker3a:
2053 <<: *worker
2054 hostname: worker3a
2055 command: |
2056 pg_autoctl create worker --group 3 --name worker3a --run
2057
2058 worker3b:
2059 <<: *worker
2060 hostname: worker3b
2061 command: |
2062 pg_autoctl create worker --group 3 --name worker3b --run
2063
2064
2065 This definition is a little more involved than the previous one. We
2066 take benefit from YAML anchors and aliases to define a template for our
2067 coordinator nodes and worker nodes, and then apply that template to the
2068 actual nodes.
2069
2070 Also this time we provision an application service (named "app") that
2071 sits in the backgound and allow us to later connect to our current pri‐
2072 mary coordinator. See Dockerfile.app for the complete definition of
2073 this service.
2074
2075 We start this cluster with a simplified command line this time:
2076
2077 $ docker-compose up
2078
2079 And this time we get the following cluster as a result:
2080
2081 $ docker-compose exec monitor pg_autoctl show state
2082 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
2083 ---------+-------+---------------+----------------+--------------+---------------------+--------------------
2084 coord0a | 0/3 | coord0a:5432 | 1: 0/312B040 | read-write | primary | primary
2085 coord0b | 0/4 | coord0b:5432 | 1: 0/312B040 | read-only | secondary | secondary
2086 worker1a | 1/1 | worker1a:5432 | 1: 0/311C550 | read-write | primary | primary
2087 worker1b | 1/2 | worker1b:5432 | 1: 0/311C550 | read-only | secondary | secondary
2088 worker2b | 2/7 | worker2b:5432 | 2: 0/5032698 | read-write | primary | primary
2089 worker2a | 2/8 | worker2a:5432 | 2: 0/5032698 | read-only | secondary | secondary
2090 worker3a | 3/5 | worker3a:5432 | 1: 0/311C870 | read-write | primary | primary
2091 worker3b | 3/6 | worker3b:5432 | 1: 0/311C870 | read-only | secondary | secondary
2092
2093 And then we have the following application connection string to use:
2094
2095 $ docker-compose exec monitor pg_autoctl show uri
2096 Type | Name | Connection String
2097 -------------+---------+-------------------------------
2098 monitor | monitor | postgres://autoctl_node@f0135b83edcd:5432/pg_auto_failover?sslmode=require
2099 formation | default | postgres://coord0b:5432,coord0a:5432/citus?target_session_attrs=read-write&sslmode=require
2100
2101 And finally, the nodes being registered as Citus worker nodes also make
2102 more sense:
2103
2104 $ docker-compose exec coord0a psql -d citus -c 'select * from citus_get_active_worker_nodes()'
2105 node_name | node_port
2106 -----------+-----------
2107 worker1a | 5432
2108 worker3a | 5432
2109 worker2b | 5432
2110 (3 rows)
2111
2112 IMPORTANT:
2113 At this point, it is important to note that the Citus coordinator
2114 only knows about the primary nodes in each group. The High Avail‐
2115 ability mechanisms are all implemented in pg_auto_failover, which
2116 mostly uses the Citus API citus_update_node during worker node
2117 failovers.
2118
2119 Our first Citus worker failover
2120 We see that in the citus_get_active_worker_nodes() output we have
2121 worker1a, worker2b, and worker3a. As mentionned before, that should
2122 have no impact on the operations of the Citus cluster when nodes are
2123 all dimensionned the same.
2124
2125 That said, some readers among you will prefer to have the A nodes as
2126 primaries to get started with. So let's implement our first worker
2127 failover then. With pg_auto_failover, this is as easy as doing:
2128
2129 $ docker-compose exec monitor pg_autoctl perform failover --group 2
2130 15:40:03 9246 INFO Waiting 60 secs for a notification with state "primary" in formation "default" and group 2
2131 15:40:03 9246 INFO Listening monitor notifications about state changes in formation "default" and group 2
2132 15:40:03 9246 INFO Following table displays times when notifications are received
2133 Time | Name | Node | Host:Port | Current State | Assigned State
2134 ---------+----------+-------+---------------+---------------------+--------------------
2135 22:58:42 | worker2b | 2/7 | worker2b:5432 | primary | draining
2136 22:58:42 | worker2a | 2/8 | worker2a:5432 | secondary | prepare_promotion
2137 22:58:42 | worker2a | 2/8 | worker2a:5432 | prepare_promotion | prepare_promotion
2138 22:58:42 | worker2a | 2/8 | worker2a:5432 | prepare_promotion | wait_primary
2139 22:58:42 | worker2b | 2/7 | worker2b:5432 | primary | demoted
2140 22:58:42 | worker2b | 2/7 | worker2b:5432 | draining | demoted
2141 22:58:42 | worker2b | 2/7 | worker2b:5432 | demoted | demoted
2142 22:58:43 | worker2a | 2/8 | worker2a:5432 | wait_primary | wait_primary
2143 22:58:44 | worker2b | 2/7 | worker2b:5432 | demoted | catchingup
2144 22:58:46 | worker2b | 2/7 | worker2b:5432 | catchingup | catchingup
2145 22:58:46 | worker2b | 2/7 | worker2b:5432 | catchingup | secondary
2146 22:58:46 | worker2b | 2/7 | worker2b:5432 | secondary | secondary
2147 22:58:46 | worker2a | 2/8 | worker2a:5432 | wait_primary | primary
2148 22:58:46 | worker2a | 2/8 | worker2a:5432 | primary | primary
2149
2150 So it took around 5 seconds to do a full worker failover in worker
2151 group 2. Now we'll do the same on the group 1 to fix the other situa‐
2152 tion, and review the resulting cluster state.
2153
2154 $ docker-compose exec monitor pg_autoctl show state
2155 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
2156 ---------+-------+---------------+----------------+--------------+---------------------+--------------------
2157 coord0a | 0/3 | coord0a:5432 | 1: 0/312ADA8 | read-write | primary | primary
2158 coord0b | 0/4 | coord0b:5432 | 1: 0/312ADA8 | read-only | secondary | secondary
2159 worker1a | 1/1 | worker1a:5432 | 1: 0/311B610 | read-write | primary | primary
2160 worker1b | 1/2 | worker1b:5432 | 1: 0/311B610 | read-only | secondary | secondary
2161 worker2b | 2/7 | worker2b:5432 | 2: 0/50000D8 | read-only | secondary | secondary
2162 worker2a | 2/8 | worker2a:5432 | 2: 0/50000D8 | read-write | primary | primary
2163 worker3a | 3/5 | worker3a:5432 | 1: 0/311B648 | read-write | primary | primary
2164 worker3b | 3/6 | worker3b:5432 | 1: 0/311B648 | read-only | secondary | secondary
2165
2166 Which seen from the Citus coordinator, looks like the following:
2167
2168 $ docker-compose exec coord0a psql -d citus -c 'select * from citus_get_active_worker_nodes()'
2169 node_name | node_port
2170 -----------+-----------
2171 worker1a | 5432
2172 worker3a | 5432
2173 worker2a | 5432
2174 (3 rows)
2175
2176 Distribute Data to Workers
2177 Let's create a database schema with a single distributed table.
2178
2179 $ docker-compose exec app psql
2180
2181 -- in psql
2182
2183 CREATE TABLE companies
2184 (
2185 id bigserial PRIMARY KEY,
2186 name text NOT NULL,
2187 image_url text,
2188 created_at timestamp without time zone NOT NULL,
2189 updated_at timestamp without time zone NOT NULL
2190 );
2191
2192 SELECT create_distributed_table('companies', 'id');
2193
2194 Next download and ingest some sample data, still from within our psql
2195 session:
2196
2197 \copy companies from program 'curl -o- https://examples.citusdata.com/mt_ref_arch/companies.csv' with csv
2198 # ( COPY 75 )
2199
2200 Handle Worker Failure
2201 Now we'll intentionally crash a worker's primary node and observe how
2202 the pg_auto_failover monitor unregisters that node in the coordinator
2203 and registers the secondary instead.
2204
2205 # the pg_auto_failover keeper process will be unable to resurrect
2206 # the worker node if pg_control has been removed
2207 $ docker-compose exec worker1a rm /tmp/pgaf/global/pg_control
2208
2209 # shut it down
2210 $ docker-compose exec worker1a /usr/lib/postgresql/14/bin/pg_ctl stop -D /tmp/pgaf
2211
2212 The keeper will attempt to start worker 1a three times and then report
2213 the failure to the monitor, who promotes worker1b to replace worker1a.
2214 Citus worker worker1a is unregistered with the coordinator node, and
2215 worker1b is registered in its stead.
2216
2217 Asking the coordinator for active worker nodes now shows worker1b,
2218 worker2a, and worker3a:
2219
2220 $ docker-compose exec app psql -c 'select * from master_get_active_worker_nodes();'
2221
2222 node_name | node_port
2223 -----------+-----------
2224 worker3a | 5432
2225 worker2a | 5432
2226 worker1b | 5432
2227 (3 rows)
2228
2229 Finally, verify that all rows of data are still present:
2230
2231 $ docker-compose exec app psql -c 'select count(*) from companies;'
2232 count
2233 -------
2234 75
2235
2236 Meanwhile, the keeper on worker 1a heals the node. It runs pg_base‐
2237 backup to fetch the current PGDATA from worker1a. This restores, among
2238 other things, a new copy of the file we removed. After streaming repli‐
2239 cation completes, worker1b becomes a full-fledged primary and worker1a
2240 its secondary.
2241
2242 $ docker-compose exec monitor pg_autoctl show state
2243 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
2244 ---------+-------+---------------+----------------+--------------+---------------------+--------------------
2245 coord0a | 0/3 | coord0a:5432 | 1: 0/3178B20 | read-write | primary | primary
2246 coord0b | 0/4 | coord0b:5432 | 1: 0/3178B20 | read-only | secondary | secondary
2247 worker1a | 1/1 | worker1a:5432 | 2: 0/504C400 | read-only | secondary | secondary
2248 worker1b | 1/2 | worker1b:5432 | 2: 0/504C400 | read-write | primary | primary
2249 worker2b | 2/7 | worker2b:5432 | 2: 0/50FF048 | read-only | secondary | secondary
2250 worker2a | 2/8 | worker2a:5432 | 2: 0/50FF048 | read-write | primary | primary
2251 worker3a | 3/5 | worker3a:5432 | 1: 0/31CD8C0 | read-write | primary | primary
2252 worker3b | 3/6 | worker3b:5432 | 1: 0/31CD8C0 | read-only | secondary | secondary
2253
2254 Handle Coordinator Failure
2255 Because our application connection string includes both coordinator
2256 hosts with the option target_session_attrs=read-write, the database
2257 client will connect to whichever of these servers supports both reads
2258 and writes.
2259
2260 However if we use the same trick with the pg_control file to crash our
2261 primary coordinator, we can watch how the monitor promotes the sec‐
2262 ondary.
2263
2264 $ docker-compose exec coord0a rm /tmp/pgaf/global/pg_control
2265 $ docker-compose exec coord0a /usr/lib/postgresql/14/bin/pg_ctl stop -D /tmp/pgaf
2266
2267 After some time, coordinator A's keeper heals it, and the cluster con‐
2268 verges in this state:
2269
2270 $ docker-compose exec monitor pg_autoctl show state
2271 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
2272 ---------+-------+---------------+----------------+--------------+---------------------+--------------------
2273 coord0a | 0/3 | coord0a:5432 | 2: 0/50000D8 | read-only | secondary | secondary
2274 coord0b | 0/4 | coord0b:5432 | 2: 0/50000D8 | read-write | primary | primary
2275 worker1a | 1/1 | worker1a:5432 | 2: 0/504C520 | read-only | secondary | secondary
2276 worker1b | 1/2 | worker1b:5432 | 2: 0/504C520 | read-write | primary | primary
2277 worker2b | 2/7 | worker2b:5432 | 2: 0/50FF130 | read-only | secondary | secondary
2278 worker2a | 2/8 | worker2a:5432 | 2: 0/50FF130 | read-write | primary | primary
2279 worker3a | 3/5 | worker3a:5432 | 1: 0/31CD8C0 | read-write | primary | primary
2280 worker3b | 3/6 | worker3b:5432 | 1: 0/31CD8C0 | read-only | secondary | secondary
2281
2282 We can check that the data is still available through the new coordina‐
2283 tor node too:
2284
2285 $ docker-compose exec app psql -c 'select count(*) from companies;'
2286 count
2287 -------
2288 75
2289
2290 Next steps
2291 As mentioned in the first section of this tutorial, the way we use
2292 docker-compose here is not meant to be production ready. It's useful to
2293 understand and play with a distributed system such as Citus though, and
2294 makes it simple to introduce faults and see how the pg_auto_failover
2295 High Availability reacts to those faults.
2296
2297 One obvious missing element to better test the system is the lack of
2298 persistent volumes in our docker-compose based test rig. It is possible
2299 to create external volumes and use them for each node in the
2300 docker-compose definition. This allows restarting nodes over the same
2301 data set.
2302
2303 See the command pg_autoctl do tmux compose session for more details
2304 about how to run a docker-compose test environment with docker-compose,
2305 including external volumes for each node.
2306
2307 Now is a good time to go read Citus Documentation too, so that you know
2308 how to use this cluster you just created!
2309
2311 Introduction
2312 pg_auto_failover uses a state machine for highly controlled execution.
2313 As keepers inform the monitor about new events (or fail to contact it
2314 at all), the monitor assigns each node both a current state and a goal
2315 state. A node's current state is a strong guarantee of its capabili‐
2316 ties. States themselves do not cause any actions; actions happen during
2317 state transitions. The assigned goal states inform keepers of what
2318 transitions to attempt.
2319
2320 Example of state transitions in a new cluster
2321 A good way to get acquainted with the states is by examining the tran‐
2322 sitions of a cluster from birth to high availability.
2323
2324 After starting a monitor and running keeper init for the first data
2325 node ("node A"), the monitor registers the state of that node as "init"
2326 with a goal state of "single." The init state means the monitor knows
2327 nothing about the node other than its existence because the keeper is
2328 not yet continuously running there to report node health.
2329
2330 Once the keeper runs and reports its health to the monitor, the monitor
2331 assigns it the state "single," meaning it is just an ordinary Postgres
2332 server with no failover. Because there are not yet other nodes in the
2333 cluster, the monitor also assigns node A the goal state of single --
2334 there's nothing that node A's keeper needs to change.
2335
2336 As soon as a new node ("node B") is initialized, the monitor assigns
2337 node A the goal state of "wait_primary." This means the node still has
2338 no failover, but there's hope for a secondary to synchronize with it
2339 soon. To accomplish the transition from single to wait_primary, node
2340 A's keeper adds node B's hostname to pg_hba.conf to allow a hot standby
2341 replication connection.
2342
2343 At the same time, node B transitions into wait_standby with the goal
2344 initially of staying in wait_standby. It can do nothing but wait until
2345 node A gives it access to connect. Once node A has transitioned to
2346 wait_primary, the monitor assigns B the goal of "catchingup," which
2347 gives B's keeper the green light to make the transition from
2348 wait_standby to catchingup. This transition involves running pg_base‐
2349 backup, editing recovery.conf and restarting PostgreSQL in Hot Standby
2350 node.
2351
2352 Node B reports to the monitor when it's in hot standby mode and able to
2353 connect to node A. The monitor then assigns node B the goal state of
2354 "secondary" and A the goal of "primary." Postgres ships WAL logs from
2355 node A and replays them on B. Finally B is caught up and tells the mon‐
2356 itor (specifically B reports its pg_stat_replication.sync_state and WAL
2357 replay lag). At this glorious moment the monitor assigns A the state
2358 primary (goal: primary) and B secondary (goal: secondary).
2359
2360 State reference
2361 The following diagram shows the pg_auto_failover State Machine. It's
2362 missing links to the single state, which can always been reached when
2363 removing all the other nodes.
2364 [image: pg_auto_failover Finite State Machine diagram] [image]
2365 pg_auto_failover Finite State Machine diagram.UNINDENT
2366
2367 In the previous diagram we can see that we have a list of six states
2368 where the application can connect to a read-write Postgres service:
2369 single, wait_primary, primary, prepare_maintenance, and apply_set‐
2370 tings.
2371
2372 Init
2373 A node is assigned the "init" state when it is first registered with
2374 the monitor. Nothing is known about the node at this point beyond its
2375 existence. If no other node has been registered with the monitor for
2376 the same formation and group ID then this node is assigned a goal state
2377 of "single." Otherwise the node has the goal state of "wait_standby."
2378
2379 Single
2380 There is only one node in the group. It behaves as a regular PostgreSQL
2381 instance, with no high availability and no failover. If the administra‐
2382 tor removes a node the other node will revert to the single state.
2383
2384 Wait_primary
2385 Applied to a node intended to be the primary but not yet in that posi‐
2386 tion. The primary-to-be at this point knows the secondary's node name
2387 or IP address, and has granted the node hot standby access in the
2388 pg_hba.conf file.
2389
2390 The wait_primary state may be caused either by a new potential sec‐
2391 ondary being registered with the monitor (good), or an existing sec‐
2392 ondary becoming unhealthy (bad). In the latter case, during the transi‐
2393 tion from primary to wait_primary, the primary node's keeper disables
2394 synchronous replication on the node. It also cancels currently blocked
2395 queries.
2396
2397 Join_primary
2398 Applied to a primary node when another standby is joining the group.
2399 This allows the primary node to apply necessary changes to its HBA
2400 setup before allowing the new node joining the system to run the
2401 pg_basebackup command.
2402
2403 IMPORTANT:
2404 This state has been deprecated, and is no longer assigned to nodes.
2405 Any time we would have used join_primary before, we now use primary
2406 instead.
2407
2408 Primary
2409 A healthy secondary node exists and has caught up with WAL replication.
2410 Specifically, the keeper reports the primary state only when it has
2411 verified that the secondary is reported "sync" in pg_stat_replica‐
2412 tion.sync_state, and with a WAL lag of 0.
2413
2414 The primary state is a strong assurance. It's the only state where we
2415 know we can fail over when required.
2416
2417 During the transition from wait_primary to primary, the keeper also en‐
2418 ables synchronous replication. This means that after a failover the
2419 secondary will be fully up to date.
2420
2421 Wait_standby
2422 Monitor decides this node is a standby. Node must wait until the pri‐
2423 mary has authorized it to connect and setup hot standby replication.
2424
2425 Catchingup
2426 The monitor assigns catchingup to the standby node when the primary is
2427 ready for a replication connection (pg_hba.conf has been properly
2428 edited, connection role added, etc).
2429
2430 The standby node keeper runs pg_basebackup, connecting to the primary's
2431 hostname and port. The keeper then edits recovery.conf and starts Post‐
2432 greSQL in hot standby node.
2433
2434 Secondary
2435 A node with this state is acting as a hot standby for the primary, and
2436 is up to date with the WAL log there. In particular, it is within 16MB
2437 or 1 WAL segment of the primary.
2438
2439 Maintenance
2440 The cluster administrator can manually move a secondary into the main‐
2441 tenance state to gracefully take it offline. The primary will then
2442 transition from state primary to wait_primary, during which time the
2443 secondary will be online to accept writes. When the old primary reaches
2444 the wait_primary state then the secondary is safe to take offline with
2445 minimal consequences.
2446
2447 Prepare_maintenance
2448 The cluster administrator can manually move a primary node into the
2449 maintenance state to gracefully take it offline. The primary then tran‐
2450 sitions to the prepare_maintenance state to make sure the secondary is
2451 not missing any writes. In the prepare_maintenance state, the primary
2452 shuts down.
2453
2454 Wait_maintenance
2455 The custer administrator can manually move a secondary into the mainte‐
2456 nance state to gracefully take it offline. Before reaching the mainte‐
2457 nance state though, we want to switch the primary node to asynchronous
2458 replication, in order to avoid writes being blocked. In the state
2459 wait_maintenance the standby waits until the primary has reached
2460 wait_primary.
2461
2462 Draining
2463 A state between primary and demoted where replication buffers finish
2464 flushing. A draining node will not accept new client writes, but will
2465 continue to send existing data to the secondary.
2466
2467 To implement that with Postgres we actually stop the service. When
2468 stopping, Postgres ensures that the current replication buffers are
2469 flushed correctly to synchronous standbys.
2470
2471 Demoted
2472 The primary keeper or its database were unresponsive past a certain
2473 threshold. The monitor assigns demoted state to the primary to avoid a
2474 split-brain scenario where there might be two nodes that don't communi‐
2475 cate with each other and both accept client writes.
2476
2477 In that state the keeper stops PostgreSQL and prevents it from running.
2478
2479 Demote_timeout
2480 If the monitor assigns the primary a demoted goal state but the primary
2481 keeper doesn't acknowledge transitioning to that state within a timeout
2482 window, then the monitor assigns demote_timeout to the primary.
2483
2484 Most commonly may happen when the primary machine goes silent. The
2485 keeper is not reporting to the monitor.
2486
2487 Stop_replication
2488 The stop_replication state is meant to ensure that the primary goes to
2489 the demoted state before the standby goes to single and accepts writes
2490 (in case the primary can’t contact the monitor anymore). Before promot‐
2491 ing the secondary node, the keeper stops PostgreSQL on the primary to
2492 avoid split-brain situations.
2493
2494 For safety, when the primary fails to contact the monitor and fails to
2495 see the pg_auto_failover connection in pg_stat_replication, then it
2496 goes to the demoted state of its own accord.
2497
2498 Prepare_promotion
2499 The prepare_promotion state is meant to prepare the standby server to
2500 being promoted. This state allows synchronisation on the monitor, mak‐
2501 ing sure that the primary has stopped Postgres before promoting the
2502 secondary, hence preventing split brain situations.
2503
2504 Report_LSN
2505 The report_lsn state is assigned to standby nodes when a failover is
2506 orchestrated and there are several standby nodes. In order to pick the
2507 furthest standby in the replication, pg_auto_failover first needs a
2508 fresh report of the current LSN position reached on each standby node.
2509
2510 When a node reaches the report_lsn state, the replication stream is
2511 stopped, by restarting Postgres without a primary_conninfo. This allows
2512 the primary node to detect Network Partitions, i.e. when the primary
2513 can't connect to the monitor and there's no standby listed in
2514 pg_stat_replication.
2515
2516 Fast_forward
2517 The fast_forward state is assigned to the selected promotion candidate
2518 during a failover when it won the election thanks to the candidate pri‐
2519 ority settings, but the selected node is not the most advanced standby
2520 node as reported in the report_lsn state.
2521
2522 Missing WAL bytes are fetched from one of the most advanced standby
2523 nodes by using Postgres cascading replication features: it is possible
2524 to use any standby node in the primary_conninfo.
2525
2526 Dropped
2527 The dropped state is assigned to a node when the pg_autoctl drop node
2528 command is used. This allows the node to implement specific local ac‐
2529 tions before being entirely removed from the monitor database.
2530
2531 When a node reports reaching the dropped state, the monitor removes its
2532 entry. If a node is not reporting anymore, maybe because it's com‐
2533 pletely unavailable, then it's possible to run the pg_autoctl drop node
2534 --force command, and then the node entry is removed from the monitor.
2535
2536 pg_auto_failover keeper's State Machine
2537 When built in TEST mode, it is then possible to use the following com‐
2538 mand to get a visual representation of the Keeper's Finite State Ma‐
2539 chine:
2540
2541 $ PG_AUTOCTL_DEBUG=1 pg_autoctl do fsm gv | dot -Tsvg > fsm.svg
2542
2543 The dot program is part of the Graphviz suite and produces the follow‐
2544 ing output:
2545 [image: Keeper state machine] [image] Keeper State Machine.UNINDENT
2546
2548 At the heart of the pg_auto_failover implementation is a State Machine.
2549 The state machine is driven by the monitor, and its transitions are im‐
2550 plemented in the keeper service, which then reports success to the mon‐
2551 itor.
2552
2553 The keeper is allowed to retry transitions as many times as needed un‐
2554 til they succeed, and reports also failures to reach the assigned state
2555 to the monitor node. The monitor also implements frequent health-checks
2556 targeting the registered PostgreSQL nodes.
2557
2558 When the monitor detects something is not as expected, it takes action
2559 by assigning a new goal state to the keeper, that is responsible for
2560 implementing the transition to this new state, and then reporting.
2561
2562 Unhealthy Nodes
2563 The pg_auto_failover monitor is responsible for running regular
2564 health-checks with every PostgreSQL node it manages. A health-check is
2565 successful when it is able to connect to the PostgreSQL node using the
2566 PostgreSQL protocol (libpq), imitating the pg_isready command.
2567
2568 How frequent those health checks are (5s by default), the PostgreSQL
2569 connection timeout in use (5s by default), and how many times to retry
2570 in case of a failure before marking the node unhealthy (2 by default)
2571 are GUC variables that you can set on the Monitor node itself. Remem‐
2572 ber, the monitor is implemented as a PostgreSQL extension, so the setup
2573 is a set of PostgreSQL configuration settings:
2574
2575 SELECT name, setting
2576 FROM pg_settings
2577 WHERE name ~ 'pgautofailover\.health';
2578 name | setting
2579 -----------------------------------------+---------
2580 pgautofailover.health_check_max_retries | 2
2581 pgautofailover.health_check_period | 5000
2582 pgautofailover.health_check_retry_delay | 2000
2583 pgautofailover.health_check_timeout | 5000
2584 (4 rows)
2585
2586 The pg_auto_failover keeper also reports if PostgreSQL is running as
2587 expected. This is useful for situations where the PostgreSQL server /
2588 OS is running fine and the keeper (pg_autoctl run) is still active, but
2589 PostgreSQL has failed. Situations might include File System is Full on
2590 the WAL disk, some file system level corruption, missing files, etc.
2591
2592 Here's what happens to your PostgreSQL service in case of any sin‐
2593 gle-node failure is observed:
2594
2595 • Primary node is monitored unhealthy
2596
2597 When the primary node is unhealthy, and only when the secondary
2598 node is itself in good health, then the primary node is asked to
2599 transition to the DRAINING state, and the attached secondary is
2600 asked to transition to the state PREPARE_PROMOTION. In this state,
2601 the secondary is asked to catch-up with the WAL traffic from the
2602 primary, and then report success.
2603
2604 The monitor then continues orchestrating the promotion of the
2605 standby: it stops the primary (implementing STONITH in order to
2606 prevent any data loss), and promotes the secondary into being a
2607 primary now.
2608
2609 Depending on the exact situation that triggered the primary un‐
2610 healthy, it's possible that the secondary fails to catch-up with
2611 WAL from it, in that case after the PREPARE_PROMO‐
2612 TION_CATCHUP_TIMEOUT the standby reports success anyway, and the
2613 failover sequence continues from the monitor.
2614
2615 • Secondary node is monitored unhealthy
2616
2617 When the secondary node is unhealthy, the monitor assigns to it
2618 the state CATCHINGUP, and assigns the state WAIT_PRIMARY to the
2619 primary node. When implementing the transition from PRIMARY to
2620 WAIT_PRIMARY, the keeper disables synchronous replication.
2621
2622 When the keeper reports an acceptable WAL difference in the two
2623 nodes again, then the replication is upgraded back to being syn‐
2624 chronous. While a secondary node is not in the SECONDARY state,
2625 secondary promotion is disabled.
2626
2627 • Monitor node has failed
2628
2629 Then the primary and secondary node just work as if you didn't
2630 have setup pg_auto_failover in the first place, as the keeper
2631 fails to report local state from the nodes. Also, health checks
2632 are not performed. It means that no automated failover may happen,
2633 even if needed.
2634
2635 Network Partitions
2636 Adding to those simple situations, pg_auto_failover is also resilient
2637 to Network Partitions. Here's the list of situation that have an impact
2638 to pg_auto_failover behavior, and the actions taken to ensure High
2639 Availability of your PostgreSQL service:
2640
2641 • Primary can't connect to Monitor
2642
2643 Then it could be that either the primary is alone on its side of a
2644 network split, or that the monitor has failed. The keeper decides
2645 depending on whether the secondary node is still connected to the
2646 replication slot, and if we have a secondary, continues to serve
2647 PostgreSQL queries.
2648
2649 Otherwise, when the secondary isn't connected, and after the NET‐
2650 WORK_PARTITION_TIMEOUT has elapsed, the primary considers it might
2651 be alone in a network partition: that's a potential split brain
2652 situation and with only one way to prevent it. The primary stops,
2653 and reports a new state of DEMOTE_TIMEOUT.
2654
2655 The network_partition_timeout can be setup in the keeper's config‐
2656 uration and defaults to 20s.
2657
2658 • Monitor can't connect to Primary
2659
2660 Once all the retries have been done and the timeouts are elapsed,
2661 then the primary node is considered unhealthy, and the monitor be‐
2662 gins the failover routine. This routine has several steps, each of
2663 them allows to control our expectations and step back if needed.
2664
2665 For the failover to happen, the secondary node needs to be healthy
2666 and caught-up with the primary. Only if we timeout while waiting
2667 for the WAL delta to resorb (30s by default) then the secondary
2668 can be promoted with uncertainty about the data durability in the
2669 group.
2670
2671 • Monitor can't connect to Secondary
2672
2673 As soon as the secondary is considered unhealthy then the monitor
2674 changes the replication setting to asynchronous on the primary, by
2675 assigning it the WAIT_PRIMARY state. Also the secondary is as‐
2676 signed the state CATCHINGUP, which means it can't be promoted in
2677 case of primary failure.
2678
2679 As the monitor tracks the WAL delta between the two servers, and
2680 they both report it independently, the standby is eligible to pro‐
2681 motion again as soon as it's caught-up with the primary again, and
2682 at this time it is assigned the SECONDARY state, and the replica‐
2683 tion will be switched back to synchronous.
2684
2685 Failure handling and network partition detection
2686 If a node cannot communicate to the monitor, either because the monitor
2687 is down or because there is a problem with the network, it will simply
2688 remain in the same state until the monitor comes back.
2689
2690 If there is a network partition, it might be that the monitor and sec‐
2691 ondary can still communicate and the monitor decides to promote the
2692 secondary since the primary is no longer responsive. Meanwhile, the
2693 primary is still up-and-running on the other side of the network parti‐
2694 tion. If a primary cannot communicate to the monitor it starts checking
2695 whether the secondary is still connected. In PostgreSQL, the secondary
2696 connection automatically times out after 30 seconds. If last contact
2697 with the monitor and the last time a connection from the secondary was
2698 observed are both more than 30 seconds in the past, the primary con‐
2699 cludes it is on the losing side of a network partition and shuts itself
2700 down. It may be that the secondary and the monitor were actually down
2701 and the primary was the only node that was alive, but we currently do
2702 not have a way to distinguish such a situation. As with consensus al‐
2703 gorithms, availability can only be correctly preserved if at least 2
2704 out of 3 nodes are up.
2705
2706 In asymmetric network partitions, the primary might still be able to
2707 talk to the secondary, while unable to talk to the monitor. During
2708 failover, the monitor therefore assigns the secondary the stop_replica‐
2709 tion state, which will cause it to disconnect from the primary. After
2710 that, the primary is expected to shut down after at least 30 and at
2711 most 60 seconds. To factor in worst-case scenarios, the monitor waits
2712 for 90 seconds before promoting the secondary to become the new pri‐
2713 mary.
2714
2716 We provide native system packages for pg_auto_failover on most popular
2717 Linux distributions.
2718
2719 Use the steps below to install pg_auto_failover on PostgreSQL 11. At
2720 the current time pg_auto_failover is compatible with both PostgreSQL 10
2721 and PostgreSQL 11.
2722
2723 Ubuntu or Debian
2724 Postgres apt repository
2725 Binary packages for debian and derivatives (ubuntu) are available from
2726 apt.postgresql.org repository, install by following the linked documen‐
2727 tation and then:
2728
2729 $ sudo apt-get install pg-auto-failover-cli
2730 $ sudo apt-get install postgresql-14-auto-failover
2731
2732 The Postgres extension named "pgautofailover" is only necessary on the
2733 monitor node. To install that extension, you can install the post‐
2734 gresql-14-auto-failover package when using Postgres 14. It's available
2735 for other Postgres versions too.
2736
2737 Avoiding the default Postgres service
2738 When installing the debian Postgres package, the installation script
2739 will initialize a Postgres data directory automatically, and register
2740 it to the systemd services. When using pg_auto_failover, it is best to
2741 avoid that step.
2742
2743 To avoid automated creation of a Postgres data directory when in‐
2744 stalling the debian package, follow those steps:
2745
2746 $ curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
2747 $ echo "deb http://apt.postgresql.org/pub/repos/apt buster-pgdg main" > /etc/apt/sources.list.d/pgdg.list
2748
2749 # bypass initdb of a "main" cluster
2750 $ echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf
2751 $ apt-get update
2752 $ apt-get install -y --no-install-recommends postgresql-14
2753
2754 That way when it's time to pg_autoctl create monitor or pg_autoctl cre‐
2755 ate postgres there is no confusion about how to handle the default
2756 Postgres service created by debian: it has not been created at all.
2757
2758 Fedora, CentOS, or Red Hat
2759 Quick install
2760 The following installation method downloads a bash script that auto‐
2761 mates several steps. The full script is available for review at our
2762 package cloud installation instructions page url.
2763
2764 # add the required packages to your system
2765 curl https://install.citusdata.com/community/rpm.sh | sudo bash
2766
2767 # install pg_auto_failover
2768 sudo yum install -y pg-auto-failover14_12
2769
2770 # confirm installation
2771 /usr/pgsql-12/bin/pg_autoctl --version
2772
2773 Manual installation
2774 If you'd prefer to install your repo on your system manually, follow
2775 the instructions from package cloud manual installation page. This page
2776 will guide you with the specific details to achieve the 3 steps:
2777
2778 1. install the pygpgme yum-utils packages for your distribution
2779
2780 2. install a new RPM reposiroty for CitusData packages
2781
2782 3. update your local yum cache
2783
2784 Then when that's done, you can proceed with installing pg_auto_failover
2785 itself as in the previous case:
2786
2787 # install pg_auto_failover
2788 sudo yum install -y pg-auto-failover14_12
2789
2790 # confirm installation
2791 /usr/pgsql-12/bin/pg_autoctl --version
2792
2793 Installing a pgautofailover Systemd unit
2794 The command pg_autoctl show systemd outputs a systemd unit file that
2795 you can use to setup a boot-time registered service for
2796 pg_auto_failover on your machine.
2797
2798 Here's a sample output from the command:
2799
2800 $ export PGDATA=/var/lib/postgresql/monitor
2801 $ pg_autoctl show systemd
2802 13:44:34 INFO HINT: to complete a systemd integration, run the following commands:
2803 13:44:34 INFO pg_autoctl -q show systemd --pgdata "/var/lib/postgresql/monitor" | sudo tee /etc/systemd/system/pgautofailover.service
2804 13:44:34 INFO sudo systemctl daemon-reload
2805 13:44:34 INFO sudo systemctl start pgautofailover
2806 [Unit]
2807 Description = pg_auto_failover
2808
2809 [Service]
2810 WorkingDirectory = /var/lib/postgresql
2811 Environment = 'PGDATA=/var/lib/postgresql/monitor'
2812 User = postgres
2813 ExecStart = /usr/lib/postgresql/10/bin/pg_autoctl run
2814 Restart = always
2815 StartLimitBurst = 0
2816
2817 [Install]
2818 WantedBy = multi-user.target
2819
2820 Copy/pasting the commands given in the hint output from the command
2821 will enable the pgautofailer service on your system, when using sys‐
2822 temd.
2823
2824 It is important that PostgreSQL is started by pg_autoctl rather than by
2825 systemd itself, as it might be that a failover has been done during a
2826 reboot, for instance, and that once the reboot complete we want the lo‐
2827 cal Postgres to re-join as a secondary node where it used to be a pri‐
2828 mary node.
2829
2831 In order to be able to orchestrate fully automated failovers,
2832 pg_auto_failover needs to be able to establish the following Postgres
2833 connections:
2834
2835 • from the monitor node to each Postgres node to check the node's
2836 “health”
2837
2838 • from each Postgres node to the monitor to implement our node_ac‐
2839 tive protocol and fetch the current assigned state for this node
2840
2841 • from the secondary node to the primary node for Postgres streaming
2842 replication.
2843
2844 Postgres Client authentication is controlled by a configuration file:
2845 pg_hba.conf. This file contains a list of rules where each rule may al‐
2846 low or reject a connection attempt.
2847
2848 For pg_auto_failover to work as intended, some HBA rules need to be
2849 added to each node configuration. You can choose to provision the
2850 pg_hba.conf file yourself thanks to pg_autoctl options' --skip-pg-hba,
2851 or you can use the following options to control which kind of rules are
2852 going to be added for you.
2853
2854 Postgres HBA rules
2855 For your application to be able to connect to the current Postgres pri‐
2856 mary servers, some application specific HBA rules have to be added to
2857 pg_hba.conf. There is no provision for doing that in pg_auto_failover.
2858
2859 In other words, it is expected that you have to edit pg_hba.conf to
2860 open connections for your application needs.
2861
2862 The trust security model
2863 As its name suggests the trust security model is not enabling any kind
2864 of security validation. This setting is popular for testing deployments
2865 though, as it makes it very easy to verify that everything works as in‐
2866 tended before putting security restrictions in place.
2867
2868 To enable a “trust” security model with pg_auto_failover, use the
2869 pg_autoctl option --auth trust when creating nodes:
2870
2871 $ pg_autoctl create monitor --auth trust ...
2872 $ pg_autoctl create postgres --auth trust ...
2873 $ pg_autoctl create postgres --auth trust ...
2874
2875 When using --auth trust pg_autoctl adds new HBA rules in the monitor
2876 and the Postgres nodes to enable connections as seen above.
2877
2878 Authentication with passwords
2879 To setup pg_auto_failover with password for connections, you can use
2880 one of the password based authentication methods supported by Postgres,
2881 such as password or scram-sha-256. We recommend the latter, as in the
2882 following example:
2883
2884 $ pg_autoctl create monitor --auth scram-sha-256 ...
2885
2886 The pg_autoctl does not set the password for you. The first step is to
2887 set the database user password in the monitor database thanks to the
2888 following command:
2889
2890 $ psql postgres://monitor.host/pg_auto_failover
2891 > alter user autoctl_node password 'h4ckm3';
2892
2893 Now that the monitor is ready with our password set for the au‐
2894 toctl_node user, we can use the password in the monitor connection
2895 string used when creating Postgres nodes.
2896
2897 On the primary node, we can create the Postgres setup as usual, and
2898 then set our replication password, that we will use if we are demoted
2899 and then re-join as a standby:
2900
2901 $ pg_autoctl create postgres \
2902 --auth scram-sha-256 \
2903 ... \
2904 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover
2905
2906 $ pg_autoctl config set replication.password h4ckm3m0r3
2907
2908 The second Postgres node is going to be initialized as a secondary and
2909 pg_autoctl then calls pg_basebackup at create time. We need to have the
2910 replication password already set at this time, and we can achieve that
2911 the following way:
2912
2913 $ export PGPASSWORD=h4ckm3m0r3
2914 $ pg_autoctl create postgres \
2915 --auth scram-sha-256 \
2916 ... \
2917 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover
2918
2919 $ pg_autoctl config set replication.password h4ckm3m0r3
2920
2921 Note that you can use The Password File mechanism as discussed in the
2922 Postgres documentation in order to maintain your passwords in a sepa‐
2923 rate file, not in your main pg_auto_failover configuration file. This
2924 also avoids using passwords in the environment and in command lines.
2925
2926 Encryption of network communications
2927 Postgres knows how to use SSL to enable network encryption of all com‐
2928 munications, including authentication with passwords and the whole data
2929 set when streaming replication is used.
2930
2931 To enable SSL on the server an SSL certificate is needed. It could be
2932 as simple as a self-signed certificate, and pg_autoctl creates such a
2933 certificate for you when using --ssl-self-signed command line option:
2934
2935 $ pg_autoctl create monitor --ssl-self-signed ... \
2936 --auth scram-sha-256 ... \
2937 --ssl-mode require \
2938 ...
2939
2940 $ pg_autoctl create postgres --ssl-self-signed ... \
2941 --auth scram-sha-256 ... \
2942 ...
2943
2944 $ pg_autoctl create postgres --ssl-self-signed ... \
2945 --auth scram-sha-256 ... \
2946 ...
2947
2948 In that example we setup SSL connections to encrypt the network traf‐
2949 fic, and we still have to setup an authentication mechanism exactly as
2950 in the previous sections of this document. Here scram-sha-256 has been
2951 selected, and the password will be sent over an encrypted channel.
2952
2953 When using the --ssl-self-signed option, pg_autoctl creates a
2954 self-signed certificate, as per the Postgres documentation at the
2955 Creating Certificates page.
2956
2957 The certificate subject CN defaults to the --hostname parameter, which
2958 can be given explicitly or computed by pg_autoctl as either your host‐
2959 name when you have proper DNS resolution, or your current IP address.
2960
2961 Self-signed certificates provide protection against eavesdropping; this
2962 setup does NOT protect against Man-In-The-Middle attacks nor Imperson‐
2963 ation attacks. See PostgreSQL documentation page SSL Support for de‐
2964 tails.
2965
2966 Using your own SSL certificates
2967 In many cases you will want to install certificates provided by your
2968 local security department and signed by a trusted Certificate Author‐
2969 ity. In that case one solution is to use --skip-pg-hba and do the whole
2970 setup yourself.
2971
2972 It is still possible to give the certificates to pg_auto_failover and
2973 have it handle the Postgres setup for you:
2974
2975 $ pg_autoctl create monitor --ssl-ca-file root.crt \
2976 --ssl-crl-file root.crl \
2977 --server-cert server.crt \
2978 --server-key server.key \
2979 --ssl-mode verify-full \
2980 ...
2981
2982 $ pg_autoctl create postgres --ssl-ca-file root.crt \
2983 --server-cert server.crt \
2984 --server-key server.key \
2985 --ssl-mode verify-full \
2986 ...
2987
2988 $ pg_autoctl create postgres --ssl-ca-file root.crt \
2989 --server-cert server.crt \
2990 --server-key server.key \
2991 --ssl-mode verify-full \
2992 ...
2993
2994 The option --ssl-mode can be used to force connection strings used by
2995 pg_autoctl to contain your preferred ssl mode. It defaults to require
2996 when using --ssl-self-signed and to allow when --no-ssl is used. Here,
2997 we set --ssl-mode to verify-full which requires SSL Certificates Au‐
2998 thentication, covered next.
2999
3000 The default --ssl-mode when providing your own certificates (signed by
3001 your trusted CA) is then verify-full. This setup applies to the client
3002 connection where the server identity is going to be checked against the
3003 root certificate provided with --ssl-ca-file and the revocation list
3004 optionally provided with the --ssl-crl-file. Both those files are used
3005 as the respective parameters sslrootcert and sslcrl in pg_autoctl con‐
3006 nection strings to both the monitor and the streaming replication pri‐
3007 mary server.
3008
3009 SSL Certificates Authentication
3010 Given those files, it is then possible to use certificate based authen‐
3011 tication of client connections. For that, it is necessary to prepare
3012 client certificates signed by your root certificate private key and us‐
3013 ing the target user name as its CN, as per Postgres documentation for
3014 Certificate Authentication:
3015 The cn (Common Name) attribute of the certificate will be compared
3016 to the requested database user name, and if they match the login
3017 will be allowed
3018
3019 For enabling the cert authentication method with pg_auto_failover, you
3020 need to prepare a Client Certificate for the user postgres and used by
3021 pg_autoctl when connecting to the monitor, to place in ~/.post‐
3022 gresql/postgresql.crt along with its key ~/.postgresql/postgresql.key,
3023 in the home directory of the user that runs the pg_autoctl service
3024 (which defaults to postgres).
3025
3026 Then you need to create a user name map as documented in Postgres page
3027 User Name Maps so that your certificate can be used to authenticate
3028 pg_autoctl users.
3029
3030 The ident map in pg_ident.conf on the pg_auto_failover monitor should
3031 then have the following entry, to allow postgres to connect as the au‐
3032 toctl_node user for pg_autoctl operations:
3033
3034 # MAPNAME SYSTEM-USERNAME PG-USERNAME
3035
3036 # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
3037 pgautofailover postgres autoctl_node
3038
3039 To enable streaming replication, the pg_ident.conf file on each Post‐
3040 gres node should now allow the postgres user in the client certificate
3041 to connect as the pgautofailover_replicator database user:
3042
3043 # MAPNAME SYSTEM-USERNAME PG-USERNAME
3044
3045 # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
3046 pgautofailover postgres pgautofailover_replicator
3047
3048 Given that user name map, you can then use the cert authentication
3049 method. As with the pg_ident.conf provisioning, it is best to now pro‐
3050 vision the HBA rules yourself, using the --skip-pg-hba option:
3051
3052 $ pg_autoctl create postgres --skip-pg-hba --ssl-ca-file ...
3053
3054 The HBA rule will use the authentication method cert with a map option,
3055 and might then look like the following on the monitor:
3056
3057 # allow certificate based authentication to the monitor
3058 hostssl pg_auto_failover autoctl_node 10.0.0.0/8 cert map=pgautofailover
3059
3060 Then your pg_auto_failover nodes on the 10.0.0.0 network are allowed to
3061 connect to the monitor with the user autoctl_node used by pg_autoctl,
3062 assuming they have a valid and trusted client certificate.
3063
3064 The HBA rule to use on the Postgres nodes to allow for Postgres stream‐
3065 ing replication connections looks like the following:
3066
3067 # allow streaming replication for pg_auto_failover nodes
3068 hostssl replication pgautofailover_replicator 10.0.0.0/8 cert map=pgautofailover
3069
3070 Because the Postgres server runs as the postgres system user, the con‐
3071 nection to the primary node can be made with SSL enabled and will then
3072 use the client certificates installed in the postgres home directory in
3073 ~/.postgresql/postgresql.{key,cert} locations.
3074
3075 Postgres HBA provisioning
3076 While pg_auto_failover knows how to manage the Postgres HBA rules that
3077 are necessary for your stream replication needs and for its monitor
3078 protocol, it will not manage the Postgres HBA rules that are needed for
3079 your applications.
3080
3081 If you have your own HBA provisioning solution, you can include the
3082 rules needed for pg_auto_failover and then use the --skip-pg-hba option
3083 to the pg_autoctl create commands.
3084
3085 Enable SSL connections on an existing setup
3086 Whether you upgrade pg_auto_failover from a previous version that did
3087 not have support for the SSL features, or when you started with
3088 --no-ssl and later change your mind, it is possible with
3089 pg_auto_failover to add SSL settings on system that has already been
3090 setup without explicit SSL support.
3091
3092 In this section we detail how to upgrade to SSL settings.
3093
3094 Installing Self-Signed certificates on-top of an already existing
3095 pg_auto_failover setup is done with one of the following pg_autoctl
3096 command variants, depending if you want self-signed certificates or
3097 fully verified ssl certificates:
3098
3099 $ pg_autoctl enable ssl --ssl-self-signed --ssl-mode required
3100
3101 $ pg_autoctl enable ssl --ssl-ca-file root.crt \
3102 --ssl-crl-file root.crl \
3103 --server-cert server.crt \
3104 --server-key server.key \
3105 --ssl-mode verify-full
3106
3107 The pg_autoctl enable ssl command edits the post‐
3108 gresql-auto-failover.conf Postgres configuration file to match the com‐
3109 mand line arguments given and enable SSL as instructed, and then up‐
3110 dates the pg_autoctl configuration.
3111
3112 The connection string to connect to the monitor is also automatically
3113 updated by the pg_autoctl enable ssl command. You can verify your new
3114 configuration with:
3115
3116 $ pg_autoctl config get pg_autoctl.monitor
3117
3118 Note that an already running pg_autoctl daemon will try to reload its
3119 configuration after pg_autoctl enable ssl has finished. In some cases
3120 this is not possible to do without a restart. So be sure to check the
3121 logs from a running daemon to confirm that the reload succeeded. If it
3122 did not you may need to restart the daemon to ensure the new connection
3123 string is used.
3124
3125 The HBA settings are not edited, irrespective of the --skip-pg-hba that
3126 has been used at creation time. That's because the host records match
3127 either SSL or non-SSL connection attempts in Postgres HBA file, so the
3128 pre-existing setup will continue to work. To enhance the SSL setup, you
3129 can manually edit the HBA files and change the existing lines from host
3130 to hostssl to dissallow unencrypted connections at the server side.
3131
3132 In summary, to upgrade an existing pg_auto_failover setup to enable
3133 SSL:
3134
3135 1. run the pg_autoctl enable ssl command on your monitor and then
3136 all the Postgres nodes,
3137
3138 2. on the Postgres nodes, review your pg_autoctl logs to make sure
3139 that the reload operation has been effective, and review your
3140 Postgres settings to verify that you have the expected result,
3141
3142 3. review your HBA rules setup to change the pg_auto_failover rules
3143 from host to hostssl to disallow insecure connections.
3144
3146 The pg_autoctl tool hosts many commands and sub-commands. Each of them
3147 have their own manual page.
3148
3149 pg_autoctl
3150 pg_autoctl - control a pg_auto_failover node
3151
3152 Synopsis
3153 pg_autoctl provides the following commands:
3154
3155 pg_autoctl
3156 + create Create a pg_auto_failover node, or formation
3157 + drop Drop a pg_auto_failover node, or formation
3158 + config Manages the pg_autoctl configuration
3159 + show Show pg_auto_failover information
3160 + enable Enable a feature on a formation
3161 + disable Disable a feature on a formation
3162 + get Get a pg_auto_failover node, or formation setting
3163 + set Set a pg_auto_failover node, or formation setting
3164 + perform Perform an action orchestrated by the monitor
3165 activate Activate a Citus worker from the Citus coordinator
3166 run Run the pg_autoctl service (monitor or keeper)
3167 stop signal the pg_autoctl service for it to stop
3168 reload signal the pg_autoctl for it to reload its configuration
3169 status Display the current status of the pg_autoctl service
3170 help print help message
3171 version print pg_autoctl version
3172
3173 pg_autoctl create
3174 monitor Initialize a pg_auto_failover monitor node
3175 postgres Initialize a pg_auto_failover standalone postgres node
3176 coordinator Initialize a pg_auto_failover citus coordinator node
3177 worker Initialize a pg_auto_failover citus worker node
3178 formation Create a new formation on the pg_auto_failover monitor
3179
3180 pg_autoctl drop
3181 monitor Drop the pg_auto_failover monitor
3182 node Drop a node from the pg_auto_failover monitor
3183 formation Drop a formation on the pg_auto_failover monitor
3184
3185 pg_autoctl config
3186 check Check pg_autoctl configuration
3187 get Get the value of a given pg_autoctl configuration variable
3188 set Set the value of a given pg_autoctl configuration variable
3189
3190 pg_autoctl show
3191 uri Show the postgres uri to use to connect to pg_auto_failover nodes
3192 events Prints monitor's state of nodes in a given formation and group
3193 state Prints monitor's state of nodes in a given formation and group
3194 settings Print replication settings for a formation from the monitor
3195 standby-names Prints synchronous_standby_names for a given group
3196 file List pg_autoctl internal files (config, state, pid)
3197 systemd Print systemd service file for this node
3198
3199 pg_autoctl enable
3200 secondary Enable secondary nodes on a formation
3201 maintenance Enable Postgres maintenance mode on this node
3202 ssl Enable SSL configuration on this node
3203 monitor Enable a monitor for this node to be orchestrated from
3204
3205 pg_autoctl disable
3206 secondary Disable secondary nodes on a formation
3207 maintenance Disable Postgres maintenance mode on this node
3208 ssl Disable SSL configuration on this node
3209 monitor Disable the monitor for this node
3210
3211 pg_autoctl get
3212 + node get a node property from the pg_auto_failover monitor
3213 + formation get a formation property from the pg_auto_failover monitor
3214
3215 pg_autoctl get node
3216 replication-quorum get replication-quorum property from the monitor
3217 candidate-priority get candidate property from the monitor
3218
3219 pg_autoctl get formation
3220 settings get replication settings for a formation from the monitor
3221 number-sync-standbys get number_sync_standbys for a formation from the monitor
3222
3223 pg_autoctl set
3224 + node set a node property on the monitor
3225 + formation set a formation property on the monitor
3226
3227 pg_autoctl set node
3228 metadata set metadata on the monitor
3229 replication-quorum set replication-quorum property on the monitor
3230 candidate-priority set candidate property on the monitor
3231
3232 pg_autoctl set formation
3233 number-sync-standbys set number-sync-standbys for a formation on the monitor
3234
3235 pg_autoctl perform
3236 failover Perform a failover for given formation and group
3237 switchover Perform a switchover for given formation and group
3238 promotion Perform a failover that promotes a target node
3239
3240 Description
3241 The pg_autoctl tool is the client tool provided by pg_auto_failover to
3242 create and manage Postgres nodes and the pg_auto_failover monitor node.
3243 The command is built with many sub-commands that each have their own
3244 manual page.
3245
3246 Help
3247 To get the full recursive list of supported commands, use:
3248
3249 pg_autoctl help
3250
3251 Version
3252 To grab the version of pg_autoctl that you're using, use:
3253
3254 pg_autoctl --version
3255 pg_autoctl version
3256
3257 A typical output would be:
3258
3259 pg_autoctl version 1.4.2
3260 pg_autoctl extension version 1.4
3261 compiled with PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
3262 compatible with Postgres 10, 11, 12, and 13
3263
3264 The version is also available as a JSON document when using the --json
3265 option:
3266
3267 pg_autoctl --version --json
3268 pg_autoctl version --json
3269
3270 A typical JSON output would be:
3271
3272 {
3273 "pg_autoctl": "1.4.2",
3274 "pgautofailover": "1.4",
3275 "pg_major": "12",
3276 "pg_version": "12.3",
3277 "pg_version_str": "PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit",
3278 "pg_version_num": 120003
3279 }
3280
3281 This is for version 1.4.2 of pg_auto_failover. This particular version
3282 of the pg_autoctl client tool has been compiled using libpq for Post‐
3283 greSQL 12.3 and is compatible with Postgres 10, 11, 12, and 13.
3284
3285 Environment
3286 PG_AUTOCTL_DEBUG
3287 When this environment variable is set (to anything) then pg_autoctl
3288 allows more commands. Use with care, this opens abilities to destroy
3289 your production clusters.
3290
3291 pg_autoctl create
3292 pg_autoctl create - Create a pg_auto_failover node, or formation
3293
3294 pg_autoctl create monitor
3295 pg_autoctl create monitor - Initialize a pg_auto_failover monitor node
3296
3297 Synopsis
3298 This command initializes a PostgreSQL cluster and installs the pgauto‐
3299 failover extension so that it's possible to use the new instance to
3300 monitor PostgreSQL services:
3301
3302 usage: pg_autoctl create monitor [ --pgdata --pgport --pgctl --hostname ]
3303
3304 --pgctl path to pg_ctl
3305 --pgdata path to data directory
3306 --pgport PostgreSQL's port number
3307 --hostname hostname by which postgres is reachable
3308 --auth authentication method for connections from data nodes
3309 --skip-pg-hba skip editing pg_hba.conf rules
3310 --run create node then run pg_autoctl service
3311 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
3312 --ssl-mode use that sslmode in connection strings
3313 --ssl-ca-file set the Postgres ssl_ca_file to that file path
3314 --ssl-crl-file set the Postgres ssl_crl_file to that file path
3315 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
3316 --server-key set the Postgres ssl_key_file to that file path
3317 --server-cert set the Postgres ssl_cert_file to that file path
3318
3319 Description
3320 The pg_autoctl tool is the client tool provided by pg_auto_failover to
3321 create and manage Postgres nodes and the pg_auto_failover monitor node.
3322 The command is built with many sub-commands that each have their own
3323 manual page.
3324
3325 Options
3326 The following options are available to pg_autoctl create monitor:
3327
3328 --pgctl
3329 Path to the pg_ctl tool to use for the version of PostgreSQL you
3330 want to use.
3331
3332 Defaults to the pg_ctl found in the PATH when there is a single
3333 entry for pg_ctl in the PATH. Check your setup using which -a
3334 pg_ctl.
3335
3336 When using an RPM based distribution such as RHEL or CentOS, the
3337 path would usually be /usr/pgsql-13/bin/pg_ctl for Postgres 13.
3338
3339 When using a debian based distribution such as debian or ubuntu,
3340 the path would usually be /usr/lib/postgresql/13/bin/pg_ctl for
3341 Postgres 13. Those distributions also use the package post‐
3342 gresql-common which provides /usr/bin/pg_config. This tool can
3343 be automatically used by pg_autoctl to discover the default ver‐
3344 sion of Postgres to use on your setup.
3345
3346 --pgdata
3347 Location where to initialize a Postgres database cluster, using
3348 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
3349 ment variable PGDATA.
3350
3351 --pgport
3352 Postgres port to use, defaults to 5432.
3353
3354 --hostname
3355 Hostname or IP address (both v4 and v6 are supported) to use
3356 from any other node to connect to this node.
3357
3358 When not provided, a default value is computed by running the
3359 following algorithm.
3360
3361 1. We get this machine's "public IP" by opening a connection
3362 to the 8.8.8.8:53 public service. Then we get TCP/IP
3363 client address that has been used to make that connection.
3364
3365 2. We then do a reverse DNS lookup on the IP address found in
3366 the previous step to fetch a hostname for our local ma‐
3367 chine.
3368
3369 3. If the reverse DNS lookup is successful , then pg_autoctl
3370 does a forward DNS lookup of that hostname.
3371
3372 When the forward DNS lookup response in step 3. is an IP address
3373 found in one of our local network interfaces, then pg_autoctl
3374 uses the hostname found in step 2. as the default --hostname.
3375 Otherwise it uses the IP address found in step 1.
3376
3377 You may use the --hostname command line option to bypass the
3378 whole DNS lookup based process and force the local node name to
3379 a fixed value.
3380
3381 --auth Authentication method used by pg_autoctl when editing the Post‐
3382 gres HBA file to open connections to other nodes. No default
3383 value, must be provided by the user. The value --trust is only a
3384 good choice for testing and evaluation of pg_auto_failover, see
3385 Security settings for pg_auto_failover for more information.
3386
3387 --skip-pg-hba
3388 When this option is used then pg_autoctl refrains from any edit‐
3389 ing of the Postgres HBA file. Please note that editing the HBA
3390 file is still needed so that other nodes can connect using ei‐
3391 ther read privileges or replication streaming privileges.
3392
3393 When --skip-pg-hba is used, pg_autoctl still outputs the HBA en‐
3394 tries it needs in the logs, it only skips editing the HBA file.
3395
3396 --run Immediately run the pg_autoctl service after having created this
3397 node.
3398
3399 --ssl-self-signed
3400 Generate SSL self-signed certificates to provide network encryp‐
3401 tion. This does not protect against man-in-the-middle kinds of
3402 attacks. See Security settings for pg_auto_failover for more
3403 about our SSL settings.
3404
3405 --ssl-mode
3406 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
3407 cluding when connecting for streaming replication.
3408
3409 --ssl-ca-file
3410 Set the Postgres ssl_ca_file to that file path.
3411
3412 --ssl-crl-file
3413 Set the Postgres ssl_crl_file to that file path.
3414
3415 --no-ssl
3416 Don't enable network encryption. This is not recommended, prefer
3417 --ssl-self-signed.
3418
3419 --server-key
3420 Set the Postgres ssl_key_file to that file path.
3421
3422 --server-cert
3423 Set the Postgres ssl_cert_file to that file path.
3424
3425 Environment
3426 PGDATA
3427 Postgres directory location. Can be used instead of the --pgdata op‐
3428 tion.
3429
3430 PG_CONFIG
3431 Can be set to the absolute path to the pg_config Postgres tool. This
3432 is mostly used in the context of building extensions, though it can
3433 be a useful way to select a Postgres version when several are in‐
3434 stalled on the same system.
3435
3436 PATH
3437 Used the usual way mostly. Some entries that are searched in the
3438 PATH by the pg_autoctl command are expected to be found only once,
3439 to avoid mistakes with Postgres major versions.
3440
3441 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
3442 See the Postgres docs about Environment Variables for details.
3443
3444 TMPDIR
3445 The pgcopydb command creates all its work files and directories in
3446 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
3447
3448 XDG_CONFIG_HOME
3449 The pg_autoctl command stores its configuration files in the stan‐
3450 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
3451 tion.
3452
3453 XDG_DATA_HOME
3454 The pg_autoctl command stores its internal states files in the stan‐
3455 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
3456 XDG Base Directory Specification.
3457
3458 pg_autoctl create postgres
3459 pg_autoctl create postgres - Initialize a pg_auto_failover postgres
3460 node
3461
3462 Synopsis
3463 The command pg_autoctl create postgres initializes a standalone Post‐
3464 gres node to a pg_auto_failover monitor. The monitor is then handling
3465 auto-failover for this Postgres node (as soon as a secondary has been
3466 registered too, and is known to be healthy).
3467
3468 usage: pg_autoctl create postgres
3469
3470 --pgctl path to pg_ctl
3471 --pgdata path to data directory
3472 --pghost PostgreSQL's hostname
3473 --pgport PostgreSQL's port number
3474 --listen PostgreSQL's listen_addresses
3475 --username PostgreSQL's username
3476 --dbname PostgreSQL's database name
3477 --name pg_auto_failover node name
3478 --hostname hostname used to connect from the other nodes
3479 --formation pg_auto_failover formation
3480 --monitor pg_auto_failover Monitor Postgres URL
3481 --auth authentication method for connections from monitor
3482 --skip-pg-hba skip editing pg_hba.conf rules
3483 --pg-hba-lan edit pg_hba.conf rules for --dbname in detected LAN
3484 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
3485 --ssl-mode use that sslmode in connection strings
3486 --ssl-ca-file set the Postgres ssl_ca_file to that file path
3487 --ssl-crl-file set the Postgres ssl_crl_file to that file path
3488 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
3489 --server-key set the Postgres ssl_key_file to that file path
3490 --server-cert set the Postgres ssl_cert_file to that file path
3491 --candidate-priority priority of the node to be promoted to become primary
3492 --replication-quorum true if node participates in write quorum
3493 --maximum-backup-rate maximum transfer rate of data transferred from the server during initial sync
3494
3495 Description
3496 Three different modes of initialization are supported by this command,
3497 corresponding to as many implementation strategies.
3498
3499 1. Initialize a primary node from scratch
3500
3501 This happens when --pgdata (or the environment variable PGDATA)
3502 points to an non-existing or empty directory. Then the given
3503 --hostname is registered to the pg_auto_failover --monitor as a
3504 member of the --formation.
3505
3506 The monitor answers to the registration call with a state to as‐
3507 sign to the new member of the group, either SINGLE or
3508 WAIT_STANDBY. When the assigned state is SINGLE, then pg_autoctl
3509 create postgres proceeds to initialize a new PostgreSQL instance.
3510
3511 2. Initialize an already existing primary server
3512
3513 This happens when --pgdata (or the environment variable PGDATA)
3514 points to an already existing directory that belongs to a Post‐
3515 greSQL instance. The standard PostgreSQL tool pg_controldata is
3516 used to recognize whether the directory belongs to a PostgreSQL
3517 instance.
3518
3519 In that case, the given --hostname is registered to the monitor
3520 in the tentative SINGLE state. When the given --formation and
3521 --group is currently empty, then the monitor accepts the regis‐
3522 tration and the pg_autoctl create prepares the already existing
3523 primary server for pg_auto_failover.
3524
3525 3. Initialize a secondary node from scratch
3526
3527 This happens when --pgdata (or the environment variable PGDATA)
3528 points to a non-existing or empty directory, and when the monitor
3529 registration call assigns the state WAIT_STANDBY in step 1.
3530
3531 In that case, the pg_autoctl create command steps through the
3532 initial states of registering a secondary server, which includes
3533 preparing the primary server PostgreSQL HBA rules and creating a
3534 replication slot.
3535
3536 When the command ends successfully, a PostgreSQL secondary server
3537 has been created with pg_basebackup and is now started, catch‐
3538 ing-up to the primary server.
3539
3540 4. Initialize a secondary node from an existing data directory
3541
3542 When the data directory pointed to by the option --pgdata or the
3543 environment variable PGDATA already exists, then pg_auto_failover
3544 verifies that the system identifier matches the one of the other
3545 nodes already existing in the same group.
3546
3547 The system identifier can be obtained with the command pg_con‐
3548 troldata. All nodes in a physical replication setting must have
3549 the same system identifier, and so in pg_auto_failover all the
3550 nodes in a same group have that constraint too.
3551
3552 When the system identifier matches the already registered system
3553 identifier of other nodes in the same group, then the node is
3554 set-up as a standby and Postgres is started with the primary con‐
3555 ninfo pointed at the current primary.
3556
3557 The --auth option allows setting up authentication method to be used
3558 when monitor node makes a connection to data node with pgauto‐
3559 failover_monitor user. As with the pg_autoctl create monitor command,
3560 you could use --auth trust when playing with pg_auto_failover at first
3561 and consider something production grade later. Also, consider using
3562 --skip-pg-hba if you already have your own provisioning tools with a
3563 security compliance process.
3564
3565 See Security settings for pg_auto_failover for notes on .pgpass
3566
3567 Options
3568 The following options are available to pg_autoctl create postgres:
3569
3570 --pgctl
3571 Path to the pg_ctl tool to use for the version of PostgreSQL you
3572 want to use.
3573
3574 Defaults to the pg_ctl found in the PATH when there is a single
3575 entry for pg_ctl in the PATH. Check your setup using which -a
3576 pg_ctl.
3577
3578 When using an RPM based distribution such as RHEL or CentOS, the
3579 path would usually be /usr/pgsql-13/bin/pg_ctl for Postgres 13.
3580
3581 When using a debian based distribution such as debian or ubuntu,
3582 the path would usually be /usr/lib/postgresql/13/bin/pg_ctl for
3583 Postgres 13. Those distributions also use the package post‐
3584 gresql-common which provides /usr/bin/pg_config. This tool can
3585 be automatically used by pg_autoctl to discover the default ver‐
3586 sion of Postgres to use on your setup.
3587
3588 --pgdata
3589 Location where to initialize a Postgres database cluster, using
3590 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
3591 ment variable PGDATA.
3592
3593 --pghost
3594 Hostname to use when connecting to the local Postgres instance
3595 from the pg_autoctl process. By default, this field is left
3596 blank in the connection string, allowing to use Unix Domain
3597 Sockets with the default path compiled in your libpq version,
3598 usually provided by the Operating System. That would be
3599 /var/run/postgresql when using debian or ubuntu.
3600
3601 --pgport
3602 Postgres port to use, defaults to 5432.
3603
3604 --listen
3605 PostgreSQL's listen_addresses to setup. At the moment only one
3606 address is supported in this command line option.
3607
3608 --username
3609 PostgreSQL's username to use when connecting to the local Post‐
3610 gres instance to manage it.
3611
3612 --dbname
3613 PostgreSQL's database name to use in your application. Defaults
3614 to being the same as the --username, or to postgres when none of
3615 those options are used.
3616
3617 --name Node name used on the monitor to refer to this node. The host‐
3618 name is a technical information, and given Postgres requirements
3619 on the HBA setup and DNS resolution (both forward and reverse
3620 lookups), IP addresses are often used for the hostname.
3621
3622 The --name option allows using a user-friendly name for your
3623 Postgres nodes.
3624
3625 --hostname
3626 Hostname or IP address (both v4 and v6 are supported) to use
3627 from any other node to connect to this node.
3628
3629 When not provided, a default value is computed by running the
3630 following algorithm.
3631
3632 1. We get this machine's "public IP" by opening a connection
3633 to the given monitor hostname or IP address. Then we get
3634 TCP/IP client address that has been used to make that con‐
3635 nection.
3636
3637 2. We then do a reverse DNS lookup on the IP address found in
3638 the previous step to fetch a hostname for our local ma‐
3639 chine.
3640
3641 3. If the reverse DNS lookup is successful , then pg_autoctl
3642 does a forward DNS lookup of that hostname.
3643
3644 When the forward DNS lookup response in step 3. is an IP address
3645 found in one of our local network interfaces, then pg_autoctl
3646 uses the hostname found in step 2. as the default --hostname.
3647 Otherwise it uses the IP address found in step 1.
3648
3649 You may use the --hostname command line option to bypass the
3650 whole DNS lookup based process and force the local node name to
3651 a fixed value.
3652
3653 --formation
3654 Formation to register the node into on the monitor. Defaults to
3655 the default formation, that is automatically created in the mon‐
3656 itor in the pg_autoctl create monitor command.
3657
3658 --monitor
3659 Postgres URI used to connect to the monitor. Must use the au‐
3660 toctl_node username and target the pg_auto_failover database
3661 name. It is possible to show the Postgres URI from the monitor
3662 node using the command pg_autoctl show uri.
3663
3664 --auth Authentication method used by pg_autoctl when editing the Post‐
3665 gres HBA file to open connections to other nodes. No default
3666 value, must be provided by the user. The value --trust is only a
3667 good choice for testing and evaluation of pg_auto_failover, see
3668 Security settings for pg_auto_failover for more information.
3669
3670 --skip-pg-hba
3671 When this option is used then pg_autoctl refrains from any edit‐
3672 ing of the Postgres HBA file. Please note that editing the HBA
3673 file is still needed so that other nodes can connect using ei‐
3674 ther read privileges or replication streaming privileges.
3675
3676 When --skip-pg-hba is used, pg_autoctl still outputs the HBA en‐
3677 tries it needs in the logs, it only skips editing the HBA file.
3678
3679 --pg-hba-lan
3680 When this option is used pg_autoctl determines the local IP ad‐
3681 dress used to connect to the monitor, and retrieves its netmask,
3682 and uses that to compute your local area network CIDR. This CIDR
3683 is then opened for connections in the Postgres HBA rules.
3684
3685 For instance, when the monitor resolves to 192.168.0.1 and your
3686 local Postgres node uses an inferface with IP address
3687 192.168.0.2/255.255.255.0 to connect to the monitor, then the
3688 LAN CIDR is computed to be 192.168.0.0/24.
3689
3690 --candidate-priority
3691 Sets this node replication setting for candidate priority to the
3692 given value (between 0 and 100) at node registration on the mon‐
3693 itor. Defaults to 50.
3694
3695 --replication-quorum
3696 Sets this node replication setting for replication quorum to the
3697 given value (either true or false) at node registration on the
3698 monitor. Defaults to true, which enables synchronous replica‐
3699 tion.
3700
3701 --maximum-backup-rate
3702 Sets the maximum transfer rate of data transferred from the
3703 server during initial sync. This is used by pg_basebackup. De‐
3704 faults to 100M.
3705
3706 --run Immediately run the pg_autoctl service after having created this
3707 node.
3708
3709 --ssl-self-signed
3710 Generate SSL self-signed certificates to provide network encryp‐
3711 tion. This does not protect against man-in-the-middle kinds of
3712 attacks. See Security settings for pg_auto_failover for more
3713 about our SSL settings.
3714
3715 --ssl-mode
3716 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
3717 cluding when connecting for streaming replication.
3718
3719 --ssl-ca-file
3720 Set the Postgres ssl_ca_file to that file path.
3721
3722 --ssl-crl-file
3723 Set the Postgres ssl_crl_file to that file path.
3724
3725 --no-ssl
3726 Don't enable network encryption. This is not recommended, prefer
3727 --ssl-self-signed.
3728
3729 --server-key
3730 Set the Postgres ssl_key_file to that file path.
3731
3732 --server-cert
3733 Set the Postgres ssl_cert_file to that file path.
3734
3735 Environment
3736 PGDATA
3737 Postgres directory location. Can be used instead of the --pgdata op‐
3738 tion.
3739
3740 PG_AUTOCTL_MONITOR
3741 Postgres URI to connect to the monitor node, can be used instead of
3742 the --monitor option.
3743
3744 PG_AUTOCTL_NODE_NAME
3745 Node name to register to the monitor, can be used instead of the
3746 --name option.
3747
3748 PG_AUTOCTL_REPLICATION_QUORUM
3749 Can be used instead of the --replication-quorum option.
3750
3751 PG_AUTOCTL_CANDIDATE_PRIORITY
3752 Can be used instead of the --candidate-priority option.
3753
3754 PG_CONFIG
3755 Can be set to the absolute path to the pg_config Postgres tool. This
3756 is mostly used in the context of building extensions, though it can
3757 be a useful way to select a Postgres version when several are in‐
3758 stalled on the same system.
3759
3760 PATH
3761 Used the usual way mostly. Some entries that are searched in the
3762 PATH by the pg_autoctl command are expected to be found only once,
3763 to avoid mistakes with Postgres major versions.
3764
3765 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
3766 See the Postgres docs about Environment Variables for details.
3767
3768 TMPDIR
3769 The pgcopydb command creates all its work files and directories in
3770 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
3771
3772 XDG_CONFIG_HOME
3773 The pg_autoctl command stores its configuration files in the stan‐
3774 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
3775 tion.
3776
3777 XDG_DATA_HOME
3778 The pg_autoctl command stores its internal states files in the stan‐
3779 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
3780 XDG Base Directory Specification.
3781
3782 pg_autoctl create coordinator
3783 pg_autoctl create coordinator - Initialize a pg_auto_failover coordina‐
3784 tor node
3785
3786 Synopsis
3787 The command pg_autoctl create coordinator initializes a
3788 pg_auto_failover Coordinator node for a Citus formation. The coordina‐
3789 tor is special in a Citus formation: that's where the client applica‐
3790 tion connects to either to manage the formation and the sharding of the
3791 tables, or for its normal SQL traffic.
3792
3793 The coordinator also has to register every worker in the formation.
3794
3795 usage: pg_autoctl create coordinator
3796
3797 --pgctl path to pg_ctl
3798 --pgdata path to data directory
3799 --pghost PostgreSQL's hostname
3800 --pgport PostgreSQL's port number
3801 --hostname hostname by which postgres is reachable
3802 --listen PostgreSQL's listen_addresses
3803 --username PostgreSQL's username
3804 --dbname PostgreSQL's database name
3805 --name pg_auto_failover node name
3806 --formation pg_auto_failover formation
3807 --monitor pg_auto_failover Monitor Postgres URL
3808 --auth authentication method for connections from monitor
3809 --skip-pg-hba skip editing pg_hba.conf rules
3810 --citus-secondary when used, this worker node is a citus secondary
3811 --citus-cluster name of the Citus Cluster for read-replicas
3812 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
3813 --ssl-mode use that sslmode in connection strings
3814 --ssl-ca-file set the Postgres ssl_ca_file to that file path
3815 --ssl-crl-file set the Postgres ssl_crl_file to that file path
3816 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
3817 --server-key set the Postgres ssl_key_file to that file path
3818 --server-cert set the Postgres ssl_cert_file to that file path
3819
3820 Description
3821 This commands works the same as the pg_autoctl create postgres command
3822 and implements the following extra steps:
3823
3824 1. adds shared_preload_libraries = citus to the local PostgreSQL in‐
3825 stance configuration.
3826
3827 2. enables the whole local area network to connect to the coordina‐
3828 tor, by adding an entry for e.g. 192.168.1.0/24 in the PostgreSQL
3829 HBA configuration.
3830
3831 3. creates the extension citus in the target database.
3832
3833 IMPORTANT:
3834 The default --dbname is the same as the current system user name,
3835 which in many case is going to be postgres. Please make sure to use
3836 the --dbname option with the actual database that you're going to
3837 use with your application.
3838
3839 Citus does not support multiple databases, you have to use the data‐
3840 base where Citus is created. When using Citus, that is essential to
3841 the well behaving of worker failover.
3842
3843 Options
3844 See the manual page for pg_autoctl create postgres for the common op‐
3845 tions. This section now lists the options that are specific to pg_au‐
3846 toctl create coordinator:
3847
3848 --citus-secondary
3849 Use this option to create a coordinator dedicated to a Citus
3850 Secondary cluster.
3851
3852 See Citus Secondaries and read-replica for more information.
3853
3854 --citus-cluster
3855 Use this option to name the Citus Secondary cluster that this
3856 coordinator node belongs to. Use the same cluster name again for
3857 the worker nodes that are part of this cluster.
3858
3859 See Citus Secondaries and read-replica for more information.
3860
3861 pg_autoctl create worker
3862 pg_autoctl create worker - Initialize a pg_auto_failover worker node
3863
3864 Synopsis
3865 The command pg_autoctl create worker initializes a pg_auto_failover
3866 Worker node for a Citus formation. The worker is special in a Citus
3867 formation: that's where the client application connects to either to
3868 manage the formation and the sharding of the tables, or for its normal
3869 SQL traffic.
3870
3871 The worker also has to register every worker in the formation.
3872
3873 usage: pg_autoctl create worker
3874
3875 --pgctl path to pg_ctl
3876 --pgdata path to data director
3877 --pghost PostgreSQL's hostname
3878 --pgport PostgreSQL's port number
3879 --hostname hostname by which postgres is reachable
3880 --listen PostgreSQL's listen_addresses
3881 --proxyport Proxy's port number
3882 --username PostgreSQL's username
3883 --dbname PostgreSQL's database name
3884 --name pg_auto_failover node name
3885 --formation pg_auto_failover formation
3886 --group pg_auto_failover group Id
3887 --monitor pg_auto_failover Monitor Postgres URL
3888 --auth authentication method for connections from monitor
3889 --skip-pg-hba skip editing pg_hba.conf rules
3890 --citus-secondary when used, this worker node is a citus secondary
3891 --citus-cluster name of the Citus Cluster for read-replicas
3892 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
3893 --ssl-mode use that sslmode in connection strings
3894 --ssl-ca-file set the Postgres ssl_ca_file to that file path
3895 --ssl-crl-file set the Postgres ssl_crl_file to that file path
3896 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
3897 --server-key set the Postgres ssl_key_file to that file path
3898 --server-cert set the Postgres ssl_cert_file to that file path
3899
3900 Description
3901 This commands works the same as the pg_autoctl create postgres command
3902 and implements the following extra steps:
3903
3904 1. adds shared_preload_libraries = citus to the local PostgreSQL in‐
3905 stance configuration.
3906
3907 2. creates the extension citus in the target database.
3908
3909 3. gets the coordinator node hostname from the pg_auto_failover mon‐
3910 itor.
3911
3912 This operation is retried when it fails, as the coordinator might
3913 appear later than some of the workers when the whole formation is
3914 initialized at once, in parallel, on multiple nodes.
3915
3916 4. adds node to the coordinator
3917
3918 This is done in two steps. First, we call the SQL function mas‐
3919 ter_add_inactive_node on the coordinator, and second, we call the
3920 SQL function master_activate_node.
3921
3922 This way allows for easier diagnostics in case things go wrong.
3923 In the first step, the network and authentication setup needs to
3924 allow for nodes to connect to each other. In the second step, the
3925 Citus reference tables are distributed to the new node, and this
3926 operation has its own set of failure cases to handle.
3927
3928 IMPORTANT:
3929 The default --dbname is the same as the current system user name,
3930 which in many case is going to be postgres. Please make sure to use
3931 the --dbname option with the actual database that you're going to
3932 use with your application.
3933
3934 Citus does not support multiple databases, you have to use the data‐
3935 base where Citus is created. When using Citus, that is essential to
3936 the well behaving of worker failover.
3937
3938 Options
3939 See the manual page for pg_autoctl create postgres for the common op‐
3940 tions. This section now lists the options that are specific to pg_au‐
3941 toctl create worker:
3942
3943 --proxyport
3944 The --proxyport option allows pg_auto_failover to register the
3945 proxy port in the pg_dist_poolinfo entry for the worker node in
3946 its Coordinator, rather than the --pgport entry as would usually
3947 be done.
3948
3949 --citus-secondary
3950 Use this option to create a worker dedicated to a Citus Sec‐
3951 ondary cluster.
3952
3953 See Citus Secondaries and read-replica for more information.
3954
3955 --citus-cluster
3956 Use this option to name the Citus Secondary cluster that this
3957 worker node belongs to. Use the same cluster name again for the
3958 worker nodes that are part of this cluster.
3959
3960 See Citus Secondaries and read-replica for more information.
3961
3962 pg_autoctl create formation
3963 pg_autoctl create formation - Create a new formation on the
3964 pg_auto_failover monitor
3965
3966 Synopsis
3967 This command registers a new formation on the monitor, with the speci‐
3968 fied kind:
3969
3970 usage: pg_autoctl create formation [ --pgdata --monitor --formation --kind --dbname --with-secondary --without-secondary ]
3971
3972 --pgdata path to data directory
3973 --monitor pg_auto_failover Monitor Postgres URL
3974 --formation name of the formation to create
3975 --kind formation kind, either "pgsql" or "citus"
3976 --dbname name for postgres database to use in this formation
3977 --enable-secondary create a formation that has multiple nodes that can be
3978 used for fail over when others have issues
3979 --disable-secondary create a citus formation without nodes to fail over to
3980 --number-sync-standbys minimum number of standbys to confirm write
3981
3982 Description
3983 A single pg_auto_failover monitor may manage any number of formations,
3984 each composed of at least one Postgres service group. This commands
3985 creates a new formation so that it is then possible to register Post‐
3986 gres nodes in the new formation.
3987
3988 Options
3989 The following options are available to pg_autoctl create formation:
3990
3991 --pgdata
3992 Location where to initialize a Postgres database cluster, using
3993 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
3994 ment variable PGDATA.
3995
3996 --monitor
3997 Postgres URI used to connect to the monitor. Must use the au‐
3998 toctl_node username and target the pg_auto_failover database
3999 name. It is possible to show the Postgres URI from the monitor
4000 node using the command pg_autoctl show uri.
4001
4002 --formation
4003 Name of the formation to create.
4004
4005 --kind A pg_auto_failover formation could be of kind pgsql or of kind
4006 citus. At the moment citus formation kinds are not managed in
4007 the Open Source version of pg_auto_failover.
4008
4009 --dbname
4010 Name of the database to use in the formation, mostly useful to
4011 formation kinds citus where the Citus extension is only in‐
4012 stalled in a single target database.
4013
4014 --enable-secondary
4015 The formation to be created allows using standby nodes. Defaults
4016 to true. Mostly useful for Citus formations.
4017
4018 --disable-secondary
4019 See --enable-secondary above.
4020
4021 --number-sync-standby
4022 Postgres streaming replication uses synchronous_standby_names to
4023 setup how many standby nodes should have received a copy of the
4024 transaction data. When using pg_auto_failover this setup is han‐
4025 dled at the formation level.
4026
4027 Defaults to zero when creating the first two Postgres nodes in a
4028 formation in the same group. When set to zero pg_auto_failover
4029 uses synchronous replication only when a standby node is avail‐
4030 able: the idea is to allow failover, this setting does not allow
4031 proper HA for Postgres.
4032
4033 When adding a third node that participates in the quorum (one
4034 primary, two secondaries), the setting is automatically changed
4035 from zero to one.
4036
4037 Environment
4038 PGDATA
4039 Postgres directory location. Can be used instead of the --pgdata op‐
4040 tion.
4041
4042 PG_AUTOCTL_MONITOR
4043 Postgres URI to connect to the monitor node, can be used instead of
4044 the --monitor option.
4045
4046 XDG_CONFIG_HOME
4047 The pg_autoctl command stores its configuration files in the stan‐
4048 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4049 tion.
4050
4051 XDG_DATA_HOME
4052 The pg_autoctl command stores its internal states files in the stan‐
4053 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4054 XDG Base Directory Specification.
4055
4056 pg_autoctl drop
4057 pg_autoctl drop - Drop a pg_auto_failover node, or formation
4058
4059 pg_autoctl drop monitor
4060 pg_autoctl drop monitor - Drop the pg_auto_failover monitor
4061
4062 Synopsis
4063 This command allows to review all the replication settings of a given
4064 formation (defaults to 'default' as usual):
4065
4066 usage: pg_autoctl drop monitor [ --pgdata --destroy ]
4067
4068 --pgdata path to data directory
4069 --destroy also destroy Postgres database
4070
4071 Options
4072 --pgdata
4073 Location of the Postgres node being managed locally. Defaults to
4074 the environment variable PGDATA. Use --monitor to connect to a
4075 monitor from anywhere, rather than the monitor URI used by a lo‐
4076 cal Postgres node managed with pg_autoctl.
4077
4078 --destroy
4079 By default the pg_autoctl drop monitor commands does not remove
4080 the Postgres database for the monitor. When using --destroy, the
4081 Postgres installation is also deleted.
4082
4083 Environment
4084 PGDATA
4085 Postgres directory location. Can be used instead of the --pgdata op‐
4086 tion.
4087
4088 XDG_CONFIG_HOME
4089 The pg_autoctl command stores its configuration files in the stan‐
4090 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4091 tion.
4092
4093 XDG_DATA_HOME
4094 The pg_autoctl command stores its internal states files in the stan‐
4095 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4096 XDG Base Directory Specification.
4097
4098 pg_autoctl drop node
4099 pg_autoctl drop node - Drop a node from the pg_auto_failover monitor
4100
4101 Synopsis
4102 This command drops a Postgres node from the pg_auto_failover monitor:
4103
4104 usage: pg_autoctl drop node [ [ [ --pgdata ] [ --destroy ] ] | [ --monitor [ [ --hostname --pgport ] | [ --formation --name ] ] ] ]
4105
4106 --pgdata path to data directory
4107 --monitor pg_auto_failover Monitor Postgres URL
4108 --formation pg_auto_failover formation
4109 --name drop the node with the given node name
4110 --hostname drop the node with given hostname and pgport
4111 --pgport drop the node with given hostname and pgport
4112 --destroy also destroy Postgres database
4113 --force force dropping the node from the monitor
4114 --wait how many seconds to wait, default to 60
4115
4116 Description
4117 Two modes of operations are implemented in the pg_autoctl drop node
4118 command.
4119
4120 When removing a node that still exists, it is possible to use pg_au‐
4121 toctl drop node --destroy to remove the node both from the monitor and
4122 also delete the local Postgres instance entirely.
4123
4124 When removing a node that doesn't exist physically anymore, or when the
4125 VM that used to host the node has been lost entirely, use either the
4126 pair of options --hostname and --pgport or the pair of options --forma‐
4127 tion and --name to match the node registration record on the monitor
4128 database, and get it removed from the known list of nodes on the moni‐
4129 tor.
4130
4131 Then option --force can be used when the target node to remove does not
4132 exist anymore. When a node has been lost entirely, it's not going to be
4133 able to finish the procedure itself, and it is then possible to in‐
4134 struct the monitor of the situation.
4135
4136 Options
4137 --pgdata
4138 Location of the Postgres node being managed locally. Defaults to
4139 the environment variable PGDATA. Use --monitor to connect to a
4140 monitor from anywhere, rather than the monitor URI used by a lo‐
4141 cal Postgres node managed with pg_autoctl.
4142
4143 --monitor
4144 Postgres URI used to connect to the monitor. Must use the au‐
4145 toctl_node username and target the pg_auto_failover database
4146 name. It is possible to show the Postgres URI from the monitor
4147 node using the command pg_autoctl show uri.
4148
4149 --hostname
4150 Hostname of the Postgres node to remove from the monitor. Use
4151 either --name or --hostname --pgport, but not both.
4152
4153 --pgport
4154 Port of the Postgres node to remove from the monitor. Use either
4155 --name or --hostname --pgport, but not both.
4156
4157 --name Name of the node to remove from the monitor. Use either --name
4158 or --hostname --pgport, but not both.
4159
4160 --destroy
4161 By default the pg_autoctl drop monitor commands does not remove
4162 the Postgres database for the monitor. When using --destroy, the
4163 Postgres installation is also deleted.
4164
4165 --force
4166 By default a node is expected to reach the assigned state
4167 DROPPED when it is removed from the monitor, and has the oppor‐
4168 tunity to implement clean-up actions. When the target node to
4169 remove is not available anymore, it is possible to use the op‐
4170 tion --force to immediately remove the node from the monitor.
4171
4172 --wait How many seconds to wait for the node to be dropped entirely.
4173 The command stops when the target node is not to be found on the
4174 monitor anymore, or when the timeout has elapsed, whichever
4175 comes first. The value 0 (zero) disables the timeout and dis‐
4176 ables waiting entirely, making the command async.
4177
4178 Environment
4179 PGDATA
4180 Postgres directory location. Can be used instead of the --pgdata op‐
4181 tion.
4182
4183 PG_AUTOCTL_MONITOR
4184 Postgres URI to connect to the monitor node, can be used instead of
4185 the --monitor option.
4186
4187 PG_AUTOCTL_NODE_NAME
4188 Node name to register to the monitor, can be used instead of the
4189 --name option.
4190
4191 PG_AUTOCTL_REPLICATION_QUORUM
4192 Can be used instead of the --replication-quorum option.
4193
4194 PG_AUTOCTL_CANDIDATE_PRIORITY
4195 Can be used instead of the --candidate-priority option.
4196
4197 PG_CONFIG
4198 Can be set to the absolute path to the pg_config Postgres tool. This
4199 is mostly used in the context of building extensions, though it can
4200 be a useful way to select a Postgres version when several are in‐
4201 stalled on the same system.
4202
4203 PATH
4204 Used the usual way mostly. Some entries that are searched in the
4205 PATH by the pg_autoctl command are expected to be found only once,
4206 to avoid mistakes with Postgres major versions.
4207
4208 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
4209 See the Postgres docs about Environment Variables for details.
4210
4211 TMPDIR
4212 The pgcopydb command creates all its work files and directories in
4213 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
4214
4215 XDG_CONFIG_HOME
4216 The pg_autoctl command stores its configuration files in the stan‐
4217 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4218 tion.
4219
4220 XDG_DATA_HOME
4221 The pg_autoctl command stores its internal states files in the stan‐
4222 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4223 XDG Base Directory Specification.
4224
4225 Examples
4226 $ pg_autoctl drop node --destroy --pgdata ./node3
4227 17:52:21 54201 INFO Reaching assigned state "secondary"
4228 17:52:21 54201 INFO Removing node with name "node3" in formation "default" from the monitor
4229 17:52:21 54201 WARN Postgres is not running and we are in state secondary
4230 17:52:21 54201 WARN Failed to update the keeper's state from the local PostgreSQL instance, see above for details.
4231 17:52:21 54201 INFO Calling node_active for node default/4/0 with current state: PostgreSQL is running is false, sync_state is "", latest WAL LSN is 0/0.
4232 17:52:21 54201 INFO FSM transition to "dropped": This node is being dropped from the monitor
4233 17:52:21 54201 INFO Transition complete: current state is now "dropped"
4234 17:52:21 54201 INFO This node with id 4 in formation "default" and group 0 has been dropped from the monitor
4235 17:52:21 54201 INFO Stopping PostgreSQL at "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
4236 17:52:21 54201 INFO /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl --pgdata /Users/dim/dev/MS/pg_auto_failover/tmux/node3 --wait stop --mode fast
4237 17:52:21 54201 INFO /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl status -D /Users/dim/dev/MS/pg_auto_failover/tmux/node3 [3]
4238 17:52:21 54201 INFO pg_ctl: no server running
4239 17:52:21 54201 INFO pg_ctl stop failed, but PostgreSQL is not running anyway
4240 17:52:21 54201 INFO Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
4241 17:52:21 54201 INFO Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node3/pg_autoctl.cfg"
4242
4243 pg_autoctl drop formation
4244 pg_autoctl drop formation - Drop a formation on the pg_auto_failover
4245 monitor
4246
4247 Synopsis
4248 This command drops an existing formation on the monitor:
4249
4250 usage: pg_autoctl drop formation [ --pgdata --formation ]
4251
4252 --pgdata path to data directory
4253 --monitor pg_auto_failover Monitor Postgres URL
4254 --formation name of the formation to drop
4255
4256 Options
4257 --pgdata
4258 Location of the Postgres node being managed locally. Defaults to
4259 the environment variable PGDATA. Use --monitor to connect to a
4260 monitor from anywhere, rather than the monitor URI used by a lo‐
4261 cal Postgres node managed with pg_autoctl.
4262
4263 --monitor
4264 Postgres URI used to connect to the monitor. Must use the au‐
4265 toctl_node username and target the pg_auto_failover database
4266 name. It is possible to show the Postgres URI from the monitor
4267 node using the command pg_autoctl show uri.
4268
4269 --formation
4270 Name of the formation to drop from the monitor.
4271
4272 Environment
4273 PGDATA
4274 Postgres directory location. Can be used instead of the --pgdata op‐
4275 tion.
4276
4277 PG_AUTOCTL_MONITOR
4278 Postgres URI to connect to the monitor node, can be used instead of
4279 the --monitor option.
4280
4281 XDG_CONFIG_HOME
4282 The pg_autoctl command stores its configuration files in the stan‐
4283 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4284 tion.
4285
4286 XDG_DATA_HOME
4287 The pg_autoctl command stores its internal states files in the stan‐
4288 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4289 XDG Base Directory Specification.
4290
4291 pg_autoctl config
4292 pg_autoctl config - Manages the pg_autoctl configuration
4293
4294 pg_autoctl config get
4295 pg_autoctl config get - Get the value of a given pg_autoctl configura‐
4296 tion variable
4297
4298 Synopsis
4299 This command prints a pg_autoctl configuration setting:
4300
4301 usage: pg_autoctl config get [ --pgdata ] [ --json ] [ section.option ]
4302
4303 --pgdata path to data directory
4304
4305 Options
4306 --pgdata
4307 Location of the Postgres node being managed locally. Defaults to
4308 the environment variable PGDATA. Use --monitor to connect to a
4309 monitor from anywhere, rather than the monitor URI used by a lo‐
4310 cal Postgres node managed with pg_autoctl.
4311
4312 --json Output JSON formatted data.
4313
4314 Environment
4315 PGDATA
4316 Postgres directory location. Can be used instead of the --pgdata op‐
4317 tion.
4318
4319 PG_AUTOCTL_MONITOR
4320 Postgres URI to connect to the monitor node, can be used instead of
4321 the --monitor option.
4322
4323 XDG_CONFIG_HOME
4324 The pg_autoctl command stores its configuration files in the stan‐
4325 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4326 tion.
4327
4328 XDG_DATA_HOME
4329 The pg_autoctl command stores its internal states files in the stan‐
4330 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4331 XDG Base Directory Specification.
4332
4333 Description
4334 When the argument section.option is used, this is the name of a config‐
4335 uration ooption. The configuration file for pg_autoctl is stored using
4336 the INI format.
4337
4338 When no argument is given to pg_autoctl config get the entire configu‐
4339 ration file is given in the output. To figure out where the configura‐
4340 tion file is stored, see pg_autoctl show file and use pg_autoctl show
4341 file --config.
4342
4343 Examples
4344 Without arguments, we get the entire file:
4345
4346 $ pg_autoctl config get --pgdata node1
4347 [pg_autoctl]
4348 role = keeper
4349 monitor = postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
4350 formation = default
4351 group = 0
4352 name = node1
4353 hostname = localhost
4354 nodekind = standalone
4355
4356 [postgresql]
4357 pgdata = /Users/dim/dev/MS/pg_auto_failover/tmux/node1
4358 pg_ctl = /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
4359 dbname = demo
4360 host = /tmp
4361 port = 5501
4362 proxyport = 0
4363 listen_addresses = *
4364 auth_method = trust
4365 hba_level = app
4366
4367 [ssl]
4368 active = 1
4369 sslmode = require
4370 cert_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.crt
4371 key_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.key
4372
4373 [replication]
4374 maximum_backup_rate = 100M
4375 backup_directory = /Users/dim/dev/MS/pg_auto_failover/tmux/backup/node_1
4376
4377 [timeout]
4378 network_partition_timeout = 20
4379 prepare_promotion_catchup = 30
4380 prepare_promotion_walreceiver = 5
4381 postgresql_restart_failure_timeout = 20
4382 postgresql_restart_failure_max_retries = 3
4383
4384 It is possible to pipe JSON formatted output to the jq command line and
4385 filter the result down to a specific section of the file:
4386
4387 $ pg_autoctl config get --pgdata node1 --json | jq .pg_autoctl
4388 {
4389 "role": "keeper",
4390 "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
4391 "formation": "default",
4392 "group": 0,
4393 "name": "node1",
4394 "hostname": "localhost",
4395 "nodekind": "standalone"
4396 }
4397
4398 Finally, a single configuration element can be listed:
4399
4400 $ pg_autoctl config get --pgdata node1 ssl.sslmode --json
4401 require
4402
4403 pg_autoctl config set
4404 pg_autoctl config set - Set the value of a given pg_autoctl configura‐
4405 tion variable
4406
4407 Synopsis
4408 This command prints a pg_autoctl configuration setting:
4409
4410 usage: pg_autoctl config set [ --pgdata ] [ --json ] section.option [ value ]
4411
4412 --pgdata path to data directory
4413
4414 Options
4415 --pgdata
4416 Location of the Postgres node being managed locally. Defaults to
4417 the environment variable PGDATA. Use --monitor to connect to a
4418 monitor from anywhere, rather than the monitor URI used by a lo‐
4419 cal Postgres node managed with pg_autoctl.
4420
4421 --json Output JSON formatted data.
4422
4423 Environment
4424 PGDATA
4425 Postgres directory location. Can be used instead of the --pgdata op‐
4426 tion.
4427
4428 PG_AUTOCTL_MONITOR
4429 Postgres URI to connect to the monitor node, can be used instead of
4430 the --monitor option.
4431
4432 XDG_CONFIG_HOME
4433 The pg_autoctl command stores its configuration files in the stan‐
4434 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4435 tion.
4436
4437 XDG_DATA_HOME
4438 The pg_autoctl command stores its internal states files in the stan‐
4439 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4440 XDG Base Directory Specification.
4441
4442 Description
4443 This commands allows to set a pg_autoctl configuration setting to a new
4444 value. Most settings are possible to change and can be reloaded online.
4445
4446 Some of those commands can then be applied with a pg_autoctl reload
4447 command to an already running process.
4448
4449 Settings
4450 pg_autoctl.role
4451 This setting can not be changed. It can be either monitor or keeper
4452 and the rest of the configuration file is read depending on this
4453 value.
4454
4455 pg_autoctl.monitor
4456 URI of the pg_autoctl monitor Postgres service. Can be changed with
4457 a reload.
4458
4459 To register an existing node to a new monitor, use pg_autoctl dis‐
4460 able monitor and then pg_autoctl enable monitor.
4461
4462 pg_autoctl.formation
4463 Formation to which this node has been registered. Changing this set‐
4464 ting is not supported.
4465
4466 pg_autoctl.group
4467 Group in which this node has been registered. Changing this setting
4468 is not supported.
4469
4470 pg_autoctl.name
4471 Name of the node as known to the monitor and listed in pg_autoctl
4472 show state. Can be changed with a reload.
4473
4474 pg_autoctl.hostname
4475 Hostname or IP address of the node, as known to the monitor. Can be
4476 changed with a reload.
4477
4478 pg_autoctl.nodekind
4479 This setting can not be changed and depends on the command that has
4480 been used to create this pg_autoctl node.
4481
4482 postgresql.pgdata
4483 Directory where the managed Postgres instance is to be created (or
4484 found) and managed. Can't be changed.
4485
4486 postgresql.pg_ctl
4487 Path to the pg_ctl tool used to manage this Postgres instance. Ab‐
4488 solute path depends on the major version of Postgres and looks like
4489 /usr/lib/postgresql/13/bin/pg_ctl when using a debian or ubuntu OS.
4490
4491 Can be changed after a major upgrade of Postgres.
4492
4493 postgresql.dbname
4494 Name of the database that is used to connect to Postgres. Can be
4495 changed, but then must be changed manually on the monitor's pgauto‐
4496 failover.formation table with a SQL command.
4497
4498 WARNING:
4499 When using pg_auto_failover enterprise edition with Citus sup‐
4500 port, this is the database where pg_autoctl maintains the list
4501 of Citus nodes on the coordinator. Using the same database name
4502 as your application that uses Citus is then crucial.
4503
4504 postgresql.host
4505 Hostname to use in connection strings when connecting from the local
4506 pg_autoctl process to the local Postgres database. Defaults to using
4507 the Operating System default value for the Unix Domain Socket direc‐
4508 tory, either /tmp or when using debian or ubuntu /var/run/post‐
4509 gresql.
4510
4511 Can be changed with a reload.
4512
4513 postgresql.port
4514 Port on which Postgres should be managed. Can be changed offline,
4515 between a pg_autoctl stop and a subsequent pg_autoctl start.
4516
4517 postgresql.listen_addresses
4518 Value to set to Postgres parameter of the same name. At the moment
4519 pg_autoctl only supports a single address for this parameter.
4520
4521 postgresql.auth_method
4522 Authentication method to use when editing HBA rules to allow the
4523 Postgres nodes of a formation to connect to each other, and to the
4524 monitor, and to allow the monitor to connect to the nodes.
4525
4526 Can be changed online with a reload, but actually adding new HBA
4527 rules requires a restart of the "node-active" service.
4528
4529 postgresql.hba_level
4530 This setting reflects the choice of --skip-pg-hba or --pg-hba-lan
4531 that has been used when creating this pg_autoctl node. Can be
4532 changed with a reload, though the HBA rules that have been previ‐
4533 ously added will not get removed.
4534
4535 ssl.active, ssl.sslmode, ssl.cert_file, ssl.key_file, etc
4536 Please use the command pg_autoctl enable ssl or pg_autoctl disable
4537 ssl to manage the SSL settings in the ssl section of the configura‐
4538 tion. Using those commands, the settings can be changed online.
4539
4540 replication.maximum_backup_rate
4541 Used as a parameter to pg_basebackup, defaults to 100M. Can be
4542 changed with a reload. Changing this value does not affect an al‐
4543 ready running pg_basebackup command.
4544
4545 Limiting the bandwidth used by pg_basebackup makes the operation
4546 slower, and still has the advantage of limiting the impact on the
4547 disks of the primary server.
4548
4549 replication.backup_directory
4550 Target location of the pg_basebackup command used by pg_autoctl when
4551 creating a secondary node. When done with fetching the data over the
4552 network, then pg_autoctl uses the rename(2) system-call to rename
4553 the temporary download location to the target PGDATA location.
4554
4555 The rename(2) system-call is known to be atomic when both the source
4556 and the target of the operation are using the same file system /
4557 mount point.
4558
4559 Can be changed online with a reload, will not affect already running
4560 pg_basebackup sub-processes.
4561
4562 replication.password
4563 Used as a parameter in the connection string to the upstream Post‐
4564 gres node. The "replication" connection uses the password set-up in
4565 the pg_autoctl configuration file.
4566
4567 Changing the replication.password of a pg_autoctl configuration has
4568 no effect on the Postgres database itself. The password must match
4569 what the Postgres upstream node expects, which can be set with the
4570 following SQL command run on the upstream server (primary or other
4571 standby node):
4572
4573 alter user pgautofailover_replicator password 'h4ckm3m0r3';
4574
4575 The replication.password can be changed online with a reload, but
4576 requires restarting the Postgres service to be activated. Postgres
4577 only reads the primary_conninfo connection string at start-up, up to
4578 and including Postgres 12. With Postgres 13 and following, it is
4579 possible to reload this Postgres parameter.
4580
4581 timeout.network_partition_timeout
4582 Timeout (in seconds) that pg_autoctl waits before deciding that it
4583 is on the losing side of a network partition. When pg_autoctl fails
4584 to connect to the monitor and when the local Postgres instance
4585 pg_stat_replication system view is empty, and after this many sec‐
4586 onds have passed, then pg_autoctl demotes itself.
4587
4588 Can be changed with a reload.
4589
4590 timeout.prepare_promotion_catchup
4591 Currently not used in the source code. Can be changed with a reload.
4592
4593 timeout.prepare_promotion_walreceiver
4594 Currently not used in the source code. Can be changed with a reload.
4595
4596 timeout.postgresql_restart_failure_timeout
4597 When pg_autoctl fails to start Postgres for at least this duration
4598 from the first attempt, then it starts reporting that Postgres is
4599 not running to the monitor, which might then decide to implement a
4600 failover.
4601
4602 Can be changed with a reload.
4603
4604 timeout.postgresql_restart_failure_max_retries
4605 When pg_autoctl fails to start Postgres for at least this many times
4606 then it starts reporting that Postgres is not running to the moni‐
4607 tor, which them might decide to implement a failover.
4608
4609 Can be changed with a reload.
4610
4611 pg_autoctl config check
4612 pg_autoctl config check - Check pg_autoctl configuration
4613
4614 Synopsis
4615 This command implements a very basic list of sanity checks for a pg_au‐
4616 toctl node setup:
4617
4618 usage: pg_autoctl config check [ --pgdata ] [ --json ]
4619
4620 --pgdata path to data directory
4621 --json output data in the JSON format
4622
4623 Options
4624 --pgdata
4625 Location of the Postgres node being managed locally. Defaults to
4626 the environment variable PGDATA. Use --monitor to connect to a
4627 monitor from anywhere, rather than the monitor URI used by a lo‐
4628 cal Postgres node managed with pg_autoctl.
4629
4630 --json Output JSON formatted data.
4631
4632 Environment
4633 PGDATA
4634 Postgres directory location. Can be used instead of the --pgdata op‐
4635 tion.
4636
4637 PG_AUTOCTL_MONITOR
4638 Postgres URI to connect to the monitor node, can be used instead of
4639 the --monitor option.
4640
4641 XDG_CONFIG_HOME
4642 The pg_autoctl command stores its configuration files in the stan‐
4643 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4644 tion.
4645
4646 XDG_DATA_HOME
4647 The pg_autoctl command stores its internal states files in the stan‐
4648 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4649 XDG Base Directory Specification.
4650
4651 Examples
4652 $ pg_autoctl config check --pgdata node1
4653 18:37:27 63749 INFO Postgres setup for PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" is ok, running with PID 5501 and port 99698
4654 18:37:27 63749 INFO Connection to local Postgres ok, using "port=5501 dbname=demo host=/tmp"
4655 18:37:27 63749 INFO Postgres configuration settings required for pg_auto_failover are ok
4656 18:37:27 63749 WARN Postgres 12.1 does not support replication slots on a standby node
4657 18:37:27 63749 INFO Connection to monitor ok, using "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer"
4658 18:37:27 63749 INFO Monitor is running version "1.5.0.1", as expected
4659 pgdata: /Users/dim/dev/MS/pg_auto_failover/tmux/node1
4660 pg_ctl: /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
4661 pg_version: 12.3
4662 pghost: /tmp
4663 pgport: 5501
4664 proxyport: 0
4665 pid: 99698
4666 is in recovery: no
4667 Control Version: 1201
4668 Catalog Version: 201909212
4669 System Identifier: 6941034382470571312
4670 Latest checkpoint LSN: 0/6000098
4671 Postmaster status: ready
4672
4673 pg_autoctl show
4674 pg_autoctl show - Show pg_auto_failover information
4675
4676 pg_autoctl show uri
4677 pg_autoctl show uri - Show the postgres uri to use to connect to
4678 pg_auto_failover nodes
4679
4680 Synopsis
4681 This command outputs the monitor or the coordinator Postgres URI to use
4682 from an application to connect to Postgres:
4683
4684 usage: pg_autoctl show uri [ --pgdata --monitor --formation --json ]
4685
4686 --pgdata path to data directory
4687 --monitor monitor uri
4688 --formation show the coordinator uri of given formation
4689 --json output data in the JSON format
4690
4691 Options
4692 --pgdata
4693 Location of the Postgres node being managed locally. Defaults to
4694 the environment variable PGDATA. Use --monitor to connect to a
4695 monitor from anywhere, rather than the monitor URI used by a lo‐
4696 cal Postgres node managed with pg_autoctl.
4697
4698 --monitor
4699 Postgres URI used to connect to the monitor. Must use the au‐
4700 toctl_node username and target the pg_auto_failover database
4701 name. It is possible to show the Postgres URI from the monitor
4702 node using the command pg_autoctl show uri.
4703
4704 Defaults to the value of the environment variable PG_AU‐
4705 TOCTL_MONITOR.
4706
4707 --formation
4708 When --formation is used, lists the Postgres URIs of all known
4709 formations on the monitor.
4710
4711 --json Output a JSON formatted data instead of a table formatted list.
4712
4713 Environment
4714 PGDATA
4715 Postgres directory location. Can be used instead of the --pgdata op‐
4716 tion.
4717
4718 PG_AUTOCTL_MONITOR
4719 Postgres URI to connect to the monitor node, can be used instead of
4720 the --monitor option.
4721
4722 XDG_CONFIG_HOME
4723 The pg_autoctl command stores its configuration files in the stan‐
4724 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4725 tion.
4726
4727 XDG_DATA_HOME
4728 The pg_autoctl command stores its internal states files in the stan‐
4729 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4730 XDG Base Directory Specification.
4731
4732 Examples
4733 $ pg_autoctl show uri
4734 Type | Name | Connection String
4735 -------------+---------+-------------------------------
4736 monitor | monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover
4737 formation | default | postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer
4738
4739 $ pg_autoctl show uri --formation monitor
4740 postgres://autoctl_node@localhost:5500/pg_auto_failover
4741
4742 $ pg_autoctl show uri --formation default
4743 postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer
4744
4745 $ pg_autoctl show uri --json
4746 [
4747 {
4748 "uri": "postgres://autoctl_node@localhost:5500/pg_auto_failover",
4749 "name": "monitor",
4750 "type": "monitor"
4751 },
4752 {
4753 "uri": "postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer",
4754 "name": "default",
4755 "type": "formation"
4756 }
4757 ]
4758
4759 Multi-hosts Postgres connection strings
4760 PostgreSQL since version 10 includes support for multiple hosts in its
4761 connection driver libpq, with the special target_session_attrs connec‐
4762 tion property.
4763
4764 This multi-hosts connection string facility allows applications to keep
4765 using the same stable connection string over server-side failovers.
4766 That's why pg_autoctl show uri uses that format.
4767
4768 pg_autoctl show events
4769 pg_autoctl show events - Prints monitor's state of nodes in a given
4770 formation and group
4771
4772 Synopsis
4773 This command outputs the events that the pg_auto_failover events
4774 records about state changes of the pg_auto_failover nodes managed by
4775 the monitor:
4776
4777 usage: pg_autoctl show events [ --pgdata --formation --group --count ]
4778
4779 --pgdata path to data directory
4780 --monitor pg_auto_failover Monitor Postgres URL
4781 --formation formation to query, defaults to 'default'
4782 --group group to query formation, defaults to all
4783 --count how many events to fetch, defaults to 10
4784 --watch display an auto-updating dashboard
4785 --json output data in the JSON format
4786
4787 Options
4788 --pgdata
4789 Location of the Postgres node being managed locally. Defaults to
4790 the environment variable PGDATA. Use --monitor to connect to a
4791 monitor from anywhere, rather than the monitor URI used by a lo‐
4792 cal Postgres node managed with pg_autoctl.
4793
4794 --monitor
4795 Postgres URI used to connect to the monitor. Must use the au‐
4796 toctl_node username and target the pg_auto_failover database
4797 name. It is possible to show the Postgres URI from the monitor
4798 node using the command pg_autoctl show uri.
4799
4800 --formation
4801 List the events recorded for nodes in the given formation. De‐
4802 faults to default.
4803
4804 --count
4805 By default only the last 10 events are printed.
4806
4807 --watch
4808 Take control of the terminal and display the current state of
4809 the system and the last events from the monitor. The display is
4810 updated automatically every 500 milliseconds (half a second) and
4811 reacts properly to window size change.
4812
4813 Depending on the terminal window size, a different set of col‐
4814 umns is visible in the state part of the output. See pg_autoctl
4815 watch.
4816
4817 --json Output a JSON formatted data instead of a table formatted list.
4818
4819 Environment
4820 PGDATA
4821 Postgres directory location. Can be used instead of the --pgdata op‐
4822 tion.
4823
4824 PG_AUTOCTL_MONITOR
4825 Postgres URI to connect to the monitor node, can be used instead of
4826 the --monitor option.
4827
4828 XDG_CONFIG_HOME
4829 The pg_autoctl command stores its configuration files in the stan‐
4830 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4831 tion.
4832
4833 XDG_DATA_HOME
4834 The pg_autoctl command stores its internal states files in the stan‐
4835 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4836 XDG Base Directory Specification.
4837
4838 Examples
4839 $ pg_autoctl show events --count 2 --json
4840 [
4841 {
4842 "nodeid": 1,
4843 "eventid": 15,
4844 "groupid": 0,
4845 "nodehost": "localhost",
4846 "nodename": "node1",
4847 "nodeport": 5501,
4848 "eventtime": "2021-03-18T12:32:36.103467+01:00",
4849 "goalstate": "primary",
4850 "description": "Setting goal state of node 1 \"node1\" (localhost:5501) to primary now that at least one secondary candidate node is healthy.",
4851 "formationid": "default",
4852 "reportedlsn": "0/4000060",
4853 "reportedstate": "wait_primary",
4854 "reportedrepstate": "async",
4855 "candidatepriority": 50,
4856 "replicationquorum": true
4857 },
4858 {
4859 "nodeid": 1,
4860 "eventid": 16,
4861 "groupid": 0,
4862 "nodehost": "localhost",
4863 "nodename": "node1",
4864 "nodeport": 5501,
4865 "eventtime": "2021-03-18T12:32:36.215494+01:00",
4866 "goalstate": "primary",
4867 "description": "New state is reported by node 1 \"node1\" (localhost:5501): \"primary\"",
4868 "formationid": "default",
4869 "reportedlsn": "0/4000110",
4870 "reportedstate": "primary",
4871 "reportedrepstate": "quorum",
4872 "candidatepriority": 50,
4873 "replicationquorum": true
4874 }
4875 ]
4876
4877 pg_autoctl show state
4878 pg_autoctl show state - Prints monitor's state of nodes in a given for‐
4879 mation and group
4880
4881 Synopsis
4882 This command outputs the current state of the formation and groups reg‐
4883 istered to the pg_auto_failover monitor:
4884
4885 usage: pg_autoctl show state [ --pgdata --formation --group ]
4886
4887 --pgdata path to data directory
4888 --monitor pg_auto_failover Monitor Postgres URL
4889 --formation formation to query, defaults to 'default'
4890 --group group to query formation, defaults to all
4891 --local show local data, do not connect to the monitor
4892 --watch display an auto-updating dashboard
4893 --json output data in the JSON format
4894
4895 Options
4896 --pgdata
4897 Location of the Postgres node being managed locally. Defaults to
4898 the environment variable PGDATA. Use --monitor to connect to a
4899 monitor from anywhere, rather than the monitor URI used by a lo‐
4900 cal Postgres node managed with pg_autoctl.
4901
4902 --monitor
4903 Postgres URI used to connect to the monitor. Must use the au‐
4904 toctl_node username and target the pg_auto_failover database
4905 name. It is possible to show the Postgres URI from the monitor
4906 node using the command pg_autoctl show uri.
4907
4908 --formation
4909 List the events recorded for nodes in the given formation. De‐
4910 faults to default.
4911
4912 --group
4913 Limit output to a single group in the formation. Default to in‐
4914 cluding all groups registered in the target formation.
4915
4916 --local
4917 Print the local state information without connecting to the mon‐
4918 itor.
4919
4920 --watch
4921 Take control of the terminal and display the current state of
4922 the system and the last events from the monitor. The display is
4923 updated automatically every 500 milliseconds (half a second) and
4924 reacts properly to window size change.
4925
4926 Depending on the terminal window size, a different set of col‐
4927 umns is visible in the state part of the output. See pg_autoctl
4928 watch.
4929
4930 --json Output a JSON formatted data instead of a table formatted list.
4931
4932 Environment
4933 PGDATA
4934 Postgres directory location. Can be used instead of the --pgdata op‐
4935 tion.
4936
4937 PG_AUTOCTL_MONITOR
4938 Postgres URI to connect to the monitor node, can be used instead of
4939 the --monitor option.
4940
4941 XDG_CONFIG_HOME
4942 The pg_autoctl command stores its configuration files in the stan‐
4943 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
4944 tion.
4945
4946 XDG_DATA_HOME
4947 The pg_autoctl command stores its internal states files in the stan‐
4948 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
4949 XDG Base Directory Specification.
4950
4951 Description
4952 The pg_autoctl show state output includes the following columns:
4953
4954 • Name
4955 Name of the node.
4956
4957 • Node
4958 Node information. When the formation has a single group (group
4959 zero), then this column only contains the nodeId.
4960
4961 Only Citus formations allow several groups. When using a Citus
4962 formation the Node column contains the groupId and the nodeId,
4963 separated by a colon, such as 0:1 for the first coordinator
4964 node.
4965
4966 • Host:Port
4967 Hostname and port number used to connect to the node.
4968
4969 • TLI: LSN
4970 Timeline identifier (TLI) and Postgres Log Sequence Number
4971 (LSN).
4972
4973 The LSN is the current position in the Postgres WAL stream.
4974 This is a hexadecimal number. See pg_lsn for more information.
4975
4976 The current timeline is incremented each time a failover hap‐
4977 pens, or when doing Point In Time Recovery. A node can only
4978 reach the secondary state when it is on the same timeline as
4979 its primary node.
4980
4981 • Connection
4982 This output field contains two bits of information. First, the
4983 Postgres connection type that the node provides, either
4984 read-write or read-only. Then the mark ! is added when the
4985 monitor has failed to connect to this node, and ? when the
4986 monitor didn't connect to the node yet.
4987
4988 • Reported State
4989 The latest reported FSM state, as reported to the monitor by
4990 the pg_autoctl process running on the Postgres node.
4991
4992 • Assigned State
4993 The assigned FSM state on the monitor. When the assigned state
4994 is not the same as the reported start, then the pg_autoctl
4995 process running on the Postgres node might have not retrieved
4996 the assigned state yet, or might still be implementing the FSM
4997 transition from the current state to the assigned state.
4998
4999 Examples
5000 $ pg_autoctl show state
5001 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
5002 ------+-------+----------------+----------------+--------------+---------------------+--------------------
5003 node1 | 1 | localhost:5501 | 1: 0/4000678 | read-write | primary | primary
5004 node2 | 2 | localhost:5502 | 1: 0/4000678 | read-only | secondary | secondary
5005 node3 | 3 | localhost:5503 | 1: 0/4000678 | read-only | secondary | secondary
5006
5007 $ pg_autoctl show state --local
5008 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
5009 ------+-------+----------------+----------------+--------------+---------------------+--------------------
5010 node1 | 1 | localhost:5501 | 1: 0/4000678 | read-write ? | primary | primary
5011
5012 $ pg_autoctl show state --json
5013 [
5014 {
5015 "health": 1,
5016 "node_id": 1,
5017 "group_id": 0,
5018 "nodehost": "localhost",
5019 "nodename": "node1",
5020 "nodeport": 5501,
5021 "reported_lsn": "0/4000678",
5022 "reported_tli": 1,
5023 "formation_kind": "pgsql",
5024 "candidate_priority": 50,
5025 "replication_quorum": true,
5026 "current_group_state": "primary",
5027 "assigned_group_state": "primary"
5028 },
5029 {
5030 "health": 1,
5031 "node_id": 2,
5032 "group_id": 0,
5033 "nodehost": "localhost",
5034 "nodename": "node2",
5035 "nodeport": 5502,
5036 "reported_lsn": "0/4000678",
5037 "reported_tli": 1,
5038 "formation_kind": "pgsql",
5039 "candidate_priority": 50,
5040 "replication_quorum": true,
5041 "current_group_state": "secondary",
5042 "assigned_group_state": "secondary"
5043 },
5044 {
5045 "health": 1,
5046 "node_id": 3,
5047 "group_id": 0,
5048 "nodehost": "localhost",
5049 "nodename": "node3",
5050 "nodeport": 5503,
5051 "reported_lsn": "0/4000678",
5052 "reported_tli": 1,
5053 "formation_kind": "pgsql",
5054 "candidate_priority": 50,
5055 "replication_quorum": true,
5056 "current_group_state": "secondary",
5057 "assigned_group_state": "secondary"
5058 }
5059 ]
5060
5061 pg_autoctl show settings
5062 pg_autoctl show settings - Print replication settings for a formation
5063 from the monitor
5064
5065 Synopsis
5066 This command allows to review all the replication settings of a given
5067 formation (defaults to 'default' as usual):
5068
5069 usage: pg_autoctl show settings [ --pgdata ] [ --json ] [ --formation ]
5070
5071 --pgdata path to data directory
5072 --monitor pg_auto_failover Monitor Postgres URL
5073 --json output data in the JSON format
5074 --formation pg_auto_failover formation
5075
5076 Description
5077 See also pg_autoctl get formation settings which is a synonym.
5078
5079 The output contains setting and values that apply at different con‐
5080 texts, as shown here with a formation of four nodes, where node_4 is
5081 not participating in the replication quorum and also not a candidate
5082 for failover:
5083
5084 $ pg_autoctl show settings
5085 Context | Name | Setting | Value
5086 ----------+---------+---------------------------+-------------------------------------------------------------
5087 formation | default | number_sync_standbys | 1
5088 primary | node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
5089 node | node_1 | replication quorum | true
5090 node | node_2 | replication quorum | true
5091 node | node_3 | replication quorum | true
5092 node | node_4 | replication quorum | false
5093 node | node_1 | candidate priority | 50
5094 node | node_2 | candidate priority | 50
5095 node | node_3 | candidate priority | 50
5096 node | node_4 | candidate priority | 0
5097
5098 Three replication settings context are listed:
5099
5100 1. The "formation" context contains a single entry, the value of
5101 number_sync_standbys for the target formation.
5102
5103 2. The "primary" context contains one entry per group of Postgres
5104 nodes in the formation, and shows the current value of the syn‐
5105 chronous_standby_names Postgres setting as computed by the moni‐
5106 tor. It should match what's currently set on the primary node un‐
5107 less while applying a change, as shown by the primary being in
5108 the APPLY_SETTING state.
5109
5110 3. The "node" context contains two entry per nodes, one line shows
5111 the replication quorum setting of nodes, and another line shows
5112 the candidate priority of nodes.
5113
5114 This command gives an overview of all the settings that apply to the
5115 current formation.
5116
5117 Options
5118 --pgdata
5119 Location of the Postgres node being managed locally. Defaults to
5120 the environment variable PGDATA. Use --monitor to connect to a
5121 monitor from anywhere, rather than the monitor URI used by a lo‐
5122 cal Postgres node managed with pg_autoctl.
5123
5124 --monitor
5125 Postgres URI used to connect to the monitor. Must use the au‐
5126 toctl_node username and target the pg_auto_failover database
5127 name. It is possible to show the Postgres URI from the monitor
5128 node using the command pg_autoctl show uri.
5129
5130 Defaults to the value of the environment variable PG_AU‐
5131 TOCTL_MONITOR.
5132
5133 --formation
5134 Show the current replication settings for the given formation.
5135 Defaults to the default formation.
5136
5137 --json Output a JSON formatted data instead of a table formatted list.
5138
5139 Environment
5140 PGDATA
5141 Postgres directory location. Can be used instead of the --pgdata op‐
5142 tion.
5143
5144 PG_AUTOCTL_MONITOR
5145 Postgres URI to connect to the monitor node, can be used instead of
5146 the --monitor option.
5147
5148 XDG_CONFIG_HOME
5149 The pg_autoctl command stores its configuration files in the stan‐
5150 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5151 tion.
5152
5153 XDG_DATA_HOME
5154 The pg_autoctl command stores its internal states files in the stan‐
5155 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5156 XDG Base Directory Specification.
5157
5158 Examples
5159 $ pg_autoctl show settings
5160 Context | Name | Setting | Value
5161 ----------+---------+---------------------------+-------------------------------------------------------------
5162 formation | default | number_sync_standbys | 1
5163 primary | node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
5164 node | node1 | candidate priority | 50
5165 node | node2 | candidate priority | 50
5166 node | node3 | candidate priority | 50
5167 node | node1 | replication quorum | true
5168 node | node2 | replication quorum | true
5169 node | node3 | replication quorum | true
5170
5171 pg_autoctl show standby-names
5172 pg_autoctl show standby-names - Prints synchronous_standby_names for a
5173 given group
5174
5175 Synopsis
5176 This command prints the current value for synchronous_standby_names for
5177 the primary Postgres server of the target group (default 0) in the tar‐
5178 get formation (default default), as computed by the monitor:
5179
5180 usage: pg_autoctl show standby-names [ --pgdata ] --formation --group
5181
5182 --pgdata path to data directory
5183 --monitor pg_auto_failover Monitor Postgres URL
5184 --formation formation to query, defaults to 'default'
5185 --group group to query formation, defaults to all
5186 --json output data in the JSON format
5187
5188 Options
5189 --pgdata
5190 Location of the Postgres node being managed locally. Defaults to
5191 the environment variable PGDATA. Use --monitor to connect to a
5192 monitor from anywhere, rather than the monitor URI used by a lo‐
5193 cal Postgres node managed with pg_autoctl.
5194
5195 --monitor
5196 Postgres URI used to connect to the monitor. Must use the au‐
5197 toctl_node username and target the pg_auto_failover database
5198 name. It is possible to show the Postgres URI from the monitor
5199 node using the command pg_autoctl show uri.
5200
5201 Defaults to the value of the environment variable PG_AU‐
5202 TOCTL_MONITOR.
5203
5204 --formation
5205 Show the current synchronous_standby_names value for the given
5206 formation. Defaults to the default formation.
5207
5208 --group
5209 Show the current synchronous_standby_names value for the given
5210 group in the given formation. Defaults to group 0.
5211
5212 --json Output a JSON formatted data instead of a table formatted list.
5213
5214 Environment
5215 PGDATA
5216 Postgres directory location. Can be used instead of the --pgdata op‐
5217 tion.
5218
5219 PG_AUTOCTL_MONITOR
5220 Postgres URI to connect to the monitor node, can be used instead of
5221 the --monitor option.
5222
5223 XDG_CONFIG_HOME
5224 The pg_autoctl command stores its configuration files in the stan‐
5225 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5226 tion.
5227
5228 XDG_DATA_HOME
5229 The pg_autoctl command stores its internal states files in the stan‐
5230 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5231 XDG Base Directory Specification.
5232
5233 Examples
5234 $ pg_autoctl show standby-names
5235 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
5236
5237 $ pg_autoctl show standby-names --json
5238 {
5239 "formation": "default",
5240 "group": 0,
5241 "synchronous_standby_names": "ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)"
5242 }
5243
5244 pg_autoctl show file
5245 pg_autoctl show file - List pg_autoctl internal files (config, state,
5246 pid)
5247
5248 Synopsis
5249 This command the files that pg_autoctl uses internally for its own con‐
5250 figuration, state, and pid:
5251
5252 usage: pg_autoctl show file [ --pgdata --all --config | --state | --init | --pid --contents ]
5253
5254 --pgdata path to data directory
5255 --all show all pg_autoctl files
5256 --config show pg_autoctl configuration file
5257 --state show pg_autoctl state file
5258 --init show pg_autoctl initialisation state file
5259 --pid show pg_autoctl PID file
5260 --contents show selected file contents
5261 --json output data in the JSON format
5262
5263 Description
5264 The pg_autoctl command follows the XDG Base Directory Specification and
5265 places its internal and configuration files by default in places such
5266 as ~/.config/pg_autoctl and ~/.local/share/pg_autoctl.
5267
5268 It is possible to change the default XDG locations by using the envi‐
5269 ronment variables XDG_CONFIG_HOME, XDG_DATA_HOME, and XDG_RUNTIME_DIR.
5270
5271 Also, pg_config uses sub-directories that are specific to a given PG‐
5272 DATA, making it possible to run several Postgres nodes on the same ma‐
5273 chine, which is very practical for testing and development purposes,
5274 though not advised for production setups.
5275
5276 Configuration File
5277 The pg_autoctl configuration file for an instance serving the data di‐
5278 rectory at /data/pgsql is found at ~/.config/pg_au‐
5279 toctl/data/pgsql/pg_autoctl.cfg, written in the INI format.
5280
5281 It is possible to get the location of the configuration file by using
5282 the command pg_autoctl show file --config --pgdata /data/pgsql and to
5283 output its content by using the command pg_autoctl show file --config
5284 --contents --pgdata /data/pgsql.
5285
5286 See also pg_autoctl config get and pg_autoctl config set.
5287
5288 State File
5289 The pg_autoctl state file for an instance serving the data directory at
5290 /data/pgsql is found at ~/.local/share/pg_autoctl/data/pgsql/pg_au‐
5291 toctl.state, written in a specific binary format.
5292
5293 This file is not intended to be written by anything else than pg_au‐
5294 toctl itself. In case of state corruption, see the trouble shooting
5295 section of the documentation.
5296
5297 It is possible to get the location of the state file by using the com‐
5298 mand pg_autoctl show file --state --pgdata /data/pgsql and to output
5299 its content by using the command pg_autoctl show file --state --con‐
5300 tents --pgdata /data/pgsql.
5301
5302 Init State File
5303 The pg_autoctl init state file for an instance serving the data direc‐
5304 tory at /data/pgsql is found at ~/.local/share/pg_au‐
5305 toctl/data/pgsql/pg_autoctl.init, written in a specific binary format.
5306
5307 This file is not intended to be written by anything else than pg_au‐
5308 toctl itself. In case of state corruption, see the trouble shooting
5309 section of the documentation.
5310
5311 This initialization state file only exists during the initialization of
5312 a pg_auto_failover node. In normal operations, this file does not ex‐
5313 ist.
5314
5315 It is possible to get the location of the state file by using the com‐
5316 mand pg_autoctl show file --init --pgdata /data/pgsql and to output its
5317 content by using the command pg_autoctl show file --init --contents
5318 --pgdata /data/pgsql.
5319
5320 PID File
5321 The pg_autoctl PID file for an instance serving the data directory at
5322 /data/pgsql is found at /tmp/pg_autoctl/data/pgsql/pg_autoctl.pid,
5323 written in a specific text format.
5324
5325 The PID file is located in a temporary directory by default, or in the
5326 XDG_RUNTIME_DIR directory when this is setup.
5327
5328 Options
5329 --pgdata
5330 Location of the Postgres node being managed locally. Defaults to
5331 the environment variable PGDATA. Use --monitor to connect to a
5332 monitor from anywhere, rather than the monitor URI used by a lo‐
5333 cal Postgres node managed with pg_autoctl.
5334
5335 --all List all the files that belong to this pg_autoctl node.
5336
5337 --config
5338 Show only the configuration file.
5339
5340 --state
5341 Show only the state file.
5342
5343 --init Show only the init state file, which only exists while the com‐
5344 mand pg_autoctl create postgres or the command pg_autoctl create
5345 monitor is running, or when than command failed (and can then be
5346 retried).
5347
5348 --pid Show only the pid file.
5349
5350 --contents
5351 When one of the options to show a specific file is in use, then
5352 --contents shows the contents of the selected file instead of
5353 showing its absolute file path.
5354
5355 --json Output JSON formatted data.
5356
5357 Environment
5358 PGDATA
5359 Postgres directory location. Can be used instead of the --pgdata op‐
5360 tion.
5361
5362 PG_AUTOCTL_MONITOR
5363 Postgres URI to connect to the monitor node, can be used instead of
5364 the --monitor option.
5365
5366 XDG_CONFIG_HOME
5367 The pg_autoctl command stores its configuration files in the stan‐
5368 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5369 tion.
5370
5371 XDG_DATA_HOME
5372 The pg_autoctl command stores its internal states files in the stan‐
5373 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5374 XDG Base Directory Specification.
5375
5376 Examples
5377 The following examples are taken from a QA environment that has been
5378 prepared thanks to the make cluster command made available to the
5379 pg_auto_failover contributors. As a result, the XDG environment vari‐
5380 ables have been tweaked to obtain a self-contained test:
5381
5382 $ tmux show-env | grep XDG
5383 XDG_CONFIG_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/config
5384 XDG_DATA_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/share
5385 XDG_RUNTIME_DIR=/Users/dim/dev/MS/pg_auto_failover/tmux/run
5386
5387 Within that self-contained test location, we can see the following ex‐
5388 amples.
5389
5390 $ pg_autoctl show file --pgdata ./node1
5391 File | Path
5392 --------+----------------
5393 Config | /Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.cfg
5394 State | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
5395 Init | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.init
5396 Pid | /Users/dim/dev/MS/pg_auto_failover/tmux/run/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.pid
5397 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
5398
5399 $ pg_autoctl show file --pgdata node1 --state
5400 /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
5401
5402 $ pg_autoctl show file --pgdata node1 --state --contents
5403 Current Role: primary
5404 Assigned Role: primary
5405 Last Monitor Contact: Thu Mar 18 17:32:25 2021
5406 Last Secondary Contact: 0
5407 pg_autoctl state version: 1
5408 group: 0
5409 node id: 1
5410 nodes version: 0
5411 PostgreSQL Version: 1201
5412 PostgreSQL CatVersion: 201909212
5413 PostgreSQL System Id: 6940955496243696337
5414
5415 pg_autoctl show file --pgdata node1 --config --contents --json | jq .pg_autoctl
5416 {
5417 "role": "keeper",
5418 "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
5419 "formation": "default",
5420 "group": 0,
5421 "name": "node1",
5422 "hostname": "localhost",
5423 "nodekind": "standalone"
5424 }
5425
5426 pg_autoctl show systemd
5427 pg_autoctl show systemd - Print systemd service file for this node
5428
5429 Synopsis
5430 This command outputs a configuration unit that is suitable for regis‐
5431 tering pg_autoctl as a systemd service.
5432
5433 Examples
5434 $ pg_autoctl show systemd --pgdata node1
5435 17:38:29 99778 INFO HINT: to complete a systemd integration, run the following commands:
5436 17:38:29 99778 INFO pg_autoctl -q show systemd --pgdata "node1" | sudo tee /etc/systemd/system/pgautofailover.service
5437 17:38:29 99778 INFO sudo systemctl daemon-reload
5438 17:38:29 99778 INFO sudo systemctl enable pgautofailover
5439 17:38:29 99778 INFO sudo systemctl start pgautofailover
5440 [Unit]
5441 Description = pg_auto_failover
5442
5443 [Service]
5444 WorkingDirectory = /Users/dim
5445 Environment = 'PGDATA=node1'
5446 User = dim
5447 ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
5448 Restart = always
5449 StartLimitBurst = 0
5450 ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload
5451
5452 [Install]
5453 WantedBy = multi-user.target
5454
5455 To avoid the logs output, use the -q option:
5456
5457 $ pg_autoctl show systemd --pgdata node1 -q
5458 [Unit]
5459 Description = pg_auto_failover
5460
5461 [Service]
5462 WorkingDirectory = /Users/dim
5463 Environment = 'PGDATA=node1'
5464 User = dim
5465 ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
5466 Restart = always
5467 StartLimitBurst = 0
5468 ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload
5469
5470 [Install]
5471 WantedBy = multi-user.target
5472
5473 pg_autoctl enable
5474 pg_autoctl enable - Enable a feature on a formation
5475
5476 pg_autoctl enable secondary
5477 pg_autoctl enable secondary - Enable secondary nodes on a formation
5478
5479 Synopsis
5480 This feature makes the most sense when using the Enterprise Edition of
5481 pg_auto_failover, which is fully compatible with Citus formations. When
5482 secondary are enabled, then Citus workers creation policy is to assign
5483 a primary node then a standby node for each group. When secondary is
5484 disabled the Citus workers creation policy is to assign only the pri‐
5485 mary nodes.
5486
5487 usage: pg_autoctl enable secondary [ --pgdata --formation ]
5488
5489 --pgdata path to data directory
5490 --formation Formation to enable secondary on
5491
5492 Options
5493 --pgdata
5494 Location of the Postgres node being managed locally. Defaults to
5495 the environment variable PGDATA. Use --monitor to connect to a
5496 monitor from anywhere, rather than the monitor URI used by a lo‐
5497 cal Postgres node managed with pg_autoctl.
5498
5499 --formation
5500 Target formation where to enable secondary feature.
5501
5502 Environment
5503 PGDATA
5504 Postgres directory location. Can be used instead of the --pgdata op‐
5505 tion.
5506
5507 PG_AUTOCTL_MONITOR
5508 Postgres URI to connect to the monitor node, can be used instead of
5509 the --monitor option.
5510
5511 XDG_CONFIG_HOME
5512 The pg_autoctl command stores its configuration files in the stan‐
5513 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5514 tion.
5515
5516 XDG_DATA_HOME
5517 The pg_autoctl command stores its internal states files in the stan‐
5518 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5519 XDG Base Directory Specification.
5520
5521 pg_autoctl enable maintenance
5522 pg_autoctl enable maintenance - Enable Postgres maintenance mode on
5523 this node
5524
5525 Synopsis
5526 A pg_auto_failover can be put to a maintenance state. The Postgres node
5527 is then still registered to the monitor, and is known to be unreliable
5528 until maintenance is disabled. A node in the maintenance state is not a
5529 candidate for promotion.
5530
5531 Typical use of the maintenance state include Operating System or Post‐
5532 gres reboot, e.g. when applying security upgrades.
5533
5534 usage: pg_autoctl enable maintenance [ --pgdata --allow-failover ]
5535
5536 --pgdata path to data directory
5537
5538 Options
5539 --pgdata
5540 Location of the Postgres node being managed locally. Defaults to
5541 the environment variable PGDATA. Use --monitor to connect to a
5542 monitor from anywhere, rather than the monitor URI used by a lo‐
5543 cal Postgres node managed with pg_autoctl.
5544
5545 --formation
5546 Target formation where to enable secondary feature.
5547
5548 Environment
5549 PGDATA
5550 Postgres directory location. Can be used instead of the --pgdata op‐
5551 tion.
5552
5553 PG_AUTOCTL_MONITOR
5554 Postgres URI to connect to the monitor node, can be used instead of
5555 the --monitor option.
5556
5557 XDG_CONFIG_HOME
5558 The pg_autoctl command stores its configuration files in the stan‐
5559 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5560 tion.
5561
5562 XDG_DATA_HOME
5563 The pg_autoctl command stores its internal states files in the stan‐
5564 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5565 XDG Base Directory Specification.
5566
5567 Examples
5568 pg_autoctl show state
5569 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5570 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5571 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
5572 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
5573 node3 | 3 | localhost:5503 | 0/4000760 | read-only | secondary | secondary
5574
5575 $ pg_autoctl enable maintenance --pgdata node3
5576 12:06:12 47086 INFO Listening monitor notifications about state changes in formation "default" and group 0
5577 12:06:12 47086 INFO Following table displays times when notifications are received
5578 Time | Name | Node | Host:Port | Current State | Assigned State
5579 ---------+-------+-------+----------------+---------------------+--------------------
5580 12:06:12 | node1 | 1 | localhost:5501 | primary | join_primary
5581 12:06:12 | node3 | 3 | localhost:5503 | secondary | wait_maintenance
5582 12:06:12 | node3 | 3 | localhost:5503 | wait_maintenance | wait_maintenance
5583 12:06:12 | node1 | 1 | localhost:5501 | join_primary | join_primary
5584 12:06:12 | node3 | 3 | localhost:5503 | wait_maintenance | maintenance
5585 12:06:12 | node1 | 1 | localhost:5501 | join_primary | primary
5586 12:06:13 | node3 | 3 | localhost:5503 | maintenance | maintenance
5587
5588 $ pg_autoctl show state
5589 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5590 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5591 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
5592 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
5593 node3 | 3 | localhost:5503 | 0/4000810 | none | maintenance | maintenance
5594
5595 pg_autoctl enable ssl
5596 pg_autoctl enable ssl - Enable SSL configuration on this node
5597
5598 Synopsis
5599 It is possible to manage Postgres SSL settings with the pg_autoctl com‐
5600 mand, both at pg_autoctl create postgres time and then again to change
5601 your mind and update the SSL settings at run-time.
5602
5603 usage: pg_autoctl enable ssl [ --pgdata ] [ --json ]
5604
5605 --pgdata path to data directory
5606 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
5607 --ssl-mode use that sslmode in connection strings
5608 --ssl-ca-file set the Postgres ssl_ca_file to that file path
5609 --ssl-crl-file set the Postgres ssl_crl_file to that file path
5610 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
5611 --server-key set the Postgres ssl_key_file to that file path
5612 --server-cert set the Postgres ssl_cert_file to that file path
5613
5614 Options
5615 --pgdata
5616 Location of the Postgres node being managed locally. Defaults to
5617 the environment variable PGDATA. Use --monitor to connect to a
5618 monitor from anywhere, rather than the monitor URI used by a lo‐
5619 cal Postgres node managed with pg_autoctl.
5620
5621 --ssl-self-signed
5622 Generate SSL self-signed certificates to provide network encryp‐
5623 tion. This does not protect against man-in-the-middle kinds of
5624 attacks. See Security settings for pg_auto_failover for more
5625 about our SSL settings.
5626
5627 --ssl-mode
5628 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
5629 cluding when connecting for streaming replication.
5630
5631 --ssl-ca-file
5632 Set the Postgres ssl_ca_file to that file path.
5633
5634 --ssl-crl-file
5635 Set the Postgres ssl_crl_file to that file path.
5636
5637 --no-ssl
5638 Don't enable network encryption. This is not recommended, prefer
5639 --ssl-self-signed.
5640
5641 --server-key
5642 Set the Postgres ssl_key_file to that file path.
5643
5644 --server-cert
5645 Set the Postgres ssl_cert_file to that file path.
5646
5647 Environment
5648 PGDATA
5649 Postgres directory location. Can be used instead of the --pgdata op‐
5650 tion.
5651
5652 PG_AUTOCTL_MONITOR
5653 Postgres URI to connect to the monitor node, can be used instead of
5654 the --monitor option.
5655
5656 XDG_CONFIG_HOME
5657 The pg_autoctl command stores its configuration files in the stan‐
5658 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5659 tion.
5660
5661 XDG_DATA_HOME
5662 The pg_autoctl command stores its internal states files in the stan‐
5663 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5664 XDG Base Directory Specification.
5665
5666 pg_autoctl enable monitor
5667 pg_autoctl enable monitor - Enable a monitor for this node to be or‐
5668 chestrated from
5669
5670 Synopsis
5671 It is possible to disable the pg_auto_failover monitor and enable it
5672 again online in a running pg_autoctl Postgres node. The main use-cases
5673 where this operation is useful is when the monitor node has to be re‐
5674 placed, either after a full crash of the previous monitor node, of for
5675 migrating to a new monitor node (hardware replacement, region or zone
5676 migration, etc).
5677
5678 usage: pg_autoctl enable monitor [ --pgdata --allow-failover ] postgres://autoctl_node@new.monitor.add.ress/pg_auto_failover
5679
5680 --pgdata path to data directory
5681
5682 Options
5683 --pgdata
5684 Location of the Postgres node being managed locally. Defaults to
5685 the environment variable PGDATA. Use --monitor to connect to a
5686 monitor from anywhere, rather than the monitor URI used by a lo‐
5687 cal Postgres node managed with pg_autoctl.
5688
5689 Environment
5690 PGDATA
5691 Postgres directory location. Can be used instead of the --pgdata op‐
5692 tion.
5693
5694 PG_AUTOCTL_MONITOR
5695 Postgres URI to connect to the monitor node, can be used instead of
5696 the --monitor option.
5697
5698 XDG_CONFIG_HOME
5699 The pg_autoctl command stores its configuration files in the stan‐
5700 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5701 tion.
5702
5703 XDG_DATA_HOME
5704 The pg_autoctl command stores its internal states files in the stan‐
5705 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5706 XDG Base Directory Specification.
5707
5708 Examples
5709 $ pg_autoctl show state
5710 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5711 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5712 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
5713 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
5714
5715
5716 $ pg_autoctl enable monitor --pgdata node3 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=require'
5717 12:42:07 43834 INFO Registered node 3 (localhost:5503) with name "node3" in formation "default", group 0, state "wait_standby"
5718 12:42:07 43834 INFO Successfully registered to the monitor with nodeId 3
5719 12:42:08 43834 INFO Still waiting for the monitor to drive us to state "catchingup"
5720 12:42:08 43834 WARN Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
5721
5722 $ pg_autoctl show state
5723 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5724 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5725 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
5726 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
5727 node3 | 3 | localhost:5503 | 0/4000810 | read-only | secondary | secondary
5728
5729 pg_autoctl disable
5730 pg_autoctl disable - Disable a feature on a formation
5731
5732 pg_autoctl disable secondary
5733 pg_autoctl disable secondary - Disable secondary nodes on a formation
5734
5735 Synopsis
5736 This feature makes the most sense when using the Enterprise Edition of
5737 pg_auto_failover, which is fully compatible with Citus formations. When
5738 secondary are disabled, then Citus workers creation policy is to assign
5739 a primary node then a standby node for each group. When secondary is
5740 disabled the Citus workers creation policy is to assign only the pri‐
5741 mary nodes.
5742
5743 usage: pg_autoctl disable secondary [ --pgdata --formation ]
5744
5745 --pgdata path to data directory
5746 --formation Formation to disable secondary on
5747
5748 Options
5749 --pgdata
5750 Location of the Postgres node being managed locally. Defaults to
5751 the environment variable PGDATA. Use --monitor to connect to a
5752 monitor from anywhere, rather than the monitor URI used by a lo‐
5753 cal Postgres node managed with pg_autoctl.
5754
5755 --formation
5756 Target formation where to disable secondary feature.
5757
5758 Environment
5759 PGDATA
5760 Postgres directory location. Can be used instead of the --pgdata op‐
5761 tion.
5762
5763 PG_AUTOCTL_MONITOR
5764 Postgres URI to connect to the monitor node, can be used instead of
5765 the --monitor option.
5766
5767 XDG_CONFIG_HOME
5768 The pg_autoctl command stores its configuration files in the stan‐
5769 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5770 tion.
5771
5772 XDG_DATA_HOME
5773 The pg_autoctl command stores its internal states files in the stan‐
5774 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5775 XDG Base Directory Specification.
5776
5777 pg_autoctl disable maintenance
5778 pg_autoctl disable maintenance - Disable Postgres maintenance mode on
5779 this node
5780
5781 Synopsis
5782 A pg_auto_failover can be put to a maintenance state. The Postgres node
5783 is then still registered to the monitor, and is known to be unreliable
5784 until maintenance is disabled. A node in the maintenance state is not a
5785 candidate for promotion.
5786
5787 Typical use of the maintenance state include Operating System or Post‐
5788 gres reboot, e.g. when applying security upgrades.
5789
5790 usage: pg_autoctl disable maintenance [ --pgdata --allow-failover ]
5791
5792 --pgdata path to data directory
5793
5794 Options
5795 --pgdata
5796 Location of the Postgres node being managed locally. Defaults to
5797 the environment variable PGDATA. Use --monitor to connect to a
5798 monitor from anywhere, rather than the monitor URI used by a lo‐
5799 cal Postgres node managed with pg_autoctl.
5800
5801 --formation
5802 Target formation where to disable secondary feature.
5803
5804 Environment
5805 PGDATA
5806 Postgres directory location. Can be used instead of the --pgdata op‐
5807 tion.
5808
5809 PG_AUTOCTL_MONITOR
5810 Postgres URI to connect to the monitor node, can be used instead of
5811 the --monitor option.
5812
5813 XDG_CONFIG_HOME
5814 The pg_autoctl command stores its configuration files in the stan‐
5815 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5816 tion.
5817
5818 XDG_DATA_HOME
5819 The pg_autoctl command stores its internal states files in the stan‐
5820 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5821 XDG Base Directory Specification.
5822
5823 Examples
5824 $ pg_autoctl show state
5825 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5826 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5827 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
5828 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
5829 node3 | 3 | localhost:5503 | 0/4000810 | none | maintenance | maintenance
5830
5831 $ pg_autoctl disable maintenance --pgdata node3
5832 12:06:37 47542 INFO Listening monitor notifications about state changes in formation "default" and group 0
5833 12:06:37 47542 INFO Following table displays times when notifications are received
5834 Time | Name | Node | Host:Port | Current State | Assigned State
5835 ---------+-------+-------+----------------+---------------------+--------------------
5836 12:06:37 | node3 | 3 | localhost:5503 | maintenance | catchingup
5837 12:06:37 | node3 | 3 | localhost:5503 | catchingup | catchingup
5838 12:06:37 | node3 | 3 | localhost:5503 | catchingup | secondary
5839 12:06:37 | node3 | 3 | localhost:5503 | secondary | secondary
5840
5841 $ pg_autoctl show state
5842 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5843 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5844 node1 | 1 | localhost:5501 | 0/4000848 | read-write | primary | primary
5845 node2 | 2 | localhost:5502 | 0/4000848 | read-only | secondary | secondary
5846 node3 | 3 | localhost:5503 | 0/4000000 | read-only | secondary | secondary
5847
5848 pg_autoctl disable ssl
5849 pg_autoctl disable ssl - Disable SSL configuration on this node
5850
5851 Synopsis
5852 It is possible to manage Postgres SSL settings with the pg_autoctl com‐
5853 mand, both at pg_autoctl create postgres time and then again to change
5854 your mind and update the SSL settings at run-time.
5855
5856 usage: pg_autoctl disable ssl [ --pgdata ] [ --json ]
5857
5858 --pgdata path to data directory
5859 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
5860 --ssl-mode use that sslmode in connection strings
5861 --ssl-ca-file set the Postgres ssl_ca_file to that file path
5862 --ssl-crl-file set the Postgres ssl_crl_file to that file path
5863 --no-ssl don't disable network encryption (NOT recommended, prefer --ssl-self-signed)
5864 --server-key set the Postgres ssl_key_file to that file path
5865 --server-cert set the Postgres ssl_cert_file to that file path
5866
5867 Options
5868 --pgdata
5869 Location of the Postgres node being managed locally. Defaults to
5870 the environment variable PGDATA. Use --monitor to connect to a
5871 monitor from anywhere, rather than the monitor URI used by a lo‐
5872 cal Postgres node managed with pg_autoctl.
5873
5874 --ssl-self-signed
5875 Generate SSL self-signed certificates to provide network encryp‐
5876 tion. This does not protect against man-in-the-middle kinds of
5877 attacks. See Security settings for pg_auto_failover for more
5878 about our SSL settings.
5879
5880 --ssl-mode
5881 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
5882 cluding when connecting for streaming replication.
5883
5884 --ssl-ca-file
5885 Set the Postgres ssl_ca_file to that file path.
5886
5887 --ssl-crl-file
5888 Set the Postgres ssl_crl_file to that file path.
5889
5890 --no-ssl
5891 Don't disable network encryption. This is not recommended, pre‐
5892 fer --ssl-self-signed.
5893
5894 --server-key
5895 Set the Postgres ssl_key_file to that file path.
5896
5897 --server-cert
5898 Set the Postgres ssl_cert_file to that file path.
5899
5900 Environment
5901 PGDATA
5902 Postgres directory location. Can be used instead of the --pgdata op‐
5903 tion.
5904
5905 PG_AUTOCTL_MONITOR
5906 Postgres URI to connect to the monitor node, can be used instead of
5907 the --monitor option.
5908
5909 XDG_CONFIG_HOME
5910 The pg_autoctl command stores its configuration files in the stan‐
5911 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5912 tion.
5913
5914 XDG_DATA_HOME
5915 The pg_autoctl command stores its internal states files in the stan‐
5916 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5917 XDG Base Directory Specification.
5918
5919 pg_autoctl disable monitor
5920 pg_autoctl disable monitor - Disable the monitor for this node
5921
5922 Synopsis
5923 It is possible to disable the pg_auto_failover monitor and enable it
5924 again online in a running pg_autoctl Postgres node. The main use-cases
5925 where this operation is useful is when the monitor node has to be re‐
5926 placed, either after a full crash of the previous monitor node, of for
5927 migrating to a new monitor node (hardware replacement, region or zone
5928 migration, etc).
5929
5930 usage: pg_autoctl disable monitor [ --pgdata --force ]
5931
5932 --pgdata path to data directory
5933 --force force unregistering from the monitor
5934
5935 Options
5936 --pgdata
5937 Location of the Postgres node being managed locally. Defaults to
5938 the environment variable PGDATA. Use --monitor to connect to a
5939 monitor from anywhere, rather than the monitor URI used by a lo‐
5940 cal Postgres node managed with pg_autoctl.
5941
5942 --force
5943 The --force covers the two following situations:
5944
5945 1. By default, the command expects to be able to connect to
5946 the current monitor. When the current known monitor in the
5947 setup is not running anymore, use --force to skip this
5948 step.
5949
5950 2. When pg_autoctl could connect to the monitor and the node
5951 is found there, this is normally an error that prevents
5952 from disabling the monitor. Using --force allows the com‐
5953 mand to drop the node from the monitor and continue with
5954 disabling the monitor.
5955
5956 Environment
5957 PGDATA
5958 Postgres directory location. Can be used instead of the --pgdata op‐
5959 tion.
5960
5961 PG_AUTOCTL_MONITOR
5962 Postgres URI to connect to the monitor node, can be used instead of
5963 the --monitor option.
5964
5965 XDG_CONFIG_HOME
5966 The pg_autoctl command stores its configuration files in the stan‐
5967 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
5968 tion.
5969
5970 XDG_DATA_HOME
5971 The pg_autoctl command stores its internal states files in the stan‐
5972 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
5973 XDG Base Directory Specification.
5974
5975 Examples
5976 $ pg_autoctl show state
5977 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5978 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5979 node1 | 1 | localhost:5501 | 0/4000148 | read-write | primary | primary
5980 node2 | 2 | localhost:5502 | 0/4000148 | read-only | secondary | secondary
5981 node3 | 3 | localhost:5503 | 0/4000148 | read-only | secondary | secondary
5982
5983
5984 $ pg_autoctl disable monitor --pgdata node3
5985 12:41:21 43039 INFO Found node 3 "node3" (localhost:5503) on the monitor
5986 12:41:21 43039 FATAL Use --force to remove the node from the monitor
5987
5988 $ pg_autoctl disable monitor --pgdata node3 --force
5989 12:41:32 43219 INFO Removing node 3 "node3" (localhost:5503) from monitor
5990
5991 $ pg_autoctl show state
5992 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
5993 ------+-------+----------------+-----------+--------------+---------------------+--------------------
5994 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
5995 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
5996
5997 pg_autoctl get
5998 pg_autoctl get - Get a pg_auto_failover node, or formation setting
5999
6000 pg_autoctl get formation settings
6001 pg_autoctl get formation settings - get replication settings for a for‐
6002 mation from the monitor
6003
6004 Synopsis
6005 This command prints a pg_autoctl replication settings:
6006
6007 usage: pg_autoctl get formation settings [ --pgdata ] [ --json ] [ --formation ]
6008
6009 --pgdata path to data directory
6010 --json output data in the JSON format
6011 --formation pg_auto_failover formation
6012
6013 Description
6014 See also pg_autoctl show settings which is a synonym.
6015
6016 Options
6017 --pgdata
6018 Location of the Postgres node being managed locally. Defaults to
6019 the environment variable PGDATA. Use --monitor to connect to a
6020 monitor from anywhere, rather than the monitor URI used by a lo‐
6021 cal Postgres node managed with pg_autoctl.
6022
6023 --json Output JSON formatted data.
6024
6025 --formation
6026 Show replication settings for given formation. Defaults to de‐
6027 fault.
6028
6029 Environment
6030 PGDATA
6031 Postgres directory location. Can be used instead of the --pgdata op‐
6032 tion.
6033
6034 PG_AUTOCTL_MONITOR
6035 Postgres URI to connect to the monitor node, can be used instead of
6036 the --monitor option.
6037
6038 XDG_CONFIG_HOME
6039 The pg_autoctl command stores its configuration files in the stan‐
6040 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6041 tion.
6042
6043 XDG_DATA_HOME
6044 The pg_autoctl command stores its internal states files in the stan‐
6045 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6046 XDG Base Directory Specification.
6047
6048 Examples
6049 $ pg_autoctl get formation settings
6050 Context | Name | Setting | Value
6051 ----------+---------+---------------------------+-------------------------------------------------------------
6052 formation | default | number_sync_standbys | 1
6053 primary | node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
6054 node | node1 | candidate priority | 50
6055 node | node2 | candidate priority | 50
6056 node | node3 | candidate priority | 50
6057 node | node1 | replication quorum | true
6058 node | node2 | replication quorum | true
6059 node | node3 | replication quorum | true
6060
6061 $ pg_autoctl get formation settings --json
6062 {
6063 "nodes": [
6064 {
6065 "value": "true",
6066 "context": "node",
6067 "node_id": 1,
6068 "setting": "replication quorum",
6069 "group_id": 0,
6070 "nodename": "node1"
6071 },
6072 {
6073 "value": "true",
6074 "context": "node",
6075 "node_id": 2,
6076 "setting": "replication quorum",
6077 "group_id": 0,
6078 "nodename": "node2"
6079 },
6080 {
6081 "value": "true",
6082 "context": "node",
6083 "node_id": 3,
6084 "setting": "replication quorum",
6085 "group_id": 0,
6086 "nodename": "node3"
6087 },
6088 {
6089 "value": "50",
6090 "context": "node",
6091 "node_id": 1,
6092 "setting": "candidate priority",
6093 "group_id": 0,
6094 "nodename": "node1"
6095 },
6096 {
6097 "value": "50",
6098 "context": "node",
6099 "node_id": 2,
6100 "setting": "candidate priority",
6101 "group_id": 0,
6102 "nodename": "node2"
6103 },
6104 {
6105 "value": "50",
6106 "context": "node",
6107 "node_id": 3,
6108 "setting": "candidate priority",
6109 "group_id": 0,
6110 "nodename": "node3"
6111 }
6112 ],
6113 "primary": [
6114 {
6115 "value": "'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'",
6116 "context": "primary",
6117 "node_id": 1,
6118 "setting": "synchronous_standby_names",
6119 "group_id": 0,
6120 "nodename": "node1"
6121 }
6122 ],
6123 "formation": {
6124 "value": "1",
6125 "context": "formation",
6126 "node_id": null,
6127 "setting": "number_sync_standbys",
6128 "group_id": null,
6129 "nodename": "default"
6130 }
6131 }
6132
6133 pg_autoctl get formation number-sync-standbys
6134 pg_autoctl get formation number-sync-standbys - get number_sync_stand‐
6135 bys for a formation from the monitor
6136
6137 Synopsis
6138 This command prints a pg_autoctl replication settings for number sync
6139 standbys:
6140
6141 usage: pg_autoctl get formation number-sync-standbys [ --pgdata ] [ --json ] [ --formation ]
6142
6143 --pgdata path to data directory
6144 --json output data in the JSON format
6145 --formation pg_auto_failover formation
6146
6147 Description
6148 See also pg_autoctl show settings for the full list of replication set‐
6149 tings.
6150
6151 Options
6152 --pgdata
6153 Location of the Postgres node being managed locally. Defaults to
6154 the environment variable PGDATA. Use --monitor to connect to a
6155 monitor from anywhere, rather than the monitor URI used by a lo‐
6156 cal Postgres node managed with pg_autoctl.
6157
6158 --json Output JSON formatted data.
6159
6160 --formation
6161 Show replication settings for given formation. Defaults to de‐
6162 fault.
6163
6164 Environment
6165 PGDATA
6166 Postgres directory location. Can be used instead of the --pgdata op‐
6167 tion.
6168
6169 PG_AUTOCTL_MONITOR
6170 Postgres URI to connect to the monitor node, can be used instead of
6171 the --monitor option.
6172
6173 XDG_CONFIG_HOME
6174 The pg_autoctl command stores its configuration files in the stan‐
6175 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6176 tion.
6177
6178 XDG_DATA_HOME
6179 The pg_autoctl command stores its internal states files in the stan‐
6180 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6181 XDG Base Directory Specification.
6182
6183 Examples
6184 $ pg_autoctl get formation number-sync-standbys
6185 1
6186
6187 $ pg_autoctl get formation number-sync-standbys --json
6188 {
6189 "number-sync-standbys": 1
6190 }
6191
6192 pg_autoctl get node replication-quorum
6193 pg_autoctl get replication-quorum - get replication-quorum property
6194 from the monitor
6195
6196 Synopsis
6197 This command prints pg_autoctl replication quorum for a given node:
6198
6199 usage: pg_autoctl get node replication-quorum [ --pgdata ] [ --json ] [ --formation ] [ --name ]
6200
6201 --pgdata path to data directory
6202 --formation pg_auto_failover formation
6203 --name pg_auto_failover node name
6204 --json output data in the JSON format
6205
6206 Description
6207 See also pg_autoctl show settings for the full list of replication set‐
6208 tings.
6209
6210 Options
6211 --pgdata
6212 Location of the Postgres node being managed locally. Defaults to
6213 the environment variable PGDATA. Use --monitor to connect to a
6214 monitor from anywhere, rather than the monitor URI used by a lo‐
6215 cal Postgres node managed with pg_autoctl.
6216
6217 --json Output JSON formatted data.
6218
6219 --formation
6220 Show replication settings for given formation. Defaults to de‐
6221 fault.
6222
6223 --name Show replication settings for given node, selected by name.
6224
6225 Environment
6226 PGDATA
6227 Postgres directory location. Can be used instead of the --pgdata op‐
6228 tion.
6229
6230 PG_AUTOCTL_MONITOR
6231 Postgres URI to connect to the monitor node, can be used instead of
6232 the --monitor option.
6233
6234 XDG_CONFIG_HOME
6235 The pg_autoctl command stores its configuration files in the stan‐
6236 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6237 tion.
6238
6239 XDG_DATA_HOME
6240 The pg_autoctl command stores its internal states files in the stan‐
6241 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6242 XDG Base Directory Specification.
6243
6244 Examples
6245 $ pg_autoctl get node replication-quorum --name node1
6246 true
6247
6248 $ pg_autoctl get node replication-quorum --name node1 --json
6249 {
6250 "name": "node1",
6251 "replication-quorum": true
6252 }
6253
6254 pg_autoctl get node candidate-priority
6255 pg_autoctl get candidate-priority - get candidate-priority property
6256 from the monitor
6257
6258 Synopsis
6259 This command prints pg_autoctl candidate priority for a given node:
6260
6261 usage: pg_autoctl get node candidate-priority [ --pgdata ] [ --json ] [ --formation ] [ --name ]
6262
6263 --pgdata path to data directory
6264 --formation pg_auto_failover formation
6265 --name pg_auto_failover node name
6266 --json output data in the JSON format
6267
6268 Description
6269 See also pg_autoctl show settings for the full list of replication set‐
6270 tings.
6271
6272 Options
6273 --pgdata
6274 Location of the Postgres node being managed locally. Defaults to
6275 the environment variable PGDATA. Use --monitor to connect to a
6276 monitor from anywhere, rather than the monitor URI used by a lo‐
6277 cal Postgres node managed with pg_autoctl.
6278
6279 --json Output JSON formatted data.
6280
6281 --formation
6282 Show replication settings for given formation. Defaults to de‐
6283 fault.
6284
6285 --name Show replication settings for given node, selected by name.
6286
6287 Examples
6288 $ pg_autoctl get node candidate-priority --name node1
6289 50
6290
6291 $ pg_autoctl get node candidate-priority --name node1 --json
6292 {
6293 "name": "node1",
6294 "candidate-priority": 50
6295 }
6296
6297 pg_autoctl set
6298 pg_autoctl set - Set a pg_auto_failover node, or formation setting
6299
6300 pg_autoctl set formation number-sync-standbys
6301 pg_autoctl set formation number-sync-standbys - set number_sync_stand‐
6302 bys for a formation from the monitor
6303
6304 Synopsis
6305 This command set a pg_autoctl replication settings for number sync
6306 standbys:
6307
6308 usage: pg_autoctl set formation number-sync-standbys [ --pgdata ] [ --json ] [ --formation ] <number_sync_standbys>
6309
6310 --pgdata path to data directory
6311 --formation pg_auto_failover formation
6312 --json output data in the JSON format
6313
6314 Description
6315 The pg_auto_failover monitor ensures that at least N+1 candidate
6316 standby nodes are registered when number-sync-standbys is N. This means
6317 that to be able to run the following command, at least 3 standby nodes
6318 with a non-zero candidate priority must be registered to the monitor:
6319
6320 $ pg_autoctl set formation number-sync-standbys 2
6321
6322 See also pg_autoctl show settings for the full list of replication set‐
6323 tings.
6324
6325 Options
6326 --pgdata
6327 Location of the Postgres node being managed locally. Defaults to
6328 the environment variable PGDATA. Use --monitor to connect to a
6329 monitor from anywhere, rather than the monitor URI used by a lo‐
6330 cal Postgres node managed with pg_autoctl.
6331
6332 --json Output JSON formatted data.
6333
6334 --formation
6335 Show replication settings for given formation. Defaults to de‐
6336 fault.
6337
6338 Environment
6339 PGDATA
6340 Postgres directory location. Can be used instead of the --pgdata op‐
6341 tion.
6342
6343 PG_AUTOCTL_MONITOR
6344 Postgres URI to connect to the monitor node, can be used instead of
6345 the --monitor option.
6346
6347 XDG_CONFIG_HOME
6348 The pg_autoctl command stores its configuration files in the stan‐
6349 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6350 tion.
6351
6352 XDG_DATA_HOME
6353 The pg_autoctl command stores its internal states files in the stan‐
6354 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6355 XDG Base Directory Specification.
6356
6357 pg_autoctl set node replication-quorum
6358 pg_autoctl set replication-quorum - set replication-quorum property
6359 from the monitor
6360
6361 Synopsis
6362 This command sets pg_autoctl replication quorum for a given node:
6363
6364 usage: pg_autoctl set node replication-quorum [ --pgdata ] [ --json ] [ --formation ] [ --name ] <true|false>
6365
6366 --pgdata path to data directory
6367 --formation pg_auto_failover formation
6368 --name pg_auto_failover node name
6369 --json output data in the JSON format
6370
6371 Description
6372 See also pg_autoctl show settings for the full list of replication set‐
6373 tings.
6374
6375 Options
6376 --pgdata
6377 Location of the Postgres node being managed locally. Defaults to
6378 the environment variable PGDATA. Use --monitor to connect to a
6379 monitor from anywhere, rather than the monitor URI used by a lo‐
6380 cal Postgres node managed with pg_autoctl.
6381
6382 --json Output JSON formatted data.
6383
6384 --formation
6385 Show replication settings for given formation. Defaults to de‐
6386 fault.
6387
6388 --name Show replication settings for given node, selected by name.
6389
6390 Environment
6391 PGDATA
6392 Postgres directory location. Can be used instead of the --pgdata op‐
6393 tion.
6394
6395 PG_AUTOCTL_MONITOR
6396 Postgres URI to connect to the monitor node, can be used instead of
6397 the --monitor option.
6398
6399 XDG_CONFIG_HOME
6400 The pg_autoctl command stores its configuration files in the stan‐
6401 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6402 tion.
6403
6404 XDG_DATA_HOME
6405 The pg_autoctl command stores its internal states files in the stan‐
6406 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6407 XDG Base Directory Specification.
6408
6409 Examples
6410 $ pg_autoctl set node replication-quorum --name node1 false
6411 12:49:37 94092 INFO Waiting for the settings to have been applied to the monitor and primary node
6412 12:49:37 94092 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
6413 12:49:37 94092 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
6414 12:49:37 94092 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
6415 false
6416
6417 $ pg_autoctl set node replication-quorum --name node1 true --json
6418 12:49:42 94199 INFO Waiting for the settings to have been applied to the monitor and primary node
6419 12:49:42 94199 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
6420 12:49:42 94199 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
6421 12:49:43 94199 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
6422 {
6423 "replication-quorum": true
6424 }
6425
6426 pg_autoctl set node candidate-priority
6427 pg_autoctl set candidate-priority - set candidate-priority property
6428 from the monitor
6429
6430 Synopsis
6431 This command sets the pg_autoctl candidate priority for a given node:
6432
6433 usage: pg_autoctl set node candidate-priority [ --pgdata ] [ --json ] [ --formation ] [ --name ] <priority: 0..100>
6434
6435 --pgdata path to data directory
6436 --formation pg_auto_failover formation
6437 --name pg_auto_failover node name
6438 --json output data in the JSON format
6439
6440 Description
6441 See also pg_autoctl show settings for the full list of replication set‐
6442 tings.
6443
6444 Options
6445 --pgdata
6446 Location of the Postgres node being managed locally. Defaults to
6447 the environment variable PGDATA. Use --monitor to connect to a
6448 monitor from anywhere, rather than the monitor URI used by a lo‐
6449 cal Postgres node managed with pg_autoctl.
6450
6451 --json Output JSON formatted data.
6452
6453 --formation
6454 Show replication settings for given formation. Defaults to de‐
6455 fault.
6456
6457 --name Show replication settings for given node, selected by name.
6458
6459 Environment
6460 PGDATA
6461 Postgres directory location. Can be used instead of the --pgdata op‐
6462 tion.
6463
6464 PG_AUTOCTL_MONITOR
6465 Postgres URI to connect to the monitor node, can be used instead of
6466 the --monitor option.
6467
6468 XDG_CONFIG_HOME
6469 The pg_autoctl command stores its configuration files in the stan‐
6470 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6471 tion.
6472
6473 XDG_DATA_HOME
6474 The pg_autoctl command stores its internal states files in the stan‐
6475 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6476 XDG Base Directory Specification.
6477
6478 Examples
6479 $ pg_autoctl set node candidate-priority --name node1 65
6480 12:47:59 92326 INFO Waiting for the settings to have been applied to the monitor and primary node
6481 12:47:59 92326 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
6482 12:47:59 92326 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
6483 12:47:59 92326 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
6484 65
6485
6486 $ pg_autoctl set node candidate-priority --name node1 50 --json
6487 12:48:05 92450 INFO Waiting for the settings to have been applied to the monitor and primary node
6488 12:48:05 92450 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
6489 12:48:05 92450 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
6490 12:48:05 92450 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
6491 {
6492 "candidate-priority": 50
6493 }
6494
6495 pg_autoctl perform
6496 pg_autoctl perform - Perform an action orchestrated by the monitor
6497
6498 pg_autoctl perform failover
6499 pg_autoctl perform failover - Perform a failover for given formation
6500 and group
6501
6502 Synopsis
6503 This command starts a Postgres failover orchestration from the
6504 pg_auto_failover monitor:
6505
6506 usage: pg_autoctl perform failover [ --pgdata --formation --group ]
6507
6508 --pgdata path to data directory
6509 --formation formation to target, defaults to 'default'
6510 --group group to target, defaults to 0
6511 --wait how many seconds to wait, default to 60
6512
6513 Description
6514 The pg_auto_failover monitor can be used to orchestrate a manual
6515 failover, sometimes also known as a switchover. When doing so,
6516 split-brain are prevented thanks to intermediary states being used in
6517 the Finite State Machine.
6518
6519 The pg_autoctl perform failover command waits until the failover is
6520 known complete on the monitor, or until the hard-coded 60s timeout has
6521 passed.
6522
6523 The failover orchestration is done in the background by the monitor, so
6524 even if the pg_autoctl perform failover stops on the timeout, the
6525 failover orchestration continues at the monitor.
6526
6527 Options
6528 --pgdata
6529 Location of the Postgres node being managed locally. Defaults to
6530 the environment variable PGDATA. Use --monitor to connect to a
6531 monitor from anywhere, rather than the monitor URI used by a lo‐
6532 cal Postgres node managed with pg_autoctl.
6533
6534 --formation
6535 Formation to target for the operation. Defaults to default.
6536
6537 --group
6538 Postgres group to target for the operation. Defaults to 0, only
6539 Citus formations may have more than one group.
6540
6541 --wait How many seconds to wait for notifications about the promotion.
6542 The command stops when the promotion is finished (a node is pri‐
6543 mary), or when the timeout has elapsed, whichever comes first.
6544 The value 0 (zero) disables the timeout and allows the command
6545 to wait forever.
6546
6547 Environment
6548 PGDATA
6549 Postgres directory location. Can be used instead of the --pgdata op‐
6550 tion.
6551
6552 PG_AUTOCTL_MONITOR
6553 Postgres URI to connect to the monitor node, can be used instead of
6554 the --monitor option.
6555
6556 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
6557 See the Postgres docs about Environment Variables for details.
6558
6559 TMPDIR
6560 The pgcopydb command creates all its work files and directories in
6561 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
6562
6563 XDG_CONFIG_HOME
6564 The pg_autoctl command stores its configuration files in the stan‐
6565 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6566 tion.
6567
6568 XDG_DATA_HOME
6569 The pg_autoctl command stores its internal states files in the stan‐
6570 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6571 XDG Base Directory Specification.
6572
6573 Examples
6574 $ pg_autoctl perform failover
6575 12:57:30 3635 INFO Listening monitor notifications about state changes in formation "default" and group 0
6576 12:57:30 3635 INFO Following table displays times when notifications are received
6577 Time | Name | Node | Host:Port | Current State | Assigned State
6578 ---------+-------+-------+----------------+---------------------+--------------------
6579 12:57:30 | node1 | 1 | localhost:5501 | primary | draining
6580 12:57:30 | node1 | 1 | localhost:5501 | draining | draining
6581 12:57:30 | node2 | 2 | localhost:5502 | secondary | report_lsn
6582 12:57:30 | node3 | 3 | localhost:5503 | secondary | report_lsn
6583 12:57:36 | node3 | 3 | localhost:5503 | report_lsn | report_lsn
6584 12:57:36 | node2 | 2 | localhost:5502 | report_lsn | report_lsn
6585 12:57:36 | node2 | 2 | localhost:5502 | report_lsn | prepare_promotion
6586 12:57:36 | node2 | 2 | localhost:5502 | prepare_promotion | prepare_promotion
6587 12:57:36 | node2 | 2 | localhost:5502 | prepare_promotion | stop_replication
6588 12:57:36 | node1 | 1 | localhost:5501 | draining | demote_timeout
6589 12:57:36 | node3 | 3 | localhost:5503 | report_lsn | join_secondary
6590 12:57:36 | node1 | 1 | localhost:5501 | demote_timeout | demote_timeout
6591 12:57:36 | node3 | 3 | localhost:5503 | join_secondary | join_secondary
6592 12:57:37 | node2 | 2 | localhost:5502 | stop_replication | stop_replication
6593 12:57:37 | node2 | 2 | localhost:5502 | stop_replication | wait_primary
6594 12:57:37 | node1 | 1 | localhost:5501 | demote_timeout | demoted
6595 12:57:37 | node1 | 1 | localhost:5501 | demoted | demoted
6596 12:57:37 | node2 | 2 | localhost:5502 | wait_primary | wait_primary
6597 12:57:37 | node3 | 3 | localhost:5503 | join_secondary | secondary
6598 12:57:37 | node1 | 1 | localhost:5501 | demoted | catchingup
6599 12:57:38 | node3 | 3 | localhost:5503 | secondary | secondary
6600 12:57:38 | node2 | 2 | localhost:5502 | wait_primary | primary
6601 12:57:38 | node1 | 1 | localhost:5501 | catchingup | catchingup
6602 12:57:38 | node2 | 2 | localhost:5502 | primary | primary
6603
6604 $ pg_autoctl show state
6605 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
6606 ------+-------+----------------+-----------+--------------+---------------------+--------------------
6607 node1 | 1 | localhost:5501 | 0/4000F50 | read-only | secondary | secondary
6608 node2 | 2 | localhost:5502 | 0/4000F50 | read-write | primary | primary
6609 node3 | 3 | localhost:5503 | 0/4000F50 | read-only | secondary | secondary
6610
6611 pg_autoctl perform switchover
6612 pg_autoctl perform switchover - Perform a switchover for given forma‐
6613 tion and group
6614
6615 Synopsis
6616 This command starts a Postgres switchover orchestration from the
6617 pg_auto_switchover monitor:
6618
6619 usage: pg_autoctl perform switchover [ --pgdata --formation --group ]
6620
6621 --pgdata path to data directory
6622 --formation formation to target, defaults to 'default'
6623 --group group to target, defaults to 0
6624
6625 Description
6626 The pg_auto_switchover monitor can be used to orchestrate a manual
6627 switchover, sometimes also known as a switchover. When doing so,
6628 split-brain are prevented thanks to intermediary states being used in
6629 the Finite State Machine.
6630
6631 The pg_autoctl perform switchover command waits until the switchover is
6632 known complete on the monitor, or until the hard-coded 60s timeout has
6633 passed.
6634
6635 The switchover orchestration is done in the background by the monitor,
6636 so even if the pg_autoctl perform switchover stops on the timeout, the
6637 switchover orchestration continues at the monitor.
6638
6639 See also pg_autoctl perform failover, a synonym for this command.
6640
6641 Options
6642 --pgdata
6643 Location of the Postgres node being managed locally. Defaults to
6644 the environment variable PGDATA. Use --monitor to connect to a
6645 monitor from anywhere, rather than the monitor URI used by a lo‐
6646 cal Postgres node managed with pg_autoctl.
6647
6648 --formation
6649 Formation to target for the operation. Defaults to default.
6650
6651 --group
6652 Postgres group to target for the operation. Defaults to 0, only
6653 Citus formations may have more than one group.
6654
6655 Environment
6656 PGDATA
6657 Postgres directory location. Can be used instead of the --pgdata op‐
6658 tion.
6659
6660 PG_AUTOCTL_MONITOR
6661 Postgres URI to connect to the monitor node, can be used instead of
6662 the --monitor option.
6663
6664 PG_CONFIG
6665 Can be set to the absolute path to the pg_config Postgres tool. This
6666 is mostly used in the context of building extensions, though it can
6667 be a useful way to select a Postgres version when several are in‐
6668 stalled on the same system.
6669
6670 PATH
6671 Used the usual way mostly. Some entries that are searched in the
6672 PATH by the pg_autoctl command are expected to be found only once,
6673 to avoid mistakes with Postgres major versions.
6674
6675 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
6676 See the Postgres docs about Environment Variables for details.
6677
6678 TMPDIR
6679 The pgcopydb command creates all its work files and directories in
6680 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
6681
6682 XDG_CONFIG_HOME
6683 The pg_autoctl command stores its configuration files in the stan‐
6684 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6685 tion.
6686
6687 XDG_DATA_HOME
6688 The pg_autoctl command stores its internal states files in the stan‐
6689 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6690 XDG Base Directory Specification.
6691
6692 pg_autoctl perform promotion
6693 pg_autoctl perform promotion - Perform a failover that promotes a tar‐
6694 get node
6695
6696 Synopsis
6697 This command starts a Postgres failover orchestration from the
6698 pg_auto_promotion monitor and targets given node:
6699
6700 usage: pg_autoctl perform promotion [ --pgdata --formation --group ]
6701
6702 --pgdata path to data directory
6703 --formation formation to target, defaults to 'default'
6704 --name node name to target, defaults to current node
6705 --wait how many seconds to wait, default to 60
6706
6707 Description
6708 The pg_auto_promotion monitor can be used to orchestrate a manual pro‐
6709 motion, sometimes also known as a switchover. When doing so,
6710 split-brain are prevented thanks to intermediary states being used in
6711 the Finite State Machine.
6712
6713 The pg_autoctl perform promotion command waits until the promotion is
6714 known complete on the monitor, or until the hard-coded 60s timeout has
6715 passed.
6716
6717 The promotion orchestration is done in the background by the monitor,
6718 so even if the pg_autoctl perform promotion stops on the timeout, the
6719 promotion orchestration continues at the monitor.
6720
6721 Options
6722 --pgdata
6723 Location of the Postgres node being managed locally. Defaults to
6724 the environment variable PGDATA. Use --monitor to connect to a
6725 monitor from anywhere, rather than the monitor URI used by a lo‐
6726 cal Postgres node managed with pg_autoctl.
6727
6728 --formation
6729 Formation to target for the operation. Defaults to default.
6730
6731 --name Name of the node that should be elected as the new primary node.
6732
6733 --wait How many seconds to wait for notifications about the promotion.
6734 The command stops when the promotion is finished (a node is pri‐
6735 mary), or when the timeout has elapsed, whichever comes first.
6736 The value 0 (zero) disables the timeout and allows the command
6737 to wait forever.
6738
6739 Environment
6740 PGDATA
6741 Postgres directory location. Can be used instead of the --pgdata op‐
6742 tion.
6743
6744 PG_AUTOCTL_MONITOR
6745 Postgres URI to connect to the monitor node, can be used instead of
6746 the --monitor option.
6747
6748 PG_CONFIG
6749 Can be set to the absolute path to the pg_config Postgres tool. This
6750 is mostly used in the context of building extensions, though it can
6751 be a useful way to select a Postgres version when several are in‐
6752 stalled on the same system.
6753
6754 PATH
6755 Used the usual way mostly. Some entries that are searched in the
6756 PATH by the pg_autoctl command are expected to be found only once,
6757 to avoid mistakes with Postgres major versions.
6758
6759 PGHOST, PGPORT, PGDATABASE, PGUSER, PGCONNECT_TIMEOUT, ...
6760 See the Postgres docs about Environment Variables for details.
6761
6762 TMPDIR
6763 The pgcopydb command creates all its work files and directories in
6764 ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.
6765
6766 XDG_CONFIG_HOME
6767 The pg_autoctl command stores its configuration files in the stan‐
6768 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
6769 tion.
6770
6771 XDG_DATA_HOME
6772 The pg_autoctl command stores its internal states files in the stan‐
6773 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
6774 XDG Base Directory Specification.
6775
6776 Examples
6777 $ pg_autoctl show state
6778 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
6779 ------+-------+----------------+-----------+--------------+---------------------+--------------------
6780 node1 | 1 | localhost:5501 | 0/4000F88 | read-only | secondary | secondary
6781 node2 | 2 | localhost:5502 | 0/4000F88 | read-write | primary | primary
6782 node3 | 3 | localhost:5503 | 0/4000F88 | read-only | secondary | secondary
6783
6784
6785 $ pg_autoctl perform promotion --name node1
6786 13:08:13 15297 INFO Listening monitor notifications about state changes in formation "default" and group 0
6787 13:08:13 15297 INFO Following table displays times when notifications are received
6788 Time | Name | Node | Host:Port | Current State | Assigned State
6789 ---------+-------+-------+----------------+---------------------+--------------------
6790 13:08:13 | node1 | 0/1 | localhost:5501 | secondary | secondary
6791 13:08:13 | node2 | 0/2 | localhost:5502 | primary | draining
6792 13:08:13 | node2 | 0/2 | localhost:5502 | draining | draining
6793 13:08:13 | node1 | 0/1 | localhost:5501 | secondary | report_lsn
6794 13:08:13 | node3 | 0/3 | localhost:5503 | secondary | report_lsn
6795 13:08:19 | node3 | 0/3 | localhost:5503 | report_lsn | report_lsn
6796 13:08:19 | node1 | 0/1 | localhost:5501 | report_lsn | report_lsn
6797 13:08:19 | node1 | 0/1 | localhost:5501 | report_lsn | prepare_promotion
6798 13:08:19 | node1 | 0/1 | localhost:5501 | prepare_promotion | prepare_promotion
6799 13:08:19 | node1 | 0/1 | localhost:5501 | prepare_promotion | stop_replication
6800 13:08:19 | node2 | 0/2 | localhost:5502 | draining | demote_timeout
6801 13:08:19 | node3 | 0/3 | localhost:5503 | report_lsn | join_secondary
6802 13:08:19 | node2 | 0/2 | localhost:5502 | demote_timeout | demote_timeout
6803 13:08:19 | node3 | 0/3 | localhost:5503 | join_secondary | join_secondary
6804 13:08:20 | node1 | 0/1 | localhost:5501 | stop_replication | stop_replication
6805 13:08:20 | node1 | 0/1 | localhost:5501 | stop_replication | wait_primary
6806 13:08:20 | node2 | 0/2 | localhost:5502 | demote_timeout | demoted
6807 13:08:20 | node1 | 0/1 | localhost:5501 | wait_primary | wait_primary
6808 13:08:20 | node3 | 0/3 | localhost:5503 | join_secondary | secondary
6809 13:08:20 | node2 | 0/2 | localhost:5502 | demoted | demoted
6810 13:08:20 | node2 | 0/2 | localhost:5502 | demoted | catchingup
6811 13:08:21 | node3 | 0/3 | localhost:5503 | secondary | secondary
6812 13:08:21 | node1 | 0/1 | localhost:5501 | wait_primary | primary
6813 13:08:21 | node2 | 0/2 | localhost:5502 | catchingup | catchingup
6814 13:08:21 | node1 | 0/1 | localhost:5501 | primary | primary
6815
6816 $ pg_autoctl show state
6817 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
6818 ------+-------+----------------+-----------+--------------+---------------------+--------------------
6819 node1 | 1 | localhost:5501 | 0/40012F0 | read-write | primary | primary
6820 node2 | 2 | localhost:5502 | 0/40012F0 | read-only | secondary | secondary
6821 node3 | 3 | localhost:5503 | 0/40012F0 | read-only | secondary | secondary
6822
6823 pg_autoctl do
6824 pg_autoctl do - Internal commands and internal QA tooling
6825
6826 The debug commands for pg_autoctl are only available when the environ‐
6827 ment variable PG_AUTOCTL_DEBUG is set (to any value).
6828
6829 When testing pg_auto_failover, it is helpful to be able to play with
6830 the local nodes using the same lower-level API as used by the
6831 pg_auto_failover Finite State Machine transitions. Some commands could
6832 be useful in contexts other than pg_auto_failover development and QA
6833 work, so some documentation has been made available.
6834
6835 pg_autoctl do tmux
6836 pg_autoctl do tmux - Set of facilities to handle tmux interactive ses‐
6837 sions
6838
6839 Synopsis
6840 pg_autoctl do tmux provides the following commands:
6841
6842 pg_autoctl do tmux
6843 + compose Set of facilities to handle docker-compose sessions
6844 script Produce a tmux script for a demo or a test case (debug only)
6845 session Run a tmux session for a demo or a test case
6846 stop Stop pg_autoctl processes that belong to a tmux session
6847 wait Wait until a given node has been registered on the monitor
6848 clean Clean-up a tmux session processes and root dir
6849
6850 pg_autoctl do tmux compose
6851 config Produce a docker-compose configuration file for a demo
6852 script Produce a tmux script for a demo or a test case (debug only)
6853 session Run a tmux session for a demo or a test case
6854
6855 Description
6856 An easy way to get started with pg_auto_failover in a localhost only
6857 formation with three nodes is to run the following command:
6858
6859 $ PG_AUTOCTL_DEBUG=1 pg_autoctl do tmux session \
6860 --root /tmp/pgaf \
6861 --first-pgport 9000 \
6862 --nodes 4 \
6863 --layout tiled
6864
6865 This requires the command tmux to be available in your PATH. The pg_au‐
6866 toctl do tmux session commands prepares a self-contained root directory
6867 where to create pg_auto_failover nodes and their configuration, then
6868 prepares a tmux script, and then runs the script with a command such
6869 as:
6870
6871 /usr/local/bin/tmux -v start-server ; source-file /tmp/pgaf/script-9000.tmux
6872
6873 The tmux session contains a single tmux window multiple panes:
6874
6875 • one pane for the monitor
6876
6877 • one pane per Postgres nodes, here 4 of them
6878
6879 • one pane for running pg_autoctl watch
6880
6881 • one extra pane for an interactive shell.
6882
6883 Usually the first two commands to run in the interactive shell, once
6884 the formation is stable (one node is primary, the other ones are all
6885 secondary), are the following:
6886
6887 $ pg_autoctl get formation settings
6888 $ pg_autoctl perform failover
6889
6890 Using docker-compose to run a distributed system
6891 The same idea can also be implemented with docker-compose to run the
6892 nodes, and still using tmux to have three control panels this time:
6893
6894 • one pane for the docker-compose cumulative logs of all the nodes
6895
6896 • one pane for running pg_autoctl watch
6897
6898 • one extra pane for an interactive shell.
6899
6900 For this setup, you can use the following command:
6901
6902 PG_AUTOCTL_DEBUG=1 pg_autoctl do tmux compose session \
6903 --root ./tmux/citus \
6904 --first-pgport 5600 \
6905 --nodes 3 \
6906 --async-nodes 0 \
6907 --node-priorities 50,50,0 \
6908 --sync-standbys -1 \
6909 --citus-workers 4 \
6910 --citus-secondaries 0 \
6911 --citus \
6912 --layout even-vertical
6913
6914 The pg_autoctl do tmux compose session command also takes care of cre‐
6915 ating external docker volumes and referencing them for each node in the
6916 docker-compose file.
6917
6918 pg_autoctl do tmux session
6919 This command runs a tmux session for a demo or a test case.
6920
6921 usage: pg_autoctl do tmux session [option ...]
6922
6923 --root path where to create a cluster
6924 --first-pgport first Postgres port to use (5500)
6925 --nodes number of Postgres nodes to create (2)
6926 --async-nodes number of async nodes within nodes (0)
6927 --node-priorities list of nodes priorities (50)
6928 --sync-standbys number-sync-standbys to set (0 or 1)
6929 --skip-pg-hba use --skip-pg-hba when creating nodes
6930 --citus start a Citus formation
6931 --citus-workers number of Citus workers to create (2)
6932 --citus-secondaries number of Citus secondaries to create (0)
6933 --layout tmux layout to use (even-vertical)
6934 --binpath path to the pg_autoctl binary (current binary path)
6935
6936 pg_autoctl do tmux compose session
6937 This command runs a tmux session for a demo or a test case. It gener‐
6938 ates a docker-compose file and then uses docker-compose to drive many
6939 nodes.
6940
6941 usage: pg_autoctl do tmux compose session [option ...]
6942
6943 --root path where to create a cluster
6944 --first-pgport first Postgres port to use (5500)
6945 --nodes number of Postgres nodes to create (2)
6946 --async-nodes number of async nodes within nodes (0)
6947 --node-priorities list of nodes priorities (50)
6948 --sync-standbys number-sync-standbys to set (0 or 1)
6949 --skip-pg-hba use --skip-pg-hba when creating nodes
6950 --layout tmux layout to use (even-vertical)
6951 --binpath path to the pg_autoctl binary (current binary path)
6952
6953 pg_autoctl do demo
6954 pg_autoctl do demo - Use a demo application for pg_auto_failover
6955
6956 Synopsis
6957 pg_autoctl do demo provides the following commands:
6958
6959 pg_autoctl do demo
6960 run Run the pg_auto_failover demo application
6961 uri Grab the application connection string from the monitor
6962 ping Attempt to connect to the application URI
6963 summary Display a summary of the previous demo app run
6964
6965 To run a demo, use pg_autoctl do demo run:
6966
6967 usage: pg_autoctl do demo run [option ...]
6968
6969 --monitor Postgres URI of the pg_auto_failover monitor
6970 --formation Formation to use (default)
6971 --group Group Id to failover (0)
6972 --username PostgreSQL's username
6973 --clients How many client processes to use (1)
6974 --duration Duration of the demo app, in seconds (30)
6975 --first-failover Timing of the first failover (10)
6976 --failover-freq Seconds between subsequent failovers (45)
6977
6978 Description
6979 The pg_autoctl debug tooling includes a demo application.
6980
6981 The demo prepare its Postgres schema on the target database, and then
6982 starts several clients (see --clients) that concurrently connect to the
6983 target application URI and record the time it took to establish the
6984 Postgres connection to the current read-write node, with information
6985 about the retry policy metrics.
6986
6987 Example
6988 $ pg_autoctl do demo run --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
6989 14:43:35 19660 INFO Using application connection string "postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer"
6990 14:43:35 19660 INFO Using Postgres user PGUSER "dim"
6991 14:43:35 19660 INFO Preparing demo schema: drop schema if exists demo cascade
6992 14:43:35 19660 WARN NOTICE: schema "demo" does not exist, skipping
6993 14:43:35 19660 INFO Preparing demo schema: create schema demo
6994 14:43:35 19660 INFO Preparing demo schema: create table demo.tracking(ts timestamptz default now(), client integer, loop integer, retries integer, us bigint, recovery bool)
6995 14:43:36 19660 INFO Preparing demo schema: create table demo.client(client integer, pid integer, retry_sleep_ms integer, retry_cap_ms integer, failover_count integer)
6996 14:43:36 19660 INFO Starting 10 concurrent clients as sub-processes
6997 14:43:36 19675 INFO Failover client is started, will failover in 10s and every 45s after that
6998 ...
6999
7000 $ pg_autoctl do demo summary --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
7001 14:44:27 22789 INFO Using application connection string "postgres://localhost:5503,localhost:5501,localhost:5502/demo?target_session_attrs=read-write&sslmode=prefer"
7002 14:44:27 22789 INFO Using Postgres user PGUSER "dim"
7003 14:44:27 22789 INFO Summary for the demo app running with 10 clients for 30s
7004 Client | Connections | Retries | Min Connect Time (ms) | max | p95 | p99
7005 ----------------------+-------------+---------+-----------------------+----------+---------+---------
7006 Client 1 | 136 | 14 | 58.318 | 2601.165 | 244.443 | 261.809
7007 Client 2 | 136 | 5 | 55.199 | 2514.968 | 242.362 | 259.282
7008 Client 3 | 134 | 6 | 55.815 | 2974.247 | 241.740 | 262.908
7009 Client 4 | 135 | 7 | 56.542 | 2970.922 | 238.995 | 251.177
7010 Client 5 | 136 | 8 | 58.339 | 2758.106 | 238.720 | 252.439
7011 Client 6 | 134 | 9 | 58.679 | 2813.653 | 244.696 | 254.674
7012 Client 7 | 134 | 11 | 58.737 | 2795.974 | 243.202 | 253.745
7013 Client 8 | 136 | 12 | 52.109 | 2354.952 | 242.664 | 254.233
7014 Client 9 | 137 | 19 | 59.735 | 2628.496 | 235.668 | 253.582
7015 Client 10 | 133 | 6 | 57.994 | 3060.489 | 242.156 | 256.085
7016 All Clients Combined | 1351 | 97 | 52.109 | 3060.489 | 241.848 | 258.450
7017 (11 rows)
7018
7019 Min Connect Time (ms) | max | freq | bar
7020 -----------------------+----------+------+-----------------------------------------------
7021 52.109 | 219.105 | 1093 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
7022 219.515 | 267.168 | 248 | ▒▒▒▒▒▒▒▒▒▒
7023 2354.952 | 2354.952 | 1 |
7024 2514.968 | 2514.968 | 1 |
7025 2601.165 | 2628.496 | 2 |
7026 2758.106 | 2813.653 | 3 |
7027 2970.922 | 2974.247 | 2 |
7028 3060.489 | 3060.489 | 1 |
7029 (8 rows)
7030
7031 pg_autoctl do service restart
7032 pg_autoctl do service restart - Run pg_autoctl sub-processes (services)
7033
7034 Synopsis
7035 pg_autoctl do service restart provides the following commands:
7036
7037 pg_autoctl do service restart
7038 postgres Restart the pg_autoctl postgres controller service
7039 listener Restart the pg_autoctl monitor listener service
7040 node-active Restart the pg_autoctl keeper node-active service
7041
7042 Description
7043 It is possible to restart the pg_autoctl or the Postgres service with‐
7044 out affecting the other running service. Typically, to restart the
7045 pg_autoctl parts without impacting Postgres:
7046
7047 $ pg_autoctl do service restart node-active --pgdata node1
7048 14:52:06 31223 INFO Sending the TERM signal to service "node-active" with pid 26626
7049 14:52:06 31223 INFO Service "node-active" has been restarted with pid 31230
7050 31230
7051
7052 The Postgres service has not been impacted by the restart of the pg_au‐
7053 toctl process.
7054
7055 pg_autoctl do show
7056 pg_autoctl do show - Show some debug level information
7057
7058 Synopsis
7059 The commands pg_autoctl create monitor and pg_autoctl create postgres
7060 both implement some level of automated detection of the node network
7061 settings when the option --hostname is not used.
7062
7063 Adding to those commands, when a new node is registered to the monitor,
7064 other nodes also edit their Postgres HBA rules to allow the new node to
7065 connect, unless the option --skip-pg-hba has been used.
7066
7067 The debug sub-commands for pg_autoctl do show can be used to see in de‐
7068 tails the network discovery done by pg_autoctl.
7069
7070 pg_autoctl do show provides the following commands:
7071
7072 pg_autoctl do show
7073 ipaddr Print this node's IP address information
7074 cidr Print this node's CIDR information
7075 lookup Print this node's DNS lookup information
7076 hostname Print this node's default hostname
7077 reverse Lookup given hostname and check reverse DNS setup
7078
7079 pg_autoctl do show ipaddr
7080 Connects to an external IP address and uses getsockname(2) to retrieve
7081 the current address to which the socket is bound.
7082
7083 The external IP address defaults to 8.8.8.8, the IP address of a Google
7084 provided public DNS server, or to the monitor IP address or hostname in
7085 the context of pg_autoctl create postgres.
7086
7087 $ pg_autoctl do show ipaddr
7088 16:42:40 62631 INFO ipaddr.c:107: Connecting to 8.8.8.8 (port 53)
7089 192.168.1.156
7090
7091 pg_autoctl do show cidr
7092 Connects to an external IP address in the same way as the previous com‐
7093 mand pg_autoctl do show ipaddr and then matches the local socket name
7094 with the list of local network interfaces. When a match is found, uses
7095 the netmask of the interface to compute the CIDR notation from the IP
7096 address.
7097
7098 The computed CIDR notation is then used in HBA rules.
7099
7100 $ pg_autoctl do show cidr
7101 16:43:19 63319 INFO Connecting to 8.8.8.8 (port 53)
7102 192.168.1.0/24
7103
7104 pg_autoctl do show hostname
7105 Uses either its first (and only) argument or the result of gethost‐
7106 name(2) as the candidate hostname to use in HBA rules, and then check
7107 that the hostname resolves to an IP address that belongs to one of the
7108 machine network interfaces.
7109
7110 When the hostname forward-dns lookup resolves to an IP address that is
7111 local to the node where the command is run, then a reverse-lookup from
7112 the IP address is made to see if it matches with the candidate host‐
7113 name.
7114
7115 $ pg_autoctl do show hostname
7116 DESKTOP-IC01GOOS.europe.corp.microsoft.com
7117
7118 $ pg_autoctl -vv do show hostname 'postgres://autoctl_node@localhost:5500/pg_auto_failover'
7119 13:45:00 93122 INFO cli_do_show.c:256: Using monitor hostname "localhost" and port 5500
7120 13:45:00 93122 INFO ipaddr.c:107: Connecting to ::1 (port 5500)
7121 13:45:00 93122 DEBUG cli_do_show.c:272: cli_show_hostname: ip ::1
7122 13:45:00 93122 DEBUG cli_do_show.c:283: cli_show_hostname: host localhost
7123 13:45:00 93122 DEBUG cli_do_show.c:294: cli_show_hostname: ip ::1
7124 localhost
7125
7126 pg_autoctl do show lookup
7127 Checks that the given argument is an hostname that resolves to a local
7128 IP address, that is an IP address associated with a local network in‐
7129 terface.
7130
7131 $ pg_autoctl do show lookup DESKTOP-IC01GOOS.europe.corp.microsoft.com
7132 DESKTOP-IC01GOOS.europe.corp.microsoft.com: 192.168.1.156
7133
7134 pg_autoctl do show reverse
7135 Implements the same DNS checks as Postgres HBA matching code: first
7136 does a forward DNS lookup of the given hostname, and then a re‐
7137 verse-lookup from all the IP addresses obtained. Success is reached
7138 when at least one of the IP addresses from the forward lookup resolves
7139 back to the given hostname (as the first answer to the reverse DNS
7140 lookup).
7141
7142 $ pg_autoctl do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
7143 16:44:49 64910 FATAL Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
7144 16:44:49 64910 INFO Continuing with IP address "192.168.1.156"
7145
7146 $ pg_autoctl -vv do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
7147 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
7148 16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
7149 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
7150 16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
7151 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
7152 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
7153 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
7154 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
7155 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
7156 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
7157 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
7158 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
7159 16:44:45 64832 FATAL cli_do_show.c:333: Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
7160 16:44:45 64832 INFO cli_do_show.c:334: Continuing with IP address "192.168.1.156"
7161
7162 pg_autoctl do pgsetup
7163 pg_autoctl do pgsetup - Manage a local Postgres setup
7164
7165 Synopsis
7166 The main pg_autoctl commands implement low-level management tooling for
7167 a local Postgres instance. Some of the low-level Postgres commands can
7168 be used as their own tool in some cases.
7169
7170 pg_autoctl do pgsetup provides the following commands:
7171
7172 pg_autoctl do pgsetup
7173 pg_ctl Find a non-ambiguous pg_ctl program and Postgres version
7174 discover Discover local PostgreSQL instance, if any
7175 ready Return true is the local Postgres server is ready
7176 wait Wait until the local Postgres server is ready
7177 logs Outputs the Postgres startup logs
7178 tune Compute and log some Postgres tuning options
7179
7180 pg_autoctl do pgsetup pg_ctl
7181 In a similar way to which -a, this commands scans your PATH for pg_ctl
7182 commands. Then it runs the pg_ctl --version command and parses the out‐
7183 put to determine the version of Postgres that is available in the path.
7184
7185 $ pg_autoctl do pgsetup pg_ctl --pgdata node1
7186 16:49:18 69684 INFO Environment variable PG_CONFIG is set to "/Applications/Postgres.app//Contents/Versions/12/bin/pg_config"
7187 16:49:18 69684 INFO `pg_autoctl create postgres` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
7188 16:49:18 69684 INFO `pg_autoctl create monitor` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
7189
7190 pg_autoctl do pgsetup discover
7191 Given a PGDATA or --pgdata option, the command discovers if a running
7192 Postgres service matches the pg_autoctl setup, and prints the informa‐
7193 tion that pg_autoctl typically needs when managing a Postgres instance.
7194
7195 $ pg_autoctl do pgsetup discover --pgdata node1
7196 pgdata: /Users/dim/dev/MS/pg_auto_failover/tmux/node1
7197 pg_ctl: /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
7198 pg_version: 12.3
7199 pghost: /tmp
7200 pgport: 5501
7201 proxyport: 0
7202 pid: 21029
7203 is in recovery: no
7204 Control Version: 1201
7205 Catalog Version: 201909212
7206 System Identifier: 6942422768095393833
7207 Latest checkpoint LSN: 0/4059C18
7208 Postmaster status: ready
7209
7210 pg_autoctl do pgsetup ready
7211 Similar to the pg_isready command, though uses the Postgres specifica‐
7212 tions found in the pg_autoctl node setup.
7213
7214 $ pg_autoctl do pgsetup ready --pgdata node1
7215 16:50:08 70582 INFO Postgres status is: "ready"
7216
7217 pg_autoctl do pgsetup wait
7218 When pg_autoctl do pgsetup ready would return false because Postgres is
7219 not ready yet, this command continues probing every second for 30 sec‐
7220 onds, and exists as soon as Postgres is ready.
7221
7222 $ pg_autoctl do pgsetup wait --pgdata node1
7223 16:50:22 70829 INFO Postgres is now serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 21029
7224 16:50:22 70829 INFO Postgres status is: "ready"
7225
7226 pg_autoctl do pgsetup logs
7227 Outputs the Postgres logs from the most recent log file in the PG‐
7228 DATA/log directory.
7229
7230 $ pg_autoctl do pgsetup logs --pgdata node1
7231 16:50:39 71126 WARN Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/startup.log":
7232 16:50:39 71126 INFO 2021-03-22 14:43:48.911 CET [21029] LOG: starting PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
7233 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on IPv6 address "::", port 5501
7234 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on IPv4 address "0.0.0.0", port 5501
7235 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on Unix socket "/tmp/.s.PGSQL.5501"
7236 16:50:39 71126 INFO 2021-03-22 14:43:48.931 CET [21029] LOG: redirecting log output to logging collector process
7237 16:50:39 71126 INFO 2021-03-22 14:43:48.931 CET [21029] HINT: Future log output will appear in directory "log".
7238 16:50:39 71126 WARN Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/log/postgresql-2021-03-22_144348.log":
7239 16:50:39 71126 INFO 2021-03-22 14:43:48.937 CET [21033] LOG: database system was shut down at 2021-03-22 14:43:46 CET
7240 16:50:39 71126 INFO 2021-03-22 14:43:48.937 CET [21033] LOG: entering standby mode
7241 16:50:39 71126 INFO 2021-03-22 14:43:48.942 CET [21033] LOG: consistent recovery state reached at 0/4022E88
7242 16:50:39 71126 INFO 2021-03-22 14:43:48.942 CET [21033] LOG: invalid record length at 0/4022E88: wanted 24, got 0
7243 16:50:39 71126 INFO 2021-03-22 14:43:48.946 CET [21029] LOG: database system is ready to accept read only connections
7244 16:50:39 71126 INFO 2021-03-22 14:43:49.032 CET [21038] LOG: fetching timeline history file for timeline 4 from primary server
7245 16:50:39 71126 INFO 2021-03-22 14:43:49.037 CET [21038] LOG: started streaming WAL from primary at 0/4000000 on timeline 3
7246 16:50:39 71126 INFO 2021-03-22 14:43:49.046 CET [21038] LOG: replication terminated by primary server
7247 16:50:39 71126 INFO 2021-03-22 14:43:49.046 CET [21038] DETAIL: End of WAL reached on timeline 3 at 0/4022E88.
7248 16:50:39 71126 INFO 2021-03-22 14:43:49.047 CET [21033] LOG: new target timeline is 4
7249 16:50:39 71126 INFO 2021-03-22 14:43:49.049 CET [21038] LOG: restarted WAL streaming at 0/4000000 on timeline 4
7250 16:50:39 71126 INFO 2021-03-22 14:43:49.210 CET [21033] LOG: redo starts at 0/4022E88
7251 16:50:39 71126 INFO 2021-03-22 14:52:06.692 CET [21029] LOG: received SIGHUP, reloading configuration files
7252 16:50:39 71126 INFO 2021-03-22 14:52:06.906 CET [21029] LOG: received SIGHUP, reloading configuration files
7253 16:50:39 71126 FATAL 2021-03-22 15:34:24.920 CET [21038] FATAL: terminating walreceiver due to timeout
7254 16:50:39 71126 INFO 2021-03-22 15:34:24.973 CET [21033] LOG: invalid record length at 0/4059CC8: wanted 24, got 0
7255 16:50:39 71126 INFO 2021-03-22 15:34:25.105 CET [35801] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
7256 16:50:39 71126 FATAL 2021-03-22 16:12:56.918 CET [35801] FATAL: terminating walreceiver due to timeout
7257 16:50:39 71126 INFO 2021-03-22 16:12:57.086 CET [38741] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
7258 16:50:39 71126 FATAL 2021-03-22 16:23:39.349 CET [38741] FATAL: terminating walreceiver due to timeout
7259 16:50:39 71126 INFO 2021-03-22 16:23:39.497 CET [41635] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
7260
7261 pg_autoctl do pgsetup tune
7262 Outputs the pg_autoclt automated tuning options. Depending on the num‐
7263 ber of CPU and amount of RAM detected in the environment where it is
7264 run, pg_autoctl can adjust some very basic Postgres tuning knobs to get
7265 started.
7266
7267 $ pg_autoctl do pgsetup tune --pgdata node1 -vv
7268 13:25:25 77185 DEBUG pgtuning.c:85: Detected 12 CPUs and 16 GB total RAM on this server
7269 13:25:25 77185 DEBUG pgtuning.c:225: Setting autovacuum_max_workers to 3
7270 13:25:25 77185 DEBUG pgtuning.c:228: Setting shared_buffers to 4096 MB
7271 13:25:25 77185 DEBUG pgtuning.c:231: Setting work_mem to 24 MB
7272 13:25:25 77185 DEBUG pgtuning.c:235: Setting maintenance_work_mem to 512 MB
7273 13:25:25 77185 DEBUG pgtuning.c:239: Setting effective_cache_size to 12 GB
7274 # basic tuning computed by pg_auto_failover
7275 track_functions = pl
7276 shared_buffers = '4096 MB'
7277 work_mem = '24 MB'
7278 maintenance_work_mem = '512 MB'
7279 effective_cache_size = '12 GB'
7280 autovacuum_max_workers = 3
7281 autovacuum_vacuum_scale_factor = 0.08
7282 autovacuum_analyze_scale_factor = 0.02
7283
7284 The low-level API is made available through the following pg_autoctl do
7285 commands, only available in debug environments:
7286
7287 pg_autoctl do
7288 + monitor Query a pg_auto_failover monitor
7289 + fsm Manually manage the keeper's state
7290 + primary Manage a PostgreSQL primary server
7291 + standby Manage a PostgreSQL standby server
7292 + show Show some debug level information
7293 + pgsetup Manage a local Postgres setup
7294 + pgctl Signal the pg_autoctl postgres service
7295 + service Run pg_autoctl sub-processes (services)
7296 + tmux Set of facilities to handle tmux interactive sessions
7297 + azure Manage a set of Azure resources for a pg_auto_failover demo
7298 + demo Use a demo application for pg_auto_failover
7299
7300 pg_autoctl do monitor
7301 + get Get information from the monitor
7302 register Register the current node with the monitor
7303 active Call in the pg_auto_failover Node Active protocol
7304 version Check that monitor version is 1.5.0.1; alter extension update if not
7305 parse-notification parse a raw notification message
7306
7307 pg_autoctl do monitor get
7308 primary Get the primary node from pg_auto_failover in given formation/group
7309 others Get the other nodes from the pg_auto_failover group of hostname/port
7310 coordinator Get the coordinator node from the pg_auto_failover formation
7311
7312 pg_autoctl do fsm
7313 init Initialize the keeper's state on-disk
7314 state Read the keeper's state from disk and display it
7315 list List reachable FSM states from current state
7316 gv Output the FSM as a .gv program suitable for graphviz/dot
7317 assign Assign a new goal state to the keeper
7318 step Make a state transition if instructed by the monitor
7319 + nodes Manually manage the keeper's nodes list
7320
7321 pg_autoctl do fsm nodes
7322 get Get the list of nodes from file (see --disable-monitor)
7323 set Set the list of nodes to file (see --disable-monitor)
7324
7325 pg_autoctl do primary
7326 + slot Manage replication slot on the primary server
7327 + adduser Create users on primary
7328 defaults Add default settings to postgresql.conf
7329 identify Run the IDENTIFY_SYSTEM replication command on given host
7330
7331 pg_autoctl do primary slot
7332 create Create a replication slot on the primary server
7333 drop Drop a replication slot on the primary server
7334
7335 pg_autoctl do primary adduser
7336 monitor add a local user for queries from the monitor
7337 replica add a local user with replication privileges
7338
7339 pg_autoctl do standby
7340 init Initialize the standby server using pg_basebackup
7341 rewind Rewind a demoted primary server using pg_rewind
7342 promote Promote a standby server to become writable
7343
7344 pg_autoctl do show
7345 ipaddr Print this node's IP address information
7346 cidr Print this node's CIDR information
7347 lookup Print this node's DNS lookup information
7348 hostname Print this node's default hostname
7349 reverse Lookup given hostname and check reverse DNS setup
7350
7351 pg_autoctl do pgsetup
7352 pg_ctl Find a non-ambiguous pg_ctl program and Postgres version
7353 discover Discover local PostgreSQL instance, if any
7354 ready Return true is the local Postgres server is ready
7355 wait Wait until the local Postgres server is ready
7356 logs Outputs the Postgres startup logs
7357 tune Compute and log some Postgres tuning options
7358
7359 pg_autoctl do pgctl
7360 on Signal pg_autoctl postgres service to ensure Postgres is running
7361 off Signal pg_autoctl postgres service to ensure Postgres is stopped
7362
7363 pg_autoctl do service
7364 + getpid Get the pid of pg_autoctl sub-processes (services)
7365 + restart Restart pg_autoctl sub-processes (services)
7366 pgcontroller pg_autoctl supervised postgres controller
7367 postgres pg_autoctl service that start/stop postgres when asked
7368 listener pg_autoctl service that listens to the monitor notifications
7369 node-active pg_autoctl service that implements the node active protocol
7370
7371 pg_autoctl do service getpid
7372 postgres Get the pid of the pg_autoctl postgres controller service
7373 listener Get the pid of the pg_autoctl monitor listener service
7374 node-active Get the pid of the pg_autoctl keeper node-active service
7375
7376 pg_autoctl do service restart
7377 postgres Restart the pg_autoctl postgres controller service
7378 listener Restart the pg_autoctl monitor listener service
7379 node-active Restart the pg_autoctl keeper node-active service
7380
7381 pg_autoctl do tmux
7382 script Produce a tmux script for a demo or a test case (debug only)
7383 session Run a tmux session for a demo or a test case
7384 stop Stop pg_autoctl processes that belong to a tmux session
7385 wait Wait until a given node has been registered on the monitor
7386 clean Clean-up a tmux session processes and root dir
7387
7388 pg_autoctl do azure
7389 + provision provision azure resources for a pg_auto_failover demo
7390 + tmux Run a tmux session with an Azure setup for QA/testing
7391 + show show azure resources for a pg_auto_failover demo
7392 deploy Deploy a pg_autoctl VMs, given by name
7393 create Create an azure QA environment
7394 drop Drop an azure QA environment: resource group, network, VMs
7395 ls List resources in a given azure region
7396 ssh Runs ssh -l ha-admin <public ip address> for a given VM name
7397 sync Rsync pg_auto_failover sources on all the target region VMs
7398
7399 pg_autoctl do azure provision
7400 region Provision an azure region: resource group, network, VMs
7401 nodes Provision our pre-created VM with pg_autoctl Postgres nodes
7402
7403 pg_autoctl do azure tmux
7404 session Create or attach a tmux session for the created Azure VMs
7405 kill Kill an existing tmux session for Azure VMs
7406
7407 pg_autoctl do azure show
7408 ips Show public and private IP addresses for selected VMs
7409 state Connect to the monitor node to show the current state
7410
7411 pg_autoctl do demo
7412 run Run the pg_auto_failover demo application
7413 uri Grab the application connection string from the monitor
7414 ping Attempt to connect to the application URI
7415 summary Display a summary of the previous demo app run
7416
7417 pg_autoctl run
7418 pg_autoctl run - Run the pg_autoctl service (monitor or keeper)
7419
7420 Synopsis
7421 This commands starts the processes needed to run a monitor node or a
7422 keeper node, depending on the configuration file that belongs to the
7423 --pgdata option or PGDATA environment variable.
7424
7425 usage: pg_autoctl run [ --pgdata --name --hostname --pgport ]
7426
7427 --pgdata path to data directory
7428 --name pg_auto_failover node name
7429 --hostname hostname used to connect from other nodes
7430 --pgport PostgreSQL's port number
7431
7432 Description
7433 When registering Postgres nodes to the pg_auto_failover monitor using
7434 the pg_autoctl create postgres command, the nodes are registered with
7435 metadata: the node name, hostname and Postgres port.
7436
7437 The node name is used mostly in the logs and pg_autoctl show state com‐
7438 mands and helps human administrators of the formation.
7439
7440 The node hostname and pgport are used by other nodes, including the
7441 pg_auto_failover monitor, to open a Postgres connection.
7442
7443 Both the node name and the node hostname and port can be changed after
7444 the node registration by using either this command (pg_autoctl run) or
7445 the pg_autoctl config set command.
7446
7447 Options
7448 --pgdata
7449 Location of the Postgres node being managed locally. Defaults to
7450 the environment variable PGDATA. Use --monitor to connect to a
7451 monitor from anywhere, rather than the monitor URI used by a lo‐
7452 cal Postgres node managed with pg_autoctl.
7453
7454 --name Node name used on the monitor to refer to this node. The host‐
7455 name is a technical information, and given Postgres requirements
7456 on the HBA setup and DNS resolution (both forward and reverse
7457 lookups), IP addresses are often used for the hostname.
7458
7459 The --name option allows using a user-friendly name for your
7460 Postgres nodes.
7461
7462 --hostname
7463 Hostname or IP address (both v4 and v6 are supported) to use
7464 from any other node to connect to this node.
7465
7466 When not provided, a default value is computed by running the
7467 following algorithm.
7468
7469 1. We get this machine's "public IP" by opening a connection
7470 to the given monitor hostname or IP address. Then we get
7471 TCP/IP client address that has been used to make that con‐
7472 nection.
7473
7474 2. We then do a reverse DNS lookup on the IP address found in
7475 the previous step to fetch a hostname for our local ma‐
7476 chine.
7477
7478 3. If the reverse DNS lookup is successful , then pg_autoctl
7479 does a forward DNS lookup of that hostname.
7480
7481 When the forward DNS lookup response in step 3. is an IP address
7482 found in one of our local network interfaces, then pg_autoctl
7483 uses the hostname found in step 2. as the default --hostname.
7484 Otherwise it uses the IP address found in step 1.
7485
7486 You may use the --hostname command line option to bypass the
7487 whole DNS lookup based process and force the local node name to
7488 a fixed value.
7489
7490 --pgport
7491 Postgres port to use, defaults to 5432.
7492
7493 pg_autoctl watch
7494 pg_autoctl watch - Display an auto-updating dashboard
7495
7496 Synopsis
7497 This command outputs the events that the pg_auto_failover events
7498 records about state changes of the pg_auto_failover nodes managed by
7499 the monitor:
7500
7501 usage: pg_autoctl watch [ --pgdata --formation --group ]
7502
7503 --pgdata path to data directory
7504 --monitor show the monitor uri
7505 --formation formation to query, defaults to 'default'
7506 --group group to query formation, defaults to all
7507 --json output data in the JSON format
7508
7509 Options
7510 --pgdata
7511 Location of the Postgres node being managed locally. Defaults to
7512 the environment variable PGDATA. Use --monitor to connect to a
7513 monitor from anywhere, rather than the monitor URI used by a lo‐
7514 cal Postgres node managed with pg_autoctl.
7515
7516 --monitor
7517 Postgres URI used to connect to the monitor. Must use the au‐
7518 toctl_node username and target the pg_auto_failover database
7519 name. It is possible to show the Postgres URI from the monitor
7520 node using the command pg_autoctl show uri.
7521
7522 --formation
7523 List the events recorded for nodes in the given formation. De‐
7524 faults to default.
7525
7526 --group
7527 Limit output to a single group in the formation. Default to in‐
7528 cluding all groups registered in the target formation.
7529
7530 Environment
7531 PGDATA
7532 Postgres directory location. Can be used instead of the --pgdata op‐
7533 tion.
7534
7535 PG_AUTOCTL_MONITOR
7536 Postgres URI to connect to the monitor node, can be used instead of
7537 the --monitor option.
7538
7539 XDG_CONFIG_HOME
7540 The pg_autoctl command stores its configuration files in the stan‐
7541 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
7542 tion.
7543
7544 XDG_DATA_HOME
7545 The pg_autoctl command stores its internal states files in the stan‐
7546 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
7547 XDG Base Directory Specification.
7548
7549 Description
7550 The pg_autoctl watch output is divided in 3 sections.
7551
7552 The first section is a single header line which includes the name of
7553 the currently selected formation, the formation replication setting
7554 Number Sync Standbys, and then in the right most position the current
7555 time.
7556
7557 The second section displays one line per node, and each line contains a
7558 list of columns that describe the current state for the node. This list
7559 can includes the following columns, and which columns are part of the
7560 output depends on the terminal window size. This choice is dynamic and
7561 changes if your terminal window size changes:
7562
7563 • Name
7564 Name of the node.
7565
7566 • Node, or Id
7567 Node information. When the formation has a single group (group
7568 zero), then this column only contains the nodeId.
7569
7570 Only Citus formations allow several groups. When using a Citus
7571 formation the Node column contains the groupId and the nodeId,
7572 separated by a colon, such as 0:1 for the first coordinator
7573 node.
7574
7575 • Last Report, or Report
7576 Time interval between now and the last known time when a node
7577 has reported to the monitor, using the node_active protocol.
7578
7579 This value is expected to stay under 2s or abouts, and is
7580 known to increment when either the pg_autoctl run service is
7581 not running, or when there is a network split.
7582
7583 • Last Check, or Check
7584 Time interval between now and the last known time when the
7585 monitor could connect to a node's Postgres instance, via its
7586 health check mechanism.
7587
7588 This value is known to increment when either the Postgres ser‐
7589 vice is not running on the target node, when there is a net‐
7590 work split, or when the internal machinery (the health check
7591 worker background process) implements jitter.
7592
7593 • Host:Port
7594 Hostname and port number used to connect to the node.
7595
7596 • TLI: LSN
7597 Timeline identifier (TLI) and Postgres Log Sequence Number
7598 (LSN).
7599
7600 The LSN is the current position in the Postgres WAL stream.
7601 This is a hexadecimal number. See pg_lsn for more information.
7602
7603 The current timeline is incremented each time a failover hap‐
7604 pens, or when doing Point In Time Recovery. A node can only
7605 reach the secondary state when it is on the same timeline as
7606 its primary node.
7607
7608 • Connection
7609 This output field contains two bits of information. First, the
7610 Postgres connection type that the node provides, either
7611 read-write or read-only. Then the mark ! is added when the
7612 monitor has failed to connect to this node, and ? when the
7613 monitor didn't connect to the node yet.
7614
7615 • Reported State
7616 The current FSM state as reported to the monitor by the pg_au‐
7617 toctl process running on the Postgres node.
7618
7619 • Assigned State
7620 The assigned FSM state on the monitor. When the assigned state
7621 is not the same as the reported start, then the pg_autoctl
7622 process running on the Postgres node might have not retrieved
7623 the assigned state yet, or might still be implementing the FSM
7624 transition from the current state to the assigned state.
7625
7626 The third and last section lists the most recent events that the moni‐
7627 tor has registered, the more recent event is found at the bottom of the
7628 screen.
7629
7630 To quit the command hit either the F1 key or the q key.
7631
7632 pg_autoctl stop
7633 pg_autoctl stop - signal the pg_autoctl service for it to stop
7634
7635 Synopsis
7636 This commands stops the processes needed to run a monitor node or a
7637 keeper node, depending on the configuration file that belongs to the
7638 --pgdata option or PGDATA environment variable.
7639
7640 usage: pg_autoctl stop [ --pgdata --fast --immediate ]
7641
7642 --pgdata path to data directory
7643 --fast fast shutdown mode for the keeper
7644 --immediate immediate shutdown mode for the keeper
7645
7646 Description
7647 The pg_autoctl stop commands finds the PID of the running service for
7648 the given --pgdata, and if the process is still running, sends a
7649 SIGTERM signal to the process.
7650
7651 When pg_autoclt receives a shutdown signal a shutdown sequence is trig‐
7652 gered. Depending on the signal received, an operation that has been
7653 started (such as a state transition) is either run to completion,
7654 stopped as the next opportunity, or stopped immediately even when in
7655 the middle of the transition.
7656
7657 Options
7658 --pgdata
7659 Location of the Postgres node being managed locally. Defaults to
7660 the environment variable PGDATA. Use --monitor to connect to a
7661 monitor from anywhere, rather than the monitor URI used by a lo‐
7662 cal Postgres node managed with pg_autoctl.
7663
7664 --fast Fast Shutdown mode for pg_autoctl. Sends the SIGINT signal to
7665 the running service, which is the same as using C-c on an inter‐
7666 active process running as a foreground shell job.
7667
7668 --immediate
7669 Immediate Shutdown mode for pg_autoctl. Sends the SIGQUIT signal
7670 to the running service.
7671
7672 Environment
7673 PGDATA
7674 Postgres directory location. Can be used instead of the --pgdata op‐
7675 tion.
7676
7677 PG_AUTOCTL_MONITOR
7678 Postgres URI to connect to the monitor node, can be used instead of
7679 the --monitor option.
7680
7681 XDG_CONFIG_HOME
7682 The pg_autoctl command stores its configuration files in the stan‐
7683 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
7684 tion.
7685
7686 XDG_DATA_HOME
7687 The pg_autoctl command stores its internal states files in the stan‐
7688 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
7689 XDG Base Directory Specification.
7690
7691 pg_autoctl reload
7692 pg_autoctl reload - signal the pg_autoctl for it to reload its configu‐
7693 ration
7694
7695 Synopsis
7696 This commands signals a running pg_autoctl process to reload its con‐
7697 figuration from disk, and also signal the managed Postgres service to
7698 reload its configuration.
7699
7700 usage: pg_autoctl reload [ --pgdata ] [ --json ]
7701
7702 --pgdata path to data directory
7703
7704 Description
7705 The pg_autoctl reload commands finds the PID of the running service for
7706 the given --pgdata, and if the process is still running, sends a SIGHUP
7707 signal to the process.
7708
7709 Options
7710 --pgdata
7711 Location of the Postgres node being managed locally. Defaults to
7712 the environment variable PGDATA. Use --monitor to connect to a
7713 monitor from anywhere, rather than the monitor URI used by a lo‐
7714 cal Postgres node managed with pg_autoctl.
7715
7716 pg_autoctl status
7717 pg_autoctl status - Display the current status of the pg_autoctl ser‐
7718 vice
7719
7720 Synopsis
7721 This commands outputs the current process status for the pg_autoctl
7722 service running for the given --pgdata location.
7723
7724 usage: pg_autoctl status [ --pgdata ] [ --json ]
7725
7726 --pgdata path to data directory
7727 --json output data in the JSON format
7728
7729 Options
7730 --pgdata
7731 Location of the Postgres node being managed locally. Defaults to
7732 the environment variable PGDATA. Use --monitor to connect to a
7733 monitor from anywhere, rather than the monitor URI used by a lo‐
7734 cal Postgres node managed with pg_autoctl.
7735
7736 --json Output a JSON formatted data instead of a table formatted list.
7737
7738 Environment
7739 PGDATA
7740 Postgres directory location. Can be used instead of the --pgdata op‐
7741 tion.
7742
7743 PG_AUTOCTL_MONITOR
7744 Postgres URI to connect to the monitor node, can be used instead of
7745 the --monitor option.
7746
7747 XDG_CONFIG_HOME
7748 The pg_autoctl command stores its configuration files in the stan‐
7749 dard place XDG_CONFIG_HOME. See the XDG Base Directory Specifica‐
7750 tion.
7751
7752 XDG_DATA_HOME
7753 The pg_autoctl command stores its internal states files in the stan‐
7754 dard place XDG_DATA_HOME, which defaults to ~/.local/share. See the
7755 XDG Base Directory Specification.
7756
7757 Example
7758 $ pg_autoctl status --pgdata node1
7759 11:26:30 27248 INFO pg_autoctl is running with pid 26618
7760 11:26:30 27248 INFO Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
7761
7762 $ pg_autoctl status --pgdata node1 --json
7763 11:26:37 27385 INFO pg_autoctl is running with pid 26618
7764 11:26:37 27385 INFO Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
7765 {
7766 "postgres": {
7767 "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
7768 "pg_ctl": "\/Applications\/Postgres.app\/Contents\/Versions\/12\/bin\/pg_ctl",
7769 "version": "12.3",
7770 "host": "\/tmp",
7771 "port": 5501,
7772 "proxyport": 0,
7773 "pid": 26725,
7774 "in_recovery": false,
7775 "control": {
7776 "version": 0,
7777 "catalog_version": 0,
7778 "system_identifier": "0"
7779 },
7780 "postmaster": {
7781 "status": "ready"
7782 }
7783 },
7784 "pg_autoctl": {
7785 "pid": 26618,
7786 "status": "running",
7787 "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
7788 "version": "1.5.0",
7789 "semId": 196609,
7790 "services": [
7791 {
7792 "name": "postgres",
7793 "pid": 26625,
7794 "status": "running",
7795 "version": "1.5.0",
7796 "pgautofailover": "1.5.0.1"
7797 },
7798 {
7799 "name": "node-active",
7800 "pid": 26626,
7801 "status": "running",
7802 "version": "1.5.0",
7803 "pgautofailover": "1.5.0.1"
7804 }
7805 ]
7806 }
7807 }
7808
7809 pg_autoctl activate
7810 pg_autoctl activate - Activate a Citus worker from the Citus coordina‐
7811 tor
7812
7813 Synopsis
7814 This command calls the Citus “activation” API so that a node can be
7815 used to host shards for your reference and distributed tables.
7816
7817 usage: pg_autoctl activate [ --pgdata ]
7818
7819 --pgdata path to data directory
7820
7821 Description
7822 When creating a Citus worker, pg_autoctl create worker automatically
7823 activates the worker node to the coordinator. You only need this com‐
7824 mand when something unexpected have happened and you want to manually
7825 make sure the worker node has been activated at the Citus coordinator
7826 level.
7827
7828 Starting with Citus 10 it is also possible to activate the coordinator
7829 itself as a node with shard placement. Use pg_autoctl activate on your
7830 Citus coordinator node manually to use that feature.
7831
7832 When the Citus coordinator is activated, an extra step is then needed
7833 for it to host shards of distributed tables. If you want your coordina‐
7834 tor to have shards, then have a look at the Citus API
7835 citus_set_node_property to set the shouldhaveshards property to true.
7836
7837 Options
7838 --pgdata
7839 Location of the Postgres node being managed locally. Defaults to
7840 the environment variable PGDATA. Use --monitor to connect to a
7841 monitor from anywhere, rather than the monitor URI used by a lo‐
7842 cal Postgres node managed with pg_autoctl.
7843
7845 Several defaults settings of pg_auto_failover can be reviewed and
7846 changed depending on the trade-offs you want to implement in your own
7847 production setup. The settings that you can change will have an impact
7848 of the following operations:
7849
7850 • Deciding when to promote the secondary
7851
7852 pg_auto_failover decides to implement a failover to the secondary
7853 node when it detects that the primary node is unhealthy. Changing
7854 the following settings will have an impact on when the
7855 pg_auto_failover monitor decides to promote the secondary Post‐
7856 greSQL node:
7857
7858 pgautofailover.health_check_max_retries
7859 pgautofailover.health_check_period
7860 pgautofailover.health_check_retry_delay
7861 pgautofailover.health_check_timeout
7862 pgautofailover.node_considered_unhealthy_timeout
7863
7864 • Time taken to promote the secondary
7865
7866 At secondary promotion time, pg_auto_failover waits for the fol‐
7867 lowing timeout to make sure that all pending writes on the primary
7868 server made it to the secondary at shutdown time, thus preventing
7869 data loss.:
7870
7871 pgautofailover.primary_demote_timeout
7872
7873 • Preventing promotion of the secondary
7874
7875 pg_auto_failover implements a trade-off where data availability
7876 trumps service availability. When the primary node of a PostgreSQL
7877 service is detected unhealthy, the secondary is only promoted if
7878 it was known to be eligible at the moment when the primary is
7879 lost.
7880
7881 In the case when synchronous replication was in use at the moment
7882 when the primary node is lost, then we know we can switch to the
7883 secondary safely, and the wal lag is 0 in that case.
7884
7885 In the case when the secondary server had been detected unhealthy
7886 before, then the pg_auto_failover monitor switches it from the
7887 state SECONDARY to the state CATCHING-UP and promotion is pre‐
7888 vented then.
7889
7890 The following setting allows to still promote the secondary, al‐
7891 lowing for a window of data loss:
7892
7893 pgautofailover.promote_wal_log_threshold
7894
7895 pg_auto_failover Monitor
7896 The configuration for the behavior of the monitor happens in the Post‐
7897 greSQL database where the extension has been deployed:
7898
7899 pg_auto_failover=> select name, setting, unit, short_desc from pg_settings where name ~ 'pgautofailover.';
7900 -[ RECORD 1 ]----------------------------------------------------------------------------------------------------
7901 name | pgautofailover.enable_sync_wal_log_threshold
7902 setting | 16777216
7903 unit |
7904 short_desc | Don't enable synchronous replication until secondary xlog is within this many bytes of the primary's
7905 -[ RECORD 2 ]----------------------------------------------------------------------------------------------------
7906 name | pgautofailover.health_check_max_retries
7907 setting | 2
7908 unit |
7909 short_desc | Maximum number of re-tries before marking a node as failed.
7910 -[ RECORD 3 ]----------------------------------------------------------------------------------------------------
7911 name | pgautofailover.health_check_period
7912 setting | 5000
7913 unit | ms
7914 short_desc | Duration between each check (in milliseconds).
7915 -[ RECORD 4 ]----------------------------------------------------------------------------------------------------
7916 name | pgautofailover.health_check_retry_delay
7917 setting | 2000
7918 unit | ms
7919 short_desc | Delay between consecutive retries.
7920 -[ RECORD 5 ]----------------------------------------------------------------------------------------------------
7921 name | pgautofailover.health_check_timeout
7922 setting | 5000
7923 unit | ms
7924 short_desc | Connect timeout (in milliseconds).
7925 -[ RECORD 6 ]----------------------------------------------------------------------------------------------------
7926 name | pgautofailover.node_considered_unhealthy_timeout
7927 setting | 20000
7928 unit | ms
7929 short_desc | Mark node unhealthy if last ping was over this long ago
7930 -[ RECORD 7 ]----------------------------------------------------------------------------------------------------
7931 name | pgautofailover.primary_demote_timeout
7932 setting | 30000
7933 unit | ms
7934 short_desc | Give the primary this long to drain before promoting the secondary
7935 -[ RECORD 8 ]----------------------------------------------------------------------------------------------------
7936 name | pgautofailover.promote_wal_log_threshold
7937 setting | 16777216
7938 unit |
7939 short_desc | Don't promote secondary unless xlog is with this many bytes of the master
7940 -[ RECORD 9 ]----------------------------------------------------------------------------------------------------
7941 name | pgautofailover.startup_grace_period
7942 setting | 10000
7943 unit | ms
7944 short_desc | Wait for at least this much time after startup before initiating a failover.
7945
7946 You can edit the parameters as usual with PostgreSQL, either in the
7947 postgresql.conf file or using ALTER DATABASE pg_auto_failover SET pa‐
7948 rameter = value; commands, then issuing a reload.
7949
7950 pg_auto_failover Keeper Service
7951 For an introduction to the pg_autoctl commands relevant to the
7952 pg_auto_failover Keeper configuration, please see pg_autoctl config.
7953
7954 An example configuration file looks like the following:
7955
7956 [pg_autoctl]
7957 role = keeper
7958 monitor = postgres://autoctl_node@192.168.1.34:6000/pg_auto_failover
7959 formation = default
7960 group = 0
7961 hostname = node1.db
7962 nodekind = standalone
7963
7964 [postgresql]
7965 pgdata = /data/pgsql/
7966 pg_ctl = /usr/pgsql-10/bin/pg_ctl
7967 dbname = postgres
7968 host = /tmp
7969 port = 5000
7970
7971 [replication]
7972 slot = pgautofailover_standby
7973 maximum_backup_rate = 100M
7974 backup_directory = /data/backup/node1.db
7975
7976 [timeout]
7977 network_partition_timeout = 20
7978 postgresql_restart_failure_timeout = 20
7979 postgresql_restart_failure_max_retries = 3
7980
7981 To output, edit and check entries of the configuration, the following
7982 commands are provided:
7983
7984 pg_autoctl config check [--pgdata <pgdata>]
7985 pg_autoctl config get [--pgdata <pgdata>] section.option
7986 pg_autoctl config set [--pgdata <pgdata>] section.option value
7987
7988 The [postgresql] section is discovered automatically by the pg_autoctl
7989 command and is not intended to be changed manually.
7990
7991 pg_autoctl.monitor
7992
7993 PostgreSQL service URL of the pg_auto_failover monitor, as given in the
7994 output of the pg_autoctl show uri command.
7995
7996 pg_autoctl.formation
7997
7998 A single pg_auto_failover monitor may handle several postgres forma‐
7999 tions. The default formation name default is usually fine.
8000
8001 pg_autoctl.group
8002
8003 This information is retrieved by the pg_auto_failover keeper when reg‐
8004 istering a node to the monitor, and should not be changed afterwards.
8005 Use at your own risk.
8006
8007 pg_autoctl.hostname
8008
8009 Node hostname used by all the other nodes in the cluster to contact
8010 this node. In particular, if this node is a primary then its standby
8011 uses that address to setup streaming replication.
8012
8013 replication.slot
8014
8015 Name of the PostgreSQL replication slot used in the streaming replica‐
8016 tion setup automatically deployed by pg_auto_failover. Replication
8017 slots can't be renamed in PostgreSQL.
8018
8019 replication.maximum_backup_rate
8020
8021 When pg_auto_failover (re-)builds a standby node using the pg_base‐
8022 backup command, this parameter is given to pg_basebackup to throttle
8023 the network bandwidth used. Defaults to 100Mbps.
8024
8025 replication.backup_directory
8026
8027 When pg_auto_failover (re-)builds a standby node using the pg_base‐
8028 backup command, this parameter is the target directory where to copy
8029 the bits from the primary server. When the copy has been successful,
8030 then the directory is renamed to postgresql.pgdata.
8031
8032 The default value is computed from ${PGDATA}/../backup/${hostname} and
8033 can be set to any value of your preference. Remember that the directory
8034 renaming is an atomic operation only when both the source and the tar‐
8035 get of the copy are in the same filesystem, at least in Unix systems.
8036
8037 timeout
8038
8039 This section allows to setup the behavior of the pg_auto_failover
8040 keeper in interesting scenarios.
8041
8042 timeout.network_partition_timeout
8043
8044 Timeout in seconds before we consider failure to communicate with other
8045 nodes indicates a network partition. This check is only done on a PRI‐
8046 MARY server, so other nodes mean both the monitor and the standby.
8047
8048 When a PRIMARY node is detected to be on the losing side of a network
8049 partition, the pg_auto_failover keeper enters the DEMOTE state and
8050 stops the PostgreSQL instance in order to protect against split brain
8051 situations.
8052
8053 The default is 20s.
8054
8055 timeout.postgresql_restart_failure_timeout
8056
8057 timeout.postgresql_restart_failure_max_retries
8058
8059 When PostgreSQL is not running, the first thing the pg_auto_failover
8060 keeper does is try to restart it. In case of a transient failure (e.g.
8061 file system is full, or other dynamic OS resource constraint), the best
8062 course of action is to try again for a little while before reaching out
8063 to the monitor and ask for a failover.
8064
8065 The pg_auto_failover keeper tries to restart PostgreSQL timeout.post‐
8066 gresql_restart_failure_max_retries times in a row (default 3) or up to
8067 timeout.postgresql_restart_failure_timeout (defaults 20s) since it de‐
8068 tected that PostgreSQL is not running, whichever comes first.
8069
8071 This section is not yet complete. Please contact us with any questions.
8072
8073 Deployment
8074 pg_auto_failover is a general purpose tool for setting up PostgreSQL
8075 replication in order to implement High Availability of the PostgreSQL
8076 service.
8077
8078 Provisioning
8079 It is also possible to register pre-existing PostgreSQL instances with
8080 a pg_auto_failover monitor. The pg_autoctl create command honors the
8081 PGDATA environment variable, and checks whether PostgreSQL is already
8082 running. If Postgres is detected, the new node is registered in SINGLE
8083 mode, bypassing the monitor's role assignment policy.
8084
8085 Postgres configuration management
8086 The pg_autoctl create postgres command edits the default Postgres con‐
8087 figuration file (postgresql.conf) to include pg_auto_failover settings.
8088
8089 The include directive is placed on the top of the postgresql.conf file
8090 in a way that you may override any setting by editing it later in the
8091 file.
8092
8093 Unless using the --skip-pg-hba option then pg_autoctl edits a minimal
8094 set of HBA rules for you, in order for the pg_auto_failover nodes to be
8095 able to connect to each other. The HBA rules that are needed for your
8096 application to connect to your Postgres nodes still need to be added.
8097 As pg_autoctl knows nothing about your applications, then you are re‐
8098 sponsible for editing the HBA file.
8099
8100 Upgrading pg_auto_failover, from versions 1.4 onward
8101 When upgrading a pg_auto_failover setup, the procedure is different on
8102 the monitor and on the Postgres nodes:
8103
8104 • on the monitor, the internal pg_auto_failover database schema
8105 might have changed and needs to be upgraded to its new definition,
8106 porting the existing data over. The pg_auto_failover database con‐
8107 tains the registration of every node in the system and their cur‐
8108 rent state.
8109 It is not possible to trigger a failover during the monitor
8110 update. Postgres operations on the Postgres nodes continue
8111 normally.
8112
8113 During the restart of the monitor, the other nodes might have
8114 trouble connecting to the monitor. The pg_autoctl command is
8115 designed to retry connecting to the monitor and handle errors
8116 gracefully.
8117
8118 • on the Postgres nodes, the pg_autoctl command connects to the mon‐
8119 itor every once in a while (every second by default), and then
8120 calls the node_active protocol, a stored procedure in the monitor
8121 databases.
8122 The pg_autoctl also verifies at each connection to the monitor
8123 that it's running the expected version of the extension. When
8124 that's not the case, the "node-active" sub-process quits, to
8125 be restarted with the possibly new version of the pg_autoctl
8126 binary found on-disk.
8127
8128 As a result, here is the standard upgrade plan for pg_auto_failover:
8129
8130 1. Upgrade the pg_auto_failover package on the all the nodes, moni‐
8131 tor included.
8132 When using a debian based OS, this looks like the following
8133 command when from 1.4 to 1.5:
8134
8135 sudo apt-get remove pg-auto-failover-cli-1.4 postgresql-11-auto-failover-1.4
8136 sudo apt-get install -q -y pg-auto-failover-cli-1.5 postgresql-11-auto-failover-1.5
8137
8138 2. Restart the pgautofailover service on the monitor.
8139 When using the systemd integration, all we need to do is:
8140
8141 sudo systemctl restart pgautofailover
8142
8143 Then we may use the following commands to make sure that the
8144 service is running as expected:
8145
8146 sudo systemctl status pgautofailover
8147 sudo journalctl -u pgautofailover
8148
8149 At this point it is expected that the pg_autoctl logs show
8150 that an upgrade has been performed by using the ALTER EXTEN‐
8151 SION pgautofailover UPDATE TO ... command. The monitor is
8152 ready with the new version of pg_auto_failover.
8153
8154 When the Postgres nodes pg_autoctl process connects to the new monitor
8155 version, the check for version compatibility fails, and the "node-ac‐
8156 tive" sub-process exits. The main pg_autoctl process supervisor then
8157 restart the "node-active" sub-process from its on-disk binary exe‐
8158 cutable file, which has been upgraded to the new version. That's why we
8159 first install the new packages for pg_auto_failover on every node, and
8160 only then restart the monitor.
8161
8162 IMPORTANT:
8163 Before upgrading the monitor, which is a simple restart of the
8164 pg_autoctl process, it is important that the OS packages for pgauto‐
8165 failover be updated on all the Postgres nodes.
8166
8167 When that's not the case, pg_autoctl on the Postgres nodes will
8168 still detect a version mismatch with the monitor extension, and the
8169 "node-active" sub-process will exit. And when restarted automati‐
8170 cally, the same version of the local pg_autoctl binary executable is
8171 found on-disk, leading to the same version mismatch with the monitor
8172 extension.
8173
8174 After restarting the "node-active" process 5 times, pg_autoctl quits
8175 retrying and stops. This includes stopping the Postgres service too,
8176 and a service downtime might then occur.
8177
8178 And when the upgrade is done we can use pg_autoctl show state on the
8179 monitor to see that eveything is as expected.
8180
8181 Upgrading from previous pg_auto_failover versions
8182 The new upgrade procedure described in the previous section is part of
8183 pg_auto_failover since version 1.4. When upgrading from a previous ver‐
8184 sion of pg_auto_failover, up to and including version 1.3, then all the
8185 pg_autoctl processes have to be restarted fully.
8186
8187 To prevent triggering a failover during the upgrade, it's best to put
8188 your secondary nodes in maintenance. The procedure then looks like the
8189 following:
8190
8191 1. Enable maintenance on your secondary node(s):
8192
8193 pg_autoctl enable maintenance
8194
8195 2. Upgrade the OS packages for pg_auto_failover on every node, as
8196 per previous section.
8197
8198 3. Restart the monitor to upgrade it to the new pg_auto_failover
8199 version:
8200 When using the systemd integration, all we need to do is:
8201
8202 sudo systemctl restart pgautofailover
8203
8204 Then we may use the following commands to make sure that the
8205 service is running as expected:
8206
8207 sudo systemctl status pgautofailover
8208 sudo journalctl -u pgautofailover
8209
8210 At this point it is expected that the pg_autoctl logs show
8211 that an upgrade has been performed by using the ALTER EXTEN‐
8212 SION pgautofailover UPDATE TO ... command. The monitor is
8213 ready with the new version of pg_auto_failover.
8214
8215 4. Restart pg_autoctl on all Postgres nodes on the cluster.
8216 When using the systemd integration, all we need to do is:
8217
8218 sudo systemctl restart pgautofailover
8219
8220 As in the previous point in this list, make sure the service
8221 is now running as expected.
8222
8223 5. Disable maintenance on your secondary nodes(s):
8224
8225 pg_autoctl disable maintenance
8226
8227 Extension dependencies when upgrading the monitor
8228 Since version 1.4.0 the pgautofailover extension requires the Postgres
8229 contrib extension btree_gist. The pg_autoctl command arranges for the
8230 creation of this dependency, and has been buggy in some releases.
8231
8232 As a result, you might have trouble upgrade the pg_auto_failover moni‐
8233 tor to a recent version. It is possible to fix the error by connecting
8234 to the monitor Postgres database and running the create extension com‐
8235 mand manually:
8236
8237 # create extension btree_gist;
8238
8239 Cluster Management and Operations
8240 It is possible to operate pg_auto_failover formations and groups di‐
8241 rectly from the monitor. All that is needed is an access to the monitor
8242 Postgres database as a client, such as psql. It's also possible to add
8243 those management SQL function calls in your own ops application if you
8244 have one.
8245
8246 For security reasons, the autoctl_node is not allowed to perform main‐
8247 tenance operations. This user is limited to what pg_autoctl needs. You
8248 can either create a specific user and authentication rule to expose for
8249 management, or edit the default HBA rules for the autoctl user. In the
8250 following examples we're directly connecting as the autoctl role.
8251
8252 The main operations with pg_auto_failover are node maintenance and man‐
8253 ual failover, also known as a controlled switchover.
8254
8255 Maintenance of a secondary node
8256 It is possible to put a secondary node in any group in a MAINTENANCE
8257 state, so that the Postgres server is not doing synchronous replication
8258 anymore and can be taken down for maintenance purposes, such as secu‐
8259 rity kernel upgrades or the like.
8260
8261 The command line tool pg_autoctl exposes an API to schedule maintenance
8262 operations on the current node, which must be a secondary node at the
8263 moment when maintenance is requested.
8264
8265 Here's an example of using the maintenance commands on a secondary
8266 node, including the output. Of course, when you try that on your own
8267 nodes, dates and PID information might differ:
8268
8269 $ pg_autoctl enable maintenance
8270 17:49:19 14377 INFO Listening monitor notifications about state changes in formation "default" and group 0
8271 17:49:19 14377 INFO Following table displays times when notifications are received
8272 Time | ID | Host | Port | Current State | Assigned State
8273 ---------+-----+-----------+--------+---------------------+--------------------
8274 17:49:19 | 1 | localhost | 5001 | primary | wait_primary
8275 17:49:19 | 2 | localhost | 5002 | secondary | wait_maintenance
8276 17:49:19 | 2 | localhost | 5002 | wait_maintenance | wait_maintenance
8277 17:49:20 | 1 | localhost | 5001 | wait_primary | wait_primary
8278 17:49:20 | 2 | localhost | 5002 | wait_maintenance | maintenance
8279 17:49:20 | 2 | localhost | 5002 | maintenance | maintenance
8280
8281 The command listens to the state changes in the current node's forma‐
8282 tion and group on the monitor and displays those changes as it receives
8283 them. The operation is done when the node has reached the maintenance
8284 state.
8285
8286 It is now possible to disable maintenance to allow pg_autoctl to manage
8287 this standby node again:
8288
8289 $ pg_autoctl disable maintenance
8290 17:49:26 14437 INFO Listening monitor notifications about state changes in formation "default" and group 0
8291 17:49:26 14437 INFO Following table displays times when notifications are received
8292 Time | ID | Host | Port | Current State | Assigned State
8293 ---------+-----+-----------+--------+---------------------+--------------------
8294 17:49:27 | 2 | localhost | 5002 | maintenance | catchingup
8295 17:49:27 | 2 | localhost | 5002 | catchingup | catchingup
8296 17:49:28 | 2 | localhost | 5002 | catchingup | secondary
8297 17:49:28 | 1 | localhost | 5001 | wait_primary | primary
8298 17:49:28 | 2 | localhost | 5002 | secondary | secondary
8299 17:49:29 | 1 | localhost | 5001 | primary | primary
8300
8301 When a standby node is in maintenance, the monitor sets the primary
8302 node replication to WAIT_PRIMARY: in this role, the PostgreSQL stream‐
8303 ing replication is now asynchronous and the standby PostgreSQL server
8304 may be stopped, rebooted, etc.
8305
8306 Maintenance of a primary node
8307 A primary node must be available at all times in any formation and
8308 group in pg_auto_failover, that is the invariant provided by the whole
8309 solution. With that in mind, the only way to allow a primary node to go
8310 to a maintenance mode is to first failover and promote the secondary
8311 node.
8312
8313 The same command pg_autoctl enable maintenance implements that opera‐
8314 tion when run on a primary node with the option --allow-failover. Here
8315 is an example of such an operation:
8316
8317 $ pg_autoctl enable maintenance
8318 11:53:03 50526 WARN Enabling maintenance on a primary causes a failover
8319 11:53:03 50526 FATAL Please use --allow-failover to allow the command proceed
8320
8321 As we can see the option allow-failover is mandatory. In the next exam‐
8322 ple we use it:
8323
8324 $ pg_autoctl enable maintenance --allow-failover
8325 13:13:42 1614 INFO Listening monitor notifications about state changes in formation "default" and group 0
8326 13:13:42 1614 INFO Following table displays times when notifications are received
8327 Time | ID | Host | Port | Current State | Assigned State
8328 ---------+-----+-----------+--------+---------------------+--------------------
8329 13:13:43 | 2 | localhost | 5002 | primary | prepare_maintenance
8330 13:13:43 | 1 | localhost | 5001 | secondary | prepare_promotion
8331 13:13:43 | 1 | localhost | 5001 | prepare_promotion | prepare_promotion
8332 13:13:43 | 2 | localhost | 5002 | prepare_maintenance | prepare_maintenance
8333 13:13:44 | 1 | localhost | 5001 | prepare_promotion | stop_replication
8334 13:13:45 | 1 | localhost | 5001 | stop_replication | stop_replication
8335 13:13:46 | 1 | localhost | 5001 | stop_replication | wait_primary
8336 13:13:46 | 2 | localhost | 5002 | prepare_maintenance | maintenance
8337 13:13:46 | 1 | localhost | 5001 | wait_primary | wait_primary
8338 13:13:47 | 2 | localhost | 5002 | maintenance | maintenance
8339
8340 When the operation is done we can have the old primary re-join the
8341 group, this time as a secondary:
8342
8343 $ pg_autoctl disable maintenance
8344 13:14:46 1985 INFO Listening monitor notifications about state changes in formation "default" and group 0
8345 13:14:46 1985 INFO Following table displays times when notifications are received
8346 Time | ID | Host | Port | Current State | Assigned State
8347 ---------+-----+-----------+--------+---------------------+--------------------
8348 13:14:47 | 2 | localhost | 5002 | maintenance | catchingup
8349 13:14:47 | 2 | localhost | 5002 | catchingup | catchingup
8350 13:14:52 | 2 | localhost | 5002 | catchingup | secondary
8351 13:14:52 | 1 | localhost | 5001 | wait_primary | primary
8352 13:14:52 | 2 | localhost | 5002 | secondary | secondary
8353 13:14:53 | 1 | localhost | 5001 | primary | primary
8354
8355 Triggering a failover
8356 It is possible to trigger a manual failover, or a switchover, using the
8357 command pg_autoctl perform failover. Here's an example of what happens
8358 when running the command:
8359
8360 $ pg_autoctl perform failover
8361 11:58:00 53224 INFO Listening monitor notifications about state changes in formation "default" and group 0
8362 11:58:00 53224 INFO Following table displays times when notifications are received
8363 Time | ID | Host | Port | Current State | Assigned State
8364 ---------+-----+-----------+--------+--------------------+-------------------
8365 11:58:01 | 1 | localhost | 5001 | primary | draining
8366 11:58:01 | 2 | localhost | 5002 | secondary | prepare_promotion
8367 11:58:01 | 1 | localhost | 5001 | draining | draining
8368 11:58:01 | 2 | localhost | 5002 | prepare_promotion | prepare_promotion
8369 11:58:02 | 2 | localhost | 5002 | prepare_promotion | stop_replication
8370 11:58:02 | 1 | localhost | 5001 | draining | demote_timeout
8371 11:58:03 | 1 | localhost | 5001 | demote_timeout | demote_timeout
8372 11:58:04 | 2 | localhost | 5002 | stop_replication | stop_replication
8373 11:58:05 | 2 | localhost | 5002 | stop_replication | wait_primary
8374 11:58:05 | 1 | localhost | 5001 | demote_timeout | demoted
8375 11:58:05 | 2 | localhost | 5002 | wait_primary | wait_primary
8376 11:58:05 | 1 | localhost | 5001 | demoted | demoted
8377 11:58:06 | 1 | localhost | 5001 | demoted | catchingup
8378 11:58:06 | 1 | localhost | 5001 | catchingup | catchingup
8379 11:58:08 | 1 | localhost | 5001 | catchingup | secondary
8380 11:58:08 | 2 | localhost | 5002 | wait_primary | primary
8381 11:58:08 | 1 | localhost | 5001 | secondary | secondary
8382 11:58:08 | 2 | localhost | 5002 | primary | primary
8383
8384 Again, timings and PID numbers are not expected to be the same when you
8385 run the command on your own setup.
8386
8387 Also note in the output that the command shows the whole set of transi‐
8388 tions including when the old primary is now a secondary node. The data‐
8389 base is available for read-write traffic as soon as we reach the state
8390 wait_primary.
8391
8392 Implementing a controlled switchover
8393 It is generally useful to distinguish a controlled switchover to a
8394 failover. In a controlled switchover situation it is possible to organ‐
8395 ise the sequence of events in a way to avoid data loss and lower down‐
8396 time to a minimum.
8397
8398 In the case of pg_auto_failover, because we use synchronous replica‐
8399 tion, we don't face data loss risks when triggering a manual failover.
8400 Moreover, our monitor knows the current primary health at the time when
8401 the failover is triggered, and drives the failover accordingly.
8402
8403 So to trigger a controlled switchover with pg_auto_failover you can use
8404 the same API as for a manual failover:
8405
8406 $ pg_autoctl perform switchover
8407
8408 Because the subtelties of orchestrating either a controlled switchover
8409 or an unplanned failover are all handled by the monitor, rather than
8410 the client side command line, at the client level the two command
8411 pg_autoctl perform failover and pg_autoctl perform switchover are syn‐
8412 onyms, or aliases.
8413
8414 Current state, last events
8415 The following commands display information from the pg_auto_failover
8416 monitor tables pgautofailover.node and pgautofailover.event:
8417
8418 $ pg_autoctl show state
8419 $ pg_autoctl show events
8420
8421 When run on the monitor, the commands outputs all the known states and
8422 events for the whole set of formations handled by the monitor. When run
8423 on a PostgreSQL node, the command connects to the monitor and outputs
8424 the information relevant to the service group of the local node only.
8425
8426 For interactive debugging it is helpful to run the following command
8427 from the monitor node while e.g. initializing a formation from scratch,
8428 or performing a manual failover:
8429
8430 $ watch pg_autoctl show state
8431
8432 Monitoring pg_auto_failover in Production
8433 The monitor reports every state change decision to a LISTEN/NOTIFY
8434 channel named state. PostgreSQL logs on the monitor are also stored in
8435 a table, pgautofailover.event, and broadcast by NOTIFY in the channel
8436 log.
8437
8438 Replacing the monitor online
8439 When the monitor node is not available anymore, it is possible to cre‐
8440 ate a new monitor node and then switch existing nodes to a new monitor
8441 by using the following commands.
8442
8443 1. Apply the STONITH approach on the old monitor to make sure this
8444 node is not going to show up again during the procedure. This
8445 step is sometimes referred to as “fencing”.
8446
8447 2. On every node, ending with the (current) Postgres primary node
8448 for each group, disable the monitor while pg_autoctl is still
8449 running:
8450
8451 $ pg_autoctl disable monitor --force
8452
8453 3. Create a new monitor node:
8454
8455 $ pg_autoctl create monitor ...
8456
8457 4. On the current primary node first, so that it's registered first
8458 and as a primary still, for each group in your formation(s), en‐
8459 able the monitor online again:
8460
8461 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover
8462
8463 5. On every other (secondary) node, enable the monitor online again:
8464
8465 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover
8466
8467 See pg_autoctl disable monitor and pg_autoctl enable monitor for de‐
8468 tails about those commands.
8469
8470 This operation relies on the fact that a pg_autoctl can be operated
8471 without a monitor, and when reconnecting to a new monitor, this process
8472 reset the parts of the node state that comes from the monitor, such as
8473 the node identifier.
8474
8475 Trouble-Shooting Guide
8476 pg_auto_failover commands can be run repeatedly. If initialization
8477 fails the first time -- for instance because a firewall rule hasn't yet
8478 activated -- it's possible to try pg_autoctl create again.
8479 pg_auto_failover will review its previous progress and repeat idempo‐
8480 tent operations (create database, create extension etc), gracefully
8481 handling errors.
8482
8484 Those questions have been asked in GitHub issues for the project by
8485 several people. If you have more questions, feel free to open a new is‐
8486 sue, and your question and its answer might make it to this FAQ.
8487
8488 I stopped the primary and no failover is happening for 20s to 30s, why?
8489 In order to avoid spurious failovers when the network connectivity is
8490 not stable, pg_auto_failover implements a timeout of 20s before acting
8491 on a node that is known unavailable. This needs to be added to the de‐
8492 lay between health checks and the retry policy.
8493
8494 See the Configuring pg_auto_failover part for more information about
8495 how to setup the different delays and timeouts that are involved in the
8496 decision making.
8497
8498 See also pg_autoctl watch to have a dashboard that helps understanding
8499 the system and what's going on in the moment.
8500
8501 The secondary is blocked in the CATCHING_UP state, what should I do?
8502 In the pg_auto_failover design, the following two things are needed for
8503 the monitor to be able to orchestrate nodes integration completely:
8504
8505 1. Health Checks must be successful
8506
8507 The monitor runs periodic health checks with all the nodes regis‐
8508 tered in the system. Those health checks are Postgres connections
8509 from the monitor to the registered Postgres nodes, and use the
8510 hostname and port as registered.
8511
8512 The pg_autoctl show state commands column Reachable contains
8513 "yes" when the monitor could connect to a specific node, "no"
8514 when this connection failed, and "unknown" when no connection has
8515 been attempted yet, since the last startup time of the monitor.
8516
8517 The Reachable column from pg_autoctl show state command output
8518 must show a "yes" entry before a new standby node can be orches‐
8519 trated up to the "secondary" goal state.
8520
8521 2. pg_autoctl service must be running
8522
8523 The pg_auto_failover monitor works by assigning goal states to
8524 individual Postgres nodes. The monitor will not assign a new goal
8525 state until the current one has been reached.
8526
8527 To implement a transition from the current state to the goal
8528 state assigned by the monitor, the pg_autoctl service must be
8529 running on every node.
8530
8531 When your new standby node stays in the "catchingup" state for a long
8532 time, please check that the node is reachable from the monitor given
8533 its hostname and port known on the monitor, and check that the pg_au‐
8534 toctl run command is running for this node.
8535
8536 When things are not obvious, the next step is to go read the logs. Both
8537 the output of the pg_autoctl command and the Postgres logs are rele‐
8538 vant. See the Should I read the logs? Where are the logs? question for
8539 details.
8540
8541 Should I read the logs? Where are the logs?
8542 Yes. If anything seems strange to you, please do read the logs.
8543
8544 As maintainers of the pg_autoctl tool, we can't foresee everything that
8545 may happen to your production environment. Still, a lot of efforts is
8546 spent on having a meaningful output. So when you're in a situation
8547 that's hard to understand, please make sure to read the pg_autoctl logs
8548 and the Postgres logs.
8549
8550 When using systemd integration, the pg_autoctl logs are then handled
8551 entirely by the journal facility of systemd. Please then refer to jour‐
8552 nalctl for viewing the logs.
8553
8554 The Postgres logs are to be found in the $PGDATA/log directory with the
8555 default configuration deployed by pg_autoctl create .... When a custom
8556 Postgres setup is used, please refer to your actual setup to find Post‐
8557 gres logs.
8558
8559 The state of the system is blocked, what should I do?
8560 This question is a general case situation that is similar in nature to
8561 the previous situation, reached when adding a new standby to a group of
8562 Postgres nodes. Please check the same two elements: the monitor health
8563 checks are successful, and the pg_autoctl run command is running.
8564
8565 When things are not obvious, the next step is to go read the logs. Both
8566 the output of the pg_autoctl command and the Postgres logs are rele‐
8567 vant. See the Should I read the logs? Where are the logs? question for
8568 details.
8569
8570 Impossible / unresolveable state after crash - How to recover?
8571 The pg_auto_failover Failover State Machine is great to simplify node
8572 management and orchestrate multi-nodes operations such as a switchover
8573 or a failover. That said, it might happen that the FSM is unable to
8574 proceed in some cases, usually after a hard crash of some components of
8575 the system, and mostly due to bugs.
8576
8577 Even if we have an extensive test suite to prevent such bugs from hap‐
8578 pening, you might have to deal with a situation that the monitor
8579 doesn't know how to solve.
8580
8581 The FSM has been designed with a last resort operation mode. It is al‐
8582 ways possible to unregister a node from the monitor with the pg_autoctl
8583 drop node command. This helps the FSM getting back to a simpler situa‐
8584 tion, the simplest possible one being when only one node is left regis‐
8585 tered in a given formation and group (state is then SINGLE).
8586
8587 When the monitor is back on its feet again, then you may add your nodes
8588 again with the pg_autoctl create postgres command. The command under‐
8589 stands that a Postgres service is running and will recover from where
8590 you left.
8591
8592 In some cases you might have to also delete the local pg_autoctl state
8593 file, error messages will instruct you about the situation.
8594
8595 The monitor is a SPOF in pg_auto_failover design, how should we handle
8596 that?
8597 When using pg_auto_failover, the monitor is needed to make decisions
8598 and orchestrate changes in all the registered Postgres groups. Deci‐
8599 sions are transmitted to the Postgres nodes by the monitor assigning
8600 nodes a goal state which is different from their current state.
8601
8602 Consequences of the monitor being unavailable
8603 Nodes contact the monitor each second and call the node_active stored
8604 procedure, which returns a goal state that is possibly different from
8605 the current state.
8606
8607 The monitor only assigns Postgres nodes with a new goal state when a
8608 cluster wide operation is needed. In practice, only the following oper‐
8609 ations require the monitor to assign a new goal state to a Postgres
8610 node:
8611
8612 • a new node is registered
8613
8614 • a failover needs to happen, either triggered automatically or man‐
8615 ually
8616
8617 • a node is being put to maintenance
8618
8619 • a node replication setting is being changed.
8620
8621 When the monitor node is not available, the pg_autoctl processes on the
8622 Postgres nodes will fail to contact the monitor every second, and log
8623 about this failure. Adding to that, no orchestration is possible.
8624
8625 The Postgres streaming replication does not need the monitor to be
8626 available in order to deliver its service guarantees to your applica‐
8627 tion, so your Postgres service is still available when the monitor is
8628 not available.
8629
8630 To repair your installation after having lost a monitor, the following
8631 scenarios are to be considered.
8632
8633 The monitor node can be brought up again without data having been lost
8634 This is typically the case in Cloud Native environments such as Kuber‐
8635 netes, where you could have a service migrated to another pod and
8636 re-attached to its disk volume. This scenario is well supported by
8637 pg_auto_failover, and no intervention is needed.
8638
8639 It is also possible to use synchronous archiving with the monitor so
8640 that it's possible to recover from the current archives and continue
8641 operating without intervention on the Postgres nodes, except for updat‐
8642 ing their monitor URI. This requires an archiving setup that uses syn‐
8643 chronous replication so that any transaction committed on the monitor
8644 is known to have been replicated in your WAL archive.
8645
8646 At the moment, you have to take care of that setup yourself. Here's a
8647 quick summary of what needs to be done:
8648
8649 1. Schedule base backups
8650
8651 Use pg_basebackup every once in a while to have a full copy of
8652 the monitor Postgres database available.
8653
8654 2. Archive WAL files in a synchronous fashion
8655
8656 Use pg_receivewal --sync ... as a service to keep a WAL archive
8657 in sync with the monitor Postgres instance at all time.
8658
8659 3. Prepare a recovery tool on top of your archiving strategy
8660
8661 Write a utility that knows how to create a new monitor node from
8662 your most recent pg_basebackup copy and the WAL files copy.
8663
8664 Bonus points if that tool/script is tested at least once a day,
8665 so that you avoid surprises on the unfortunate day that you actu‐
8666 ally need to use it in production.
8667
8668 A future version of pg_auto_failover will include this facility, but
8669 the current versions don't.
8670
8671 The monitor node can only be built from scratch again
8672 If you don't have synchronous archiving for the monitor set-up, then
8673 you might not be able to restore a monitor database with the expected
8674 up-to-date node metadata. Specifically we need the nodes state to be in
8675 sync with what each pg_autoctl process has received the last time they
8676 could contact the monitor, before it has been unavailable.
8677
8678 It is possible to register nodes that are currently running to a new
8679 monitor without restarting Postgres on the primary. For that, the pro‐
8680 cedure mentioned in Replacing the monitor online must be followed, us‐
8681 ing the following commands:
8682
8683 $ pg_autoctl disable monitor
8684 $ pg_autoctl enable monitor
8685
8687 Microsoft
8688
8690 Copyright (c) Microsoft Corporation. All rights reserved.
8691
8692
8693
8694
86952.0 Sep 13, 2023 PG_AUTO_FAILOVER(1)