1PG_AUTO_FAILOVER(1) pg_auto_failover PG_AUTO_FAILOVER(1)
2
3
4
6 pg_auto_failover - pg_auto_failover Documentation
7
9 pg_auto_failover is an extension for PostgreSQL that monitors and man‐
10 ages failover for a postgres clusters. It is optimised for simplicity
11 and correctness.
12
13 Single Standby Architecture
14 [image: pg_auto_failover Architecture with a primary and a standby
15 node] [image] pg_auto_failover architecture with a primary and a
16 standby node.UNINDENT
17
18 pg_auto_failover implements Business Continuity for your PostgreSQL
19 services. pg_auto_failover implements a single PostgreSQL service us‐
20 ing multiple nodes with automated failover, and automates PostgreSQL
21 maintenance operations in a way that guarantees availability of the
22 service to its users and applications.
23
24 To that end, pg_auto_failover uses three nodes (machines, servers)
25 per PostgreSQL service:
26
27 • a PostgreSQL primary node,
28
29 • a PostgreSQL secondary node, using Synchronous Hot Standby,
30
31 • a pg_auto_failover Monitor node that acts both as a witness and an
32 orchestrator.
33
34 The pg_auto_failover Monitor implements a state machine and relies on
35 in-core PostgreSQL facilities to deliver HA. For example. when the sec‐
36 ondary node is detected to be unavailable, or when its lag is reported
37 above a defined threshold (the default is 1 WAL files, or 16MB, see the
38 pgautofailover.promote_wal_log_threshold GUC on the pg_auto_failover
39 monitor), then the Monitor removes it from the synchro‐
40 nous_standby_names setting on the primary node. Until the secondary is
41 back to being monitored healthy, failover and switchover operations are
42 not allowed, preventing data loss.
43
44 Multiple Standby Architecture
45 [image: pg_auto_failover Architecture for a standalone PostgreSQL
46 service] [image] pg_auto_failover architecture with a primary and two
47 standby nodes.UNINDENT
48
49 In the pictured architecture, pg_auto_failover implements Business
50 Continuity and data availability by implementing a single PostgreSQL
51 service using multiple with automated failover and data redundancy.
52 Even after losing any Postgres node in a production system, this ar‐
53 chitecture maintains two copies of the data on two different nodes.
54
55 When using more than one standby, different architectures can be
56 achieved with pg_auto_failover, depending on the objectives and
57 trade-offs needed for your production setup.
58
59 Multiple Standbys Architecture with 3 standby nodes, one async
60 [image: pg_auto_failover architecture with a primary and three
61 standby nodes] [image] pg_auto_failover architecture with a primary
62 and three standby nodes.UNINDENT
63
64 When setting the three parameters above, it's possible to design very
65 different Postgres architectures for your production needs.
66
67 In this case, the system is setup with two standby nodes participat‐
68 ing in the replication quorum, allowing for number_sync_standbys = 1.
69 The system always maintains a minimum of two copies of the data set:
70 one on the primary, another one on one on either node B or node D.
71 Whenever we lose one of those nodes, we can hold to this guarantee of
72 two copies of the data set.
73
74 Adding to that, we have the standby server C which has been set up to
75 not participate in the replication quorum. Node C will not be found
76 in the synchronous_standby_names list of nodes. Also, node C is set
77 up in a way to never be a candidate for failover, with candidate-pri‐
78 ority = 0.
79
80 This architecture would fit a situation where nodes A, B, and D are
81 deployed in the same data center or availability zone, and node C in
82 another. Those three nodes are set up to support the main production
83 traffic and implement high availability of both the Postgres service
84 and the data set.
85
86 Node C might be set up for Business Continuity in case the first data
87 center is lost, or maybe for reporting the need for deployment on an‐
88 other application domain.
89
91 pg_auto_failover includes the command line tool pg_autoctl that imple‐
92 ments many commands to manage your Postgres nodes. To implement the
93 Postgres architectures described in this documentation, and more, it is
94 generally possible to use only some of the many pg_autoctl commands.
95
96 This section of the documentation is a short introduction to the main
97 commands that are useful when getting started with pg_auto_failover.
98 More commands are available and help deal with a variety of situations,
99 see the Manual Pages for the whole list.
100
101 To understand which replication settings to use in your case, see
102 Architecture Basics section and then the Multi-node Architectures sec‐
103 tion.
104
105 To follow a step by step guide that you can reproduce on your own Azure
106 subscription and create a production Postgres setup from VMs, see the
107 pg_auto_failover Tutorial section.
108
109 To understand how to setup pg_auto_failover in a way that is compliant
110 with your internal security guide lines, read the Security settings for
111 pg_auto_failover section.
112
113 Command line environment, configuration files, etc
114 As a command line tool pg_autoctl depends on some environment vari‐
115 ables. Mostly, the tool re-uses the Postgres environment variables
116 that you might already know.
117
118 To manage a Postgres node pg_auto_failover needs to know its data di‐
119 rectory location on-disk. For that, some users will find it easier to
120 export the PGDATA variable in their environment. The alternative con‐
121 sists of always using the --pgdata option that is available to all the
122 pg_autoctl commands.
123
124 Creating Postgres Nodes
125 To get started with the simplest Postgres failover setup, 3 nodes are
126 needed: the pg_auto_failover monitor, and 2 Postgres nodes that will
127 get assigned roles by the monitor. One Postgres node will be assigned
128 the primary role, the other one will get assigned the secondary role.
129
130 To create the monitor use the command:
131
132 $ pg_autoctl create monitor
133
134 The create the Postgres nodes use the following command on each node
135 you want to create:
136
137 $ pg_autoctl create postgres
138
139 While those create commands initialize your nodes, now you have to ac‐
140 tually run the Postgres service that are expected to be running. For
141 that you can manually run the following command on every node:
142
143 $ pg_autoctl run
144
145 It is also possible (and recommended) to integrate the pg_auto_failover
146 service in your usual service management facility. When using systemd
147 the following commands can be used to produce the unit file configura‐
148 tion required:
149
150 $ pg_autoctl show systemd
151 INFO HINT: to complete a systemd integration, run the following commands:
152 INFO pg_autoctl -q show systemd --pgdata "/tmp/pgaf/m" | sudo tee /etc/systemd/system/pgautofailover.service
153 INFO sudo systemctl daemon-reload
154 INFO sudo systemctl enable pgautofailover
155 INFO sudo systemctl start pgautofailover
156 [Unit]
157 ...
158
159 While it is expected that for a production deployment each node actu‐
160 ally is a separate machine (virtual or physical, or even a container),
161 it is also possible to run several Postgres nodes all on the same ma‐
162 chine for testing or development purposes.
163
164 TIP:
165 When running several pg_autoctl nodes on the same machine for test‐
166 ing or contributing to pg_auto_failover, each Postgres instance
167 needs to run on its own port, and with its own data directory. It
168 can make things easier to then set the environement variables PGDATA
169 and PGPORT in each terminal, shell, or tab where each instance is
170 started.
171
172 Inspecting nodes
173 Once your Postgres nodes have been created, and once each pg_autoctl
174 service is running, it is possible to inspect the current state of the
175 formation with the following command:
176
177 $ pg_autoctl show state
178
179 The pg_autoctl show state commands outputs the current state of the
180 system only once. Sometimes it would be nice to have an auto-updated
181 display such as provided by common tools such as watch(1) or top(1) and
182 the like. For that, the following commands are available (see also
183 pg_autoctl watch):
184
185 $ pg_autoctl watch
186 $ pg_autoctl show state --watch
187
188 To analyze what's been happening to get to the current state, it is
189 possible to review the past events generated by the pg_auto_failover
190 monitor with the following command:
191
192 $ pg_autoctl show events
193
194 HINT:
195 The pg_autoctl show commands can be run from any node in your sys‐
196 tem. Those command need to connect to the monitor and print the
197 current state or the current known list of events as per the monitor
198 view of the system.
199
200 Use pg_autoctl show state --local to have a view of the local state
201 of a given node without connecting to the monitor Postgres instance.
202
203 The option --json is available in most pg_autoctl commands and
204 switches the output format from a human readable table form to a
205 program friendly JSON pretty-printed output.
206
207 Inspecting and Editing Replication Settings
208 When creating a node it is possible to use the --candidate-priority and
209 the --replication-quorum options to set the replication properties as
210 required by your choice of Postgres architecture.
211
212 To review the current replication settings of a formation, use one of
213 the two following commands, which are convenient aliases (the same com‐
214 mand with two ways to invoke it):
215
216 $ pg_autoctl show settings
217 $ pg_autoctl get formation settings
218
219 It is also possible to edit those replication settings at any time
220 while your nodes are in production: you can change your mind or adjust
221 to new elements without having to re-deploy everything. Just use the
222 following commands to adjust the replication settings on the fly:
223
224 $ pg_autoctl set formation number-sync-standbys
225 $ pg_autoctl set node replication-quorum
226 $ pg_autoctl set node candidate-priority
227
228 IMPORTANT:
229 The pg_autoctl get and pg_autoctl set commands always connect to the
230 monitor Postgres instance.
231
232 The pg_autoctl set command then changes the replication settings on
233 the node registration on the monitor. Then the monitor assigns the
234 APPLY_SETTINGS state to the current primary node in the system for
235 it to apply the new replication settings to its Postgres streaming
236 replication setup.
237
238 As a result, the pg_autoctl set commands requires a stable state in
239 the system to be allowed to proceed. Namely, the current primary
240 node in the system must have both its Current State and its Assigned
241 State set to primary, as per the pg_autoctl show state output.
242
243 Implementing Maintenance Operations
244 When a Postgres node must be taken offline for a maintenance operation,
245 such as e.g. a kernel security upgrade or a minor Postgres update, it
246 is best to make it so that the pg_auto_failover monitor knows about it.
247
248 • For one thing, a node that is known to be in maintenance does not
249 participate in failovers. If you are running with two Postgres
250 nodes, then failover operations are entirely prevented while the
251 standby node is in maintenance.
252
253 • Moreover, depending on your replication settings, enabling mainte‐
254 nance on your standby ensures that the primary node switches to
255 async replication before Postgres is shut down on the secondary,
256 avoiding write queries to be blocked.
257
258 To implement maintenance operations, use the following commands:
259
260 $ pg_autoctl enable maintenance
261 $ pg_autoctl disable maintenance
262
263 The main pg_autoctl run service that is expected to be running in the
264 background should continue to run during the whole maintenance opera‐
265 tion. When a node is in the maintenance state, the pg_autoctl service
266 is not controlling the Postgres service anymore.
267
268 Note that it is possible to enable maintenance on a primary Postgres
269 node, and that operation then requires a failover to happen first. It
270 is possible to have pg_auto_failover orchestrate that for you when us‐
271 ing the command:
272
273 $ pg_autoctl enable maintenance --allow-failover
274
275 IMPORTANT:
276 The pg_autoctl enable and pg_autoctl disable commands requires a
277 stable state in the system to be allowed to proceed. Namely, the
278 current primary node in the system must have both its Current State
279 and its Assigned State set to primary, as per the pg_autoctl show
280 state output.
281
282 Manual failover, switchover, and promotions
283 In the cases when a failover is needed without having an actual node
284 failure, the pg_auto_failover monitor can be used to orchestrate the
285 operation. Use one of the following commands, which are synonyms in the
286 pg_auto_failover design:
287
288 $ pg_autoctl perform failover
289 $ pg_autoctl perform switchover
290
291 Finally, it is also possible to “elect” a new primary node in your for‐
292 mation with the command:
293
294 $ pg_autoctl perform promotion
295
296 IMPORTANT:
297 The pg_autoctl perform commands requires a stable state in the sys‐
298 tem to be allowed to proceed. Namely, the current primary node in
299 the system must have both its Current State and its Assigned State
300 set to primary, as per the pg_autoctl show state output.
301
302 What's next?
303 This section of the documentation is meant to help users get started by
304 focusing on the main commands of the pg_autoctl tool. Each command has
305 many options that can have very small impact, or pretty big impact in
306 terms of security or architecture. Read the rest of the manual to un‐
307 derstand how to best use the many pg_autoctl options to implement your
308 specific Postgres production architecture.
309
311 In this guide we’ll create a primary and secondary Postgres node and
312 set up pg_auto_failover to replicate data between them. We’ll simulate
313 failure in the primary node and see how the system smoothly switches
314 (fails over) to the secondary.
315
316 For illustration, we'll run our databases on virtual machines in the
317 Azure platform, but the techniques here are relevant to any cloud
318 provider or on-premise network. We'll use four virtual machines: a pri‐
319 mary database, a secondary database, a monitor, and an "application."
320 The monitor watches the other nodes’ health, manages global state, and
321 assigns nodes their roles.
322
323 Create virtual network
324 Our database machines need to talk to each other and to the monitor
325 node, so let's create a virtual network.
326
327 az group create \
328 --name ha-demo \
329 --location eastus
330
331 az network vnet create \
332 --resource-group ha-demo \
333 --name ha-demo-net \
334 --address-prefix 10.0.0.0/16
335
336 We need to open ports 5432 (Postgres) and 22 (SSH) between the ma‐
337 chines, and also give ourselves access from our remote IP. We'll do
338 this with a network security group and a subnet.
339
340 az network nsg create \
341 --resource-group ha-demo \
342 --name ha-demo-nsg
343
344 az network nsg rule create \
345 --resource-group ha-demo \
346 --nsg-name ha-demo-nsg \
347 --name ha-demo-ssh-and-pg \
348 --access allow \
349 --protocol Tcp \
350 --direction Inbound \
351 --priority 100 \
352 --source-address-prefixes `curl ifconfig.me` 10.0.1.0/24 \
353 --source-port-range "*" \
354 --destination-address-prefix "*" \
355 --destination-port-ranges 22 5432
356
357 az network vnet subnet create \
358 --resource-group ha-demo \
359 --vnet-name ha-demo-net \
360 --name ha-demo-subnet \
361 --address-prefixes 10.0.1.0/24 \
362 --network-security-group ha-demo-nsg
363
364 Finally add four virtual machines (ha-demo-a, ha-demo-b, ha-demo-moni‐
365 tor, and ha-demo-app). For speed we background the az vm create pro‐
366 cesses and run them in parallel:
367
368 # create VMs in parallel
369 for node in monitor a b app
370 do
371 az vm create \
372 --resource-group ha-demo \
373 --name ha-demo-${node} \
374 --vnet-name ha-demo-net \
375 --subnet ha-demo-subnet \
376 --nsg ha-demo-nsg \
377 --public-ip-address ha-demo-${node}-ip \
378 --image debian \
379 --admin-username ha-admin \
380 --generate-ssh-keys &
381 done
382 wait
383
384 To make it easier to SSH into these VMs in future steps, let's make a
385 shell function to retrieve their IP addresses:
386
387 # run this in your local shell as well
388
389 vm_ip () {
390 az vm list-ip-addresses -g ha-demo -n ha-demo-$1 -o tsv \
391 --query '[] [] .virtualMachine.network.publicIpAddresses[0].ipAddress'
392 }
393
394 # for convenience with ssh
395
396 for node in monitor a b app
397 do
398 ssh-keyscan -H `vm_ip $node` >> ~/.ssh/known_hosts
399 done
400
401 Let's review what we created so far.
402
403 az resource list --output table --query \
404 "[?resourceGroup=='ha-demo'].{ name: name, flavor: kind, resourceType: type, region: location }"
405
406 This shows the following resources:
407
408 Name ResourceType Region
409 ------------------------------- ----------------------------------------------------- --------
410 ha-demo-a Microsoft.Compute/virtualMachines eastus
411 ha-demo-app Microsoft.Compute/virtualMachines eastus
412 ha-demo-b Microsoft.Compute/virtualMachines eastus
413 ha-demo-monitor Microsoft.Compute/virtualMachines eastus
414 ha-demo-appVMNic Microsoft.Network/networkInterfaces eastus
415 ha-demo-aVMNic Microsoft.Network/networkInterfaces eastus
416 ha-demo-bVMNic Microsoft.Network/networkInterfaces eastus
417 ha-demo-monitorVMNic Microsoft.Network/networkInterfaces eastus
418 ha-demo-nsg Microsoft.Network/networkSecurityGroups eastus
419 ha-demo-a-ip Microsoft.Network/publicIPAddresses eastus
420 ha-demo-app-ip Microsoft.Network/publicIPAddresses eastus
421 ha-demo-b-ip Microsoft.Network/publicIPAddresses eastus
422 ha-demo-monitor-ip Microsoft.Network/publicIPAddresses eastus
423 ha-demo-net Microsoft.Network/virtualNetworks eastus
424
425 Install the "pg_autoctl" executable
426 This guide uses Debian Linux, but similar steps will work on other dis‐
427 tributions. All that differs are the packages and paths. See Installing
428 pg_auto_failover.
429
430 The pg_auto_failover system is distributed as a single pg_autoctl bi‐
431 nary with subcommands to initialize and manage a replicated PostgreSQL
432 service. We’ll install the binary with the operating system package
433 manager on all nodes. It will help us run and observe PostgreSQL.
434
435 for node in monitor a b app
436 do
437 az vm run-command invoke \
438 --resource-group ha-demo \
439 --name ha-demo-${node} \
440 --command-id RunShellScript \
441 --scripts \
442 "sudo touch /home/ha-admin/.hushlogin" \
443 "curl https://install.citusdata.com/community/deb.sh | sudo bash" \
444 "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-common" \
445 "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" \
446 "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-11-auto-failover-1.4" \
447 "sudo usermod -a -G postgres ha-admin" &
448 done
449 wait
450
451 Run a monitor
452 The pg_auto_failover monitor is the first component to run. It periodi‐
453 cally attempts to contact the other nodes and watches their health. It
454 also maintains global state that “keepers” on each node consult to de‐
455 termine their own roles in the system.
456
457 # on the monitor virtual machine
458
459 ssh -l ha-admin `vm_ip monitor` -- \
460 pg_autoctl create monitor \
461 --auth trust \
462 --ssl-self-signed \
463 --pgdata monitor \
464 --pgctl /usr/lib/postgresql/11/bin/pg_ctl
465
466 This command initializes a PostgreSQL cluster at the location pointed
467 by the --pgdata option. When --pgdata is omitted, pg_autoctl attempts
468 to use the PGDATA environment variable. If a PostgreSQL instance had
469 already existing in the destination directory, this command would have
470 configured it to serve as a monitor.
471
472 pg_auto_failover, installs the pgautofailover Postgres extension, and
473 grants access to a new autoctl_node user.
474
475 In the Quick Start we use --auth trust to avoid complex security set‐
476 tings. The Postgres trust authentication method is not considered a
477 reasonable choice for production environments. Consider either using
478 the --skip-pg-hba option or --auth scram-sha-256 and then setting up
479 passwords yourself.
480
481 At this point the monitor is created. Now we'll install it as a service
482 with systemd so that it will resume if the VM restarts.
483
484 ssh -T -l ha-admin `vm_ip monitor` << CMD
485 pg_autoctl -q show systemd --pgdata ~ha-admin/monitor > pgautofailover.service
486 sudo mv pgautofailover.service /etc/systemd/system
487 sudo systemctl daemon-reload
488 sudo systemctl enable pgautofailover
489 sudo systemctl start pgautofailover
490 CMD
491
492 Bring up the nodes
493 We’ll create the primary database using the pg_autoctl create subcom‐
494 mand.
495
496 ssh -l ha-admin `vm_ip a` -- \
497 pg_autoctl create postgres \
498 --pgdata ha \
499 --auth trust \
500 --ssl-self-signed \
501 --username ha-admin \
502 --dbname appdb \
503 --hostname ha-demo-a.internal.cloudapp.net \
504 --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
505 --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'
506
507 Notice the user and database name in the monitor connection string --
508 these are what monitor init created. We also give it the path to pg_ctl
509 so that the keeper will use the correct version of pg_ctl in future
510 even if other versions of postgres are installed on the system.
511
512 In the example above, the keeper creates a primary database. It chooses
513 to set up node A as primary because the monitor reports there are no
514 other nodes in the system yet. This is one example of how the keeper is
515 state-based: it makes observations and then adjusts its state, in this
516 case from "init" to "single."
517
518 Also add a setting to trust connections from our "application" VM:
519
520 ssh -T -l ha-admin `vm_ip a` << CMD
521 echo 'hostssl "appdb" "ha-admin" ha-demo-app.internal.cloudapp.net trust' \
522 >> ~ha-admin/ha/pg_hba.conf
523 CMD
524
525 At this point the monitor and primary node are created and running.
526 Next we need to run the keeper. It’s an independent process so that it
527 can continue operating even if the PostgreSQL process goes terminates
528 on the node. We'll install it as a service with systemd so that it will
529 resume if the VM restarts.
530
531 ssh -T -l ha-admin `vm_ip a` << CMD
532 pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
533 sudo mv pgautofailover.service /etc/systemd/system
534 sudo systemctl daemon-reload
535 sudo systemctl enable pgautofailover
536 sudo systemctl start pgautofailover
537 CMD
538
539 Next connect to node B and do the same process. We'll do both steps at
540 once:
541
542 ssh -l ha-admin `vm_ip b` -- \
543 pg_autoctl create postgres \
544 --pgdata ha \
545 --auth trust \
546 --ssl-self-signed \
547 --username ha-admin \
548 --dbname appdb \
549 --hostname ha-demo-b.internal.cloudapp.net \
550 --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
551 --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'
552
553 ssh -T -l ha-admin `vm_ip b` << CMD
554 pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
555 sudo mv pgautofailover.service /etc/systemd/system
556 sudo systemctl daemon-reload
557 sudo systemctl enable pgautofailover
558 sudo systemctl start pgautofailover
559 CMD
560
561 It discovers from the monitor that a primary exists, and then switches
562 its own state to be a hot standby and begins streaming WAL contents
563 from the primary.
564
565 Node communication
566 For convenience, pg_autoctl modifies each node's pg_hba.conf file to
567 allow the nodes to connect to one another. For instance, pg_autoctl
568 added the following lines to node A:
569
570 # automatically added to node A
571
572 hostssl "appdb" "ha-admin" ha-demo-a.internal.cloudapp.net trust
573 hostssl replication "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
574 hostssl "appdb" "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
575
576 For pg_hba.conf on the monitor node pg_autoctl inspects the local net‐
577 work and makes its best guess about the subnet to allow. In our case it
578 guessed correctly:
579
580 # automatically added to the monitor
581
582 hostssl "pg_auto_failover" "autoctl_node" 10.0.1.0/24 trust
583
584 If worker nodes have more ad-hoc addresses and are not in the same sub‐
585 net, it's better to disable pg_autoctl's automatic modification of
586 pg_hba using the --skip-pg-hba command line option during creation. You
587 will then need to edit the hba file by hand. Another reason for manual
588 edits would be to use special authentication methods.
589
590 Watch the replication
591 First let’s verify that the monitor knows about our nodes, and see what
592 states it has assigned them:
593
594 ssh -l ha-admin `vm_ip monitor` pg_autoctl show state --pgdata monitor
595
596 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
597 -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
598 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/3000060 | yes | primary | primary
599 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/3000060 | yes | secondary | secondary
600
601 This looks good. We can add data to the primary, and later see it ap‐
602 pear in the secondary. We'll connect to the database from inside our
603 "app" virtual machine, using a connection string obtained from the mon‐
604 itor.
605
606 ssh -l ha-admin `vm_ip monitor` pg_autoctl show uri --pgdata monitor
607
608 Type | Name | Connection String
609 -----------+---------+-------------------------------
610 monitor | monitor | postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net:5432/pg_auto_failover?sslmode=require
611 formation | default | postgres://ha-demo-b.internal.cloudapp.net:5432,ha-demo-a.internal.cloudapp.net:5432/appdb?target_session_attrs=read-write&sslmode=require
612
613 Now we'll get the connection string and store it in a local environment
614 variable:
615
616 APP_DB_URI=$( \
617 ssh -l ha-admin `vm_ip monitor` \
618 pg_autoctl show uri --formation default --pgdata monitor \
619 )
620
621 The connection string contains both our nodes, comma separated, and in‐
622 cludes the url parameter ?target_session_attrs=read-write telling psql
623 that we want to connect to whichever of these servers supports reads
624 and writes. That will be the primary server.
625
626 # connect to database via psql on the app vm and
627 # create a table with a million rows
628 ssh -l ha-admin -t `vm_ip app` -- \
629 psql "'$APP_DB_URI'" \
630 -c "'CREATE TABLE foo AS SELECT generate_series(1,1000000) bar;'"
631
632 Cause a failover
633 Now that we've added data to node A, let's switch which is considered
634 the primary and which the secondary. After the switch we'll connect
635 again and query the data, this time from node B.
636
637 # initiate failover to node B
638 ssh -l ha-admin -t `vm_ip monitor` \
639 pg_autoctl perform switchover --pgdata monitor
640
641 Once node B is marked "primary" (or "wait_primary") we can connect and
642 verify that the data is still present:
643
644 # connect to database via psql on the app vm
645 ssh -l ha-admin -t `vm_ip app` -- \
646 psql "'$APP_DB_URI'" \
647 -c "'SELECT count(*) FROM foo;'"
648
649 It shows
650
651 count
652 ---------
653 1000000
654
655 Cause a node failure
656 This plot is too boring, time to introduce a problem. We’ll turn off VM
657 for node B (currently the primary after our previous failover) and
658 watch node A get promoted.
659
660 In one terminal let’s keep an eye on events:
661
662 ssh -t -l ha-admin `vm_ip monitor` -- \
663 watch -n 1 -d pg_autoctl show state --pgdata monitor
664
665 In another terminal we’ll turn off the virtual server.
666
667 az vm stop \
668 --resource-group ha-demo \
669 --name ha-demo-b
670
671 After a number of failed attempts to talk to node B, the monitor deter‐
672 mines the node is unhealthy and puts it into the "demoted" state. The
673 monitor promotes node A to be the new primary.
674
675 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
676 -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
677 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/6D4E068 | yes | wait_primary | wait_primary
678 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/6D4E000 | yes | demoted | catchingup
679
680 Node A cannot be considered in full "primary" state since there is no
681 secondary present, but it can still serve client requests. It is marked
682 as "wait_primary" until a secondary appears, to indicate that it's run‐
683 ning without a backup.
684
685 Let's add some data while B is offline.
686
687 # notice how $APP_DB_URI continues to work no matter which node
688 # is serving as primary
689 ssh -l ha-admin -t `vm_ip app` -- \
690 psql "'$APP_DB_URI'" \
691 -c "'INSERT INTO foo SELECT generate_series(1000001, 2000000);'"
692
693 Resurrect node B
694 Run this command to bring node B back online:
695
696 az vm start \
697 --resource-group ha-demo \
698 --name ha-demo-b
699
700 Now the next time the keeper retries its health check, it brings the
701 node back. Node B goes through the state "catchingup" while it updates
702 its data to match A. Once that's done, B becomes a secondary, and A is
703 now a full primary again.
704
705 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
706 -------+-------+--------------------------------------+------------+-----------+---------------------+--------------------
707 node_1 | 1 | ha-demo-a.internal.cloudapp.net:5432 | 0/12000738 | yes | primary | primary
708 node_2 | 2 | ha-demo-b.internal.cloudapp.net:5432 | 0/12000738 | yes | secondary | secondary
709
710 What's more, if we connect directly to the database again, all two mil‐
711 lion rows are still present.
712
713 ssh -l ha-admin -t `vm_ip app` -- \
714 psql "'$APP_DB_URI'" \
715 -c "'SELECT count(*) FROM foo;'"
716
717 It shows
718
719 count
720 ---------
721 2000000
722
724 pg_auto_failover is designed as a simple and robust way to manage auto‐
725 mated Postgres failover in production. On-top of robust operations,
726 pg_auto_failover setup is flexible and allows either Business Continu‐
727 ity or High Availability configurations. pg_auto_failover design in‐
728 cludes configuration changes in a live system without downtime.
729
730 pg_auto_failover is designed to be able to handle a single PostgreSQL
731 service using three nodes. In this setting, the system is resilient to
732 losing any one of three nodes.
733 [image: pg_auto_failover Architecture for a standalone PostgreSQL
734 service] [image] pg_auto_failover Architecture for a standalone Post‐
735 greSQL service.UNINDENT
736
737 It is important to understand that when using only two Postgres nodes
738 then pg_auto_failover is optimized for Business Continuity. In the
739 event of losing a single node, pg_auto_failover is capable of contin‐
740 uing the PostgreSQL service, and prevents any data loss when doing
741 so, thanks to PostgreSQL Synchronous Replication.
742
743 That said, there is a trade-off involved in this architecture. The
744 business continuity bias relaxes replication guarantees for asynchro‐
745 nous replication in the event of a standby node failure. This allows
746 the PostgreSQL service to accept writes when there's a single server
747 available, and opens the service for potential data loss if the pri‐
748 mary server were also to fail.
749
750 The pg_auto_failover Monitor
751 Each PostgreSQL node in pg_auto_failover runs a Keeper process which
752 informs a central Monitor node about notable local changes. Some
753 changes require the Monitor to orchestrate a correction across the
754 cluster:
755
756 • New nodes
757
758 At initialization time, it's necessary to prepare the configura‐
759 tion of each node for PostgreSQL streaming replication, and get
760 the cluster to converge to the nominal state with both a primary
761 and a secondary node in each group. The monitor determines each
762 new node's role
763
764 • Node failure
765
766 The monitor orchestrates a failover when it detects an unhealthy
767 node. The design of pg_auto_failover allows the monitor to shut
768 down service to a previously designated primary node without caus‐
769 ing a "split-brain" situation.
770
771 The monitor is the authoritative node that manages global state and
772 makes changes in the cluster by issuing commands to the nodes' keeper
773 processes. A pg_auto_failover monitor node failure has limited impact
774 on the system. While it prevents reacting to other nodes' failures, it
775 does not affect replication. The PostgreSQL streaming replication
776 setup installed by pg_auto_failover does not depend on having the moni‐
777 tor up and running.
778
779 pg_auto_failover Glossary
780 pg_auto_failover handles a single PostgreSQL service with the following
781 concepts:
782
783 Monitor
784 The pg_auto_failover monitor is a service that keeps track of one or
785 several formations containing groups of nodes.
786
787 The monitor is implemented as a PostgreSQL extension, so when you run
788 the command pg_autoctl create monitor a PostgreSQL instance is initial‐
789 ized, configured with the extension, and started. The monitor service
790 embeds a PostgreSQL instance.
791
792 Formation
793 A formation is a logical set of PostgreSQL services that are managed
794 together.
795
796 It is possible to operate many formations with a single monitor in‐
797 stance. Each formation has a group of Postgres nodes and the FSM or‐
798 chestration implemented by the monitor applies separately to each
799 group.
800
801 Group
802 A group of two PostgreSQL nodes work together to provide a single Post‐
803 greSQL service in a Highly Available fashion. A group consists of a
804 PostgreSQL primary server and a secondary server setup with Hot Standby
805 synchronous replication. Note that pg_auto_failover can orchestrate the
806 whole setting-up of the replication for you.
807
808 In pg_auto_failover versions up to 1.3, a single Postgres group can
809 contain only two Postgres nodes. Starting with pg_auto_failover 1.4,
810 there's no limit to the number of Postgres nodes in a single group.
811 Note that each Postgres instance that belongs to the same group serves
812 the same dataset in its data directory (PGDATA).
813
814 NOTE:
815 The notion of a formation that contains multiple groups in
816 pg_auto_failover is useful when setting up and managing a whole Ci‐
817 tus formation, where the coordinator nodes belong to group zero of
818 the formation, and each Citus worker node becomes its own group and
819 may have Postgres standby nodes.
820
821 Keeper
822 The pg_auto_failover keeper is an agent that must be running on the
823 same server where your PostgreSQL nodes are running. The keeper con‐
824 trols the local PostgreSQL instance (using both the pg_ctl command-line
825 tool and SQL queries), and communicates with the monitor:
826
827 • it sends updated data about the local node, such as the WAL delta
828 in between servers, measured via PostgreSQL statistics views.
829
830 • it receives state assignments from the monitor.
831
832 Also the keeper maintains local state that includes the most recent
833 communication established with the monitor and the other PostgreSQL
834 node of its group, enabling it to detect Network Partitions.
835
836 NOTE:
837 In pg_auto_failover versions up to and including 1.3, the keeper
838 process started with pg_autoctl run manages a separate Postgres in‐
839 stance, running as its own process tree.
840
841 Starting in pg_auto_failover version 1.4, the keeper process
842 (started with pg_autoctl run) runs the Postgres instance as a
843 sub-process of the main pg_autoctl process, allowing tighter control
844 over the Postgres execution. Running the sub-process also makes the
845 solution work better both in container environments (because it's
846 now a single process tree) and with systemd, because it uses a spe‐
847 cific cgroup per service unit.
848
849 Node
850 A node is a server (virtual or physical) that runs PostgreSQL instances
851 and a keeper service. At any given time, any node might be a primary or
852 a secondary Postgres instance. The whole point of pg_auto_failover is
853 to decide this state.
854
855 As a result, refrain from naming your nodes with the role you intend
856 for them. Their roles can change. If they didn't, your system wouldn't
857 need pg_auto_failover!
858
859 State
860 A state is the representation of the per-instance and per-group situa‐
861 tion. The monitor and the keeper implement a Finite State Machine to
862 drive operations in the PostgreSQL groups; allowing pg_auto_failover to
863 implement High Availability with the goal of zero data loss.
864
865 The keeper main loop enforces the current expected state of the local
866 PostgreSQL instance, and reports the current state and some more infor‐
867 mation to the monitor. The monitor uses this set of information and its
868 own health-check information to drive the State Machine and assign a
869 goal state to the keeper.
870
871 The keeper implements the transitions between a current state and a
872 monitor-assigned goal state.
873
874 Client-side HA
875 Implementing client-side High Availability is included in PostgreSQL's
876 driver libpq from version 10 onward. Using this driver, it is possible
877 to specify multiple host names or IP addresses in the same connection
878 string:
879
880 $ psql -d "postgresql://host1,host2/dbname?target_session_attrs=read-write"
881 $ psql -d "postgresql://host1:port2,host2:port2/dbname?target_session_attrs=read-write"
882 $ psql -d "host=host1,host2 port=port1,port2 target_session_attrs=read-write"
883
884 When using either of the syntax above, the psql application attempts to
885 connect to host1, and when successfully connected, checks the tar‐
886 get_session_attrs as per the PostgreSQL documentation of it:
887 If this parameter is set to read-write, only a connection in which
888 read-write transactions are accepted by default is considered ac‐
889 ceptable. The query SHOW transaction_read_only will be sent upon
890 any successful connection; if it returns on, the connection will be
891 closed. If multiple hosts were specified in the connection string,
892 any remaining servers will be tried just as if the connection at‐
893 tempt had failed. The default value of this parameter, any, regards
894 all connections as acceptable.
895
896 When the connection attempt to host1 fails, or when the target_ses‐
897 sion_attrs can not be verified, then the psql application attempts to
898 connect to host2.
899
900 The behavior is implemented in the connection library libpq, so any ap‐
901 plication using it can benefit from this implementation, not just psql.
902
903 When using pg_auto_failover, configure your application connection
904 string to use the primary and the secondary server host names, and set
905 target_session_attrs=read-write too, so that your application automati‐
906 cally connects to the current primary, even after a failover occurred.
907
908 Monitoring protocol
909 The monitor interacts with the data nodes in 2 ways:
910
911 • Data nodes periodically connect and run SELECT pgauto‐
912 failover.node_active(...) to communicate their current state and
913 obtain their goal state.
914
915 • The monitor periodically connects to all the data nodes to see if
916 they are healthy, doing the equivalent of pg_isready.
917
918 When a data node calls node_active, the state of the node is stored in
919 the pgautofailover.node table and the state machines of both nodes are
920 progressed. The state machines are described later in this readme. The
921 monitor typically only moves one state forward and waits for the
922 node(s) to converge except in failure states.
923
924 If a node is not communicating to the monitor, it will either cause a
925 failover (if node is a primary), disabling synchronous replication (if
926 node is a secondary), or cause the state machine to pause until the
927 node comes back (other cases). In most cases, the latter is harmless,
928 though in some cases it may cause downtime to last longer, e.g. if a
929 standby goes down during a failover.
930
931 To simplify operations, a node is only considered unhealthy if the mon‐
932 itor cannot connect and it hasn't reported its state through node_ac‐
933 tive for a while. This allows, for example, PostgreSQL to be restarted
934 without causing a health check failure.
935
936 Synchronous vs. asynchronous replication
937 By default, pg_auto_failover uses synchronous replication, which means
938 all writes block until at least one standby node has reported receiving
939 them. To handle cases in which the standby fails, the primary switches
940 between two states called wait_primary and primary based on the health
941 of standby nodes, and based on the replication setting num‐
942 ber_sync_standby.
943
944 When in the wait_primary state, synchronous replication is disabled by
945 automatically setting synchronous_standby_names = '' to allow writes to
946 proceed. However doing so also disables failover, since the standby
947 might get arbitrarily far behind. If the standby is responding to
948 health checks and within 1 WAL segment of the primary (by default),
949 synchronous replication is enabled again on the primary by setting syn‐
950 chronous_standby_names = '*' which may cause a short latency spike
951 since writes will then block until the standby has caught up.
952
953 When using several standby nodes with replication quorum enabled, the
954 actual setting for synchronous_standby_names is set to a list of those
955 standby nodes that are set to participate to the replication quorum.
956
957 If you wish to disable synchronous replication, you need to add the
958 following to postgresql.conf:
959
960 synchronous_commit = 'local'
961
962 This ensures that writes return as soon as they are committed on the
963 primary -- under all circumstances. In that case, failover might lead
964 to some data loss, but failover is not initiated if the secondary is
965 more than 10 WAL segments (by default) behind on the primary. During a
966 manual failover, the standby will continue accepting writes from the
967 old primary. The standby will stop accepting writes only if it's fully
968 caught up (most common), the primary fails, or it does not receive
969 writes for 2 minutes.
970
971 A note about performance
972 In some cases the performance impact on write latency when setting syn‐
973 chronous replication makes the application fail to deliver expected
974 performance. If testing or production feedback shows this to be the
975 case, it is beneficial to switch to using asynchronous replication.
976
977 The way to use asynchronous replication in pg_auto_failover is to
978 change the synchronous_commit setting. This setting can be set per
979 transaction, per session, or per user. It does not have to be set glob‐
980 ally on your Postgres instance.
981
982 One way to benefit from that would be:
983
984 alter role fast_and_loose set synchronous_commit to local;
985
986 That way performance-critical parts of the application don't have to
987 wait for the standby nodes. Only use this when you can also lower your
988 data durability guarantees.
989
990 Node recovery
991 When bringing a node back after a failover, the keeper (pg_autoctl run)
992 can simply be restarted. It will also restart postgres if needed and
993 obtain its goal state from the monitor. If the failed node was a pri‐
994 mary and was demoted, it will learn this from the monitor. Once the
995 node reports, it is allowed to come back as a standby by running
996 pg_rewind. If it is too far behind, the node performs a new pg_base‐
997 backup.
998
1000 Pg_auto_failover allows you to have more than one standby node, and of‐
1001 fers advanced control over your production architecture characteris‐
1002 tics.
1003
1004 Architectures with two standby nodes
1005 When adding your second standby node with default settings, you get the
1006 following architecture:
1007 [image: pg_auto_failover architecture with two standby nodes] [image]
1008 pg_auto_failover architecture with two standby nodes.UNINDENT
1009
1010 In this case, three nodes get set up with the same characteristics,
1011 achieving HA for both the Postgres service and the production
1012 dataset. An important setting for this architecture is num‐
1013 ber_sync_standbys.
1014
1015 The replication setting number_sync_standbys sets how many standby
1016 nodes the primary should wait for when committing a transaction. In
1017 order to have a good availability in your system, pg_auto_failover
1018 requires number_sync_standbys + 1 standby nodes participating in the
1019 replication quorum: this allows any standby node to fail without im‐
1020 pact on the system's ability to respect the replication quorum.
1021
1022 When only two nodes are registered in a group on the monitor we have
1023 a primary and a single secondary node. Then number_sync_standbys can
1024 only be set to zero. When adding a second standby node to a
1025 pg_auto_failover group, then the monitor automatically increments
1026 number_sync_standbys to one, as we see in the diagram above.
1027
1028 When number_sync_standbys is set to zero then pg_auto_failover imple‐
1029 ments the Business Continuity setup as seen in Architecture Basics:
1030 synchronous replication is then used as a way to guarantee that
1031 failover can be implemented without data loss.
1032
1033 In more details:
1034
1035 1. With number_sync_standbys set to one, this architecture always
1036 maintains two copies of the dataset: one on the current primary
1037 node (node A in the previous diagram), and one on the standby
1038 that acknowledges the transaction first (either node B or node C
1039 in the diagram).
1040
1041 When one of the standby nodes is unavailable, the second copy of
1042 the dataset can still be maintained thanks to the remaining
1043 standby.
1044
1045 When both the standby nodes are unavailable, then it's no longer
1046 possible to guarantee the replication quorum, and thus writes on
1047 the primary are blocked. The Postgres primary node waits until at
1048 least one standby node acknowledges the transactions locally com‐
1049 mitted, thus degrading your Postgres service to read-only.
1050
1051 0. It is possible to manually set number_sync_standbys to zero when
1052 having registered two standby nodes to the monitor, overriding
1053 the default behavior.
1054
1055 In that case, when the second standby node becomes unhealthy at
1056 the same time as the first standby node, the primary node is as‐
1057 signed the state Wait_primary. In that state, synchronous repli‐
1058 cation is disabled on the primary by setting synchro‐
1059 nous_standby_names to an empty string. Writes are allowed on the
1060 primary, even though there's no extra copy of the production
1061 dataset available at this time.
1062
1063 Setting number_sync_standbys to zero allows data to be written
1064 even when both standby nodes are down. In this case, a single
1065 copy of the production data set is kept and, if the primary was
1066 then to fail, some data will be lost. How much depends on your
1067 backup and recovery mechanisms.
1068
1069 Replication Settings and Postgres Architectures
1070 The entire flexibility of pg_auto_failover can be leveraged with the
1071 following three replication settings:
1072
1073 • Number of sync stanbys
1074
1075 • Replication quorum
1076
1077 • Candidate priority
1078
1079 Number Sync Standbys
1080 This parameter is used by Postgres in the synchronous_standby_names pa‐
1081 rameter: number_sync_standby is the number of synchronous standbys for
1082 whose replies transactions must wait.
1083
1084 This parameter can be set at the formation level in pg_auto_failover,
1085 meaning that it applies to the current primary, and "follows" a
1086 failover to apply to any new primary that might replace the current
1087 one.
1088
1089 To set this parameter to the value <n>, use the following command:
1090
1091 pg_autoctl set formation number-sync-standbys <n>
1092
1093 The default value in pg_auto_failover is zero. When set to zero, the
1094 Postgres parameter synchronous_standby_names can be set to either '*'
1095 or to '':
1096
1097 • synchronous_standby_names = '*' means that any standby may partici‐
1098 pate in the replication quorum for transactions with synchronous_com‐
1099 mit set to on or higher values.
1100
1101 pg_autofailover uses synchronous_standby_names = '*' when there's at
1102 least one standby that is known to be healthy.
1103
1104 • synchronous_standby_names = '' (empty string) disables synchrous com‐
1105 mit and makes all your commits asynchronous, meaning that transaction
1106 commits will not wait for replication. In other words, a single copy
1107 of your production data is maintained when synchronous_standby_names
1108 is set that way.
1109
1110 pg_autofailover uses synchronous_standby_names = '' only when num‐
1111 ber_sync_standbys is set to zero and there's no standby node known
1112 healthy by the monitor.
1113
1114 In order to set number_sync_standbys to a non-zero value,
1115 pg_auto_failover requires that at least number_sync_standbys + 1
1116 standby nodes be registered in the system.
1117
1118 When the first standby node is added to the pg_auto_failover monitor,
1119 the only acceptable value for number_sync_standbys is zero. When a sec‐
1120 ond standby is added that participates in the replication quorum, then
1121 number_sync_standbys is automatically set to one.
1122
1123 The command pg_autoctl set formation number-sync-standbys can be used
1124 to change the value of this parameter in a formation, even when all the
1125 nodes are already running in production. The pg_auto_failover monitor
1126 then sets a transition for the primary to update its local value of
1127 synchronous_standby_names.
1128
1129 Replication Quorum
1130 The replication quorum setting is a boolean and defaults to true, and
1131 can be set per-node. Pg_auto_failover includes a given node in synchro‐
1132 nous_standby_names only when the replication quorum parameter has been
1133 set to true. This means that asynchronous replication will be used for
1134 nodes where replication-quorum is set to false.
1135
1136 It is possible to force asynchronous replication globally by setting
1137 replication quorum to false on all the nodes in a formation. Remember
1138 that failovers will happen, and thus to set your replication settings
1139 on the current primary node too when needed: it is going to be a
1140 standby later.
1141
1142 To set this parameter to either true or false, use one of the following
1143 commands:
1144
1145 pg_autoctl set node replication-quorum true
1146 pg_autoctl set node replication-quorum false
1147
1148 Candidate Priority
1149 The candidate priority setting is an integer that can be set to any
1150 value between 0 (zero) and 100 (one hundred). The default value is 50.
1151 When the pg_auto_failover monitor decides to orchestrate a failover, it
1152 uses each node's candidate priority to pick the new primary node.
1153
1154 When setting the candidate priority of a node down to zero, this node
1155 will never be selected to be promoted as the new primary when a
1156 failover is orchestrated by the monitor. The monitor will instead wait
1157 until another node registered is healthy and in a position to be pro‐
1158 moted.
1159
1160 To set this parameter to the value <n>, use the following command:
1161
1162 pg_autoctl set node candidate-priority <n>
1163
1164 When nodes have the same candidate priority, the monitor then picks the
1165 standby with the most advanced LSN position published to the monitor.
1166 When more than one node has published the same LSN position, a random
1167 one is chosen.
1168
1169 When the candidate for failover has not published the most advanced LSN
1170 position in the WAL, pg_auto_failover orchestrates an intermediate step
1171 in the failover mechanism. The candidate fetches the missing WAL bytes
1172 from one of the standby with the most advanced LSN position prior to
1173 being promoted. Postgres allows this operation thanks to cascading
1174 replication: any standby can be the upstream node for another standby.
1175
1176 It is required at all times that at least two nodes have a non-zero
1177 candidate priority in any pg_auto_failover formation. Otherwise no
1178 failover is possible.
1179
1180 Auditing replication settings
1181 The command pg_autoctl get formation settings (also known as pg_autoctl
1182 show settings) can be used to obtain a summary of all the replication
1183 settings currently in effect in a formation. Still using the first dia‐
1184 gram on this page, we get the following summary:
1185
1186 $ pg_autoctl get formation settings
1187 Context | Name | Setting | Value
1188 ----------+---------+---------------------------+-------------------------------------------------------------
1189 formation | default | number_sync_standbys | 1
1190 primary | node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
1191 node | node_A | replication quorum | true
1192 node | node_B | replication quorum | true
1193 node | node_C | replication quorum | true
1194 node | node_A | candidate priority | 50
1195 node | node_B | candidate priority | 50
1196 node | node_C | candidate priority | 50
1197
1198 We can see that the number_sync_standbys has been used to compute the
1199 current value of the synchronous_standby_names setting on the primary.
1200
1201 Because all the nodes in that example have the same default candidate
1202 priority (50), then pg_auto_failover is using the form ANY 1 with the
1203 list of standby nodes that are currently participating in the replica‐
1204 tion quorum.
1205
1206 The entries in the synchronous_standby_names list are meant to match
1207 the application_name connection setting used in the primary_conninfo,
1208 and the format used by pg_auto_failover there is the format string
1209 "pgautofailover_standby_%d" where %d is replaced by the node id. This
1210 allows keeping the same connection string to the primary when the node
1211 name is changed (using the command pg_autoctl set metadata --name).
1212
1213 Here we can see the node id of each registered Postgres node with the
1214 following command:
1215
1216 $ pg_autoctl show state
1217 Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
1218 -------+-------+----------------+-----------+-----------+---------------------+--------------------
1219 node_A | 1 | localhost:5001 | 0/7002310 | yes | primary | primary
1220 node_B | 2 | localhost:5002 | 0/7002310 | yes | secondary | secondary
1221 node_C | 3 | localhost:5003 | 0/7002310 | yes | secondary | secondary
1222
1223 When setting pg_auto_failover with per formation number_sync_standby
1224 and then per node replication quorum and candidate priority replication
1225 settings, those properties are then used to compute the synchro‐
1226 nous_standby_names value on the primary node. This value is automati‐
1227 cally maintained on the primary by pg_auto_failover, and is updated ei‐
1228 ther when replication settings are changed or when a failover happens.
1229
1230 The other situation when the pg_auto_failover replication settings are
1231 used is a candidate election when a failover happens and there is more
1232 than two nodes registered in a group. Then the node with the highest
1233 candidate priority is selected, as detailed above in the Candidate Pri‐
1234 ority section.
1235
1236 Sample architectures with three standby nodes
1237 When setting the three parameters above, it's possible to design very
1238 different Postgres architectures for your production needs.
1239 [image: pg_auto_failover architecture with three standby nodes] [im‐
1240 age] pg_auto_failover architecture with three standby nodes.UNINDENT
1241
1242 In this case, the system is set up with three standby nodes all set
1243 the same way, with default parameters. The default parameters support
1244 setting number_sync_standbys = 2. This means that Postgres will main‐
1245 tain three copies of the production data set at all times.
1246
1247 On the other hand, if two standby nodes were to fail at the same
1248 time, despite the fact that two copies of the data are still main‐
1249 tained, the Postgres service would be degraded to read-only.
1250
1251 With this architecture diagram, here's the summary that we obtain:
1252
1253 $ pg_autoctl show settings
1254 Context | Name | Setting | Value
1255 ----------+---------+---------------------------+---------------------------------------------------------------------------------------
1256 formation | default | number_sync_standbys | 2
1257 primary | node_A | synchronous_standby_names | 'ANY 2 (pgautofailover_standby_2, pgautofailover_standby_4, pgautofailover_standby_3)'
1258 node | node_A | replication quorum | true
1259 node | node_B | replication quorum | true
1260 node | node_C | replication quorum | true
1261 node | node_D | replication quorum | true
1262 node | node_A | candidate priority | 50
1263 node | node_B | candidate priority | 50
1264 node | node_C | candidate priority | 50
1265 node | node_D | candidate priority | 50
1266
1267 Sample architecture with three standby nodes, one async
1268 [image: pg_auto_failover architecture with three standby nodes, one
1269 async] [image] pg_auto_failover architecture with three standby
1270 nodes, one async.UNINDENT
1271
1272 In this case, the system is set up with two standby nodes participat‐
1273 ing in the replication quorum, allowing for number_sync_standbys = 1.
1274 The system always maintains at least two copies of the data set, one
1275 on the primary, another on either node B or node D. Whenever we lose
1276 one of those nodes, we can hold to the guarantee of having two copies
1277 of the data set.
1278
1279 Additionally, we have the standby server C which has been set up to
1280 not participate in the replication quorum. Node C will not be found
1281 in the synchronous_standby_names list of nodes. Also, node C is set
1282 up to never be a candidate for failover, with candidate-priority = 0.
1283
1284 This architecture would fit a situation with nodes A, B, and D are
1285 deployed in the same data center or availability zone and node C in
1286 another one. Those three nodes are set up to support the main pro‐
1287 duction traffic and implement high availability of both the Postgres
1288 service and the data set.
1289
1290 Node C might be set up for Business Continuity in case the first data
1291 center is lost, or maybe for reporting needs on another application
1292 domain.
1293
1294 With this architecture diagram, here's the summary that we obtain:
1295
1296 pg_autoctl show settings
1297 Context | Name | Setting | Value
1298 ----------+---------+---------------------------+-------------------------------------------------------------
1299 formation | default | number_sync_standbys | 1
1300 primary | node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_4, pgautofailover_standby_2)'
1301 node | node_A | replication quorum | true
1302 node | node_B | replication quorum | true
1303 node | node_C | replication quorum | false
1304 node | node_D | replication quorum | true
1305 node | node_A | candidate priority | 50
1306 node | node_B | candidate priority | 50
1307 node | node_C | candidate priority | 0
1308 node | node_D | candidate priority | 50
1309
1311 Introduction
1312 pg_auto_failover uses a state machine for highly controlled execution.
1313 As keepers inform the monitor about new events (or fail to contact it
1314 at all), the monitor assigns each node both a current state and a goal
1315 state. A node's current state is a strong guarantee of its capabili‐
1316 ties. States themselves do not cause any actions; actions happen during
1317 state transitions. The assigned goal states inform keepers of what
1318 transitions to attempt.
1319
1320 Example of state transitions in a new cluster
1321 A good way to get acquainted with the states is by examining the tran‐
1322 sitions of a cluster from birth to high availability.
1323
1324 After starting a monitor and running keeper init for the first data
1325 node ("node A"), the monitor registers the state of that node as "init"
1326 with a goal state of "single." The init state means the monitor knows
1327 nothing about the node other than its existence because the keeper is
1328 not yet continuously running there to report node health.
1329
1330 Once the keeper runs and reports its health to the monitor, the monitor
1331 assigns it the state "single," meaning it is just an ordinary Postgres
1332 server with no failover. Because there are not yet other nodes in the
1333 cluster, the monitor also assigns node A the goal state of single --
1334 there's nothing that node A's keeper needs to change.
1335
1336 As soon as a new node ("node B") is initialized, the monitor assigns
1337 node A the goal state of "wait_primary." This means the node still has
1338 no failover, but there's hope for a secondary to synchronize with it
1339 soon. To accomplish the transition from single to wait_primary, node
1340 A's keeper adds node B's hostname to pg_hba.conf to allow a hot standby
1341 replication connection.
1342
1343 At the same time, node B transitions into wait_standby with the goal
1344 initially of staying in wait_standby. It can do nothing but wait until
1345 node A gives it access to connect. Once node A has transitioned to
1346 wait_primary, the monitor assigns B the goal of "catchingup," which
1347 gives B's keeper the green light to make the transition from
1348 wait_standby to catchingup. This transition involves running pg_base‐
1349 backup, editing recovery.conf and restarting PostgreSQL in Hot Standby
1350 node.
1351
1352 Node B reports to the monitor when it's in hot standby mode and able to
1353 connect to node A. The monitor then assigns node B the goal state of
1354 "secondary" and A the goal of "primary." Postgres ships WAL logs from
1355 node A and replays them on B. Finally B is caught up and tells the mon‐
1356 itor (specifically B reports its pg_stat_replication.sync_state and WAL
1357 replay lag). At this glorious moment the monitor assigns A the state
1358 primary (goal: primary) and B secondary (goal: secondary).
1359
1360 State reference
1361 The following diagram shows the pg_auto_failover State Machine. It's
1362 missing links to the single state, which can always been reached when
1363 removing all the other nodes.
1364 [image: pg_auto_failover Finite State Machine diagram] [image]
1365 pg_auto_failover Finite State Machine diagram.UNINDENT
1366
1367 In the previous diagram we can see that we have a list of six states
1368 where the application can connect to a read-write Postgres service:
1369 single, wait_primary, primary, prepare_maintenance, and apply_set‐
1370 tings.
1371
1372 Init
1373 A node is assigned the "init" state when it is first registered with
1374 the monitor. Nothing is known about the node at this point beyond its
1375 existence. If no other node has been registered with the monitor for
1376 the same formation and group ID then this node is assigned a goal state
1377 of "single." Otherwise the node has the goal state of "wait_standby."
1378
1379 Single
1380 There is only one node in the group. It behaves as a regular PostgreSQL
1381 instance, with no high availability and no failover. If the administra‐
1382 tor removes a node the other node will revert to the single state.
1383
1384 Wait_primary
1385 Applied to a node intended to be the primary but not yet in that posi‐
1386 tion. The primary-to-be at this point knows the secondary's node name
1387 or IP address, and has granted the node hot standby access in the
1388 pg_hba.conf file.
1389
1390 The wait_primary state may be caused either by a new potential sec‐
1391 ondary being registered with the monitor (good), or an existing sec‐
1392 ondary becoming unhealthy (bad). In the latter case, during the transi‐
1393 tion from primary to wait_primary, the primary node's keeper disables
1394 synchronous replication on the node. It also cancels currently blocked
1395 queries.
1396
1397 Join_primary
1398 Applied to a primary node when another standby is joining the group.
1399 This allows the primary node to apply necessary changes to its HBA
1400 setup before allowing the new node joining the system to run the
1401 pg_basebackup command.
1402
1403 IMPORTANT:
1404 This state has been deprecated, and is no longer assigned to nodes.
1405 Any time we would have used join_primary before, we now use primary
1406 instead.
1407
1408 Primary
1409 A healthy secondary node exists and has caught up with WAL replication.
1410 Specifically, the keeper reports the primary state only when it has
1411 verified that the secondary is reported "sync" in pg_stat_replica‐
1412 tion.sync_state, and with a WAL lag of 0.
1413
1414 The primary state is a strong assurance. It's the only state where we
1415 know we can fail over when required.
1416
1417 During the transition from wait_primary to primary, the keeper also en‐
1418 ables synchronous replication. This means that after a failover the
1419 secondary will be fully up to date.
1420
1421 Wait_standby
1422 Monitor decides this node is a standby. Node must wait until the pri‐
1423 mary has authorized it to connect and setup hot standby replication.
1424
1425 Catchingup
1426 The monitor assigns catchingup to the standby node when the primary is
1427 ready for a replication connection (pg_hba.conf has been properly
1428 edited, connection role added, etc).
1429
1430 The standby node keeper runs pg_basebackup, connecting to the primary's
1431 hostname and port. The keeper then edits recovery.conf and starts Post‐
1432 greSQL in hot standby node.
1433
1434 Secondary
1435 A node with this state is acting as a hot standby for the primary, and
1436 is up to date with the WAL log there. In particular, it is within 16MB
1437 or 1 WAL segment of the primary.
1438
1439 Maintenance
1440 The cluster administrator can manually move a secondary into the main‐
1441 tenance state to gracefully take it offline. The primary will then
1442 transition from state primary to wait_primary, during which time the
1443 secondary will be online to accept writes. When the old primary reaches
1444 the wait_primary state then the secondary is safe to take offline with
1445 minimal consequences.
1446
1447 Prepare_maintenance
1448 The cluster administrator can manually move a primary node into the
1449 maintenance state to gracefully take it offline. The primary then tran‐
1450 sitions to the prepare_maintenance state to make sure the secondary is
1451 not missing any writes. In the prepare_maintenance state, the primary
1452 shuts down.
1453
1454 Wait_maintenance
1455 The custer administrator can manually move a secondary into the mainte‐
1456 nance state to gracefully take it offline. Before reaching the mainte‐
1457 nance state though, we want to switch the primary node to asynchronous
1458 replication, in order to avoid writes being blocked. In the state
1459 wait_maintenance the standby waits until the primary has reached
1460 wait_primary.
1461
1462 Draining
1463 A state between primary and demoted where replication buffers finish
1464 flushing. A draining node will not accept new client writes, but will
1465 continue to send existing data to the secondary.
1466
1467 To implement that with Postgres we actually stop the service. When
1468 stopping, Postgres ensures that the current replication buffers are
1469 flushed correctly to synchronous standbys.
1470
1471 Demoted
1472 The primary keeper or its database were unresponsive past a certain
1473 threshold. The monitor assigns demoted state to the primary to avoid a
1474 split-brain scenario where there might be two nodes that don't communi‐
1475 cate with each other and both accept client writes.
1476
1477 In that state the keeper stops PostgreSQL and prevents it from running.
1478
1479 Demote_timeout
1480 If the monitor assigns the primary a demoted goal state but the primary
1481 keeper doesn't acknowledge transitioning to that state within a timeout
1482 window, then the monitor assigns demote_timeout to the primary.
1483
1484 Most commonly may happen when the primary machine goes silent. The
1485 keeper is not reporting to the monitor.
1486
1487 Stop_replication
1488 The stop_replication state is meant to ensure that the primary goes to
1489 the demoted state before the standby goes to single and accepts writes
1490 (in case the primary can’t contact the monitor anymore). Before promot‐
1491 ing the secondary node, the keeper stops PostgreSQL on the primary to
1492 avoid split-brain situations.
1493
1494 For safety, when the primary fails to contact the monitor and fails to
1495 see the pg_auto_failover connection in pg_stat_replication, then it
1496 goes to the demoted state of its own accord.
1497
1498 Prepare_promotion
1499 The prepare_promotion state is meant to prepare the standby server to
1500 being promoted. This state allows synchronisation on the monitor, mak‐
1501 ing sure that the primary has stopped Postgres before promoting the
1502 secondary, hence preventing split brain situations.
1503
1504 Report_LSN
1505 The report_lsn state is assigned to standby nodes when a failover is
1506 orchestrated and there are several standby nodes. In order to pick the
1507 furthest standby in the replication, pg_auto_failover first needs a
1508 fresh report of the current LSN position reached on each standby node.
1509
1510 When a node reaches the report_lsn state, the replication stream is
1511 stopped, by restarting Postgres without a primary_conninfo. This allows
1512 the primary node to detect Network Partitions, i.e. when the primary
1513 can't connect to the monitor and there's no standby listed in
1514 pg_stat_replication.
1515
1516 Fast_forward
1517 The fast_forward state is assigned to the selected promotion candidate
1518 during a failover when it won the election thanks to the candidate pri‐
1519 ority settings, but the selected node is not the most advanced standby
1520 node as reported in the report_lsn state.
1521
1522 Missing WAL bytes are fetched from one of the most advanced standby
1523 nodes by using Postgres cascading replication features: it is possible
1524 to use any standby node in the primary_conninfo.
1525
1526 Dropped
1527 The dropped state is assigned to a node when the pg_autoctl drop node
1528 command is used. This allows the node to implement specific local ac‐
1529 tions before being entirely removed from the monitor database.
1530
1531 When a node reports reaching the dropped state, the monitor removes its
1532 entry. If a node is not reporting anymore, maybe because it's com‐
1533 pletely unavailable, then it's possible to run the pg_autoctl drop node
1534 --force command, and then the node entry is removed from the monitor.
1535
1536 Failover logic
1537 This section needs to be expanded further, but below is the failover
1538 state machine for each node that is implemented by the monitor:
1539 [image: Node state machine] [image] Node state machine.UNINDENT
1540
1541 Since the state machines of the data nodes always move in tandem, a
1542 pair (group) of data nodes also implicitly has the following state
1543 machine:
1544 [image: Group state machine] [image] Group state machine.UNINDENT
1545
1546 pg_auto_failover keeper's State Machine
1547 When built in TEST mode, it is then possible to use the following com‐
1548 mand to get a visual representation of the Keeper's Finite State Ma‐
1549 chine:
1550
1551 $ PG_AUTOCTL_DEBUG=1 pg_autoctl do fsm gv | dot -Tsvg > fsm.svg
1552
1553 The dot program is part of the Graphviz suite and produces the follow‐
1554 ing output:
1555 [image: Keeper state machine] [image] Keeper State Machine.UNINDENT
1556
1558 At the heart of the pg_auto_failover implementation is a State Machine.
1559 The state machine is driven by the monitor, and its transitions are im‐
1560 plemented in the keeper service, which then reports success to the mon‐
1561 itor.
1562
1563 The keeper is allowed to retry transitions as many times as needed un‐
1564 til they succeed, and reports also failures to reach the assigned state
1565 to the monitor node. The monitor also implements frequent health-checks
1566 targeting the registered PostgreSQL nodes.
1567
1568 When the monitor detects something is not as expected, it takes action
1569 by assigning a new goal state to the keeper, that is responsible for
1570 implementing the transition to this new state, and then reporting.
1571
1572 Unhealthy Nodes
1573 The pg_auto_failover monitor is responsible for running regular
1574 health-checks with every PostgreSQL node it manages. A health-check is
1575 successful when it is able to connect to the PostgreSQL node using the
1576 PostgreSQL protocol (libpq), imitating the pg_isready command.
1577
1578 How frequent those health checks are (20s by default), the PostgreSQL
1579 connection timeout in use (5s by default), and how many times to retry
1580 in case of a failure before marking the node unhealthy (2 by default)
1581 are GUC variables that you can set on the Monitor node itself. Remem‐
1582 ber, the monitor is implemented as a PostgreSQL extension, so the setup
1583 is a set of PostgreSQL configuration settings:
1584
1585 SELECT name, setting
1586 FROM pg_settings
1587 WHERE name ~ 'pgautofailover\.health';
1588 name | setting
1589 -----------------------------------------+---------
1590 pgautofailover.health_check_max_retries | 2
1591 pgautofailover.health_check_period | 20000
1592 pgautofailover.health_check_retry_delay | 2000
1593 pgautofailover.health_check_timeout | 5000
1594 (4 rows)
1595
1596 The pg_auto_failover keeper also reports if PostgreSQL is running as
1597 expected. This is useful for situations where the PostgreSQL server /
1598 OS is running fine and the keeper (pg_autoctl run) is still active, but
1599 PostgreSQL has failed. Situations might include File System is Full on
1600 the WAL disk, some file system level corruption, missing files, etc.
1601
1602 Here's what happens to your PostgreSQL service in case of any sin‐
1603 gle-node failure is observed:
1604
1605 • Primary node is monitored unhealthy
1606
1607 When the primary node is unhealthy, and only when the secondary
1608 node is itself in good health, then the primary node is asked to
1609 transition to the DRAINING state, and the attached secondary is
1610 asked to transition to the state PREPARE_PROMOTION. In this state,
1611 the secondary is asked to catch-up with the WAL traffic from the
1612 primary, and then report success.
1613
1614 The monitor then continues orchestrating the promotion of the
1615 standby: it stops the primary (implementing STONITH in order to
1616 prevent any data loss), and promotes the secondary into being a
1617 primary now.
1618
1619 Depending on the exact situation that triggered the primary un‐
1620 healthy, it's possible that the secondary fails to catch-up with
1621 WAL from it, in that case after the PREPARE_PROMO‐
1622 TION_CATCHUP_TIMEOUT the standby reports success anyway, and the
1623 failover sequence continues from the monitor.
1624
1625 • Secondary node is monitored unhealthy
1626
1627 When the secondary node is unhealthy, the monitor assigns to it
1628 the state CATCHINGUP, and assigns the state WAIT_PRIMARY to the
1629 primary node. When implementing the transition from PRIMARY to
1630 WAIT_PRIMARY, the keeper disables synchronous replication.
1631
1632 When the keeper reports an acceptable WAL difference in the two
1633 nodes again, then the replication is upgraded back to being syn‐
1634 chronous. While a secondary node is not in the SECONDARY state,
1635 secondary promotion is disabled.
1636
1637 • Monitor node has failed
1638
1639 Then the primary and secondary node just work as if you didn't
1640 have setup pg_auto_failover in the first place, as the keeper
1641 fails to report local state from the nodes. Also, health checks
1642 are not performed. It means that no automated failover may happen,
1643 even if needed.
1644
1645 Network Partitions
1646 Adding to those simple situations, pg_auto_failover is also resilient
1647 to Network Partitions. Here's the list of situation that have an impact
1648 to pg_auto_failover behavior, and the actions taken to ensure High
1649 Availability of your PostgreSQL service:
1650
1651 • Primary can't connect to Monitor
1652
1653 Then it could be that either the primary is alone on its side of a
1654 network split, or that the monitor has failed. The keeper decides
1655 depending on whether the secondary node is still connected to the
1656 replication slot, and if we have a secondary, continues to serve
1657 PostgreSQL queries.
1658
1659 Otherwise, when the secondary isn't connected, and after the NET‐
1660 WORK_PARTITION_TIMEOUT has elapsed, the primary considers it might
1661 be alone in a network partition: that's a potential split brain
1662 situation and with only one way to prevent it. The primary stops,
1663 and reports a new state of DEMOTE_TIMEOUT.
1664
1665 The network_partition_timeout can be setup in the keeper's config‐
1666 uration and defaults to 20s.
1667
1668 • Monitor can't connect to Primary
1669
1670 Once all the retries have been done and the timeouts are elapsed,
1671 then the primary node is considered unhealthy, and the monitor be‐
1672 gins the failover routine. This routine has several steps, each of
1673 them allows to control our expectations and step back if needed.
1674
1675 For the failover to happen, the secondary node needs to be healthy
1676 and caught-up with the primary. Only if we timeout while waiting
1677 for the WAL delta to resorb (30s by default) then the secondary
1678 can be promoted with uncertainty about the data durability in the
1679 group.
1680
1681 • Monitor can't connect to Secondary
1682
1683 As soon as the secondary is considered unhealthy then the monitor
1684 changes the replication setting to asynchronous on the primary, by
1685 assigning it the WAIT_PRIMARY state. Also the secondary is as‐
1686 signed the state CATCHINGUP, which means it can't be promoted in
1687 case of primary failure.
1688
1689 As the monitor tracks the WAL delta between the two servers, and
1690 they both report it independently, the standby is eligible to pro‐
1691 motion again as soon as it's caught-up with the primary again, and
1692 at this time it is assigned the SECONDARY state, and the replica‐
1693 tion will be switched back to synchronous.
1694
1695 Failure handling and network partition detection
1696 If a node cannot communicate to the monitor, either because the monitor
1697 is down or because there is a problem with the network, it will simply
1698 remain in the same state until the monitor comes back.
1699
1700 If there is a network partition, it might be that the monitor and sec‐
1701 ondary can still communicate and the monitor decides to promote the
1702 secondary since the primary is no longer responsive. Meanwhile, the
1703 primary is still up-and-running on the other side of the network parti‐
1704 tion. If a primary cannot communicate to the monitor it starts checking
1705 whether the secondary is still connected. In PostgreSQL, the secondary
1706 connection automatically times out after 30 seconds. If last contact
1707 with the monitor and the last time a connection from the secondary was
1708 observed are both more than 30 seconds in the past, the primary con‐
1709 cludes it is on the losing side of a network partition and shuts itself
1710 down. It may be that the secondary and the monitor were actually down
1711 and the primary was the only node that was alive, but we currently do
1712 not have a way to distinguish such a situation. As with consensus al‐
1713 gorithms, availability can only be correctly preserved if at least 2
1714 out of 3 nodes are up.
1715
1716 In asymmetric network partitions, the primary might still be able to
1717 talk to the secondary, while unable to talk to the monitor. During
1718 failover, the monitor therefore assigns the secondary the stop_replica‐
1719 tion state, which will cause it to disconnect from the primary. After
1720 that, the primary is expected to shut down after at least 30 and at
1721 most 60 seconds. To factor in worst-case scenarios, the monitor waits
1722 for 90 seconds before promoting the secondary to become the new pri‐
1723 mary.
1724
1726 We provide native system packages for pg_auto_failover on most popular
1727 Linux distributions.
1728
1729 Use the steps below to install pg_auto_failover on PostgreSQL 11. At
1730 the current time pg_auto_failover is compatible with both PostgreSQL 10
1731 and PostgreSQL 11.
1732
1733 Ubuntu or Debian
1734 Quick install
1735 The following installation method downloads a bash script that auto‐
1736 mates several steps. The full script is available for review at our
1737 package cloud installation instructions page.
1738
1739 # add the required packages to your system
1740 curl https://install.citusdata.com/community/deb.sh | sudo bash
1741
1742 # install pg_auto_failover
1743 sudo apt-get install postgresql-11-auto-failover
1744
1745 # confirm installation
1746 /usr/bin/pg_autoctl --version
1747
1748 Manual Installation
1749 If you'd prefer to install your repo on your system manually, follow
1750 the instructions from package cloud manual installation page. This page
1751 will guide you with the specific details to achieve the 3 steps:
1752
1753 1. install CitusData GnuPG key for its package repository
1754
1755 2. install a new apt source for CitusData packages
1756
1757 3. update your available package list
1758
1759 Then when that's done, you can proceed with installing pg_auto_failover
1760 itself as in the previous case:
1761
1762 # install pg_auto_failover
1763 sudo apt-get install postgresql-11-auto-failover
1764
1765 # confirm installation
1766 /usr/bin/pg_autoctl --version
1767
1768 Fedora, CentOS, or Red Hat
1769 Quick install
1770 The following installation method downloads a bash script that auto‐
1771 mates several steps. The full script is available for review at our
1772 package cloud installation instructions page url.
1773
1774 # add the required packages to your system
1775 curl https://install.citusdata.com/community/rpm.sh | sudo bash
1776
1777 # install pg_auto_failover
1778 sudo yum install -y pg-auto-failover14_12
1779
1780 # confirm installation
1781 /usr/pgsql-12/bin/pg_autoctl --version
1782
1783 Manual installation
1784 If you'd prefer to install your repo on your system manually, follow
1785 the instructions from package cloud manual installation page. This page
1786 will guide you with the specific details to achieve the 3 steps:
1787
1788 1. install the pygpgme yum-utils packages for your distribution
1789
1790 2. install a new RPM reposiroty for CitusData packages
1791
1792 3. update your local yum cache
1793
1794 Then when that's done, you can proceed with installing pg_auto_failover
1795 itself as in the previous case:
1796
1797 # install pg_auto_failover
1798 sudo yum install -y pg-auto-failover14_12
1799
1800 # confirm installation
1801 /usr/pgsql-12/bin/pg_autoctl --version
1802
1803 Installing a pgautofailover Systemd unit
1804 The command pg_autoctl show systemd outputs a systemd unit file that
1805 you can use to setup a boot-time registered service for
1806 pg_auto_failover on your machine.
1807
1808 Here's a sample output from the command:
1809
1810 $ export PGDATA=/var/lib/postgresql/monitor
1811 $ pg_autoctl show systemd
1812 13:44:34 INFO HINT: to complete a systemd integration, run the following commands:
1813 13:44:34 INFO pg_autoctl -q show systemd --pgdata "/var/lib/postgresql/monitor" | sudo tee /etc/systemd/system/pgautofailover.service
1814 13:44:34 INFO sudo systemctl daemon-reload
1815 13:44:34 INFO sudo systemctl start pgautofailover
1816 [Unit]
1817 Description = pg_auto_failover
1818
1819 [Service]
1820 WorkingDirectory = /var/lib/postgresql
1821 Environment = 'PGDATA=/var/lib/postgresql/monitor'
1822 User = postgres
1823 ExecStart = /usr/lib/postgresql/10/bin/pg_autoctl run
1824 Restart = always
1825 StartLimitBurst = 0
1826
1827 [Install]
1828 WantedBy = multi-user.target
1829
1830 Copy/pasting the commands given in the hint output from the command
1831 will enable the pgautofailer service on your system, when using sys‐
1832 temd.
1833
1834 It is important that PostgreSQL is started by pg_autoctl rather than by
1835 systemd itself, as it might be that a failover has been done during a
1836 reboot, for instance, and that once the reboot complete we want the lo‐
1837 cal Postgres to re-join as a secondary node where it used to be a pri‐
1838 mary node.
1839
1841 In order to be able to orchestrate fully automated failovers,
1842 pg_auto_failover needs to be able to establish the following Postgres
1843 connections:
1844
1845 • from the monitor node to each Postgres node to check the node's
1846 “health”
1847
1848 • from each Postgres node to the monitor to implement our node_ac‐
1849 tive protocol and fetch the current assigned state for this node
1850
1851 • from the secondary node to the primary node for Postgres streaming
1852 replication.
1853
1854 Postgres Client authentication is controlled by a configuration file:
1855 pg_hba.conf. This file contains a list of rules where each rule may al‐
1856 low or reject a connection attempt.
1857
1858 For pg_auto_failover to work as intended, some HBA rules need to be
1859 added to each node configuration. You can choose to provision the
1860 pg_hba.conf file yourself thanks to pg_autoctl options' --skip-pg-hba,
1861 or you can use the following options to control which kind of rules are
1862 going to be added for you.
1863
1864 Postgres HBA rules
1865 For your application to be able to connect to the current Postgres pri‐
1866 mary servers, some application specific HBA rules have to be added to
1867 pg_hba.conf. There is no provision for doing that in pg_auto_failover.
1868
1869 In other words, it is expected that you have to edit pg_hba.conf to
1870 open connections for your application needs.
1871
1872 The trust security model
1873 As its name suggests the trust security model is not enabling any kind
1874 of security validation. This setting is popular for testing deployments
1875 though, as it makes it very easy to verify that everything works as in‐
1876 tended before putting security restrictions in place.
1877
1878 To enable a “trust” security model with pg_auto_failover, use the
1879 pg_autoctl option --auth trust when creating nodes:
1880
1881 $ pg_autoctl create monitor --auth trust ...
1882 $ pg_autoctl create postgres --auth trust ...
1883 $ pg_autoctl create postgres --auth trust ...
1884
1885 When using --auth trust pg_autoctl adds new HBA rules in the monitor
1886 and the Postgres nodes to enable connections as seen above.
1887
1888 Authentication with passwords
1889 To setup pg_auto_failover with password for connections, you can use
1890 one of the password based authentication methods supported by Postgres,
1891 such as password or scram-sha-256. We recommend the latter, as in the
1892 following example:
1893
1894 $ pg_autoctl create monitor --auth scram-sha-256 ...
1895
1896 The pg_autoctl does not set the password for you. The first step is to
1897 set the database user password in the monitor database thanks to the
1898 following command:
1899
1900 $ psql postgres://monitor.host/pg_auto_failover
1901 > alter user autoctl_node password 'h4ckm3';
1902
1903 Now that the monitor is ready with our password set for the au‐
1904 toctl_node user, we can use the password in the monitor connection
1905 string used when creating Postgres nodes.
1906
1907 On the primary node, we can create the Postgres setup as usual, and
1908 then set our replication password, that we will use if we are demoted
1909 and then re-join as a standby:
1910
1911 $ pg_autoctl create postgres \
1912 --auth scram-sha-256 \
1913 ... \
1914 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover
1915
1916 $ pg_autoctl config set replication.password h4ckm3m0r3
1917
1918 The second Postgres node is going to be initialized as a secondary and
1919 pg_autoctl then calls pg_basebackup at create time. We need to have the
1920 replication password already set at this time, and we can achieve that
1921 the following way:
1922
1923 $ export PGPASSWORD=h4ckm3m0r3
1924 $ pg_autoctl create postgres \
1925 --auth scram-sha-256 \
1926 ... \
1927 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover
1928
1929 $ pg_autoctl config set replication.password h4ckm3m0r3
1930
1931 Note that you can use The Password File mechanism as discussed in the
1932 Postgres documentation in order to maintain your passwords in a sepa‐
1933 rate file, not in your main pg_auto_failover configuration file. This
1934 also avoids using passwords in the environment and in command lines.
1935
1936 Encryption of network communications
1937 Postgres knows how to use SSL to enable network encryption of all com‐
1938 munications, including authentication with passwords and the whole data
1939 set when streaming replication is used.
1940
1941 To enable SSL on the server an SSL certificate is needed. It could be
1942 as simple as a self-signed certificate, and pg_autoctl creates such a
1943 certificate for you when using --ssl-self-signed command line option:
1944
1945 $ pg_autoctl create monitor --ssl-self-signed ... \
1946 --auth scram-sha-256 ... \
1947 --ssl-mode require \
1948 ...
1949
1950 $ pg_autoctl create postgres --ssl-self-signed ... \
1951 --auth scram-sha-256 ... \
1952 ...
1953
1954 $ pg_autoctl create postgres --ssl-self-signed ... \
1955 --auth scram-sha-256 ... \
1956 ...
1957
1958 In that example we setup SSL connections to encrypt the network traf‐
1959 fic, and we still have to setup an authentication mechanism exactly as
1960 in the previous sections of this document. Here scram-sha-256 has been
1961 selected, and the password will be sent over an encrypted channel.
1962
1963 When using the --ssl-self-signed option, pg_autoctl creates a
1964 self-signed certificate, as per the Postgres documentation at the
1965 Creating Certificates page.
1966
1967 The certificate subject CN defaults to the --hostname parameter, which
1968 can be given explicitely or computed by pg_autoctl as either your host‐
1969 name when you have proper DNS resolution, or your current IP address.
1970
1971 Self-signed certificates provide protection against eavesdropping; this
1972 setup does NOT protect against Man-In-The-Middle attacks nor Imperson‐
1973 ation attacks. See PostgreSQL documentation page SSL Support for de‐
1974 tails.
1975
1976 Using your own SSL certificates
1977 In many cases you will want to install certificates provided by your
1978 local security department and signed by a trusted Certificate Author‐
1979 ity. In that case one solution is to use --skip-pg-hba and do the whole
1980 setup yourself.
1981
1982 It is still possible to give the certificates to pg_auto_failover and
1983 have it handle the Postgres setup for you:
1984
1985 $ pg_autoctl create monitor --ssl-ca-file root.crt \
1986 --ssl-crl-file root.crl \
1987 --server-cert server.crt \
1988 --server-key server.key \
1989 --ssl-mode verify-full \
1990 ...
1991
1992 $ pg_autoctl create postgres --ssl-ca-file root.crt \
1993 --server-cert server.crt \
1994 --server-key server.key \
1995 --ssl-mode verify-full \
1996 ...
1997
1998 $ pg_autoctl create postgres --ssl-ca-file root.crt \
1999 --server-cert server.crt \
2000 --server-key server.key \
2001 --ssl-mode verify-full \
2002 ...
2003
2004 The option --ssl-mode can be used to force connection strings used by
2005 pg_autoctl to contain your prefered ssl mode. It defaults to require
2006 when using --ssl-self-signed and to allow when --no-ssl is used. Here,
2007 we set --ssl-mode to verify-full which requires SSL Certificates Au‐
2008 thentication, covered next.
2009
2010 The default --ssl-mode when providing your own certificates (signed by
2011 your trusted CA) is then verify-full. This setup applies to the client
2012 connection where the server identity is going to be checked against the
2013 root certificate provided with --ssl-ca-file and the revocation list
2014 optionally provided with the --ssl-crl-file. Both those files are used
2015 as the respective parameters sslrootcert and sslcrl in pg_autoctl con‐
2016 nection strings to both the monitor and the streaming replication pri‐
2017 mary server.
2018
2019 SSL Certificates Authentication
2020 Given those files, it is then possible to use certificate based authen‐
2021 tication of client connections. For that, it is necessary to prepare
2022 client certificates signed by your root certificate private key and us‐
2023 ing the target user name as its CN, as per Postgres documentation for
2024 Certificate Authentication:
2025 The cn (Common Name) attribute of the certificate will be compared
2026 to the requested database user name, and if they match the login
2027 will be allowed
2028
2029 For enabling the cert authentication method with pg_auto_failover, you
2030 need to prepare a Client Certificate for the user postgres and used by
2031 pg_autoctl when connecting to the monitor, to place in ~/.post‐
2032 gresql/postgresql.crt along with its key ~/.postgresql/postgresql.key,
2033 in the home directory of the user that runs the pg_autoctl service
2034 (which defaults to postgres).
2035
2036 Then you need to create a user name map as documented in Postgres page
2037 User Name Maps so that your certificate can be used to authenticate
2038 pg_autoctl users.
2039
2040 The ident map in pg_ident.conf on the pg_auto_failover monitor should
2041 then have the following entry, to allow postgres to connect as the au‐
2042 toctl_node user for pg_autoctl operations:
2043
2044 # MAPNAME SYSTEM-USERNAME PG-USERNAME
2045
2046 # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
2047 pgautofailover postgres autoctl_node
2048
2049 To enable streaming replication, the pg_ident.conf file on each Post‐
2050 gres node should now allow the postgres user in the client certificate
2051 to connect as the pgautofailover_replicator database user:
2052
2053 # MAPNAME SYSTEM-USERNAME PG-USERNAME
2054
2055 # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
2056 pgautofailover postgres pgautofailover_replicator
2057
2058 Given that user name map, you can then use the cert authentication
2059 method. As with the pg_ident.conf provisioning, it is best to now pro‐
2060 vision the HBA rules yourself, using the --skip-pg-hba option:
2061
2062 $ pg_autoctl create postgres --skip-pg-hba --ssl-ca-file ...
2063
2064 The HBA rule will use the authentication method cert with a map option,
2065 and might then look like the following on the monitor:
2066
2067 # allow certificate based authentication to the monitor
2068 hostssl pg_auto_failover autoctl_node 10.0.0.0/8 cert map=pgautofailover
2069
2070 Then your pg_auto_failover nodes on the 10.0.0.0 network are allowed to
2071 connect to the monitor with the user autoctl_node used by pg_autoctl,
2072 assuming they have a valid and trusted client certificate.
2073
2074 The HBA rule to use on the Postgres nodes to allow for Postgres stream‐
2075 ing replication connections looks like the following:
2076
2077 # allow streaming replication for pg_auto_failover nodes
2078 hostssl replication pgautofailover_replicator 10.0.0.0/8 cert map=pgautofailover
2079
2080 Because the Postgres server runs as the postgres system user, the con‐
2081 nection to the primary node can be made with SSL enabled and will then
2082 use the client certificates installed in the postgres home directory in
2083 ~/.postgresql/postgresql.{key,cert} locations.
2084
2085 Postgres HBA provisioning
2086 While pg_auto_failover knows how to manage the Postgres HBA rules that
2087 are necessary for your stream replication needs and for its monitor
2088 protocol, it will not manage the Postgres HBA rules that are needed for
2089 your applications.
2090
2091 If you have your own HBA provisioning solution, you can include the
2092 rules needed for pg_auto_failover and then use the --skip-pg-hba option
2093 to the pg_autoctl create commands.
2094
2095 Enable SSL connections on an existing setup
2096 Whether you upgrade pg_auto_failover from a previous version that did
2097 not have support for the SSL features, or when you started with
2098 --no-ssl and later change your mind, it is possible with
2099 pg_auto_failover to add SSL settings on system that has already been
2100 setup without explicit SSL support.
2101
2102 In this section we detail how to upgrade to SSL settings.
2103
2104 Installing Self-Signed certificates on-top of an already existing
2105 pg_auto_failover setup is done with one of the following pg_autoctl
2106 command variants, depending if you want self-signed certificates or
2107 fully verified ssl certificates:
2108
2109 $ pg_autoctl enable ssl --ssl-self-signed --ssl-mode required
2110
2111 $ pg_autoctl enable ssl --ssl-ca-file root.crt \
2112 --ssl-crl-file root.crl \
2113 --server-cert server.crt \
2114 --server-key server.key \
2115 --ssl-mode verify-full
2116
2117 The pg_autoctl enable ssl command edits the post‐
2118 gresql-auto-failover.conf Postgres configuration file to match the com‐
2119 mand line arguments given and enable SSL as instructed, and then up‐
2120 dates the pg_autoctl configuration.
2121
2122 The connection string to connect to the monitor is also automatically
2123 updated by the pg_autoctl enable ssl command. You can verify your new
2124 configuration with:
2125
2126 $ pg_autoctl config get pg_autoctl.monitor
2127
2128 Note that an already running pg_autoctl deamon will try to reload its
2129 configuration after pg_autoctl enable ssl has finished. In some cases
2130 this is not possible to do without a restart. So be sure to check the
2131 logs from a running daemon to confirm that the reload succeeded. If it
2132 did not you may need to restart the daemon to ensure the new connection
2133 string is used.
2134
2135 The HBA settings are not edited, irrespective of the --skip-pg-hba that
2136 has been used at creation time. That's because the host records match
2137 either SSL or non-SSL connection attempts in Postgres HBA file, so the
2138 pre-existing setup will continue to work. To enhance the SSL setup, you
2139 can manually edit the HBA files and change the existing lines from host
2140 to hostssl to dissallow unencrypted connections at the server side.
2141
2142 In summary, to upgrade an existing pg_auto_failover setup to enable
2143 SSL:
2144
2145 1. run the pg_autoctl enable ssl command on your monitor and then
2146 all the Postgres nodes,
2147
2148 2. on the Postgres nodes, review your pg_autoctl logs to make sure
2149 that the reload operation has been effective, and review your
2150 Postgres settings to verify that you have the expected result,
2151
2152 3. review your HBA rules setup to change the pg_auto_failover rules
2153 from host to hostssl to disallow insecure connections.
2154
2156 The pg_autoctl tool hosts many commands and sub-commands. Each of them
2157 have their own manual page.
2158
2159 pg_autoctl
2160 pg_autoctl - control a pg_auto_failover node
2161
2162 Synopsis
2163 pg_autoctl provides the following commands:
2164
2165 + create Create a pg_auto_failover node, or formation
2166 + drop Drop a pg_auto_failover node, or formation
2167 + config Manages the pg_autoctl configuration
2168 + show Show pg_auto_failover information
2169 + enable Enable a feature on a formation
2170 + disable Disable a feature on a formation
2171 + get Get a pg_auto_failover node, or formation setting
2172 + set Set a pg_auto_failover node, or formation setting
2173 + perform Perform an action orchestrated by the monitor
2174 run Run the pg_autoctl service (monitor or keeper)
2175 watch Display a dashboard to watch monitor's events and state
2176 stop signal the pg_autoctl service for it to stop
2177 reload signal the pg_autoctl for it to reload its configuration
2178 status Display the current status of the pg_autoctl service
2179 help print help message
2180 version print pg_autoctl version
2181
2182 pg_autoctl create
2183 monitor Initialize a pg_auto_failover monitor node
2184 postgres Initialize a pg_auto_failover standalone postgres node
2185 formation Create a new formation on the pg_auto_failover monitor
2186
2187 pg_autoctl drop
2188 monitor Drop the pg_auto_failover monitor
2189 node Drop a node from the pg_auto_failover monitor
2190 formation Drop a formation on the pg_auto_failover monitor
2191
2192 pg_autoctl config
2193 check Check pg_autoctl configuration
2194 get Get the value of a given pg_autoctl configuration variable
2195 set Set the value of a given pg_autoctl configuration variable
2196
2197 pg_autoctl show
2198 uri Show the postgres uri to use to connect to pg_auto_failover nodes
2199 events Prints monitor's state of nodes in a given formation and group
2200 state Prints monitor's state of nodes in a given formation and group
2201 settings Print replication settings for a formation from the monitor
2202 standby-names Prints synchronous_standby_names for a given group
2203 file List pg_autoctl internal files (config, state, pid)
2204 systemd Print systemd service file for this node
2205
2206 pg_autoctl enable
2207 secondary Enable secondary nodes on a formation
2208 maintenance Enable Postgres maintenance mode on this node
2209 ssl Enable SSL configuration on this node
2210
2211 pg_autoctl disable
2212 secondary Disable secondary nodes on a formation
2213 maintenance Disable Postgres maintenance mode on this node
2214 ssl Disable SSL configuration on this node
2215
2216 pg_autoctl get
2217 + node get a node property from the pg_auto_failover monitor
2218 + formation get a formation property from the pg_auto_failover monitor
2219
2220 pg_autoctl get node
2221 replication-quorum get replication-quorum property from the monitor
2222 candidate-priority get candidate property from the monitor
2223
2224 pg_autoctl get formation
2225 settings get replication settings for a formation from the monitor
2226 number-sync-standbys get number_sync_standbys for a formation from the monitor
2227
2228 pg_autoctl set
2229 + node set a node property on the monitor
2230 + formation set a formation property on the monitor
2231
2232 pg_autoctl set node
2233 metadata set metadata on the monitor
2234 replication-quorum set replication-quorum property on the monitor
2235 candidate-priority set candidate property on the monitor
2236
2237 pg_autoctl set formation
2238 number-sync-standbys set number-sync-standbys for a formation on the monitor
2239
2240 pg_autoctl perform
2241 failover Perform a failover for given formation and group
2242 switchover Perform a switchover for given formation and group
2243 promotion Perform a failover that promotes a target node
2244
2245 Description
2246 The pg_autoctl tool is the client tool provided by pg_auto_failover to
2247 create and manage Postgres nodes and the pg_auto_failover monitor node.
2248 The command is built with many sub-commands that each have their own
2249 manual page.
2250
2251 Help
2252 To get the full recursive list of supported commands, use:
2253
2254 pg_autoctl help
2255
2256 Version
2257 To grab the version of pg_autoctl that you're using, use:
2258
2259 pg_autoctl --version
2260 pg_autoctl version
2261
2262 A typical output would be:
2263
2264 pg_autoctl version 1.4.2
2265 pg_autoctl extension version 1.4
2266 compiled with PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
2267 compatible with Postgres 10, 11, 12, and 13
2268
2269 The version is also available as a JSON document when using the --json
2270 option:
2271
2272 pg_autoctl --version --json
2273 pg_autoctl version --json
2274
2275 A typical JSON output would be:
2276
2277 {
2278 "pg_autoctl": "1.4.2",
2279 "pgautofailover": "1.4",
2280 "pg_major": "12",
2281 "pg_version": "12.3",
2282 "pg_version_str": "PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit",
2283 "pg_version_num": 120003
2284 }
2285
2286 This is for version 1.4.2 of pg_auto_failover. This particular version
2287 of the pg_autoctl client tool has been compiled using libpq for Post‐
2288 greSQL 12.3 and is compatible with Postgres 10, 11, 12, and 13.
2289
2290 pg_autoctl create
2291 pg_autoctl create - Create a pg_auto_failover node, or formation
2292
2293 pg_autoctl create monitor
2294 pg_autoctl create monitor - Initialize a pg_auto_failover monitor node
2295
2296 Synopsis
2297 This command initializes a PostgreSQL cluster and installs the pgauto‐
2298 failover extension so that it's possible to use the new instance to
2299 monitor PostgreSQL services:
2300
2301 usage: pg_autoctl create monitor [ --pgdata --pgport --pgctl --hostname ]
2302
2303 --pgctl path to pg_ctl
2304 --pgdata path to data directory
2305 --pgport PostgreSQL's port number
2306 --hostname hostname by which postgres is reachable
2307 --auth authentication method for connections from data nodes
2308 --skip-pg-hba skip editing pg_hba.conf rules
2309 --run create node then run pg_autoctl service
2310 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
2311 --ssl-mode use that sslmode in connection strings
2312 --ssl-ca-file set the Postgres ssl_ca_file to that file path
2313 --ssl-crl-file set the Postgres ssl_crl_file to that file path
2314 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
2315 --server-key set the Postgres ssl_key_file to that file path
2316 --server-cert set the Postgres ssl_cert_file to that file path
2317
2318 Description
2319 The pg_autoctl tool is the client tool provided by pg_auto_failover to
2320 create and manage Postgres nodes and the pg_auto_failover monitor node.
2321 The command is built with many sub-commands that each have their own
2322 manual page.
2323
2324 Options
2325 The following options are available to pg_autoctl create monitor:
2326
2327 --pgctl
2328 Path to the pg_ctl tool to use for the version of PostgreSQL you
2329 want to use.
2330
2331 Defaults to the pg_ctl found in the PATH when there is a single
2332 entry for pg_ctl in the PATH. Check your setup using which -a
2333 pg_ctl.
2334
2335 When using an RPM based distribution such as RHEL or CentOS, the
2336 path would usually be /usr/pgsql-13/bin/pg_ctl for Postgres 13.
2337
2338 When using a debian based distribution such as debian or ubuntu,
2339 the path would usually be /usr/lib/postgresql/13/bin/pg_ctl for
2340 Postgres 13. Those distributions also use the package post‐
2341 gresql-common which provides /usr/bin/pg_config. This tool can
2342 be automatically used by pg_autoctl to discover the default ver‐
2343 sion of Postgres to use on your setup.
2344
2345 --pgdata
2346 Location where to initialize a Postgres database cluster, using
2347 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
2348 ment variable PGDATA.
2349
2350 --pgport
2351 Postgres port to use, defaults to 5432.
2352
2353 --hostname
2354 Hostname or IP address (both v4 and v6 are supported) to use
2355 from any other node to connect to this node.
2356
2357 When not provided, a default value is computed by running the
2358 following algorithm.
2359
2360 1. We get this machine's "public IP" by opening a connection
2361 to the 8.8.8.8:53 public service. Then we get TCP/IP
2362 client address that has been used to make that connection.
2363
2364 2. We then do a reverse DNS lookup on the IP address found in
2365 the previous step to fetch a hostname for our local ma‐
2366 chine.
2367
2368 3. If the reverse DNS lookup is successful , then pg_autoctl
2369 does a forward DNS lookup of that hostname.
2370
2371 When the forward DNS lookup response in step 3. is an IP address
2372 found in one of our local network interfaces, then pg_autoctl
2373 uses the hostname found in step 2. as the default --hostname.
2374 Otherwise it uses the IP address found in step 1.
2375
2376 You may use the --hostname command line option to bypass the
2377 whole DNS lookup based process and force the local node name to
2378 a fixed value.
2379
2380 --auth Authentication method used by pg_autoctl when editing the Post‐
2381 gres HBA file to open connections to other nodes. No default
2382 value, must be provided by the user. The value --trust is only a
2383 good choice for testing and evaluation of pg_auto_failover, see
2384 Security settings for pg_auto_failover for more information.
2385
2386 --skip-pg-hba
2387 When this option is used then pg_autoctl refrains from any edit‐
2388 ing of the Postgres HBA file. Please note that editing the HBA
2389 file is still needed so that other nodes can connect using ei‐
2390 ther read privileges or replication streaming privileges.
2391
2392 When --skip-pg-hba is used, pg_autoctl still outputs the HBA en‐
2393 tries it needs in the logs, it only skips editing the HBA file.
2394
2395 --run Immediately run the pg_autoctl service after having created this
2396 node.
2397
2398 --ssl-self-signed
2399 Generate SSL self-signed certificates to provide network encryp‐
2400 tion. This does not protect against man-in-the-middle kinds of
2401 attacks. See Security settings for pg_auto_failover for more
2402 about our SSL settings.
2403
2404 --ssl-mode
2405 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
2406 cluding when connecting for streaming replication.
2407
2408 --ssl-ca-file
2409 Set the Postgres ssl_ca_file to that file path.
2410
2411 --ssl-crl-file
2412 Set the Postgres ssl_crl_file to that file path.
2413
2414 --no-ssl
2415 Don't enable network encryption. This is not recommended, prefer
2416 --ssl-self-signed.
2417
2418 --server-key
2419 Set the Postgres ssl_key_file to that file path.
2420
2421 --server-cert
2422 Set the Postgres ssl_cert_file to that file path.
2423
2424 pg_autoctl create postgres
2425 pg_autoctl create postgres - Initialize a pg_auto_failover postgres
2426 node
2427
2428 Synopsis
2429 The command pg_autoctl create postgres initializes a standalone Post‐
2430 gres node to a pg_auto_failover monitor. The monitor is then handling
2431 auto-failover for this Postgres node (as soon as a secondary has been
2432 registered too, and is known to be healthy).
2433
2434 usage: pg_autoctl create postgres
2435
2436 --pgctl path to pg_ctl
2437 --pgdata path to data directory
2438 --pghost PostgreSQL's hostname
2439 --pgport PostgreSQL's port number
2440 --listen PostgreSQL's listen_addresses
2441 --username PostgreSQL's username
2442 --dbname PostgreSQL's database name
2443 --name pg_auto_failover node name
2444 --hostname hostname used to connect from the other nodes
2445 --formation pg_auto_failover formation
2446 --monitor pg_auto_failover Monitor Postgres URL
2447 --auth authentication method for connections from monitor
2448 --skip-pg-hba skip editing pg_hba.conf rules
2449 --pg-hba-lan edit pg_hba.conf rules for --dbname in detected LAN
2450 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
2451 --ssl-mode use that sslmode in connection strings
2452 --ssl-ca-file set the Postgres ssl_ca_file to that file path
2453 --ssl-crl-file set the Postgres ssl_crl_file to that file path
2454 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
2455 --server-key set the Postgres ssl_key_file to that file path
2456 --server-cert set the Postgres ssl_cert_file to that file path
2457 --candidate-priority priority of the node to be promoted to become primary
2458 --replication-quorum true if node participates in write quorum
2459 --maximum-backup-rate maximum transfer rate of data transferred from the server during initial sync
2460
2461 Description
2462 Three different modes of initialization are supported by this command,
2463 corresponding to as many implementation strategies.
2464
2465 1. Initialize a primary node from scratch
2466
2467 This happens when --pgdata (or the environment variable PGDATA)
2468 points to an non-existing or empty directory. Then the given
2469 --hostname is registered to the pg_auto_failover --monitor as a
2470 member of the --formation.
2471
2472 The monitor answers to the registration call with a state to as‐
2473 sign to the new member of the group, either SINGLE or
2474 WAIT_STANDBY. When the assigned state is SINGLE, then pg_autoctl
2475 create postgres proceedes to initialize a new PostgreSQL in‐
2476 stance.
2477
2478 2. Initialize an already existing primary server
2479
2480 This happens when --pgdata (or the environment variable PGDATA)
2481 points to an already existing directory that belongs to a Post‐
2482 greSQL instance. The standard PostgreSQL tool pg_controldata is
2483 used to recognize whether the directory belongs to a PostgreSQL
2484 instance.
2485
2486 In that case, the given --hostname is registered to the monitor
2487 in the tentative SINGLE state. When the given --formation and
2488 --group is currently empty, then the monitor accepts the regis‐
2489 tration and the pg_autoctl create prepares the already existing
2490 primary server for pg_auto_failover.
2491
2492 3. Initialize a secondary node from scratch
2493
2494 This happens when --pgdata (or the environment variable PGDATA)
2495 points to a non-existing or empty directory, and when the monitor
2496 registration call assigns the state WAIT_STANDBY in step 1.
2497
2498 In that case, the pg_autoctl create command steps through the
2499 initial states of registering a secondary server, which includes
2500 preparing the primary server PostgreSQL HBA rules and creating a
2501 replication slot.
2502
2503 When the command ends successfully, a PostgreSQL secondary server
2504 has been created with pg_basebackup and is now started, catch‐
2505 ing-up to the primary server.
2506
2507 4. Initialize a secondary node from an existing data directory
2508
2509 When the data directory pointed to by the option --pgdata or the
2510 environment variable PGDATA already exists, then pg_auto_failover
2511 verifies that the system identifier matches the one of the other
2512 nodes already existing in the same group.
2513
2514 The system identifier can be obtained with the command pg_con‐
2515 troldata. All nodes in a physical replication setting must have
2516 the same system identifier, and so in pg_auto_failover all the
2517 nodes in a same group have that constraint too.
2518
2519 When the system identifier matches the already registered system
2520 identifier of other nodes in the same group, then the node is
2521 set-up as a standby and Postgres is started with the primary con‐
2522 ninfo pointed at the current primary.
2523
2524 The --auth option allows setting up authentication method to be used
2525 when monitor node makes a connection to data node with pgauto‐
2526 failover_monitor user. As with the pg_autoctl create monitor command,
2527 you could use --auth trust when playing with pg_auto_failover at first
2528 and consider something production grade later. Also, consider using
2529 --skip-pg-hba if you already have your own provisioning tools with a
2530 security compliance process.
2531
2532 See Security settings for pg_auto_failover for notes on .pgpass
2533
2534 Options
2535 The following options are available to pg_autoctl create postgres:
2536
2537 --pgctl
2538 Path to the pg_ctl tool to use for the version of PostgreSQL you
2539 want to use.
2540
2541 Defaults to the pg_ctl found in the PATH when there is a single
2542 entry for pg_ctl in the PATH. Check your setup using which -a
2543 pg_ctl.
2544
2545 When using an RPM based distribution such as RHEL or CentOS, the
2546 path would usually be /usr/pgsql-13/bin/pg_ctl for Postgres 13.
2547
2548 When using a debian based distribution such as debian or ubuntu,
2549 the path would usually be /usr/lib/postgresql/13/bin/pg_ctl for
2550 Postgres 13. Those distributions also use the package post‐
2551 gresql-common which provides /usr/bin/pg_config. This tool can
2552 be automatically used by pg_autoctl to discover the default ver‐
2553 sion of Postgres to use on your setup.
2554
2555 --pgdata
2556 Location where to initialize a Postgres database cluster, using
2557 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
2558 ment variable PGDATA.
2559
2560 --pghost
2561 Hostname to use when connecting to the local Postgres instance
2562 from the pg_autoctl process. By default, this field is left
2563 blank in the connection string, allowing to use Unix Domain
2564 Sockets with the default path compiled in your libpq version,
2565 usually provided by the Operating System. That would be
2566 /var/run/postgresql when using debian or ubuntu.
2567
2568 --pgport
2569 Postgres port to use, defaults to 5432.
2570
2571 --listen
2572 PostgreSQL's listen_addresses to setup. At the moment only one
2573 address is supported in this command line option.
2574
2575 --username
2576 PostgreSQL's username to use when connecting to the local Post‐
2577 gres instance to manage it.
2578
2579 --dbname
2580 PostgreSQL's database name to use in your application. Defaults
2581 to being the same as the --username, or to postgres when none of
2582 those options are used.
2583
2584 --name Node name used on the monitor to refer to this node. The host‐
2585 name is a technical information, and given Postgres requirements
2586 on the HBA setup and DNS resolution (both forward and reverse
2587 lookups), IP addresses are often used for the hostname.
2588
2589 The --name option allows using a user-friendly name for your
2590 Postgres nodes.
2591
2592 --hostname
2593 Hostname or IP address (both v4 and v6 are supported) to use
2594 from any other node to connect to this node.
2595
2596 When not provided, a default value is computed by running the
2597 following algorithm.
2598
2599 1. We get this machine's "public IP" by opening a connection
2600 to the given monitor hostname or IP address. Then we get
2601 TCP/IP client address that has been used to make that con‐
2602 nection.
2603
2604 2. We then do a reverse DNS lookup on the IP address found in
2605 the previous step to fetch a hostname for our local ma‐
2606 chine.
2607
2608 3. If the reverse DNS lookup is successful , then pg_autoctl
2609 does a forward DNS lookup of that hostname.
2610
2611 When the forward DNS lookup response in step 3. is an IP address
2612 found in one of our local network interfaces, then pg_autoctl
2613 uses the hostname found in step 2. as the default --hostname.
2614 Otherwise it uses the IP address found in step 1.
2615
2616 You may use the --hostname command line option to bypass the
2617 whole DNS lookup based process and force the local node name to
2618 a fixed value.
2619
2620 --formation
2621 Formation to register the node into on the monitor. Defaults to
2622 the default formation, that is automatically created in the mon‐
2623 itor in the pg_autoctl create monitor command.
2624
2625 --monitor
2626 Postgres URI used to connect to the monitor. Must use the au‐
2627 toctl_node username and target the pg_auto_failover database
2628 name. It is possible to show the Postgres URI from the monitor
2629 node using the command pg_autoctl show uri.
2630
2631 --auth Authentication method used by pg_autoctl when editing the Post‐
2632 gres HBA file to open connections to other nodes. No default
2633 value, must be provided by the user. The value --trust is only a
2634 good choice for testing and evaluation of pg_auto_failover, see
2635 Security settings for pg_auto_failover for more information.
2636
2637 --skip-pg-hba
2638 When this option is used then pg_autoctl refrains from any edit‐
2639 ing of the Postgres HBA file. Please note that editing the HBA
2640 file is still needed so that other nodes can connect using ei‐
2641 ther read privileges or replication streaming privileges.
2642
2643 When --skip-pg-hba is used, pg_autoctl still outputs the HBA en‐
2644 tries it needs in the logs, it only skips editing the HBA file.
2645
2646 --pg-hba-lan
2647 When this option is used pg_autoctl determines the local IP ad‐
2648 dress used to connect to the monitor, and retrieves its netmask,
2649 and uses that to compute your local area network CIDR. This CIDR
2650 is then opened for connections in the Postgres HBA rules.
2651
2652 For instance, when the monitor resolves to 192.168.0.1 and your
2653 local Postgres node uses an inferface with IP address
2654 192.168.0.2/255.255.255.0 to connect to the monitor, then the
2655 LAN CIDR is computed to be 192.168.0.0/24.
2656
2657 --candidate-priority
2658 Sets this node replication setting for candidate priority to the
2659 given value (between 0 and 100) at node registration on the mon‐
2660 itor. Defaults to 50.
2661
2662 --replication-quorum
2663 Sets this node replication setting for replication quorum to the
2664 given value (either true or false) at node registration on the
2665 monitor. Defaults to true, which enables synchronous replica‐
2666 tion.
2667
2668 --maximum-backup-rate
2669 Sets the maximum transfer rate of data transferred from the
2670 server during initial sync. This is used by pg_basebackup. De‐
2671 faults to 100M.
2672
2673 --run Immediately run the pg_autoctl service after having created this
2674 node.
2675
2676 --ssl-self-signed
2677 Generate SSL self-signed certificates to provide network encryp‐
2678 tion. This does not protect against man-in-the-middle kinds of
2679 attacks. See Security settings for pg_auto_failover for more
2680 about our SSL settings.
2681
2682 --ssl-mode
2683 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
2684 cluding when connecting for streaming replication.
2685
2686 --ssl-ca-file
2687 Set the Postgres ssl_ca_file to that file path.
2688
2689 --ssl-crl-file
2690 Set the Postgres ssl_crl_file to that file path.
2691
2692 --no-ssl
2693 Don't enable network encryption. This is not recommended, prefer
2694 --ssl-self-signed.
2695
2696 --server-key
2697 Set the Postgres ssl_key_file to that file path.
2698
2699 --server-cert
2700 Set the Postgres ssl_cert_file to that file path.
2701
2702 pg_autoctl create formation
2703 pg_autoctl create formation - Create a new formation on the
2704 pg_auto_failover monitor
2705
2706 Synopsis
2707 This command registers a new formation on the monitor, with the speci‐
2708 fied kind:
2709
2710 usage: pg_autoctl create formation [ --pgdata --monitor --formation --kind --dbname --with-secondary --without-secondary ]
2711
2712 --pgdata path to data directory
2713 --monitor pg_auto_failover Monitor Postgres URL
2714 --formation name of the formation to create
2715 --kind formation kind, either "pgsql" or "citus"
2716 --dbname name for postgres database to use in this formation
2717 --enable-secondary create a formation that has multiple nodes that can be
2718 used for fail over when others have issues
2719 --disable-secondary create a citus formation without nodes to fail over to
2720 --number-sync-standbys minimum number of standbys to confirm write
2721
2722 Description
2723 A single pg_auto_failover monitor may manage any number of formations,
2724 each composed of at least one Postgres service group. This commands
2725 creates a new formation so that it is then possible to register Post‐
2726 gres nodes in the new formation.
2727
2728 Options
2729 The following options are available to pg_autoctl create formation:
2730
2731 --pgdata
2732 Location where to initialize a Postgres database cluster, using
2733 either pg_ctl initdb or pg_basebackup. Defaults to the environ‐
2734 ment variable PGDATA.
2735
2736 --monitor
2737 Postgres URI used to connect to the monitor. Must use the au‐
2738 toctl_node username and target the pg_auto_failover database
2739 name. It is possible to show the Postgres URI from the monitor
2740 node using the command pg_autoctl show uri.
2741
2742 --formation
2743 Name of the formation to create.
2744
2745 --kind A pg_auto_failover formation could be of kind pgsql or of kind
2746 citus. At the moment citus formation kinds are not managed in
2747 the Open Source version of pg_auto_failover.
2748
2749 --dbname
2750 Name of the database to use in the formation, mostly useful to
2751 formation kinds citus where the Citus extension is only in‐
2752 stalled in a single target database.
2753
2754 --enable-secondary
2755 The formation to be created allows using standby nodes. Defaults
2756 to true. Mostly useful for Citus formations.
2757
2758 --disable-secondary
2759 See --enable-secondary above.
2760
2761 --number-sync-standby
2762 Postgres streaming replication uses synchronous_standby_names to
2763 setup how many standby nodes should have received a copy of the
2764 transaction data. When using pg_auto_failover this setup is han‐
2765 dled at the formation level.
2766
2767 Defaults to zero when creating the first two Postgres nodes in a
2768 formation in the same group. When set to zero pg_auto_failover
2769 uses synchronous replication only when a standby node is avail‐
2770 able: the idea is to allow failover, this setting does not allow
2771 proper HA for Postgres.
2772
2773 When adding a third node that participates in the quorum (one
2774 primary, two secondaries), the setting is automatically changed
2775 from zero to one.
2776
2777 pg_autoctl drop
2778 pg_autoctl drop - Drop a pg_auto_failover node, or formation
2779
2780 pg_autoctl drop monitor
2781 pg_autoctl drop monitor - Drop the pg_auto_failover monitor
2782
2783 Synopsis
2784 This command allows to review all the replication settings of a given
2785 formation (defaults to 'default' as usual):
2786
2787 usage: pg_autoctl drop monitor [ --pgdata --destroy ]
2788
2789 --pgdata path to data directory
2790 --destroy also destroy Postgres database
2791
2792 Options
2793 --pgdata
2794 Location of the Postgres node being managed locally. Defaults to
2795 the environment variable PGDATA. Use --monitor to connect to a
2796 monitor from anywhere, rather than the monitor URI used by a lo‐
2797 cal Postgres node managed with pg_autoctl.
2798
2799 --destroy
2800 By default the pg_autoctl drop monitor commands does not remove
2801 the Postgres database for the monitor. When using --destroy, the
2802 Postgres installation is also deleted.
2803
2804 pg_autoctl drop node
2805 pg_autoctl drop node - Drop a node from the pg_auto_failover monitor
2806
2807 Synopsis
2808 This command drops a Postgres node from the pg_auto_failover monitor:
2809
2810 usage: pg_autoctl drop node [ [ [ --pgdata ] [ --destroy ] ] | [ --monitor [ [ --hostname --pgport ] | [ --formation --name ] ] ] ]
2811
2812 --pgdata path to data directory
2813 --monitor pg_auto_failover Monitor Postgres URL
2814 --formation pg_auto_failover formation
2815 --name drop the node with the given node name
2816 --hostname drop the node with given hostname and pgport
2817 --pgport drop the node with given hostname and pgport
2818 --destroy also destroy Postgres database
2819 --force force dropping the node from the monitor
2820 --wait how many seconds to wait, default to 60
2821
2822 Description
2823 Two modes of operations are implemented in the pg_autoctl drop node
2824 command.
2825
2826 When removing a node that still exists, it is possible to use pg_au‐
2827 toctl drop node --destroy to remove the node both from the monitor and
2828 also delete the local Postgres instance entirely.
2829
2830 When removing a node that doesn't exist physically anymore, or when the
2831 VM that used to host the node has been lost entirely, use either the
2832 pair of options --hostname and --pgport or the pair of options --forma‐
2833 tion and --name to match the node registration record on the monitor
2834 database, and get it removed from the known list of nodes on the moni‐
2835 tor.
2836
2837 Then option --force can be used when the target node to remove does not
2838 exist anymore. When a node has been lost entirely, it's not going to be
2839 able to finish the procedure itself, and it is then possible to in‐
2840 struct the monitor of the situation.
2841
2842 Options
2843 --pgdata
2844 Location of the Postgres node being managed locally. Defaults to
2845 the environment variable PGDATA. Use --monitor to connect to a
2846 monitor from anywhere, rather than the monitor URI used by a lo‐
2847 cal Postgres node managed with pg_autoctl.
2848
2849 --monitor
2850 Postgres URI used to connect to the monitor. Must use the au‐
2851 toctl_node username and target the pg_auto_failover database
2852 name. It is possible to show the Postgres URI from the monitor
2853 node using the command pg_autoctl show uri.
2854
2855 --hostname
2856 Hostname of the Postgres node to remove from the monitor. Use
2857 either --name or --hostname --pgport, but not both.
2858
2859 --pgport
2860 Port of the Postgres node to remove from the monitor. Use either
2861 --name or --hostname --pgport, but not both.
2862
2863 --name Name of the node to remove from the monitor. Use either --name
2864 or --hostname --pgport, but not both.
2865
2866 --destroy
2867 By default the pg_autoctl drop monitor commands does not remove
2868 the Postgres database for the monitor. When using --destroy, the
2869 Postgres installation is also deleted.
2870
2871 --force
2872 By default a node is expected to reach the assigned state
2873 DROPPED when it is removed from the monitor, and has the oppor‐
2874 tunity to implement clean-up actions. When the target node to
2875 remove is not available anymore, it is possible to use the op‐
2876 tion --force to immediately remove the node from the monitor.
2877
2878 --wait How many seconds to wait for the node to be dropped entirely.
2879 The command stops when the target node is not to be found on the
2880 monitor anymore, or when the timeout has elapsed, whichever
2881 comes first. The value 0 (zero) disables the timeout and dis‐
2882 ables waiting entirely, making the command async.
2883
2884 Examples
2885 $ pg_autoctl drop node --destroy --pgdata ./node3
2886 17:52:21 54201 INFO Reaching assigned state "secondary"
2887 17:52:21 54201 INFO Removing node with name "node3" in formation "default" from the monitor
2888 17:52:21 54201 WARN Postgres is not running and we are in state secondary
2889 17:52:21 54201 WARN Failed to update the keeper's state from the local PostgreSQL instance, see above for details.
2890 17:52:21 54201 INFO Calling node_active for node default/4/0 with current state: PostgreSQL is running is false, sync_state is "", latest WAL LSN is 0/0.
2891 17:52:21 54201 INFO FSM transition to "dropped": This node is being dropped from the monitor
2892 17:52:21 54201 INFO Transition complete: current state is now "dropped"
2893 17:52:21 54201 INFO This node with id 4 in formation "default" and group 0 has been dropped from the monitor
2894 17:52:21 54201 INFO Stopping PostgreSQL at "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
2895 17:52:21 54201 INFO /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl --pgdata /Users/dim/dev/MS/pg_auto_failover/tmux/node3 --wait stop --mode fast
2896 17:52:21 54201 INFO /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl status -D /Users/dim/dev/MS/pg_auto_failover/tmux/node3 [3]
2897 17:52:21 54201 INFO pg_ctl: no server running
2898 17:52:21 54201 INFO pg_ctl stop failed, but PostgreSQL is not running anyway
2899 17:52:21 54201 INFO Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
2900 17:52:21 54201 INFO Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node3/pg_autoctl.cfg"
2901
2902 pg_autoctl drop formation
2903 pg_autoctl drop formation - Drop a formation on the pg_auto_failover
2904 monitor
2905
2906 Synopsis
2907 This command drops an existing formation on the monitor:
2908
2909 usage: pg_autoctl drop formation [ --pgdata --formation ]
2910
2911 --pgdata path to data directory
2912 --monitor pg_auto_failover Monitor Postgres URL
2913 --formation name of the formation to drop
2914
2915 Options
2916 --pgdata
2917 Location of the Postgres node being managed locally. Defaults to
2918 the environment variable PGDATA. Use --monitor to connect to a
2919 monitor from anywhere, rather than the monitor URI used by a lo‐
2920 cal Postgres node managed with pg_autoctl.
2921
2922 --monitor
2923 Postgres URI used to connect to the monitor. Must use the au‐
2924 toctl_node username and target the pg_auto_failover database
2925 name. It is possible to show the Postgres URI from the monitor
2926 node using the command pg_autoctl show uri.
2927
2928 --formation
2929 Name of the formation to drop from the monitor.
2930
2931 pg_autoctl config
2932 pg_autoctl config - Manages the pg_autoctl configuration
2933
2934 pg_autoctl config get
2935 pg_autoctl config get - Get the value of a given pg_autoctl configura‐
2936 tion variable
2937
2938 Synopsis
2939 This command prints a pg_autoctl configuration setting:
2940
2941 usage: pg_autoctl config get [ --pgdata ] [ --json ] [ section.option ]
2942
2943 --pgdata path to data directory
2944
2945 Options
2946 --pgdata
2947 Location of the Postgres node being managed locally. Defaults to
2948 the environment variable PGDATA. Use --monitor to connect to a
2949 monitor from anywhere, rather than the monitor URI used by a lo‐
2950 cal Postgres node managed with pg_autoctl.
2951
2952 --json Output JSON formated data.
2953
2954 Description
2955 When the argument section.option is used, this is the name of a config‐
2956 uration ooption. The configuration file for pg_autoctl is stored using
2957 the INI format.
2958
2959 When no argument is given to pg_autoctl config get the entire configu‐
2960 ration file is given in the output. To figure out where the configura‐
2961 tion file is stored, see pg_autoctl show file and use pg_autoctl show
2962 file --config.
2963
2964 Examples
2965 Without arguments, we get the entire file:
2966
2967 $ pg_autoctl config get --pgdata node1
2968 [pg_autoctl]
2969 role = keeper
2970 monitor = postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
2971 formation = default
2972 group = 0
2973 name = node1
2974 hostname = localhost
2975 nodekind = standalone
2976
2977 [postgresql]
2978 pgdata = /Users/dim/dev/MS/pg_auto_failover/tmux/node1
2979 pg_ctl = /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
2980 dbname = demo
2981 host = /tmp
2982 port = 5501
2983 proxyport = 0
2984 listen_addresses = *
2985 auth_method = trust
2986 hba_level = app
2987
2988 [ssl]
2989 active = 1
2990 sslmode = require
2991 cert_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.crt
2992 key_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.key
2993
2994 [replication]
2995 maximum_backup_rate = 100M
2996 backup_directory = /Users/dim/dev/MS/pg_auto_failover/tmux/backup/node_1
2997
2998 [timeout]
2999 network_partition_timeout = 20
3000 prepare_promotion_catchup = 30
3001 prepare_promotion_walreceiver = 5
3002 postgresql_restart_failure_timeout = 20
3003 postgresql_restart_failure_max_retries = 3
3004
3005 It is possible to pipe JSON formated output to the jq command line and
3006 filter the result down to a specific section of the file:
3007
3008 $ pg_autoctl config get --pgdata node1 --json | jq .pg_autoctl
3009 {
3010 "role": "keeper",
3011 "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
3012 "formation": "default",
3013 "group": 0,
3014 "name": "node1",
3015 "hostname": "localhost",
3016 "nodekind": "standalone"
3017 }
3018
3019 Finally, a single configuration element can be listed:
3020
3021 $ pg_autoctl config get --pgdata node1 ssl.sslmode --json
3022 require
3023
3024 pg_autoctl config set
3025 pg_autoctl config set - Set the value of a given pg_autoctl configura‐
3026 tion variable
3027
3028 Synopsis
3029 This command prints a pg_autoctl configuration setting:
3030
3031 usage: pg_autoctl config set [ --pgdata ] [ --json ] section.option [ value ]
3032
3033 --pgdata path to data directory
3034
3035 Options
3036 --pgdata
3037 Location of the Postgres node being managed locally. Defaults to
3038 the environment variable PGDATA. Use --monitor to connect to a
3039 monitor from anywhere, rather than the monitor URI used by a lo‐
3040 cal Postgres node managed with pg_autoctl.
3041
3042 --json Output JSON formated data.
3043
3044 Description
3045 This commands allows to set a pg_autoctl configuration setting to a new
3046 value. Most settings are possible to change and can be reloaded online.
3047
3048 Some of those commands can then be applied with a pg_autoctl reload
3049 command to an already running process.
3050
3051 Settings
3052 pg_autoctl.role
3053 This setting can not be changed. It can be either monitor or keeper
3054 and the rest of the configuration file is read depending on this
3055 value.
3056
3057 pg_autoctl.monitor
3058 URI of the pg_autoctl monitor Postgres service. Can be changed with
3059 a reload.
3060
3061 To register an existing node to a new monitor, use pg_autoctl dis‐
3062 able monitor and then pg_autoctl enable monitor.
3063
3064 pg_autoctl.formation
3065 Formation to which this node has been registered. Changing this set‐
3066 ting is not supported.
3067
3068 pg_autoctl.group
3069 Group in which this node has been registered. Changing this setting
3070 is not supported.
3071
3072 pg_autoctl.name
3073 Name of the node as known to the monitor and listed in pg_autoctl
3074 show state. Can be changed with a reload.
3075
3076 pg_autoctl.hostname
3077 Hostname or IP address of the node, as known to the monitor. Can be
3078 changed with a reload.
3079
3080 pg_autoctl.nodekind
3081 This setting can not be changed and depends on the command that has
3082 been used to create this pg_autoctl node.
3083
3084 postgresql.pgdata
3085 Directory where the managed Postgres instance is to be created (or
3086 found) and managed. Can't be changed.
3087
3088 postgresql.pg_ctl
3089 Path to the pg_ctl tool used to manage this Postgres instance. Ab‐
3090 solute path depends on the major version of Postgres and looks like
3091 /usr/lib/postgresql/13/bin/pg_ctl when using a debian or ubuntu OS.
3092
3093 Can be changed after a major upgrade of Postgres.
3094
3095 postgresql.dbname
3096 Name of the database that is used to connect to Postgres. Can be
3097 changed, but then must be changed manually on the monitor's pgauto‐
3098 failover.formation table with a SQL command.
3099
3100 WARNING:
3101 When using pg_auto_failover enterprise edition with Citus sup‐
3102 port, this is the database where pg_autoctl maintains the list
3103 of Citus nodes on the coordinator. Using the same database name
3104 as your application that uses Citus is then crucial.
3105
3106 postgresql.host
3107 Hostname to use in connection strings when connecting from the local
3108 pg_autoctl process to the local Postgres database. Defaults to using
3109 the Operating System default value for the Unix Domain Socket direc‐
3110 tory, either /tmp or when using debian or ubuntu /var/run/post‐
3111 gresql.
3112
3113 Can be changed with a reload.
3114
3115 postgresql.port
3116 Port on which Postgres should be managed. Can be changed offline,
3117 between a pg_autoctl stop and a subsequent pg_autoctl start.
3118
3119 postgresql.listen_addresses
3120 Value to set to Postgres parameter of the same name. At the moment
3121 pg_autoctl only supports a single address for this parameter.
3122
3123 postgresql.auth_method
3124 Authentication method to use when editing HBA rules to allow the
3125 Postgres nodes of a formation to connect to each other, and to the
3126 monitor, and to allow the monitor to connect to the nodes.
3127
3128 Can be changed online with a reload, but actually adding new HBA
3129 rules requires a restart of the "node-active" service.
3130
3131 postgresql.hba_level
3132 This setting reflects the choice of --skip-pg-hba or --pg-hba-lan
3133 that has been used when creating this pg_autoctl node. Can be
3134 changed with a reload, though the HBA rules that have been previ‐
3135 ously added will not get removed.
3136
3137 ssl.active, ssl.sslmode, ssl.cert_file, ssl.key_file, etc
3138 Please use the command pg_autoctl enable ssl or pg_autoctl disable
3139 ssl to manage the SSL settings in the ssl section of the configura‐
3140 tion. Using those commands, the settings can be changed online.
3141
3142 replication.maximum_backup_rate
3143 Used as a parameter to pg_basebackup, defaults to 100M. Can be
3144 changed with a reload. Changing this value does not affect an al‐
3145 ready running pg_basebackup command.
3146
3147 Limiting the bandwidth used by pg_basebackup makes the operation
3148 slower, and still has the advantage of limiting the impact on the
3149 disks of the primary server.
3150
3151 replication.backup_directory
3152 Target location of the pg_basebackup command used by pg_autoctl when
3153 creating a secondary node. When done with fetching the data over the
3154 network, then pg_autoctl uses the rename(2) system-call to rename
3155 the temporary download location to the target PGDATA location.
3156
3157 The rename(2) system-call is known to be atomic when both the source
3158 and the target of the operation are using the same file system /
3159 mount point.
3160
3161 Can be changed online with a reload, will not affect already running
3162 pg_basebackup sub-processes.
3163
3164 replication.password
3165 Used as a parameter in the connection string to the upstream Post‐
3166 gres node. The "replication" connection uses the password set-up in
3167 the pg_autoctl configuration file.
3168
3169 Changing the replication.password of a pg_autoctl configuration has
3170 no effect on the Postgres database itself. The password must match
3171 what the Postgres upstream node expects, which can be set with the
3172 following SQL command run on the upstream server (primary or other
3173 standby node):
3174
3175 alter user pgautofailover_replicator password 'h4ckm3m0r3';
3176
3177 The replication.password can be changed online with a reload, but
3178 requires restarting the Postgres service to be activated. Postgres
3179 only reads the primary_conninfo connection string at start-up, up to
3180 and including Postgres 12. With Postgres 13 and following, it is
3181 possible to reload this Postgres paramater.
3182
3183 timeout.network_partition_timeout
3184 Timeout (in seconds) that pg_autoctl waits before deciding that it
3185 is on the losing side of a network partition. When pg_autoctl fails
3186 to connect to the monitor and when the local Postgres instance
3187 pg_stat_replication system view is empty, and after this many sec‐
3188 onds have passed, then pg_autoctl demotes itself.
3189
3190 Can be changed with a reload.
3191
3192 timeout.prepare_promotion_catchup
3193 Currently not used in the source code. Can be changed with a reload.
3194
3195 timeout.prepare_promotion_walreceiver
3196 Currently not used in the source code. Can be changed with a reload.
3197
3198 timeout.postgresql_restart_failure_timeout
3199 When pg_autoctl fails to start Postgres for at least this duration
3200 from the first attempt, then it starts reporting that Postgres is
3201 not running to the monitor, which might then decide to implement a
3202 failover.
3203
3204 Can be changed with a reload.
3205
3206 timeout.postgresql_restart_failure_max_retries
3207 When pg_autoctl fails to start Postgres for at least this many times
3208 then it starts reporting that Postgres is not running to the moni‐
3209 tor, which them might decide to implement a failover.
3210
3211 Can be changed with a reload.
3212
3213 pg_autoctl config check
3214 pg_autoctl config check - Check pg_autoctl configuration
3215
3216 Synopsis
3217 This command implements a very basic list of sanity checks for a pg_au‐
3218 toctl node setup:
3219
3220 usage: pg_autoctl config check [ --pgdata ] [ --json ]
3221
3222 --pgdata path to data directory
3223 --json output data in the JSON format
3224
3225 Options
3226 --pgdata
3227 Location of the Postgres node being managed locally. Defaults to
3228 the environment variable PGDATA. Use --monitor to connect to a
3229 monitor from anywhere, rather than the monitor URI used by a lo‐
3230 cal Postgres node managed with pg_autoctl.
3231
3232 --json Output JSON formated data.
3233
3234 Examples
3235 $ pg_autoctl config check --pgdata node1
3236 18:37:27 63749 INFO Postgres setup for PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" is ok, running with PID 5501 and port 99698
3237 18:37:27 63749 INFO Connection to local Postgres ok, using "port=5501 dbname=demo host=/tmp"
3238 18:37:27 63749 INFO Postgres configuration settings required for pg_auto_failover are ok
3239 18:37:27 63749 WARN Postgres 12.1 does not support replication slots on a standby node
3240 18:37:27 63749 INFO Connection to monitor ok, using "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer"
3241 18:37:27 63749 INFO Monitor is running version "1.5.0.1", as expected
3242 pgdata: /Users/dim/dev/MS/pg_auto_failover/tmux/node1
3243 pg_ctl: /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
3244 pg_version: 12.3
3245 pghost: /tmp
3246 pgport: 5501
3247 proxyport: 0
3248 pid: 99698
3249 is in recovery: no
3250 Control Version: 1201
3251 Catalog Version: 201909212
3252 System Identifier: 6941034382470571312
3253 Latest checkpoint LSN: 0/6000098
3254 Postmaster status: ready
3255
3256 pg_autoctl show
3257 pg_autoctl show - Show pg_auto_failover information
3258
3259 pg_autoctl show uri
3260 pg_autoctl show uri - Show the postgres uri to use to connect to
3261 pg_auto_failover nodes
3262
3263 Synopsis
3264 This command outputs the monitor or the coordinator Postgres URI to use
3265 from an application to connect to Postgres:
3266
3267 usage: pg_autoctl show uri [ --pgdata --monitor --formation --json ]
3268
3269 --pgdata path to data directory
3270 --monitor monitor uri
3271 --formation show the coordinator uri of given formation
3272 --json output data in the JSON format
3273
3274 Options
3275 --pgdata
3276 Location of the Postgres node being managed locally. Defaults to
3277 the environment variable PGDATA. Use --monitor to connect to a
3278 monitor from anywhere, rather than the monitor URI used by a lo‐
3279 cal Postgres node managed with pg_autoctl.
3280
3281 --monitor
3282 Postgres URI used to connect to the monitor. Must use the au‐
3283 toctl_node username and target the pg_auto_failover database
3284 name. It is possible to show the Postgres URI from the monitor
3285 node using the command pg_autoctl show uri.
3286
3287 Defaults to the value of the environment variable PG_AU‐
3288 TOCTL_MONITOR.
3289
3290 --formation
3291 When --formation is used, lists the Postgres URIs of all known
3292 formations on the monitor.
3293
3294 --json Output a JSON formated data instead of a table formatted list.
3295
3296 Examples
3297 $ pg_autoctl show uri
3298 Type | Name | Connection String
3299 -------------+---------+-------------------------------
3300 monitor | monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover
3301 formation | default | postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer
3302
3303 $ pg_autoctl show uri --formation monitor
3304 postgres://autoctl_node@localhost:5500/pg_auto_failover
3305
3306 $ pg_autoctl show uri --formation default
3307 postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer
3308
3309 $ pg_autoctl show uri --json
3310 [
3311 {
3312 "uri": "postgres://autoctl_node@localhost:5500/pg_auto_failover",
3313 "name": "monitor",
3314 "type": "monitor"
3315 },
3316 {
3317 "uri": "postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer",
3318 "name": "default",
3319 "type": "formation"
3320 }
3321 ]
3322
3323 Multi-hosts Postgres connection strings
3324 PostgreSQL since version 10 includes support for multiple hosts in its
3325 connection driver libpq, with the special target_session_attrs connec‐
3326 tion property.
3327
3328 This multi-hosts connection string facility allows applications to keep
3329 using the same stable connection string over server-side failovers.
3330 That's why pg_autoctl show uri uses that format.
3331
3332 pg_autoctl show events
3333 pg_autoctl show events - Prints monitor's state of nodes in a given
3334 formation and group
3335
3336 Synopsis
3337 This command outputs the events that the pg_auto_failover events
3338 records about state changes of the pg_auto_failover nodes managed by
3339 the monitor:
3340
3341 usage: pg_autoctl show events [ --pgdata --formation --group --count ]
3342
3343 --pgdata path to data directory
3344 --monitor pg_auto_failover Monitor Postgres URL
3345 --formation formation to query, defaults to 'default'
3346 --group group to query formation, defaults to all
3347 --count how many events to fetch, defaults to 10
3348 --watch display an auto-updating dashboard
3349 --json output data in the JSON format
3350
3351 Options
3352 --pgdata
3353 Location of the Postgres node being managed locally. Defaults to
3354 the environment variable PGDATA. Use --monitor to connect to a
3355 monitor from anywhere, rather than the monitor URI used by a lo‐
3356 cal Postgres node managed with pg_autoctl.
3357
3358 --monitor
3359 Postgres URI used to connect to the monitor. Must use the au‐
3360 toctl_node username and target the pg_auto_failover database
3361 name. It is possible to show the Postgres URI from the monitor
3362 node using the command pg_autoctl show uri.
3363
3364 --formation
3365 List the events recorded for nodes in the given formation. De‐
3366 faults to default.
3367
3368 --count
3369 By default only the last 10 events are printed.
3370
3371 --watch
3372 Take control of the terminal and display the current state of
3373 the system and the last events from the monitor. The display is
3374 updated automatically every 500 milliseconds (half a second) and
3375 reacts properly to window size change.
3376
3377 Depending on the terminal window size, a different set of col‐
3378 umns is visible in the state part of the output. See pg_autoctl
3379 watch.
3380
3381 --json Output a JSON formated data instead of a table formatted list.
3382
3383 Examples
3384 $ pg_autoctl show events --count 2 --json
3385 [
3386 {
3387 "nodeid": 1,
3388 "eventid": 15,
3389 "groupid": 0,
3390 "nodehost": "localhost",
3391 "nodename": "node1",
3392 "nodeport": 5501,
3393 "eventtime": "2021-03-18T12:32:36.103467+01:00",
3394 "goalstate": "primary",
3395 "description": "Setting goal state of node 1 \"node1\" (localhost:5501) to primary now that at least one secondary candidate node is healthy.",
3396 "formationid": "default",
3397 "reportedlsn": "0/4000060",
3398 "reportedstate": "wait_primary",
3399 "reportedrepstate": "async",
3400 "candidatepriority": 50,
3401 "replicationquorum": true
3402 },
3403 {
3404 "nodeid": 1,
3405 "eventid": 16,
3406 "groupid": 0,
3407 "nodehost": "localhost",
3408 "nodename": "node1",
3409 "nodeport": 5501,
3410 "eventtime": "2021-03-18T12:32:36.215494+01:00",
3411 "goalstate": "primary",
3412 "description": "New state is reported by node 1 \"node1\" (localhost:5501): \"primary\"",
3413 "formationid": "default",
3414 "reportedlsn": "0/4000110",
3415 "reportedstate": "primary",
3416 "reportedrepstate": "quorum",
3417 "candidatepriority": 50,
3418 "replicationquorum": true
3419 }
3420 ]
3421
3422 pg_autoctl show state
3423 pg_autoctl show state - Prints monitor's state of nodes in a given for‐
3424 mation and group
3425
3426 Synopsis
3427 This command outputs the current state of the formation and groups reg‐
3428 istered to the pg_auto_failover monitor:
3429
3430 usage: pg_autoctl show state [ --pgdata --formation --group ]
3431
3432 --pgdata path to data directory
3433 --monitor pg_auto_failover Monitor Postgres URL
3434 --formation formation to query, defaults to 'default'
3435 --group group to query formation, defaults to all
3436 --local show local data, do not connect to the monitor
3437 --watch display an auto-updating dashboard
3438 --json output data in the JSON format
3439
3440 Options
3441 --pgdata
3442 Location of the Postgres node being managed locally. Defaults to
3443 the environment variable PGDATA. Use --monitor to connect to a
3444 monitor from anywhere, rather than the monitor URI used by a lo‐
3445 cal Postgres node managed with pg_autoctl.
3446
3447 --monitor
3448 Postgres URI used to connect to the monitor. Must use the au‐
3449 toctl_node username and target the pg_auto_failover database
3450 name. It is possible to show the Postgres URI from the monitor
3451 node using the command pg_autoctl show uri.
3452
3453 --formation
3454 List the events recorded for nodes in the given formation. De‐
3455 faults to default.
3456
3457 --group
3458 Limit output to a single group in the formation. Default to in‐
3459 cluding all groups registered in the target formation.
3460
3461 --local
3462 Print the local state information without connecting to the mon‐
3463 itor.
3464
3465 --watch
3466 Take control of the terminal and display the current state of
3467 the system and the last events from the monitor. The display is
3468 updated automatically every 500 milliseconds (half a second) and
3469 reacts properly to window size change.
3470
3471 Depending on the terminal window size, a different set of col‐
3472 umns is visible in the state part of the output. See pg_autoctl
3473 watch.
3474
3475 --json Output a JSON formated data instead of a table formatted list.
3476
3477 Description
3478 The pg_autoctl show state output includes the following columns:
3479
3480 • Name
3481 Name of the node.
3482
3483 • Node
3484 Node information. When the formation has a single group (group
3485 zero), then this column only contains the nodeId.
3486
3487 Only Citus formations allow several groups. When using a Citus
3488 formation the Node column contains the groupId and the nodeId,
3489 separated by a colon, such as 0:1 for the first coordinator
3490 node.
3491
3492 • Host:Port
3493 Hostname and port number used to connect to the node.
3494
3495 • TLI: LSN
3496 Timeline identifier (TLI) and Postgres Log Sequence Number
3497 (LSN).
3498
3499 The LSN is the current position in the Postgres WAL stream.
3500 This is a hexadecimal number. See pg_lsn for more information.
3501
3502 The current timeline is incremented each time a failover hap‐
3503 pens, or when doing Point In Time Recovery. A node can only
3504 reach the secondary state when it is on the same timeline as
3505 its primary node.
3506
3507 • Connection
3508 This output field contains two bits of information. First, the
3509 Postgres connection type that the node provides, either
3510 read-write or read-only. Then the mark ! is added when the
3511 monitor has failed to connect to this node, and ? when the
3512 monitor didn't connect to the node yet.
3513
3514 • Reported State
3515 The latest reported FSM state, as reported to the monitor by
3516 the pg_autoctl process running on the Postgres node.
3517
3518 • Assigned State
3519 The assigned FSM state on the monitor. When the assigned state
3520 is not the same as the reported start, then the pg_autoctl
3521 process running on the Postgres node might have not retrieved
3522 the assigned state yet, or might still be implementing the FSM
3523 transition from the current state to the assigned state.
3524
3525 Examples
3526 $ pg_autoctl show state
3527 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
3528 ------+-------+----------------+----------------+--------------+---------------------+--------------------
3529 node1 | 1 | localhost:5501 | 1: 0/4000678 | read-write | primary | primary
3530 node2 | 2 | localhost:5502 | 1: 0/4000678 | read-only | secondary | secondary
3531 node3 | 3 | localhost:5503 | 1: 0/4000678 | read-only | secondary | secondary
3532
3533 $ pg_autoctl show state --local
3534 Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
3535 ------+-------+----------------+----------------+--------------+---------------------+--------------------
3536 node1 | 1 | localhost:5501 | 1: 0/4000678 | read-write ? | primary | primary
3537
3538 $ pg_autoctl show state --json
3539 [
3540 {
3541 "health": 1,
3542 "node_id": 1,
3543 "group_id": 0,
3544 "nodehost": "localhost",
3545 "nodename": "node1",
3546 "nodeport": 5501,
3547 "reported_lsn": "0/4000678",
3548 "reported_tli": 1,
3549 "formation_kind": "pgsql",
3550 "candidate_priority": 50,
3551 "replication_quorum": true,
3552 "current_group_state": "primary",
3553 "assigned_group_state": "primary"
3554 },
3555 {
3556 "health": 1,
3557 "node_id": 2,
3558 "group_id": 0,
3559 "nodehost": "localhost",
3560 "nodename": "node2",
3561 "nodeport": 5502,
3562 "reported_lsn": "0/4000678",
3563 "reported_tli": 1,
3564 "formation_kind": "pgsql",
3565 "candidate_priority": 50,
3566 "replication_quorum": true,
3567 "current_group_state": "secondary",
3568 "assigned_group_state": "secondary"
3569 },
3570 {
3571 "health": 1,
3572 "node_id": 3,
3573 "group_id": 0,
3574 "nodehost": "localhost",
3575 "nodename": "node3",
3576 "nodeport": 5503,
3577 "reported_lsn": "0/4000678",
3578 "reported_tli": 1,
3579 "formation_kind": "pgsql",
3580 "candidate_priority": 50,
3581 "replication_quorum": true,
3582 "current_group_state": "secondary",
3583 "assigned_group_state": "secondary"
3584 }
3585 ]
3586
3587 pg_autoctl show settings
3588 pg_autoctl show settings - Print replication settings for a formation
3589 from the monitor
3590
3591 Synopsis
3592 This command allows to review all the replication settings of a given
3593 formation (defaults to 'default' as usual):
3594
3595 usage: pg_autoctl show settings [ --pgdata ] [ --json ] [ --formation ]
3596
3597 --pgdata path to data directory
3598 --monitor pg_auto_failover Monitor Postgres URL
3599 --json output data in the JSON format
3600 --formation pg_auto_failover formation
3601
3602 Description
3603 See also pg_autoctl get formation settings which is a synonym.
3604
3605 The output contains setting and values that apply at different con‐
3606 texts, as shown here with a formation of four nodes, where node_4 is
3607 not participating in the replication quorum and also not a candidate
3608 for failover:
3609
3610 $ pg_autoctl show settings
3611 Context | Name | Setting | Value
3612 ----------+---------+---------------------------+-------------------------------------------------------------
3613 formation | default | number_sync_standbys | 1
3614 primary | node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
3615 node | node_1 | replication quorum | true
3616 node | node_2 | replication quorum | true
3617 node | node_3 | replication quorum | true
3618 node | node_4 | replication quorum | false
3619 node | node_1 | candidate priority | 50
3620 node | node_2 | candidate priority | 50
3621 node | node_3 | candidate priority | 50
3622 node | node_4 | candidate priority | 0
3623
3624 Three replication settings context are listed:
3625
3626 1. The "formation" context contains a single entry, the value of
3627 number_sync_standbys for the target formation.
3628
3629 2. The "primary" context contains one entry per group of Postgres
3630 nodes in the formation, and shows the current value of the syn‐
3631 chronous_standby_names Postgres setting as computed by the moni‐
3632 tor. It should match what's currently set on the primary node un‐
3633 less while applying a change, as shown by the primary being in
3634 the APPLY_SETTING state.
3635
3636 3. The "node" context contains two entry per nodes, one line shows
3637 the replication quorum setting of nodes, and another line shows
3638 the candidate priority of nodes.
3639
3640 This command gives an overview of all the settings that apply to the
3641 current formation.
3642
3643 Options
3644 --pgdata
3645 Location of the Postgres node being managed locally. Defaults to
3646 the environment variable PGDATA. Use --monitor to connect to a
3647 monitor from anywhere, rather than the monitor URI used by a lo‐
3648 cal Postgres node managed with pg_autoctl.
3649
3650 --monitor
3651 Postgres URI used to connect to the monitor. Must use the au‐
3652 toctl_node username and target the pg_auto_failover database
3653 name. It is possible to show the Postgres URI from the monitor
3654 node using the command pg_autoctl show uri.
3655
3656 Defaults to the value of the environment variable PG_AU‐
3657 TOCTL_MONITOR.
3658
3659 --formation
3660 Show the current replication settings for the given formation.
3661 Defaults to the default formation.
3662
3663 --json Output a JSON formated data instead of a table formatted list.
3664
3665 Examples
3666 $ pg_autoctl show settings
3667 Context | Name | Setting | Value
3668 ----------+---------+---------------------------+-------------------------------------------------------------
3669 formation | default | number_sync_standbys | 1
3670 primary | node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
3671 node | node1 | candidate priority | 50
3672 node | node2 | candidate priority | 50
3673 node | node3 | candidate priority | 50
3674 node | node1 | replication quorum | true
3675 node | node2 | replication quorum | true
3676 node | node3 | replication quorum | true
3677
3678 pg_autoctl show standby-names
3679 pg_autoctl show standby-names - Prints synchronous_standby_names for a
3680 given group
3681
3682 Synopsis
3683 This command prints the current value for synchronous_standby_names for
3684 the primary Postgres server of the target group (default 0) in the tar‐
3685 get formation (default default), as computed by the monitor:
3686
3687 usage: pg_autoctl show standby-names [ --pgdata ] --formation --group
3688
3689 --pgdata path to data directory
3690 --monitor pg_auto_failover Monitor Postgres URL
3691 --formation formation to query, defaults to 'default'
3692 --group group to query formation, defaults to all
3693 --json output data in the JSON format
3694
3695 Options
3696 --pgdata
3697 Location of the Postgres node being managed locally. Defaults to
3698 the environment variable PGDATA. Use --monitor to connect to a
3699 monitor from anywhere, rather than the monitor URI used by a lo‐
3700 cal Postgres node managed with pg_autoctl.
3701
3702 --monitor
3703 Postgres URI used to connect to the monitor. Must use the au‐
3704 toctl_node username and target the pg_auto_failover database
3705 name. It is possible to show the Postgres URI from the monitor
3706 node using the command pg_autoctl show uri.
3707
3708 Defaults to the value of the environment variable PG_AU‐
3709 TOCTL_MONITOR.
3710
3711 --formation
3712 Show the current synchronous_standby_names value for the given
3713 formation. Defaults to the default formation.
3714
3715 --group
3716 Show the current synchronous_standby_names value for the given
3717 group in the given formation. Defaults to group 0.
3718
3719 --json Output a JSON formated data instead of a table formatted list.
3720
3721 Examples
3722 $ pg_autoctl show standby-names
3723 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
3724
3725 $ pg_autoctl show standby-names --json
3726 {
3727 "formation": "default",
3728 "group": 0,
3729 "synchronous_standby_names": "ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)"
3730 }
3731
3732 pg_autoctl show file
3733 pg_autoctl show file - List pg_autoctl internal files (config, state,
3734 pid)
3735
3736 Synopsis
3737 This command the files that pg_autoctl uses internally for its own con‐
3738 figuration, state, and pid:
3739
3740 usage: pg_autoctl show file [ --pgdata --all --config | --state | --init | --pid --contents ]
3741
3742 --pgdata path to data directory
3743 --all show all pg_autoctl files
3744 --config show pg_autoctl configuration file
3745 --state show pg_autoctl state file
3746 --init show pg_autoctl initialisation state file
3747 --pid show pg_autoctl PID file
3748 --contents show selected file contents
3749 --json output data in the JSON format
3750
3751 Description
3752 The pg_autoctl command follows the XDG Base Directory Specification and
3753 places its internal and configuration files by default in places such
3754 as ~/.config/pg_autoctl and ~/.local/share/pg_autoctl.
3755
3756 It is possible to change the default XDG locations by using the envi‐
3757 ronment variables XDG_CONFIG_HOME, XDG_DATA_HOME, and XDG_RUNTIME_DIR.
3758
3759 Also, pg_config uses sub-directories that are specific to a given PG‐
3760 DATA, making it possible to run several Postgres nodes on the same ma‐
3761 chine, which is very practical for testing and development purposes,
3762 though not advised for production setups.
3763
3764 Configuration File
3765 The pg_autoctl configuration file for an instance serving the data di‐
3766 rectory at /data/pgsql is found at ~/.config/pg_au‐
3767 toctl/data/pgsql/pg_autoctl.cfg, written in the INI format.
3768
3769 It is possible to get the location of the configuration file by using
3770 the command pg_autoctl show file --config --pgdata /data/pgsql and to
3771 output its content by using the command pg_autoctl show file --config
3772 --contents --pgdata /data/pgsql.
3773
3774 See also pg_autoctl config get and pg_autoctl config set.
3775
3776 State File
3777 The pg_autoctl state file for an instance serving the data directory at
3778 /data/pgsql is found at ~/.local/share/pg_autoctl/data/pgsql/pg_au‐
3779 toctl.state, written in a specific binary format.
3780
3781 This file is not intended to be written by anything else than pg_au‐
3782 toctl itself. In case of state corruption, see the trouble shooting
3783 section of the documentation.
3784
3785 It is possible to get the location of the state file by using the com‐
3786 mand pg_autoctl show file --state --pgdata /data/pgsql and to output
3787 its content by using the command pg_autoctl show file --state --con‐
3788 tents --pgdata /data/pgsql.
3789
3790 Init State File
3791 The pg_autoctl init state file for an instance serving the data direc‐
3792 tory at /data/pgsql is found at ~/.local/share/pg_au‐
3793 toctl/data/pgsql/pg_autoctl.init, written in a specific binary format.
3794
3795 This file is not intended to be written by anything else than pg_au‐
3796 toctl itself. In case of state corruption, see the trouble shooting
3797 section of the documentation.
3798
3799 This initialization state file only exists during the initialization of
3800 a pg_auto_failover node. In normal operations, this file does not ex‐
3801 ist.
3802
3803 It is possible to get the location of the state file by using the com‐
3804 mand pg_autoctl show file --init --pgdata /data/pgsql and to output its
3805 content by using the command pg_autoctl show file --init --contents
3806 --pgdata /data/pgsql.
3807
3808 PID File
3809 The pg_autoctl PID file for an instance serving the data directory at
3810 /data/pgsql is found at /tmp/pg_autoctl/data/pgsql/pg_autoctl.pid,
3811 written in a specific text format.
3812
3813 The PID file is located in a temporary directory by default, or in the
3814 XDG_RUNTIME_DIR directory when this is setup.
3815
3816 Options
3817 --pgdata
3818 Location of the Postgres node being managed locally. Defaults to
3819 the environment variable PGDATA. Use --monitor to connect to a
3820 monitor from anywhere, rather than the monitor URI used by a lo‐
3821 cal Postgres node managed with pg_autoctl.
3822
3823 --all List all the files that belong to this pg_autoctl node.
3824
3825 --config
3826 Show only the configuration file.
3827
3828 --state
3829 Show only the state file.
3830
3831 --init Show only the init state file, which only exists while the com‐
3832 mand pg_autoctl create postgres or the command pg_autoctl create
3833 monitor is running, or when than command failed (and can then be
3834 retried).
3835
3836 --pid Show only the pid file.
3837
3838 --contents
3839 When one of the options to show a specific file is in use, then
3840 --contents shows the contents of the selected file instead of
3841 showing its absolute file path.
3842
3843 --json Output JSON formated data.
3844
3845 Examples
3846 The following examples are taken from a QA environment that has been
3847 prepared thanks to the make cluster command made available to the
3848 pg_auto_failover contributors. As a result, the XDG environment vari‐
3849 ables have been tweaked to obtain a self-contained test:
3850
3851 $ tmux show-env | grep XDG
3852 XDG_CONFIG_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/config
3853 XDG_DATA_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/share
3854 XDG_RUNTIME_DIR=/Users/dim/dev/MS/pg_auto_failover/tmux/run
3855
3856 Within that self-contained test location, we can see the following ex‐
3857 amples.
3858
3859 $ pg_autoctl show file --pgdata ./node1
3860 File | Path
3861 --------+----------------
3862 Config | /Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.cfg
3863 State | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
3864 Init | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.init
3865 Pid | /Users/dim/dev/MS/pg_auto_failover/tmux/run/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.pid
3866 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
3867
3868 $ pg_autoctl show file --pgdata node1 --state
3869 /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
3870
3871 $ pg_autoctl show file --pgdata node1 --state --contents
3872 Current Role: primary
3873 Assigned Role: primary
3874 Last Monitor Contact: Thu Mar 18 17:32:25 2021
3875 Last Secondary Contact: 0
3876 pg_autoctl state version: 1
3877 group: 0
3878 node id: 1
3879 nodes version: 0
3880 PostgreSQL Version: 1201
3881 PostgreSQL CatVersion: 201909212
3882 PostgreSQL System Id: 6940955496243696337
3883
3884 pg_autoctl show file --pgdata node1 --config --contents --json | jq .pg_autoctl
3885 {
3886 "role": "keeper",
3887 "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
3888 "formation": "default",
3889 "group": 0,
3890 "name": "node1",
3891 "hostname": "localhost",
3892 "nodekind": "standalone"
3893 }
3894
3895 pg_autoctl show systemd
3896 pg_autoctl show systemd - Print systemd service file for this node
3897
3898 Synopsis
3899 This command outputs a configuration unit that is suitable for regis‐
3900 tering pg_autoctl as a systemd service.
3901
3902 Examples
3903 $ pg_autoctl show systemd --pgdata node1
3904 17:38:29 99778 INFO HINT: to complete a systemd integration, run the following commands:
3905 17:38:29 99778 INFO pg_autoctl -q show systemd --pgdata "node1" | sudo tee /etc/systemd/system/pgautofailover.service
3906 17:38:29 99778 INFO sudo systemctl daemon-reload
3907 17:38:29 99778 INFO sudo systemctl enable pgautofailover
3908 17:38:29 99778 INFO sudo systemctl start pgautofailover
3909 [Unit]
3910 Description = pg_auto_failover
3911
3912 [Service]
3913 WorkingDirectory = /Users/dim
3914 Environment = 'PGDATA=node1'
3915 User = dim
3916 ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
3917 Restart = always
3918 StartLimitBurst = 0
3919 ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload
3920
3921 [Install]
3922 WantedBy = multi-user.target
3923
3924 To avoid the logs output, use the -q option:
3925
3926 $ pg_autoctl show systemd --pgdata node1 -q
3927 [Unit]
3928 Description = pg_auto_failover
3929
3930 [Service]
3931 WorkingDirectory = /Users/dim
3932 Environment = 'PGDATA=node1'
3933 User = dim
3934 ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
3935 Restart = always
3936 StartLimitBurst = 0
3937 ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload
3938
3939 [Install]
3940 WantedBy = multi-user.target
3941
3942 pg_autoctl enable
3943 pg_autoctl enable - Enable a feature on a formation
3944
3945 pg_autoctl enable secondary
3946 pg_autoctl enable secondary - Enable secondary nodes on a formation
3947
3948 Synopsis
3949 This feature makes the most sense when using the Enterprise Edition of
3950 pg_auto_failover, which is fully compatible with Citus formations. When
3951 secondary are enabled, then Citus workers creation policy is to assign
3952 a primary node then a standby node for each group. When secondary is
3953 disabled the Citus workers creation policy is to assign only the pri‐
3954 mary nodes.
3955
3956 usage: pg_autoctl enable secondary [ --pgdata --formation ]
3957
3958 --pgdata path to data directory
3959 --formation Formation to enable secondary on
3960
3961 Options
3962 --pgdata
3963 Location of the Postgres node being managed locally. Defaults to
3964 the environment variable PGDATA. Use --monitor to connect to a
3965 monitor from anywhere, rather than the monitor URI used by a lo‐
3966 cal Postgres node managed with pg_autoctl.
3967
3968 --formation
3969 Target formation where to enable secondary feature.
3970
3971 pg_autoctl enable maintenance
3972 pg_autoctl enable maintenance - Enable Postgres maintenance mode on
3973 this node
3974
3975 Synopsis
3976 A pg_auto_failover can be put to a maintenance state. The Postgres node
3977 is then still registered to the monitor, and is known to be unreliable
3978 until maintenance is disabled. A node in the maintenance state is not a
3979 candidate for promotion.
3980
3981 Typical use of the maintenance state include Operating System or Post‐
3982 gres reboot, e.g. when applying security upgrades.
3983
3984 usage: pg_autoctl enable maintenance [ --pgdata --allow-failover ]
3985
3986 --pgdata path to data directory
3987
3988 Options
3989 --pgdata
3990 Location of the Postgres node being managed locally. Defaults to
3991 the environment variable PGDATA. Use --monitor to connect to a
3992 monitor from anywhere, rather than the monitor URI used by a lo‐
3993 cal Postgres node managed with pg_autoctl.
3994
3995 --formation
3996 Target formation where to enable secondary feature.
3997
3998 Examples
3999 pg_autoctl show state
4000 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4001 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4002 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
4003 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
4004 node3 | 3 | localhost:5503 | 0/4000760 | read-only | secondary | secondary
4005
4006 $ pg_autoctl enable maintenance --pgdata node3
4007 12:06:12 47086 INFO Listening monitor notifications about state changes in formation "default" and group 0
4008 12:06:12 47086 INFO Following table displays times when notifications are received
4009 Time | Name | Node | Host:Port | Current State | Assigned State
4010 ---------+-------+-------+----------------+---------------------+--------------------
4011 12:06:12 | node1 | 1 | localhost:5501 | primary | join_primary
4012 12:06:12 | node3 | 3 | localhost:5503 | secondary | wait_maintenance
4013 12:06:12 | node3 | 3 | localhost:5503 | wait_maintenance | wait_maintenance
4014 12:06:12 | node1 | 1 | localhost:5501 | join_primary | join_primary
4015 12:06:12 | node3 | 3 | localhost:5503 | wait_maintenance | maintenance
4016 12:06:12 | node1 | 1 | localhost:5501 | join_primary | primary
4017 12:06:13 | node3 | 3 | localhost:5503 | maintenance | maintenance
4018
4019 $ pg_autoctl show state
4020 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4021 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4022 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
4023 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
4024 node3 | 3 | localhost:5503 | 0/4000810 | none | maintenance | maintenance
4025
4026 pg_autoctl enable ssl
4027 pg_autoctl enable ssl - Enable SSL configuration on this node
4028
4029 Synopsis
4030 It is possible to manage Postgres SSL settings with the pg_autoctl com‐
4031 mand, both at pg_autoctl create postgres time and then again to change
4032 your mind and update the SSL settings at run-time.
4033
4034 usage: pg_autoctl enable ssl [ --pgdata ] [ --json ]
4035
4036 --pgdata path to data directory
4037 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
4038 --ssl-mode use that sslmode in connection strings
4039 --ssl-ca-file set the Postgres ssl_ca_file to that file path
4040 --ssl-crl-file set the Postgres ssl_crl_file to that file path
4041 --no-ssl don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
4042 --server-key set the Postgres ssl_key_file to that file path
4043 --server-cert set the Postgres ssl_cert_file to that file path
4044
4045 Options
4046 --pgdata
4047 Location of the Postgres node being managed locally. Defaults to
4048 the environment variable PGDATA. Use --monitor to connect to a
4049 monitor from anywhere, rather than the monitor URI used by a lo‐
4050 cal Postgres node managed with pg_autoctl.
4051
4052 --ssl-self-signed
4053 Generate SSL self-signed certificates to provide network encryp‐
4054 tion. This does not protect against man-in-the-middle kinds of
4055 attacks. See Security settings for pg_auto_failover for more
4056 about our SSL settings.
4057
4058 --ssl-mode
4059 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
4060 cluding when connecting for streaming replication.
4061
4062 --ssl-ca-file
4063 Set the Postgres ssl_ca_file to that file path.
4064
4065 --ssl-crl-file
4066 Set the Postgres ssl_crl_file to that file path.
4067
4068 --no-ssl
4069 Don't enable network encryption. This is not recommended, prefer
4070 --ssl-self-signed.
4071
4072 --server-key
4073 Set the Postgres ssl_key_file to that file path.
4074
4075 --server-cert
4076 Set the Postgres ssl_cert_file to that file path.
4077
4078 pg_autoctl enable monitor
4079 pg_autoctl enable monitor - Enable a monitor for this node to be or‐
4080 chestrated from
4081
4082 Synopsis
4083 It is possible to disable the pg_auto_failover monitor and enable it
4084 again online in a running pg_autoctl Postgres node. The main use-cases
4085 where this operation is useful is when the monitor node has to be re‐
4086 placed, either after a full crash of the previous monitor node, of for
4087 migrating to a new monitor node (hardware replacement, region or zone
4088 migration, etc).
4089
4090 usage: pg_autoctl enable monitor [ --pgdata --allow-failover ] postgres://autoctl_node@new.monitor.add.ress/pg_auto_failover
4091
4092 --pgdata path to data directory
4093
4094 Options
4095 --pgdata
4096 Location of the Postgres node being managed locally. Defaults to
4097 the environment variable PGDATA. Use --monitor to connect to a
4098 monitor from anywhere, rather than the monitor URI used by a lo‐
4099 cal Postgres node managed with pg_autoctl.
4100
4101 Examples
4102 $ pg_autoctl show state
4103 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4104 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4105 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
4106 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
4107
4108
4109 $ pg_autoctl enable monitor --pgdata node3 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=require'
4110 12:42:07 43834 INFO Registered node 3 (localhost:5503) with name "node3" in formation "default", group 0, state "wait_standby"
4111 12:42:07 43834 INFO Successfully registered to the monitor with nodeId 3
4112 12:42:08 43834 INFO Still waiting for the monitor to drive us to state "catchingup"
4113 12:42:08 43834 WARN Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
4114
4115 $ pg_autoctl show state
4116 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4117 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4118 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
4119 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
4120 node3 | 3 | localhost:5503 | 0/4000810 | read-only | secondary | secondary
4121
4122 pg_autoctl disable
4123 pg_autoctl disable - Disable a feature on a formation
4124
4125 pg_autoctl disable secondary
4126 pg_autoctl disable secondary - Disable secondary nodes on a formation
4127
4128 Synopsis
4129 This feature makes the most sense when using the Enterprise Edition of
4130 pg_auto_failover, which is fully compatible with Citus formations. When
4131 secondary are disabled, then Citus workers creation policy is to assign
4132 a primary node then a standby node for each group. When secondary is
4133 disabled the Citus workers creation policy is to assign only the pri‐
4134 mary nodes.
4135
4136 usage: pg_autoctl disable secondary [ --pgdata --formation ]
4137
4138 --pgdata path to data directory
4139 --formation Formation to disable secondary on
4140
4141 Options
4142 --pgdata
4143 Location of the Postgres node being managed locally. Defaults to
4144 the environment variable PGDATA. Use --monitor to connect to a
4145 monitor from anywhere, rather than the monitor URI used by a lo‐
4146 cal Postgres node managed with pg_autoctl.
4147
4148 --formation
4149 Target formation where to disable secondary feature.
4150
4151 pg_autoctl disable maintenance
4152 pg_autoctl disable maintenance - Disable Postgres maintenance mode on
4153 this node
4154
4155 Synopsis
4156 A pg_auto_failover can be put to a maintenance state. The Postgres node
4157 is then still registered to the monitor, and is known to be unreliable
4158 until maintenance is disabled. A node in the maintenance state is not a
4159 candidate for promotion.
4160
4161 Typical use of the maintenance state include Operating System or Post‐
4162 gres reboot, e.g. when applying security upgrades.
4163
4164 usage: pg_autoctl disable maintenance [ --pgdata --allow-failover ]
4165
4166 --pgdata path to data directory
4167
4168 Options
4169 --pgdata
4170 Location of the Postgres node being managed locally. Defaults to
4171 the environment variable PGDATA. Use --monitor to connect to a
4172 monitor from anywhere, rather than the monitor URI used by a lo‐
4173 cal Postgres node managed with pg_autoctl.
4174
4175 --formation
4176 Target formation where to disable secondary feature.
4177
4178 Examples
4179 $ pg_autoctl show state
4180 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4181 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4182 node1 | 1 | localhost:5501 | 0/4000810 | read-write | primary | primary
4183 node2 | 2 | localhost:5502 | 0/4000810 | read-only | secondary | secondary
4184 node3 | 3 | localhost:5503 | 0/4000810 | none | maintenance | maintenance
4185
4186 $ pg_autoctl disable maintenance --pgdata node3
4187 12:06:37 47542 INFO Listening monitor notifications about state changes in formation "default" and group 0
4188 12:06:37 47542 INFO Following table displays times when notifications are received
4189 Time | Name | Node | Host:Port | Current State | Assigned State
4190 ---------+-------+-------+----------------+---------------------+--------------------
4191 12:06:37 | node3 | 3 | localhost:5503 | maintenance | catchingup
4192 12:06:37 | node3 | 3 | localhost:5503 | catchingup | catchingup
4193 12:06:37 | node3 | 3 | localhost:5503 | catchingup | secondary
4194 12:06:37 | node3 | 3 | localhost:5503 | secondary | secondary
4195
4196 $ pg_autoctl show state
4197 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4198 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4199 node1 | 1 | localhost:5501 | 0/4000848 | read-write | primary | primary
4200 node2 | 2 | localhost:5502 | 0/4000848 | read-only | secondary | secondary
4201 node3 | 3 | localhost:5503 | 0/4000000 | read-only | secondary | secondary
4202
4203 pg_autoctl disable ssl
4204 pg_autoctl disable ssl - Disable SSL configuration on this node
4205
4206 Synopsis
4207 It is possible to manage Postgres SSL settings with the pg_autoctl com‐
4208 mand, both at pg_autoctl create postgres time and then again to change
4209 your mind and update the SSL settings at run-time.
4210
4211 usage: pg_autoctl disable ssl [ --pgdata ] [ --json ]
4212
4213 --pgdata path to data directory
4214 --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
4215 --ssl-mode use that sslmode in connection strings
4216 --ssl-ca-file set the Postgres ssl_ca_file to that file path
4217 --ssl-crl-file set the Postgres ssl_crl_file to that file path
4218 --no-ssl don't disable network encryption (NOT recommended, prefer --ssl-self-signed)
4219 --server-key set the Postgres ssl_key_file to that file path
4220 --server-cert set the Postgres ssl_cert_file to that file path
4221
4222 Options
4223 --pgdata
4224 Location of the Postgres node being managed locally. Defaults to
4225 the environment variable PGDATA. Use --monitor to connect to a
4226 monitor from anywhere, rather than the monitor URI used by a lo‐
4227 cal Postgres node managed with pg_autoctl.
4228
4229 --ssl-self-signed
4230 Generate SSL self-signed certificates to provide network encryp‐
4231 tion. This does not protect against man-in-the-middle kinds of
4232 attacks. See Security settings for pg_auto_failover for more
4233 about our SSL settings.
4234
4235 --ssl-mode
4236 SSL Mode used by pg_autoctl when connecting to other nodes, in‐
4237 cluding when connecting for streaming replication.
4238
4239 --ssl-ca-file
4240 Set the Postgres ssl_ca_file to that file path.
4241
4242 --ssl-crl-file
4243 Set the Postgres ssl_crl_file to that file path.
4244
4245 --no-ssl
4246 Don't disable network encryption. This is not recommended, pre‐
4247 fer --ssl-self-signed.
4248
4249 --server-key
4250 Set the Postgres ssl_key_file to that file path.
4251
4252 --server-cert
4253 Set the Postgres ssl_cert_file to that file path.
4254
4255 pg_autoctl disable monitor
4256 pg_autoctl disable monitor - Disable the monitor for this node
4257
4258 Synopsis
4259 It is possible to disable the pg_auto_failover monitor and enable it
4260 again online in a running pg_autoctl Postgres node. The main use-cases
4261 where this operation is useful is when the monitor node has to be re‐
4262 placed, either after a full crash of the previous monitor node, of for
4263 migrating to a new monitor node (hardware replacement, region or zone
4264 migration, etc).
4265
4266 usage: pg_autoctl disable monitor [ --pgdata --force ]
4267
4268 --pgdata path to data directory
4269 --force force unregistering from the monitor
4270
4271 Options
4272 --pgdata
4273 Location of the Postgres node being managed locally. Defaults to
4274 the environment variable PGDATA. Use --monitor to connect to a
4275 monitor from anywhere, rather than the monitor URI used by a lo‐
4276 cal Postgres node managed with pg_autoctl.
4277
4278 --force
4279 The --force covers the two following situations:
4280
4281 1. By default, the command expects to be able to connect to
4282 the current monitor. When the current known monitor in the
4283 setup is not running anymore, use --force to skip this
4284 step.
4285
4286 2. When pg_autoctl could connect to the monitor and the node
4287 is found there, this is normally an error that prevents
4288 from disabling the monitor. Using --force allows the com‐
4289 mand to drop the node from the monitor and continue with
4290 disabling the monitor.
4291
4292 Examples
4293 $ pg_autoctl show state
4294 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4295 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4296 node1 | 1 | localhost:5501 | 0/4000148 | read-write | primary | primary
4297 node2 | 2 | localhost:5502 | 0/4000148 | read-only | secondary | secondary
4298 node3 | 3 | localhost:5503 | 0/4000148 | read-only | secondary | secondary
4299
4300
4301 $ pg_autoctl disable monitor --pgdata node3
4302 12:41:21 43039 INFO Found node 3 "node3" (localhost:5503) on the monitor
4303 12:41:21 43039 FATAL Use --force to remove the node from the monitor
4304
4305 $ pg_autoctl disable monitor --pgdata node3 --force
4306 12:41:32 43219 INFO Removing node 3 "node3" (localhost:5503) from monitor
4307
4308 $ pg_autoctl show state
4309 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4310 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4311 node1 | 1 | localhost:5501 | 0/4000760 | read-write | primary | primary
4312 node2 | 2 | localhost:5502 | 0/4000760 | read-only | secondary | secondary
4313
4314 pg_autoctl get
4315 pg_autoctl get - Get a pg_auto_failover node, or formation setting
4316
4317 pg_autoctl get formation settings
4318 pg_autoctl get formation settings - get replication settings for a for‐
4319 mation from the monitor
4320
4321 Synopsis
4322 This command prints a pg_autoctl replication settings:
4323
4324 usage: pg_autoctl get formation settings [ --pgdata ] [ --json ] [ --formation ]
4325
4326 --pgdata path to data directory
4327 --json output data in the JSON format
4328 --formation pg_auto_failover formation
4329
4330 Description
4331 See also pg_autoctl show settings which is a synonym.
4332
4333 Options
4334 --pgdata
4335 Location of the Postgres node being managed locally. Defaults to
4336 the environment variable PGDATA. Use --monitor to connect to a
4337 monitor from anywhere, rather than the monitor URI used by a lo‐
4338 cal Postgres node managed with pg_autoctl.
4339
4340 --json Output JSON formated data.
4341
4342 --formation
4343 Show replication settings for given formation. Defaults to de‐
4344 fault.
4345
4346 Examples
4347 $ pg_autoctl get formation settings
4348 Context | Name | Setting | Value
4349 ----------+---------+---------------------------+-------------------------------------------------------------
4350 formation | default | number_sync_standbys | 1
4351 primary | node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
4352 node | node1 | candidate priority | 50
4353 node | node2 | candidate priority | 50
4354 node | node3 | candidate priority | 50
4355 node | node1 | replication quorum | true
4356 node | node2 | replication quorum | true
4357 node | node3 | replication quorum | true
4358
4359 $ pg_autoctl get formation settings --json
4360 {
4361 "nodes": [
4362 {
4363 "value": "true",
4364 "context": "node",
4365 "node_id": 1,
4366 "setting": "replication quorum",
4367 "group_id": 0,
4368 "nodename": "node1"
4369 },
4370 {
4371 "value": "true",
4372 "context": "node",
4373 "node_id": 2,
4374 "setting": "replication quorum",
4375 "group_id": 0,
4376 "nodename": "node2"
4377 },
4378 {
4379 "value": "true",
4380 "context": "node",
4381 "node_id": 3,
4382 "setting": "replication quorum",
4383 "group_id": 0,
4384 "nodename": "node3"
4385 },
4386 {
4387 "value": "50",
4388 "context": "node",
4389 "node_id": 1,
4390 "setting": "candidate priority",
4391 "group_id": 0,
4392 "nodename": "node1"
4393 },
4394 {
4395 "value": "50",
4396 "context": "node",
4397 "node_id": 2,
4398 "setting": "candidate priority",
4399 "group_id": 0,
4400 "nodename": "node2"
4401 },
4402 {
4403 "value": "50",
4404 "context": "node",
4405 "node_id": 3,
4406 "setting": "candidate priority",
4407 "group_id": 0,
4408 "nodename": "node3"
4409 }
4410 ],
4411 "primary": [
4412 {
4413 "value": "'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'",
4414 "context": "primary",
4415 "node_id": 1,
4416 "setting": "synchronous_standby_names",
4417 "group_id": 0,
4418 "nodename": "node1"
4419 }
4420 ],
4421 "formation": {
4422 "value": "1",
4423 "context": "formation",
4424 "node_id": null,
4425 "setting": "number_sync_standbys",
4426 "group_id": null,
4427 "nodename": "default"
4428 }
4429 }
4430
4431 pg_autoctl get formation number-sync-standbys
4432 pg_autoctl get formation number-sync-standbys - get number_sync_stand‐
4433 bys for a formation from the monitor
4434
4435 Synopsis
4436 This command prints a pg_autoctl replication settings for number sync
4437 standbys:
4438
4439 usage: pg_autoctl get formation number-sync-standbys [ --pgdata ] [ --json ] [ --formation ]
4440
4441 --pgdata path to data directory
4442 --json output data in the JSON format
4443 --formation pg_auto_failover formation
4444
4445 Description
4446 See also pg_autoctl show settings for the full list of replication set‐
4447 tings.
4448
4449 Options
4450 --pgdata
4451 Location of the Postgres node being managed locally. Defaults to
4452 the environment variable PGDATA. Use --monitor to connect to a
4453 monitor from anywhere, rather than the monitor URI used by a lo‐
4454 cal Postgres node managed with pg_autoctl.
4455
4456 --json Output JSON formated data.
4457
4458 --formation
4459 Show replication settings for given formation. Defaults to de‐
4460 fault.
4461
4462 Examples
4463 $ pg_autoctl get formation number-sync-standbys
4464 1
4465
4466 $ pg_autoctl get formation number-sync-standbys --json
4467 {
4468 "number-sync-standbys": 1
4469 }
4470
4471 pg_autoctl get node replication-quorum
4472 pg_autoctl get replication-quorum - get replication-quorum property
4473 from the monitor
4474
4475 Synopsis
4476 This command prints pg_autoctl replication quorun for a given node:
4477
4478 usage: pg_autoctl get node replication-quorum [ --pgdata ] [ --json ] [ --formation ] [ --name ]
4479
4480 --pgdata path to data directory
4481 --formation pg_auto_failover formation
4482 --name pg_auto_failover node name
4483 --json output data in the JSON format
4484
4485 Description
4486 See also pg_autoctl show settings for the full list of replication set‐
4487 tings.
4488
4489 Options
4490 --pgdata
4491 Location of the Postgres node being managed locally. Defaults to
4492 the environment variable PGDATA. Use --monitor to connect to a
4493 monitor from anywhere, rather than the monitor URI used by a lo‐
4494 cal Postgres node managed with pg_autoctl.
4495
4496 --json Output JSON formated data.
4497
4498 --formation
4499 Show replication settings for given formation. Defaults to de‐
4500 fault.
4501
4502 --name Show replication settings for given node, selected by name.
4503
4504 Examples
4505 $ pg_autoctl get node replication-quorum --name node1
4506 true
4507
4508 $ pg_autoctl get node replication-quorum --name node1 --json
4509 {
4510 "name": "node1",
4511 "replication-quorum": true
4512 }
4513
4514 pg_autoctl get node candidate-priority
4515 pg_autoctl get candidate-priority - get candidate-priority property
4516 from the monitor
4517
4518 Synopsis
4519 This command prints pg_autoctl candidate priority for a given node:
4520
4521 usage: pg_autoctl get node candidate-priority [ --pgdata ] [ --json ] [ --formation ] [ --name ]
4522
4523 --pgdata path to data directory
4524 --formation pg_auto_failover formation
4525 --name pg_auto_failover node name
4526 --json output data in the JSON format
4527
4528 Description
4529 See also pg_autoctl show settings for the full list of replication set‐
4530 tings.
4531
4532 Options
4533 --pgdata
4534 Location of the Postgres node being managed locally. Defaults to
4535 the environment variable PGDATA. Use --monitor to connect to a
4536 monitor from anywhere, rather than the monitor URI used by a lo‐
4537 cal Postgres node managed with pg_autoctl.
4538
4539 --json Output JSON formated data.
4540
4541 --formation
4542 Show replication settings for given formation. Defaults to de‐
4543 fault.
4544
4545 --name Show replication settings for given node, selected by name.
4546
4547 Examples
4548 $ pg_autoctl get node candidate-priority --name node1
4549 50
4550
4551 $ pg_autoctl get node candidate-priority --name node1 --json
4552 {
4553 "name": "node1",
4554 "candidate-priority": 50
4555 }
4556
4557 pg_autoctl set
4558 pg_autoctl set - Set a pg_auto_failover node, or formation setting
4559
4560 pg_autoctl set formation number-sync-standbys
4561 pg_autoctl set formation number-sync-standbys - set number_sync_stand‐
4562 bys for a formation from the monitor
4563
4564 Synopsis
4565 This command set a pg_autoctl replication settings for number sync
4566 standbys:
4567
4568 usage: pg_autoctl set formation number-sync-standbys [ --pgdata ] [ --json ] [ --formation ] <number_sync_standbys>
4569
4570 --pgdata path to data directory
4571 --formation pg_auto_failover formation
4572 --json output data in the JSON format
4573
4574 Description
4575 The pg_auto_failover monitor ensures that at least N+1 candidate
4576 standby nodes are registered when number-sync-standbys is N. This means
4577 that to be able to run the following command, at least 3 standby nodes
4578 with a non-zero candidate priority must be registered to the monitor:
4579
4580 $ pg_autoctl set formation number-sync-standbys 2
4581
4582 See also pg_autoctl show settings for the full list of replication set‐
4583 tings.
4584
4585 Options
4586 --pgdata
4587 Location of the Postgres node being managed locally. Defaults to
4588 the environment variable PGDATA. Use --monitor to connect to a
4589 monitor from anywhere, rather than the monitor URI used by a lo‐
4590 cal Postgres node managed with pg_autoctl.
4591
4592 --json Output JSON formated data.
4593
4594 --formation
4595 Show replication settings for given formation. Defaults to de‐
4596 fault.
4597
4598 pg_autoctl set node replication-quorum
4599 pg_autoctl set replication-quorum - set replication-quorum property
4600 from the monitor
4601
4602 Synopsis
4603 This command sets pg_autoctl replication quorum for a given node:
4604
4605 usage: pg_autoctl set node replication-quorum [ --pgdata ] [ --json ] [ --formation ] [ --name ] <true|false>
4606
4607 --pgdata path to data directory
4608 --formation pg_auto_failover formation
4609 --name pg_auto_failover node name
4610 --json output data in the JSON format
4611
4612 Description
4613 See also pg_autoctl show settings for the full list of replication set‐
4614 tings.
4615
4616 Options
4617 --pgdata
4618 Location of the Postgres node being managed locally. Defaults to
4619 the environment variable PGDATA. Use --monitor to connect to a
4620 monitor from anywhere, rather than the monitor URI used by a lo‐
4621 cal Postgres node managed with pg_autoctl.
4622
4623 --json Output JSON formated data.
4624
4625 --formation
4626 Show replication settings for given formation. Defaults to de‐
4627 fault.
4628
4629 --name Show replication settings for given node, selected by name.
4630
4631 Examples
4632 $ pg_autoctl set node replication-quorum --name node1 false
4633 12:49:37 94092 INFO Waiting for the settings to have been applied to the monitor and primary node
4634 12:49:37 94092 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
4635 12:49:37 94092 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
4636 12:49:37 94092 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
4637 false
4638
4639 $ pg_autoctl set node replication-quorum --name node1 true --json
4640 12:49:42 94199 INFO Waiting for the settings to have been applied to the monitor and primary node
4641 12:49:42 94199 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
4642 12:49:42 94199 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
4643 12:49:43 94199 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
4644 {
4645 "replication-quorum": true
4646 }
4647
4648 pg_autoctl set node candidate-priority
4649 pg_autoctl set candidate-priority - set candidate-priority property
4650 from the monitor
4651
4652 Synopsis
4653 This command sets the pg_autoctl candidate priority for a given node:
4654
4655 usage: pg_autoctl set node candidate-priority [ --pgdata ] [ --json ] [ --formation ] [ --name ] <priority: 0..100>
4656
4657 --pgdata path to data directory
4658 --formation pg_auto_failover formation
4659 --name pg_auto_failover node name
4660 --json output data in the JSON format
4661
4662 Description
4663 See also pg_autoctl show settings for the full list of replication set‐
4664 tings.
4665
4666 Options
4667 --pgdata
4668 Location of the Postgres node being managed locally. Defaults to
4669 the environment variable PGDATA. Use --monitor to connect to a
4670 monitor from anywhere, rather than the monitor URI used by a lo‐
4671 cal Postgres node managed with pg_autoctl.
4672
4673 --json Output JSON formated data.
4674
4675 --formation
4676 Show replication settings for given formation. Defaults to de‐
4677 fault.
4678
4679 --name Show replication settings for given node, selected by name.
4680
4681 Examples
4682 $ pg_autoctl set node candidate-priority --name node1 65
4683 12:47:59 92326 INFO Waiting for the settings to have been applied to the monitor and primary node
4684 12:47:59 92326 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
4685 12:47:59 92326 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
4686 12:47:59 92326 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
4687 65
4688
4689 $ pg_autoctl set node candidate-priority --name node1 50 --json
4690 12:48:05 92450 INFO Waiting for the settings to have been applied to the monitor and primary node
4691 12:48:05 92450 INFO New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
4692 12:48:05 92450 INFO Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
4693 12:48:05 92450 INFO New state is reported by node 1 "node1" (localhost:5501): "primary"
4694 {
4695 "candidate-priority": 50
4696 }
4697
4698 pg_autoctl perform
4699 pg_autoctl perform - Perform an action orchestrated by the monitor
4700
4701 pg_autoctl perform failover
4702 pg_autoctl perform failover - Perform a failover for given formation
4703 and group
4704
4705 Synopsis
4706 This command starts a Postgres failover orchestration from the
4707 pg_auto_failover monitor:
4708
4709 usage: pg_autoctl perform failover [ --pgdata --formation --group ]
4710
4711 --pgdata path to data directory
4712 --formation formation to target, defaults to 'default'
4713 --group group to target, defaults to 0
4714 --wait how many seconds to wait, default to 60
4715
4716 Description
4717 The pg_auto_failover monitor can be used to orchestrate a manual
4718 failover, sometimes also known as a switchover. When doing so,
4719 split-brain are prevented thanks to intermediary states being used in
4720 the Finite State Machine.
4721
4722 The pg_autoctl perform failover command waits until the failover is
4723 known complete on the monitor, or until the hard-coded 60s timeout has
4724 passed.
4725
4726 The failover orchestration is done in the background by the monitor, so
4727 even if the pg_autoctl perform failover stops on the timeout, the
4728 failover orchestration continues at the monitor.
4729
4730 Options
4731 --pgdata
4732 Location of the Postgres node being managed locally. Defaults to
4733 the environment variable PGDATA. Use --monitor to connect to a
4734 monitor from anywhere, rather than the monitor URI used by a lo‐
4735 cal Postgres node managed with pg_autoctl.
4736
4737 --formation
4738 Formation to target for the operation. Defaults to default.
4739
4740 --group
4741 Postgres group to target for the operation. Defaults to 0, only
4742 Citus formations may have more than one group.
4743
4744 --wait How many seconds to wait for notifications about the promotion.
4745 The command stops when the promotion is finished (a node is pri‐
4746 mary), or when the timeout has elapsed, whichever comes first.
4747 The value 0 (zero) disables the timeout and allows the command
4748 to wait forever.
4749
4750 Examples
4751 $ pg_autoctl perform failover
4752 12:57:30 3635 INFO Listening monitor notifications about state changes in formation "default" and group 0
4753 12:57:30 3635 INFO Following table displays times when notifications are received
4754 Time | Name | Node | Host:Port | Current State | Assigned State
4755 ---------+-------+-------+----------------+---------------------+--------------------
4756 12:57:30 | node1 | 1 | localhost:5501 | primary | draining
4757 12:57:30 | node1 | 1 | localhost:5501 | draining | draining
4758 12:57:30 | node2 | 2 | localhost:5502 | secondary | report_lsn
4759 12:57:30 | node3 | 3 | localhost:5503 | secondary | report_lsn
4760 12:57:36 | node3 | 3 | localhost:5503 | report_lsn | report_lsn
4761 12:57:36 | node2 | 2 | localhost:5502 | report_lsn | report_lsn
4762 12:57:36 | node2 | 2 | localhost:5502 | report_lsn | prepare_promotion
4763 12:57:36 | node2 | 2 | localhost:5502 | prepare_promotion | prepare_promotion
4764 12:57:36 | node2 | 2 | localhost:5502 | prepare_promotion | stop_replication
4765 12:57:36 | node1 | 1 | localhost:5501 | draining | demote_timeout
4766 12:57:36 | node3 | 3 | localhost:5503 | report_lsn | join_secondary
4767 12:57:36 | node1 | 1 | localhost:5501 | demote_timeout | demote_timeout
4768 12:57:36 | node3 | 3 | localhost:5503 | join_secondary | join_secondary
4769 12:57:37 | node2 | 2 | localhost:5502 | stop_replication | stop_replication
4770 12:57:37 | node2 | 2 | localhost:5502 | stop_replication | wait_primary
4771 12:57:37 | node1 | 1 | localhost:5501 | demote_timeout | demoted
4772 12:57:37 | node1 | 1 | localhost:5501 | demoted | demoted
4773 12:57:37 | node2 | 2 | localhost:5502 | wait_primary | wait_primary
4774 12:57:37 | node3 | 3 | localhost:5503 | join_secondary | secondary
4775 12:57:37 | node1 | 1 | localhost:5501 | demoted | catchingup
4776 12:57:38 | node3 | 3 | localhost:5503 | secondary | secondary
4777 12:57:38 | node2 | 2 | localhost:5502 | wait_primary | primary
4778 12:57:38 | node1 | 1 | localhost:5501 | catchingup | catchingup
4779 12:57:38 | node2 | 2 | localhost:5502 | primary | primary
4780
4781 $ pg_autoctl show state
4782 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4783 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4784 node1 | 1 | localhost:5501 | 0/4000F50 | read-only | secondary | secondary
4785 node2 | 2 | localhost:5502 | 0/4000F50 | read-write | primary | primary
4786 node3 | 3 | localhost:5503 | 0/4000F50 | read-only | secondary | secondary
4787
4788 pg_autoctl perform switchover
4789 pg_autoctl perform switchover - Perform a switchover for given forma‐
4790 tion and group
4791
4792 Synopsis
4793 This command starts a Postgres switchover orchestration from the
4794 pg_auto_switchover monitor:
4795
4796 usage: pg_autoctl perform switchover [ --pgdata --formation --group ]
4797
4798 --pgdata path to data directory
4799 --formation formation to target, defaults to 'default'
4800 --group group to target, defaults to 0
4801
4802 Description
4803 The pg_auto_switchover monitor can be used to orchestrate a manual
4804 switchover, sometimes also known as a switchover. When doing so,
4805 split-brain are prevented thanks to intermediary states being used in
4806 the Finite State Machine.
4807
4808 The pg_autoctl perform switchover command waits until the switchover is
4809 known complete on the monitor, or until the hard-coded 60s timeout has
4810 passed.
4811
4812 The switchover orchestration is done in the background by the monitor,
4813 so even if the pg_autoctl perform switchover stops on the timeout, the
4814 switchover orchestration continues at the monitor.
4815
4816 See also pg_autoctl perform failover, a synonym for this command.
4817
4818 Options
4819 --pgdata
4820 Location of the Postgres node being managed locally. Defaults to
4821 the environment variable PGDATA. Use --monitor to connect to a
4822 monitor from anywhere, rather than the monitor URI used by a lo‐
4823 cal Postgres node managed with pg_autoctl.
4824
4825 --formation
4826 Formation to target for the operation. Defaults to default.
4827
4828 --group
4829 Postgres group to target for the operation. Defaults to 0, only
4830 Citus formations may have more than one group.
4831
4832 pg_autoctl perform promotion
4833 pg_autoctl perform promotion - Perform a failover that promotes a tar‐
4834 get node
4835
4836 Synopsis
4837 This command starts a Postgres failover orchestration from the
4838 pg_auto_promotion monitor and targets given node:
4839
4840 usage: pg_autoctl perform promotion [ --pgdata --formation --group ]
4841
4842 --pgdata path to data directory
4843 --formation formation to target, defaults to 'default'
4844 --name node name to target, defaults to current node
4845 --wait how many seconds to wait, default to 60
4846
4847 Description
4848 The pg_auto_promotion monitor can be used to orchestrate a manual pro‐
4849 motion, sometimes also known as a switchover. When doing so,
4850 split-brain are prevented thanks to intermediary states being used in
4851 the Finite State Machine.
4852
4853 The pg_autoctl perform promotion command waits until the promotion is
4854 known complete on the monitor, or until the hard-coded 60s timeout has
4855 passed.
4856
4857 The promotion orchestration is done in the background by the monitor,
4858 so even if the pg_autoctl perform promotion stops on the timeout, the
4859 promotion orchestration continues at the monitor.
4860
4861 Options
4862 --pgdata
4863 Location of the Postgres node being managed locally. Defaults to
4864 the environment variable PGDATA. Use --monitor to connect to a
4865 monitor from anywhere, rather than the monitor URI used by a lo‐
4866 cal Postgres node managed with pg_autoctl.
4867
4868 --formation
4869 Formation to target for the operation. Defaults to default.
4870
4871 --name Name of the node that should be elected as the new primary node.
4872
4873 --wait How many seconds to wait for notifications about the promotion.
4874 The command stops when the promotion is finished (a node is pri‐
4875 mary), or when the timeout has elapsed, whichever comes first.
4876 The value 0 (zero) disables the timeout and allows the command
4877 to wait forever.
4878
4879 Examples
4880 $ pg_autoctl show state
4881 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4882 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4883 node1 | 1 | localhost:5501 | 0/4000F88 | read-only | secondary | secondary
4884 node2 | 2 | localhost:5502 | 0/4000F88 | read-write | primary | primary
4885 node3 | 3 | localhost:5503 | 0/4000F88 | read-only | secondary | secondary
4886
4887
4888 $ pg_autoctl perform promotion --name node1
4889 13:08:13 15297 INFO Listening monitor notifications about state changes in formation "default" and group 0
4890 13:08:13 15297 INFO Following table displays times when notifications are received
4891 Time | Name | Node | Host:Port | Current State | Assigned State
4892 ---------+-------+-------+----------------+---------------------+--------------------
4893 13:08:13 | node1 | 0/1 | localhost:5501 | secondary | secondary
4894 13:08:13 | node2 | 0/2 | localhost:5502 | primary | draining
4895 13:08:13 | node2 | 0/2 | localhost:5502 | draining | draining
4896 13:08:13 | node1 | 0/1 | localhost:5501 | secondary | report_lsn
4897 13:08:13 | node3 | 0/3 | localhost:5503 | secondary | report_lsn
4898 13:08:19 | node3 | 0/3 | localhost:5503 | report_lsn | report_lsn
4899 13:08:19 | node1 | 0/1 | localhost:5501 | report_lsn | report_lsn
4900 13:08:19 | node1 | 0/1 | localhost:5501 | report_lsn | prepare_promotion
4901 13:08:19 | node1 | 0/1 | localhost:5501 | prepare_promotion | prepare_promotion
4902 13:08:19 | node1 | 0/1 | localhost:5501 | prepare_promotion | stop_replication
4903 13:08:19 | node2 | 0/2 | localhost:5502 | draining | demote_timeout
4904 13:08:19 | node3 | 0/3 | localhost:5503 | report_lsn | join_secondary
4905 13:08:19 | node2 | 0/2 | localhost:5502 | demote_timeout | demote_timeout
4906 13:08:19 | node3 | 0/3 | localhost:5503 | join_secondary | join_secondary
4907 13:08:20 | node1 | 0/1 | localhost:5501 | stop_replication | stop_replication
4908 13:08:20 | node1 | 0/1 | localhost:5501 | stop_replication | wait_primary
4909 13:08:20 | node2 | 0/2 | localhost:5502 | demote_timeout | demoted
4910 13:08:20 | node1 | 0/1 | localhost:5501 | wait_primary | wait_primary
4911 13:08:20 | node3 | 0/3 | localhost:5503 | join_secondary | secondary
4912 13:08:20 | node2 | 0/2 | localhost:5502 | demoted | demoted
4913 13:08:20 | node2 | 0/2 | localhost:5502 | demoted | catchingup
4914 13:08:21 | node3 | 0/3 | localhost:5503 | secondary | secondary
4915 13:08:21 | node1 | 0/1 | localhost:5501 | wait_primary | primary
4916 13:08:21 | node2 | 0/2 | localhost:5502 | catchingup | catchingup
4917 13:08:21 | node1 | 0/1 | localhost:5501 | primary | primary
4918
4919 $ pg_autoctl show state
4920 Name | Node | Host:Port | LSN | Connection | Current State | Assigned State
4921 ------+-------+----------------+-----------+--------------+---------------------+--------------------
4922 node1 | 1 | localhost:5501 | 0/40012F0 | read-write | primary | primary
4923 node2 | 2 | localhost:5502 | 0/40012F0 | read-only | secondary | secondary
4924 node3 | 3 | localhost:5503 | 0/40012F0 | read-only | secondary | secondary
4925
4926 pg_autoctl do
4927 pg_autoctl do - Internal commands and internal QA tooling
4928
4929 The debug commands for pg_autoctl are only available when the environ‐
4930 ment variable PG_AUTOCTL_DEBUG is set (to any value).
4931
4932 When testing pg_auto_failover, it is helpful to be able to play with
4933 the local nodes using the same lower-level API as used by the
4934 pg_auto_failover Finite State Machine transitions. Some commands could
4935 be useful in contexts other than pg_auto_failover development and QA
4936 work, so some documentation has been made available.
4937
4938 pg_autoctl do tmux
4939 pg_autoctl do tmux - Set of facilities to handle tmux interactive ses‐
4940 sions
4941
4942 Synopsis
4943 pg_autoctl do tmux provides the following commands:
4944
4945 pg_autoctl do tmux
4946 script Produce a tmux script for a demo or a test case (debug only)
4947 session Run a tmux session for a demo or a test case
4948 stop Stop pg_autoctl processes that belong to a tmux session
4949 wait Wait until a given node has been registered on the monitor
4950 clean Clean-up a tmux session processes and root dir
4951
4952 Description
4953 An easy way to get started with pg_auto_failover in a localhost only
4954 formation with three nodes is to run the following command:
4955
4956 $ PG_AUTOCTL_DEBUG=1 pg_autoctl do tmux session \
4957 --root /tmp/pgaf \
4958 --first-pgport 9000 \
4959 --nodes 4 \
4960 --layout tiled
4961
4962 This requires the command tmux to be available in your PATH. The pg_au‐
4963 toctl do tmux session commands prepares a self-contained root directory
4964 where to create pg_auto_failover nodes and their configuration, then
4965 prepares a tmux script, and then runs the script with a command such
4966 as:
4967
4968 /usr/local/bin/tmux -v start-server ; source-file /tmp/pgaf/script-9000.tmux
4969
4970 The tmux session contains a single tmux window multiple panes:
4971
4972 • one pane for the monitor
4973
4974 • one pane per Postgres nodes, here 4 of them
4975
4976 • one pane for running watch pg_autoctl show state
4977
4978 • one extra pane for an interactive shell.
4979
4980 Usually the first two commands to run in the interactive shell, once
4981 the formation is stable (one node is primary, the other ones are all
4982 secondary), are the following:
4983
4984 $ pg_autoctl get formation settings
4985 $ pg_autoctl perform failover
4986
4987 pg_autoctl do demo
4988 pg_autoctl do demo - Use a demo application for pg_auto_failover
4989
4990 Synopsis
4991 pg_autoctl do demo provides the following commands:
4992
4993 pg_autoctl do demo
4994 run Run the pg_auto_failover demo application
4995 uri Grab the application connection string from the monitor
4996 ping Attempt to connect to the application URI
4997 summary Display a summary of the previous demo app run
4998
4999 To run a demo, use pg_autoctl do demo run:
5000
5001 usage: pg_autoctl do demo run [option ...]
5002
5003 --monitor Postgres URI of the pg_auto_failover monitor
5004 --formation Formation to use (default)
5005 --group Group Id to failover (0)
5006 --username PostgreSQL's username
5007 --clients How many client processes to use (1)
5008 --duration Duration of the demo app, in seconds (30)
5009 --first-failover Timing of the first failover (10)
5010 --failover-freq Seconds between subsequent failovers (45)
5011
5012 Description
5013 The pg_autoctl debug tooling includes a demo application.
5014
5015 The demo prepare its Postgres schema on the target database, and then
5016 starts several clients (see --clients) that concurrently connect to the
5017 target application URI and record the time it took to establish the
5018 Postgres connection to the current read-write node, with information
5019 about the retry policy metrics.
5020
5021 Example
5022 $ pg_autoctl do demo run --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
5023 14:43:35 19660 INFO Using application connection string "postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer"
5024 14:43:35 19660 INFO Using Postgres user PGUSER "dim"
5025 14:43:35 19660 INFO Preparing demo schema: drop schema if exists demo cascade
5026 14:43:35 19660 WARN NOTICE: schema "demo" does not exist, skipping
5027 14:43:35 19660 INFO Preparing demo schema: create schema demo
5028 14:43:35 19660 INFO Preparing demo schema: create table demo.tracking(ts timestamptz default now(), client integer, loop integer, retries integer, us bigint, recovery bool)
5029 14:43:36 19660 INFO Preparing demo schema: create table demo.client(client integer, pid integer, retry_sleep_ms integer, retry_cap_ms integer, failover_count integer)
5030 14:43:36 19660 INFO Starting 10 concurrent clients as sub-processes
5031 14:43:36 19675 INFO Failover client is started, will failover in 10s and every 45s after that
5032 ...
5033
5034 $ pg_autoctl do demo summary --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
5035 14:44:27 22789 INFO Using application connection string "postgres://localhost:5503,localhost:5501,localhost:5502/demo?target_session_attrs=read-write&sslmode=prefer"
5036 14:44:27 22789 INFO Using Postgres user PGUSER "dim"
5037 14:44:27 22789 INFO Summary for the demo app running with 10 clients for 30s
5038 Client | Connections | Retries | Min Connect Time (ms) | max | p95 | p99
5039 ----------------------+-------------+---------+-----------------------+----------+---------+---------
5040 Client 1 | 136 | 14 | 58.318 | 2601.165 | 244.443 | 261.809
5041 Client 2 | 136 | 5 | 55.199 | 2514.968 | 242.362 | 259.282
5042 Client 3 | 134 | 6 | 55.815 | 2974.247 | 241.740 | 262.908
5043 Client 4 | 135 | 7 | 56.542 | 2970.922 | 238.995 | 251.177
5044 Client 5 | 136 | 8 | 58.339 | 2758.106 | 238.720 | 252.439
5045 Client 6 | 134 | 9 | 58.679 | 2813.653 | 244.696 | 254.674
5046 Client 7 | 134 | 11 | 58.737 | 2795.974 | 243.202 | 253.745
5047 Client 8 | 136 | 12 | 52.109 | 2354.952 | 242.664 | 254.233
5048 Client 9 | 137 | 19 | 59.735 | 2628.496 | 235.668 | 253.582
5049 Client 10 | 133 | 6 | 57.994 | 3060.489 | 242.156 | 256.085
5050 All Clients Combined | 1351 | 97 | 52.109 | 3060.489 | 241.848 | 258.450
5051 (11 rows)
5052
5053 Min Connect Time (ms) | max | freq | bar
5054 -----------------------+----------+------+-----------------------------------------------
5055 52.109 | 219.105 | 1093 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
5056 219.515 | 267.168 | 248 | ▒▒▒▒▒▒▒▒▒▒
5057 2354.952 | 2354.952 | 1 |
5058 2514.968 | 2514.968 | 1 |
5059 2601.165 | 2628.496 | 2 |
5060 2758.106 | 2813.653 | 3 |
5061 2970.922 | 2974.247 | 2 |
5062 3060.489 | 3060.489 | 1 |
5063 (8 rows)
5064
5065 pg_autoctl do service restart
5066 pg_autoctl do service restart - Run pg_autoctl sub-processes (services)
5067
5068 Synopsis
5069 pg_autoctl do service restart provides the following commands:
5070
5071 pg_autoctl do service restart
5072 postgres Restart the pg_autoctl postgres controller service
5073 listener Restart the pg_autoctl monitor listener service
5074 node-active Restart the pg_autoctl keeper node-active service
5075
5076 Description
5077 It is possible to restart the pg_autoctl or the Postgres service with‐
5078 out affecting the other running service. Typically, to restart the
5079 pg_autoctl parts without impacting Postgres:
5080
5081 $ pg_autoctl do service restart node-active --pgdata node1
5082 14:52:06 31223 INFO Sending the TERM signal to service "node-active" with pid 26626
5083 14:52:06 31223 INFO Service "node-active" has been restarted with pid 31230
5084 31230
5085
5086 The Postgres service has not been impacted by the restart of the pg_au‐
5087 toctl process.
5088
5089 pg_autoctl do show
5090 pg_autoctl do show - Show some debug level information
5091
5092 Synopsis
5093 The commands pg_autoctl create monitor and pg_autoctl create postgres
5094 both implement some level of automated detection of the node network
5095 settings when the option --hostname is not used.
5096
5097 Adding to those commands, when a new node is registered to the monitor,
5098 other nodes also edit their Postgres HBA rules to allow the new node to
5099 connect, unless the option --skip-pg-hba has been used.
5100
5101 The debug sub-commands for pg_autoctl do show can be used to see in de‐
5102 tails the network discovery done by pg_autoctl.
5103
5104 pg_autoctl do show provides the following commands:
5105
5106 pg_autoctl do show
5107 ipaddr Print this node's IP address information
5108 cidr Print this node's CIDR information
5109 lookup Print this node's DNS lookup information
5110 hostname Print this node's default hostname
5111 reverse Lookup given hostname and check reverse DNS setup
5112
5113 pg_autoctl do show ipaddr
5114 Connects to an external IP address and uses getsockname(2) to retrieve
5115 the current address to which the socket is bound.
5116
5117 The external IP address defaults to 8.8.8.8, the IP address of a Google
5118 provided public DNS server, or to the monitor IP address or hostname in
5119 the context of pg_autoctl create postgres.
5120
5121 $ pg_autoctl do show ipaddr
5122 16:42:40 62631 INFO ipaddr.c:107: Connecting to 8.8.8.8 (port 53)
5123 192.168.1.156
5124
5125 pg_autoctl do show cidr
5126 Connects to an external IP address in the same way as the previous com‐
5127 mand pg_autoctl do show ipaddr and then matches the local socket name
5128 with the list of local network interfaces. When a match is found, uses
5129 the netmask of the interface to compute the CIDR notation from the IP
5130 address.
5131
5132 The computed CIDR notation is then used in HBA rules.
5133
5134 $ pg_autoctl do show cidr
5135 16:43:19 63319 INFO Connecting to 8.8.8.8 (port 53)
5136 192.168.1.0/24
5137
5138 pg_autoctl do show hostname
5139 Uses either its first (and only) argument or the result of gethost‐
5140 name(2) as the candidate hostname to use in HBA rules, and then check
5141 that the hostname resolves to an IP address that belongs to one of the
5142 machine network interfaces.
5143
5144 When the hostname forward-dns lookup resolves to an IP address that is
5145 local to the node where the command is run, then a reverse-lookup from
5146 the IP address is made to see if it matches with the candidate host‐
5147 name.
5148
5149 $ pg_autoctl do show hostname
5150 DESKTOP-IC01GOOS.europe.corp.microsoft.com
5151
5152 $ pg_autoctl -vv do show hostname 'postgres://autoctl_node@localhost:5500/pg_auto_failover'
5153 13:45:00 93122 INFO cli_do_show.c:256: Using monitor hostname "localhost" and port 5500
5154 13:45:00 93122 INFO ipaddr.c:107: Connecting to ::1 (port 5500)
5155 13:45:00 93122 DEBUG cli_do_show.c:272: cli_show_hostname: ip ::1
5156 13:45:00 93122 DEBUG cli_do_show.c:283: cli_show_hostname: host localhost
5157 13:45:00 93122 DEBUG cli_do_show.c:294: cli_show_hostname: ip ::1
5158 localhost
5159
5160 pg_autoctl do show lookup
5161 Checks that the given argument is an hostname that resolves to a local
5162 IP address, that is an IP address associated with a local network in‐
5163 terface.
5164
5165 $ pg_autoctl do show lookup DESKTOP-IC01GOOS.europe.corp.microsoft.com
5166 DESKTOP-IC01GOOS.europe.corp.microsoft.com: 192.168.1.156
5167
5168 pg_autoctl do show reverse
5169 Implements the same DNS checks as Postgres HBA matching code: first
5170 does a forward DNS lookup of the given hostname, and then a re‐
5171 verse-lookup from all the IP addresses obtained. Success is reached
5172 when at least one of the IP addresses from the forward lookup resolves
5173 back to the given hostname (as the first answer to the reverse DNS
5174 lookup).
5175
5176 $ pg_autoctl do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
5177 16:44:49 64910 FATAL Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
5178 16:44:49 64910 INFO Continuing with IP address "192.168.1.156"
5179
5180 $ pg_autoctl -vv do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
5181 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
5182 16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
5183 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
5184 16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
5185 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
5186 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
5187 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
5188 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
5189 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
5190 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
5191 16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
5192 16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
5193 16:44:45 64832 FATAL cli_do_show.c:333: Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
5194 16:44:45 64832 INFO cli_do_show.c:334: Continuing with IP address "192.168.1.156"
5195
5196 pg_autoctl do pgsetup
5197 pg_autoctl do pgsetup - Manage a local Postgres setup
5198
5199 Synopsis
5200 The main pg_autoctl commands implement low-level management tooling for
5201 a local Postgres instance. Some of the low-level Postgres commands can
5202 be used as their own tool in some cases.
5203
5204 pg_autoctl do pgsetup provides the following commands:
5205
5206 pg_autoctl do pgsetup
5207 pg_ctl Find a non-ambiguous pg_ctl program and Postgres version
5208 discover Discover local PostgreSQL instance, if any
5209 ready Return true is the local Postgres server is ready
5210 wait Wait until the local Postgres server is ready
5211 logs Outputs the Postgres startup logs
5212 tune Compute and log some Postgres tuning options
5213
5214 pg_autoctl do pgsetup pg_ctl
5215 In a similar way to which -a, this commands scans your PATH for pg_ctl
5216 commands. Then it runs the pg_ctl --version command and parses the out‐
5217 put to determine the version of Postgres that is available in the path.
5218
5219 $ pg_autoctl do pgsetup pg_ctl --pgdata node1
5220 16:49:18 69684 INFO Environment variable PG_CONFIG is set to "/Applications/Postgres.app//Contents/Versions/12/bin/pg_config"
5221 16:49:18 69684 INFO `pg_autoctl create postgres` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
5222 16:49:18 69684 INFO `pg_autoctl create monitor` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
5223
5224 pg_autoctl do pgsetup discover
5225 Given a PGDATA or --pgdata option, the command discovers if a running
5226 Postgres service matches the pg_autoctl setup, and prints the informa‐
5227 tion that pg_autoctl typically needs when managing a Postgres instance.
5228
5229 $ pg_autoctl do pgsetup discover --pgdata node1
5230 pgdata: /Users/dim/dev/MS/pg_auto_failover/tmux/node1
5231 pg_ctl: /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
5232 pg_version: 12.3
5233 pghost: /tmp
5234 pgport: 5501
5235 proxyport: 0
5236 pid: 21029
5237 is in recovery: no
5238 Control Version: 1201
5239 Catalog Version: 201909212
5240 System Identifier: 6942422768095393833
5241 Latest checkpoint LSN: 0/4059C18
5242 Postmaster status: ready
5243
5244 pg_autoctl do pgsetup ready
5245 Similar to the pg_isready command, though uses the Postgres specifica‐
5246 tions found in the pg_autoctl node setup.
5247
5248 $ pg_autoctl do pgsetup ready --pgdata node1
5249 16:50:08 70582 INFO Postgres status is: "ready"
5250
5251 pg_autoctl do pgsetup wait
5252 When pg_autoctl do pgsetup ready would return false because Postgres is
5253 not ready yet, this command continues probing every second for 30 sec‐
5254 onds, and exists as soon as Postgres is ready.
5255
5256 $ pg_autoctl do pgsetup wait --pgdata node1
5257 16:50:22 70829 INFO Postgres is now serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 21029
5258 16:50:22 70829 INFO Postgres status is: "ready"
5259
5260 pg_autoctl do pgsetup logs
5261 Outputs the Postgres logs from the most recent log file in the PG‐
5262 DATA/log directory.
5263
5264 $ pg_autoctl do pgsetup logs --pgdata node1
5265 16:50:39 71126 WARN Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/startup.log":
5266 16:50:39 71126 INFO 2021-03-22 14:43:48.911 CET [21029] LOG: starting PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
5267 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on IPv6 address "::", port 5501
5268 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on IPv4 address "0.0.0.0", port 5501
5269 16:50:39 71126 INFO 2021-03-22 14:43:48.913 CET [21029] LOG: listening on Unix socket "/tmp/.s.PGSQL.5501"
5270 16:50:39 71126 INFO 2021-03-22 14:43:48.931 CET [21029] LOG: redirecting log output to logging collector process
5271 16:50:39 71126 INFO 2021-03-22 14:43:48.931 CET [21029] HINT: Future log output will appear in directory "log".
5272 16:50:39 71126 WARN Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/log/postgresql-2021-03-22_144348.log":
5273 16:50:39 71126 INFO 2021-03-22 14:43:48.937 CET [21033] LOG: database system was shut down at 2021-03-22 14:43:46 CET
5274 16:50:39 71126 INFO 2021-03-22 14:43:48.937 CET [21033] LOG: entering standby mode
5275 16:50:39 71126 INFO 2021-03-22 14:43:48.942 CET [21033] LOG: consistent recovery state reached at 0/4022E88
5276 16:50:39 71126 INFO 2021-03-22 14:43:48.942 CET [21033] LOG: invalid record length at 0/4022E88: wanted 24, got 0
5277 16:50:39 71126 INFO 2021-03-22 14:43:48.946 CET [21029] LOG: database system is ready to accept read only connections
5278 16:50:39 71126 INFO 2021-03-22 14:43:49.032 CET [21038] LOG: fetching timeline history file for timeline 4 from primary server
5279 16:50:39 71126 INFO 2021-03-22 14:43:49.037 CET [21038] LOG: started streaming WAL from primary at 0/4000000 on timeline 3
5280 16:50:39 71126 INFO 2021-03-22 14:43:49.046 CET [21038] LOG: replication terminated by primary server
5281 16:50:39 71126 INFO 2021-03-22 14:43:49.046 CET [21038] DETAIL: End of WAL reached on timeline 3 at 0/4022E88.
5282 16:50:39 71126 INFO 2021-03-22 14:43:49.047 CET [21033] LOG: new target timeline is 4
5283 16:50:39 71126 INFO 2021-03-22 14:43:49.049 CET [21038] LOG: restarted WAL streaming at 0/4000000 on timeline 4
5284 16:50:39 71126 INFO 2021-03-22 14:43:49.210 CET [21033] LOG: redo starts at 0/4022E88
5285 16:50:39 71126 INFO 2021-03-22 14:52:06.692 CET [21029] LOG: received SIGHUP, reloading configuration files
5286 16:50:39 71126 INFO 2021-03-22 14:52:06.906 CET [21029] LOG: received SIGHUP, reloading configuration files
5287 16:50:39 71126 FATAL 2021-03-22 15:34:24.920 CET [21038] FATAL: terminating walreceiver due to timeout
5288 16:50:39 71126 INFO 2021-03-22 15:34:24.973 CET [21033] LOG: invalid record length at 0/4059CC8: wanted 24, got 0
5289 16:50:39 71126 INFO 2021-03-22 15:34:25.105 CET [35801] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
5290 16:50:39 71126 FATAL 2021-03-22 16:12:56.918 CET [35801] FATAL: terminating walreceiver due to timeout
5291 16:50:39 71126 INFO 2021-03-22 16:12:57.086 CET [38741] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
5292 16:50:39 71126 FATAL 2021-03-22 16:23:39.349 CET [38741] FATAL: terminating walreceiver due to timeout
5293 16:50:39 71126 INFO 2021-03-22 16:23:39.497 CET [41635] LOG: started streaming WAL from primary at 0/4000000 on timeline 4
5294
5295 pg_autoctl do pgsetup tune
5296 Outputs the pg_autoclt automated tuning options. Depending on the num‐
5297 ber of CPU and amount of RAM detected in the environment where it is
5298 run, pg_autoctl can adjust some very basic Postgres tuning knobs to get
5299 started.
5300
5301 $ pg_autoctl do pgsetup tune --pgdata node1 -vv
5302 13:25:25 77185 DEBUG pgtuning.c:85: Detected 12 CPUs and 16 GB total RAM on this server
5303 13:25:25 77185 DEBUG pgtuning.c:225: Setting autovacuum_max_workers to 3
5304 13:25:25 77185 DEBUG pgtuning.c:228: Setting shared_buffers to 4096 MB
5305 13:25:25 77185 DEBUG pgtuning.c:231: Setting work_mem to 24 MB
5306 13:25:25 77185 DEBUG pgtuning.c:235: Setting maintenance_work_mem to 512 MB
5307 13:25:25 77185 DEBUG pgtuning.c:239: Setting effective_cache_size to 12 GB
5308 # basic tuning computed by pg_auto_failover
5309 track_functions = pl
5310 shared_buffers = '4096 MB'
5311 work_mem = '24 MB'
5312 maintenance_work_mem = '512 MB'
5313 effective_cache_size = '12 GB'
5314 autovacuum_max_workers = 3
5315 autovacuum_vacuum_scale_factor = 0.08
5316 autovacuum_analyze_scale_factor = 0.02
5317
5318 The low-level API is made available through the following pg_autoctl do
5319 commands, only available in debug environments:
5320
5321 pg_autoctl do
5322 + monitor Query a pg_auto_failover monitor
5323 + fsm Manually manage the keeper's state
5324 + primary Manage a PostgreSQL primary server
5325 + standby Manage a PostgreSQL standby server
5326 + show Show some debug level information
5327 + pgsetup Manage a local Postgres setup
5328 + pgctl Signal the pg_autoctl postgres service
5329 + service Run pg_autoctl sub-processes (services)
5330 + tmux Set of facilities to handle tmux interactive sessions
5331 + azure Manage a set of Azure resources for a pg_auto_failover demo
5332 + demo Use a demo application for pg_auto_failover
5333
5334 pg_autoctl do monitor
5335 + get Get information from the monitor
5336 register Register the current node with the monitor
5337 active Call in the pg_auto_failover Node Active protocol
5338 version Check that monitor version is 1.5.0.1; alter extension update if not
5339 parse-notification parse a raw notification message
5340
5341 pg_autoctl do monitor get
5342 primary Get the primary node from pg_auto_failover in given formation/group
5343 others Get the other nodes from the pg_auto_failover group of hostname/port
5344 coordinator Get the coordinator node from the pg_auto_failover formation
5345
5346 pg_autoctl do fsm
5347 init Initialize the keeper's state on-disk
5348 state Read the keeper's state from disk and display it
5349 list List reachable FSM states from current state
5350 gv Output the FSM as a .gv program suitable for graphviz/dot
5351 assign Assign a new goal state to the keeper
5352 step Make a state transition if instructed by the monitor
5353 + nodes Manually manage the keeper's nodes list
5354
5355 pg_autoctl do fsm nodes
5356 get Get the list of nodes from file (see --disable-monitor)
5357 set Set the list of nodes to file (see --disable-monitor)
5358
5359 pg_autoctl do primary
5360 + slot Manage replication slot on the primary server
5361 + adduser Create users on primary
5362 defaults Add default settings to postgresql.conf
5363 identify Run the IDENTIFY_SYSTEM replication command on given host
5364
5365 pg_autoctl do primary slot
5366 create Create a replication slot on the primary server
5367 drop Drop a replication slot on the primary server
5368
5369 pg_autoctl do primary adduser
5370 monitor add a local user for queries from the monitor
5371 replica add a local user with replication privileges
5372
5373 pg_autoctl do standby
5374 init Initialize the standby server using pg_basebackup
5375 rewind Rewind a demoted primary server using pg_rewind
5376 promote Promote a standby server to become writable
5377
5378 pg_autoctl do show
5379 ipaddr Print this node's IP address information
5380 cidr Print this node's CIDR information
5381 lookup Print this node's DNS lookup information
5382 hostname Print this node's default hostname
5383 reverse Lookup given hostname and check reverse DNS setup
5384
5385 pg_autoctl do pgsetup
5386 pg_ctl Find a non-ambiguous pg_ctl program and Postgres version
5387 discover Discover local PostgreSQL instance, if any
5388 ready Return true is the local Postgres server is ready
5389 wait Wait until the local Postgres server is ready
5390 logs Outputs the Postgres startup logs
5391 tune Compute and log some Postgres tuning options
5392
5393 pg_autoctl do pgctl
5394 on Signal pg_autoctl postgres service to ensure Postgres is running
5395 off Signal pg_autoctl postgres service to ensure Postgres is stopped
5396
5397 pg_autoctl do service
5398 + getpid Get the pid of pg_autoctl sub-processes (services)
5399 + restart Restart pg_autoctl sub-processes (services)
5400 pgcontroller pg_autoctl supervised postgres controller
5401 postgres pg_autoctl service that start/stop postgres when asked
5402 listener pg_autoctl service that listens to the monitor notifications
5403 node-active pg_autoctl service that implements the node active protocol
5404
5405 pg_autoctl do service getpid
5406 postgres Get the pid of the pg_autoctl postgres controller service
5407 listener Get the pid of the pg_autoctl monitor listener service
5408 node-active Get the pid of the pg_autoctl keeper node-active service
5409
5410 pg_autoctl do service restart
5411 postgres Restart the pg_autoctl postgres controller service
5412 listener Restart the pg_autoctl monitor listener service
5413 node-active Restart the pg_autoctl keeper node-active service
5414
5415 pg_autoctl do tmux
5416 script Produce a tmux script for a demo or a test case (debug only)
5417 session Run a tmux session for a demo or a test case
5418 stop Stop pg_autoctl processes that belong to a tmux session
5419 wait Wait until a given node has been registered on the monitor
5420 clean Clean-up a tmux session processes and root dir
5421
5422 pg_autoctl do azure
5423 + provision provision azure resources for a pg_auto_failover demo
5424 + tmux Run a tmux session with an Azure setup for QA/testing
5425 + show show azure resources for a pg_auto_failover demo
5426 deploy Deploy a pg_autoctl VMs, given by name
5427 create Create an azure QA environment
5428 drop Drop an azure QA environment: resource group, network, VMs
5429 ls List resources in a given azure region
5430 ssh Runs ssh -l ha-admin <public ip address> for a given VM name
5431 sync Rsync pg_auto_failover sources on all the target region VMs
5432
5433 pg_autoctl do azure provision
5434 region Provision an azure region: resource group, network, VMs
5435 nodes Provision our pre-created VM with pg_autoctl Postgres nodes
5436
5437 pg_autoctl do azure tmux
5438 session Create or attach a tmux session for the created Azure VMs
5439 kill Kill an existing tmux session for Azure VMs
5440
5441 pg_autoctl do azure show
5442 ips Show public and private IP addresses for selected VMs
5443 state Connect to the monitor node to show the current state
5444
5445 pg_autoctl do demo
5446 run Run the pg_auto_failover demo application
5447 uri Grab the application connection string from the monitor
5448 ping Attempt to connect to the application URI
5449 summary Display a summary of the previous demo app run
5450
5451 pg_autoctl run
5452 pg_autoctl run - Run the pg_autoctl service (monitor or keeper)
5453
5454 Synopsis
5455 This commands starts the processes needed to run a monitor node or a
5456 keeper node, depending on the configuration file that belongs to the
5457 --pgdata option or PGDATA environment variable.
5458
5459 usage: pg_autoctl run [ --pgdata --name --hostname --pgport ]
5460
5461 --pgdata path to data directory
5462 --name pg_auto_failover node name
5463 --hostname hostname used to connect from other nodes
5464 --pgport PostgreSQL's port number
5465
5466 Description
5467 When registering Postgres nodes to the pg_auto_failover monitor using
5468 the pg_autoctl create postgres command, the nodes are registered with
5469 metadata: the node name, hostname and Postgres port.
5470
5471 The node name is used mostly in the logs and pg_autoctl show state com‐
5472 mands and helps human administrators of the formation.
5473
5474 The node hostname and pgport are used by other nodes, including the
5475 pg_auto_failover monitor, to open a Postgres connection.
5476
5477 Both the node name and the node hostname and port can be changed after
5478 the node registration by using either this command (pg_autoctl run) or
5479 the pg_autoctl config set command.
5480
5481 Options
5482 --pgdata
5483 Location of the Postgres node being managed locally. Defaults to
5484 the environment variable PGDATA. Use --monitor to connect to a
5485 monitor from anywhere, rather than the monitor URI used by a lo‐
5486 cal Postgres node managed with pg_autoctl.
5487
5488 --name Node name used on the monitor to refer to this node. The host‐
5489 name is a technical information, and given Postgres requirements
5490 on the HBA setup and DNS resolution (both forward and reverse
5491 lookups), IP addresses are often used for the hostname.
5492
5493 The --name option allows using a user-friendly name for your
5494 Postgres nodes.
5495
5496 --hostname
5497 Hostname or IP address (both v4 and v6 are supported) to use
5498 from any other node to connect to this node.
5499
5500 When not provided, a default value is computed by running the
5501 following algorithm.
5502
5503 1. We get this machine's "public IP" by opening a connection
5504 to the given monitor hostname or IP address. Then we get
5505 TCP/IP client address that has been used to make that con‐
5506 nection.
5507
5508 2. We then do a reverse DNS lookup on the IP address found in
5509 the previous step to fetch a hostname for our local ma‐
5510 chine.
5511
5512 3. If the reverse DNS lookup is successful , then pg_autoctl
5513 does a forward DNS lookup of that hostname.
5514
5515 When the forward DNS lookup response in step 3. is an IP address
5516 found in one of our local network interfaces, then pg_autoctl
5517 uses the hostname found in step 2. as the default --hostname.
5518 Otherwise it uses the IP address found in step 1.
5519
5520 You may use the --hostname command line option to bypass the
5521 whole DNS lookup based process and force the local node name to
5522 a fixed value.
5523
5524 --pgport
5525 Postgres port to use, defaults to 5432.
5526
5527 pg_autoctl watch
5528 pg_autoctl watch - Display an auto-updating dashboard
5529
5530 Synopsis
5531 This command outputs the events that the pg_auto_failover events
5532 records about state changes of the pg_auto_failover nodes managed by
5533 the monitor:
5534
5535 usage: pg_autoctl watch [ --pgdata --formation --group ]
5536
5537 --pgdata path to data directory
5538 --monitor show the monitor uri
5539 --formation formation to query, defaults to 'default'
5540 --group group to query formation, defaults to all
5541 --json output data in the JSON format
5542
5543 Options
5544 --pgdata
5545 Location of the Postgres node being managed locally. Defaults to
5546 the environment variable PGDATA. Use --monitor to connect to a
5547 monitor from anywhere, rather than the monitor URI used by a lo‐
5548 cal Postgres node managed with pg_autoctl.
5549
5550 --monitor
5551 Postgres URI used to connect to the monitor. Must use the au‐
5552 toctl_node username and target the pg_auto_failover database
5553 name. It is possible to show the Postgres URI from the monitor
5554 node using the command pg_autoctl show uri.
5555
5556 --formation
5557 List the events recorded for nodes in the given formation. De‐
5558 faults to default.
5559
5560 --group
5561 Limit output to a single group in the formation. Default to in‐
5562 cluding all groups registered in the target formation.
5563
5564 Description
5565 The pg_autoctl watch output is divided in 3 sections.
5566
5567 The first section is a single header line which includes the name of
5568 the currently selected formation, the formation replication setting
5569 Number Sync Standbys, and then in the right most position the current
5570 time.
5571
5572 The second section displays one line per node, and each line contains a
5573 list of columns that describe the current state for the node. This list
5574 can includes the following columns, and which columns are part of the
5575 output depends on the terminal window size. This choice is dynamic and
5576 changes if your terminal window size changes:
5577
5578 • Name
5579 Name of the node.
5580
5581 • Node, or Id
5582 Node information. When the formation has a single group (group
5583 zero), then this column only contains the nodeId.
5584
5585 Only Citus formations allow several groups. When using a Citus
5586 formation the Node column contains the groupId and the nodeId,
5587 separated by a colon, such as 0:1 for the first coordinator
5588 node.
5589
5590 • Last Report, or Report
5591 Time interval between now and the last known time when a node
5592 has reported to the monitor, using the node_active protocol.
5593
5594 This value is expected to stay under 2s or abouts, and is
5595 known to increment when either the pg_autoctl run service is
5596 not running, or when there is a network split.
5597
5598 • Last Check, or Check
5599 Time inverval between now and the last known time when the
5600 monitor could connect to a node's Postgres instance, via its
5601 health check mechanism.
5602
5603 This value is known to increment when either the Postgres ser‐
5604 vice is not running on the target node, when there is a net‐
5605 work split, or when the internal machinery (the health check
5606 worker background process) implements jitter.
5607
5608 • Host:Port
5609 Hostname and port number used to connect to the node.
5610
5611 • TLI: LSN
5612 Timeline identifier (TLI) and Postgres Log Sequence Number
5613 (LSN).
5614
5615 The LSN is the current position in the Postgres WAL stream.
5616 This is a hexadecimal number. See pg_lsn for more information.
5617
5618 The current timeline is incremented each time a failover hap‐
5619 pens, or when doing Point In Time Recovery. A node can only
5620 reach the secondary state when it is on the same timeline as
5621 its primary node.
5622
5623 • Connection
5624 This output field contains two bits of information. First, the
5625 Postgres connection type that the node provides, either
5626 read-write or read-only. Then the mark ! is added when the
5627 monitor has failed to connect to this node, and ? when the
5628 monitor didn't connect to the node yet.
5629
5630 • Reported State
5631 The current FSM state as reported to the monitor by the pg_au‐
5632 toctl process running on the Postgres node.
5633
5634 • Assigned State
5635 The assigned FSM state on the monitor. When the assigned state
5636 is not the same as the reported start, then the pg_autoctl
5637 process running on the Postgres node might have not retrieved
5638 the assigned state yet, or might still be implementing the FSM
5639 transition from the current state to the assigned state.
5640
5641 The third and last section lists the most recent events that the moni‐
5642 tor has registered, the more recent event is found at the bottom of the
5643 screen.
5644
5645 To quit the command hit either the F1 key or the q key.
5646
5647 pg_autoctl stop
5648 pg_autoctl stop - signal the pg_autoctl service for it to stop
5649
5650 Synopsis
5651 This commands stops the processes needed to run a monitor node or a
5652 keeper node, depending on the configuration file that belongs to the
5653 --pgdata option or PGDATA environment variable.
5654
5655 usage: pg_autoctl stop [ --pgdata --fast --immediate ]
5656
5657 --pgdata path to data directory
5658 --fast fast shutdown mode for the keeper
5659 --immediate immediate shutdown mode for the keeper
5660
5661 Description
5662 The pg_autoctl stop commands finds the PID of the running service for
5663 the given --pgdata, and if the process is still running, sends a
5664 SIGTERM signal to the process.
5665
5666 When pg_autoclt receives a shutdown signal a shutdown sequence is trig‐
5667 gered. Depending on the signal received, an operation that has been
5668 started (such as a state transition) is either run to completion,
5669 stopped as the next opportunity, or stopped immediately even when in
5670 the middle of the transition.
5671
5672 Options
5673 --pgdata
5674 Location of the Postgres node being managed locally. Defaults to
5675 the environment variable PGDATA. Use --monitor to connect to a
5676 monitor from anywhere, rather than the monitor URI used by a lo‐
5677 cal Postgres node managed with pg_autoctl.
5678
5679 --fast Fast Shutdown mode for pg_autoctl. Sends the SIGINT signal to
5680 the running service, which is the same as using C-c on an inter‐
5681 active process running as a foreground shell job.
5682
5683 --immediate
5684 Immediate Shutdown mode for pg_autoctl. Sends the SIGQUIT signal
5685 to the running service.
5686
5687 pg_autoctl reload
5688 pg_autoctl reload - signal the pg_autoctl for it to reload its configu‐
5689 ration
5690
5691 Synopsis
5692 This commands signals a running pg_autoctl process to reload its con‐
5693 figuration from disk, and also signal the managed Postgres service to
5694 reload its configuration.
5695
5696 usage: pg_autoctl reload [ --pgdata ] [ --json ]
5697
5698 --pgdata path to data directory
5699
5700 Description
5701 The pg_autoctl reload commands finds the PID of the running service for
5702 the given --pgdata, and if the process is still running, sends a SIGHUP
5703 signal to the process.
5704
5705 Options
5706 --pgdata
5707 Location of the Postgres node being managed locally. Defaults to
5708 the environment variable PGDATA. Use --monitor to connect to a
5709 monitor from anywhere, rather than the monitor URI used by a lo‐
5710 cal Postgres node managed with pg_autoctl.
5711
5712 pg_autoctl status
5713 pg_autoctl status - Display the current status of the pg_autoctl ser‐
5714 vice
5715
5716 Synopsis
5717 This commands outputs the current process status for the pg_autoctl
5718 service running for the given --pgdata location.
5719
5720 usage: pg_autoctl status [ --pgdata ] [ --json ]
5721
5722 --pgdata path to data directory
5723 --json output data in the JSON format
5724
5725 Options
5726 --pgdata
5727 Location of the Postgres node being managed locally. Defaults to
5728 the environment variable PGDATA. Use --monitor to connect to a
5729 monitor from anywhere, rather than the monitor URI used by a lo‐
5730 cal Postgres node managed with pg_autoctl.
5731
5732 --json Output a JSON formated data instead of a table formatted list.
5733
5734 Example
5735 $ pg_autoctl status --pgdata node1
5736 11:26:30 27248 INFO pg_autoctl is running with pid 26618
5737 11:26:30 27248 INFO Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
5738
5739 $ pg_autoctl status --pgdata node1 --json
5740 11:26:37 27385 INFO pg_autoctl is running with pid 26618
5741 11:26:37 27385 INFO Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
5742 {
5743 "postgres": {
5744 "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
5745 "pg_ctl": "\/Applications\/Postgres.app\/Contents\/Versions\/12\/bin\/pg_ctl",
5746 "version": "12.3",
5747 "host": "\/tmp",
5748 "port": 5501,
5749 "proxyport": 0,
5750 "pid": 26725,
5751 "in_recovery": false,
5752 "control": {
5753 "version": 0,
5754 "catalog_version": 0,
5755 "system_identifier": "0"
5756 },
5757 "postmaster": {
5758 "status": "ready"
5759 }
5760 },
5761 "pg_autoctl": {
5762 "pid": 26618,
5763 "status": "running",
5764 "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
5765 "version": "1.5.0",
5766 "semId": 196609,
5767 "services": [
5768 {
5769 "name": "postgres",
5770 "pid": 26625,
5771 "status": "running",
5772 "version": "1.5.0",
5773 "pgautofailover": "1.5.0.1"
5774 },
5775 {
5776 "name": "node-active",
5777 "pid": 26626,
5778 "status": "running",
5779 "version": "1.5.0",
5780 "pgautofailover": "1.5.0.1"
5781 }
5782 ]
5783 }
5784 }
5785
5787 Several defaults settings of pg_auto_failover can be reviewed and
5788 changed depending on the trade-offs you want to implement in your own
5789 production setup. The settings that you can change will have an impact
5790 of the following operations:
5791
5792 • Deciding when to promote the secondary
5793
5794 pg_auto_failover decides to implement a failover to the secondary
5795 node when it detects that the primary node is unhealthy. Changing
5796 the following settings will have an impact on when the
5797 pg_auto_failover monitor decides to promote the secondary Post‐
5798 greSQL node:
5799
5800 pgautofailover.health_check_max_retries
5801 pgautofailover.health_check_period
5802 pgautofailover.health_check_retry_delay
5803 pgautofailover.health_check_timeout
5804 pgautofailover.node_considered_unhealthy_timeout
5805
5806 • Time taken to promote the secondary
5807
5808 At secondary promotion time, pg_auto_failover waits for the fol‐
5809 lowing timeout to make sure that all pending writes on the primary
5810 server made it to the secondary at shutdown time, thus preventing
5811 data loss.:
5812
5813 pgautofailover.primary_demote_timeout
5814
5815 • Preventing promotion of the secondary
5816
5817 pg_auto_failover implements a trade-off where data availability
5818 trumps service availability. When the primary node of a PostgreSQL
5819 service is detected unhealthy, the secondary is only promoted if
5820 it was known to be eligible at the moment when the primary is
5821 lost.
5822
5823 In the case when synchronous replication was in use at the moment
5824 when the primary node is lost, then we know we can switch to the
5825 secondary safely, and the wal lag is 0 in that case.
5826
5827 In the case when the secondary server had been detected unhealthy
5828 before, then the pg_auto_failover monitor switches it from the
5829 state SECONDARY to the state CATCHING-UP and promotion is pre‐
5830 vented then.
5831
5832 The following setting allows to still promote the secondary, al‐
5833 lowing for a window of data loss:
5834
5835 pgautofailover.promote_wal_log_threshold
5836
5837 pg_auto_failover Monitor
5838 The configuration for the behavior of the monitor happens in the Post‐
5839 greSQL database where the extension has been deployed:
5840
5841 pg_auto_failover=> select name, setting, unit, short_desc from pg_settings where name ~ 'pgautofailover.';
5842 -[ RECORD 1 ]----------------------------------------------------------------------------------------------------
5843 name | pgautofailover.enable_sync_wal_log_threshold
5844 setting | 16777216
5845 unit |
5846 short_desc | Don't enable synchronous replication until secondary xlog is within this many bytes of the primary's
5847 -[ RECORD 2 ]----------------------------------------------------------------------------------------------------
5848 name | pgautofailover.health_check_max_retries
5849 setting | 2
5850 unit |
5851 short_desc | Maximum number of re-tries before marking a node as failed.
5852 -[ RECORD 3 ]----------------------------------------------------------------------------------------------------
5853 name | pgautofailover.health_check_period
5854 setting | 5000
5855 unit | ms
5856 short_desc | Duration between each check (in milliseconds).
5857 -[ RECORD 4 ]----------------------------------------------------------------------------------------------------
5858 name | pgautofailover.health_check_retry_delay
5859 setting | 2000
5860 unit | ms
5861 short_desc | Delay between consecutive retries.
5862 -[ RECORD 5 ]----------------------------------------------------------------------------------------------------
5863 name | pgautofailover.health_check_timeout
5864 setting | 5000
5865 unit | ms
5866 short_desc | Connect timeout (in milliseconds).
5867 -[ RECORD 6 ]----------------------------------------------------------------------------------------------------
5868 name | pgautofailover.node_considered_unhealthy_timeout
5869 setting | 20000
5870 unit | ms
5871 short_desc | Mark node unhealthy if last ping was over this long ago
5872 -[ RECORD 7 ]----------------------------------------------------------------------------------------------------
5873 name | pgautofailover.primary_demote_timeout
5874 setting | 30000
5875 unit | ms
5876 short_desc | Give the primary this long to drain before promoting the secondary
5877 -[ RECORD 8 ]----------------------------------------------------------------------------------------------------
5878 name | pgautofailover.promote_wal_log_threshold
5879 setting | 16777216
5880 unit |
5881 short_desc | Don't promote secondary unless xlog is with this many bytes of the master
5882 -[ RECORD 9 ]----------------------------------------------------------------------------------------------------
5883 name | pgautofailover.startup_grace_period
5884 setting | 10000
5885 unit | ms
5886 short_desc | Wait for at least this much time after startup before initiating a failover.
5887
5888 You can edit the parameters as usual with PostgreSQL, either in the
5889 postgresql.conf file or using ALTER DATABASE pg_auto_failover SET pa‐
5890 rameter = value; commands, then issuing a reload.
5891
5892 pg_auto_failover Keeper Service
5893 For an introduction to the pg_autoctl commands relevant to the
5894 pg_auto_failover Keeper configuration, please see pg_autoctl config.
5895
5896 An example configuration file looks like the following:
5897
5898 [pg_autoctl]
5899 role = keeper
5900 monitor = postgres://autoctl_node@192.168.1.34:6000/pg_auto_failover
5901 formation = default
5902 group = 0
5903 hostname = node1.db
5904 nodekind = standalone
5905
5906 [postgresql]
5907 pgdata = /data/pgsql/
5908 pg_ctl = /usr/pgsql-10/bin/pg_ctl
5909 dbname = postgres
5910 host = /tmp
5911 port = 5000
5912
5913 [replication]
5914 slot = pgautofailover_standby
5915 maximum_backup_rate = 100M
5916 backup_directory = /data/backup/node1.db
5917
5918 [timeout]
5919 network_partition_timeout = 20
5920 postgresql_restart_failure_timeout = 20
5921 postgresql_restart_failure_max_retries = 3
5922
5923 To output, edit and check entries of the configuration, the following
5924 commands are provided:
5925
5926 pg_autoctl config check [--pgdata <pgdata>]
5927 pg_autoctl config get [--pgdata <pgdata>] section.option
5928 pg_autoctl config set [--pgdata <pgdata>] section.option value
5929
5930 The [postgresql] section is discovered automatically by the pg_autoctl
5931 command and is not intended to be changed manually.
5932
5933 pg_autoctl.monitor
5934
5935 PostgreSQL service URL of the pg_auto_failover monitor, as given in the
5936 output of the pg_autoctl show uri command.
5937
5938 pg_autoctl.formation
5939
5940 A single pg_auto_failover monitor may handle several postgres forma‐
5941 tions. The default formation name default is usually fine.
5942
5943 pg_autoctl.group
5944
5945 This information is retrieved by the pg_auto_failover keeper when reg‐
5946 istering a node to the monitor, and should not be changed afterwards.
5947 Use at your own risk.
5948
5949 pg_autoctl.hostname
5950
5951 Node hostname used by all the other nodes in the cluster to contact
5952 this node. In particular, if this node is a primary then its standby
5953 uses that address to setup streaming replication.
5954
5955 replication.slot
5956
5957 Name of the PostgreSQL replication slot used in the streaming replica‐
5958 tion setup automatically deployed by pg_auto_failover. Replication
5959 slots can't be renamed in PostgreSQL.
5960
5961 replication.maximum_backup_rate
5962
5963 When pg_auto_failover (re-)builds a standby node using the pg_base‐
5964 backup command, this parameter is given to pg_basebackup to throttle
5965 the network bandwidth used. Defaults to 100Mbps.
5966
5967 replication.backup_directory
5968
5969 When pg_auto_failover (re-)builds a standby node using the pg_base‐
5970 backup command, this parameter is the target directory where to copy
5971 the bits from the primary server. When the copy has been successful,
5972 then the directory is renamed to postgresql.pgdata.
5973
5974 The default value is computed from ${PGDATA}/../backup/${hostname} and
5975 can be set to any value of your preference. Remember that the directory
5976 renaming is an atomic operation only when both the source and the tar‐
5977 get of the copy are in the same filesystem, at least in Unix systems.
5978
5979 timeout
5980
5981 This section allows to setup the behavior of the pg_auto_failover
5982 keeper in interesting scenarios.
5983
5984 timeout.network_partition_timeout
5985
5986 Timeout in seconds before we consider failure to communicate with other
5987 nodes indicates a network partition. This check is only done on a PRI‐
5988 MARY server, so other nodes mean both the monitor and the standby.
5989
5990 When a PRIMARY node is detected to be on the losing side of a network
5991 partition, the pg_auto_failover keeper enters the DEMOTE state and
5992 stops the PostgreSQL instance in order to protect against split brain
5993 situations.
5994
5995 The default is 20s.
5996
5997 timeout.postgresql_restart_failure_timeout
5998
5999 timeout.postgresql_restart_failure_max_retries
6000
6001 When PostgreSQL is not running, the first thing the pg_auto_failover
6002 keeper does is try to restart it. In case of a transient failure (e.g.
6003 file system is full, or other dynamic OS resource constraint), the best
6004 course of action is to try again for a little while before reaching out
6005 to the monitor and ask for a failover.
6006
6007 The pg_auto_failover keeper tries to restart PostgreSQL timeout.post‐
6008 gresql_restart_failure_max_retries times in a row (default 3) or up to
6009 timeout.postgresql_restart_failure_timeout (defaults 20s) since it de‐
6010 tected that PostgreSQL is not running, whichever comes first.
6011
6013 This section is not yet complete. Please contact us with any questions.
6014
6015 Deployment
6016 pg_auto_failover is a general purpose tool for setting up PostgreSQL
6017 replication in order to implement High Availability of the PostgreSQL
6018 service.
6019
6020 Provisioning
6021 It is also possible to register pre-existing PostgreSQL instances with
6022 a pg_auto_failover monitor. The pg_autoctl create command honors the
6023 PGDATA environment variable, and checks whether PostgreSQL is already
6024 running. If Postgres is detected, the new node is registered in SINGLE
6025 mode, bypassing the monitor's role assignment policy.
6026
6027 Upgrading pg_auto_failover, from versions 1.4 onward
6028 When upgrading a pg_auto_failover setup, the procedure is different on
6029 the monitor and on the Postgres nodes:
6030
6031 • on the monitor, the internal pg_auto_failover database schema
6032 might have changed and needs to be upgraded to its new definition,
6033 porting the existing data over. The pg_auto_failover database con‐
6034 tains the registration of every node in the system and their cur‐
6035 rent state.
6036 It is not possible to trigger a failover during the monitor
6037 update. Postgres operations on the Postgres nodes continue
6038 normally.
6039
6040 During the restart of the monitor, the other nodes might have
6041 trouble connecting to the monitor. The pg_autoctl command is
6042 designed to retry connecting to the monitor and handle errors
6043 gracefully.
6044
6045 • on the Postgres nodes, the pg_autoctl command connects to the mon‐
6046 itor every once in a while (every second by default), and then
6047 calls the node_active protocol, a stored procedure in the monitor
6048 databases.
6049 The pg_autoctl also verifies at each connection to the monitor
6050 that it's running the expected version of the extension. When
6051 that's not the case, the "node-active" sub-process quits, to
6052 be restarted with the possibly new version of the pg_autoctl
6053 binary found on-disk.
6054
6055 As a result, here is the standard upgrade plan for pg_auto_failover:
6056
6057 1. Upgrade the pg_auto_failover package on the all the nodes, moni‐
6058 tor included.
6059 When using a debian based OS, this looks like the following
6060 command when from 1.4 to 1.5:
6061
6062 sudo apt-get remove pg-auto-failover-cli-1.4 postgresql-11-auto-failover-1.4
6063 sudo apt-get install -q -y pg-auto-failover-cli-1.5 postgresql-11-auto-failover-1.5
6064
6065 2. Restart the pgautofailover service on the monitor.
6066 When using the systemd integration, all we need to do is:
6067
6068 sudo systemctl restart pgautofailover
6069
6070 Then we may use the following commands to make sure that the
6071 service is running as expected:
6072
6073 sudo systemctl status pgautofailover
6074 sudo journalctl -u pgautofailover
6075
6076 At this point it is expected that the pg_autoctl logs show
6077 that an upgrade has been performed by using the ALTER EXTEN‐
6078 SION pgautofailover UPDATE TO ... command. The monitor is
6079 ready with the new version of pg_auto_failover.
6080
6081 When the Postgres nodes pg_autoctl process connects to the new monitor
6082 version, the check for version compatibility fails, and the "node-ac‐
6083 tive" sub-process exits. The main pg_autoctl process supervisor then
6084 restart the "node-active" sub-process from its on-disk binary exe‐
6085 cutable file, which has been upgraded to the new version. That's why we
6086 first install the new packages for pg_auto_failover on every node, and
6087 only then restart the monitor.
6088
6089 IMPORTANT:
6090 Before upgrading the monitor, which is a simple restart of the
6091 pg_autoctl process, it is important that the OS packages for pgauto‐
6092 failover be updated on all the Postgres nodes.
6093
6094 When that's not the case, pg_autoctl on the Postgres nodes will
6095 still detect a version mismatch with the monitor extension, and the
6096 "node-active" sub-process will exit. And when restarted automati‐
6097 cally, the same version of the local pg_autoctl binary executable is
6098 found on-disk, leading to the same version mismatch with the monitor
6099 extension.
6100
6101 After restarting the "node-active" process 5 times, pg_autoctl quits
6102 retrying and stops. This includes stopping the Postgres service too,
6103 and a service downtime might then occur.
6104
6105 And when the upgrade is done we can use pg_autoctl show state on the
6106 monitor to see that eveything is as expected.
6107
6108 Upgrading from previous pg_auto_failover versions
6109 The new upgrade procedure described in the previous section is part of
6110 pg_auto_failover since version 1.4. When upgrading from a previous ver‐
6111 sion of pg_auto_failover, up to and including version 1.3, then all the
6112 pg_autoctl processes have to be restarted fully.
6113
6114 To prevent triggering a failover during the upgrade, it's best to put
6115 your secondary nodes in maintenance. The procedure then looks like the
6116 following:
6117
6118 1. Enable maintenance on your secondary node(s):
6119
6120 pg_autoctl enable maintenance
6121
6122 2. Upgrade the OS packages for pg_auto_failover on every node, as
6123 per previous section.
6124
6125 3. Restart the monitor to upgrade it to the new pg_auto_failover
6126 version:
6127 When using the systemd integration, all we need to do is:
6128
6129 sudo systemctl restart pgautofailover
6130
6131 Then we may use the following commands to make sure that the
6132 service is running as expected:
6133
6134 sudo systemctl status pgautofailover
6135 sudo journalctl -u pgautofailover
6136
6137 At this point it is expected that the pg_autoctl logs show
6138 that an upgrade has been performed by using the ALTER EXTEN‐
6139 SION pgautofailover UPDATE TO ... command. The monitor is
6140 ready with the new version of pg_auto_failover.
6141
6142 4. Restart pg_autoctl on all Postgres nodes on the cluster.
6143 When using the systemd integration, all we need to do is:
6144
6145 sudo systemctl restart pgautofailover
6146
6147 As in the previous point in this list, make sure the service
6148 is now running as expected.
6149
6150 5. Disable maintenance on your secondary nodes(s):
6151
6152 pg_autoctl disable maintenance
6153
6154 Extension dependencies when upgrading the monitor
6155 Since version 1.4.0 the pgautofailover extension requires the Postgres
6156 contrib extension btree_gist. The pg_autoctl command arranges for the
6157 creation of this dependency, and has been buggy in some releases.
6158
6159 As a result, you might have trouble upgrade the pg_auto_failover moni‐
6160 tor to a recent version. It is possible to fix the error by connecting
6161 to the monitor Postgres database and running the create extension com‐
6162 mand manually:
6163
6164 # create extension btree_gist;
6165
6166 Cluster Management and Operations
6167 It is possible to operate pg_auto_failover formations and groups di‐
6168 rectly from the monitor. All that is needed is an access to the monitor
6169 Postgres database as a client, such as psql. It's also possible to add
6170 those management SQL function calls in your own ops application if you
6171 have one.
6172
6173 For security reasons, the autoctl_node is not allowed to perform main‐
6174 tenance operations. This user is limited to what pg_autoctl needs. You
6175 can either create a specific user and authentication rule to expose for
6176 management, or edit the default HBA rules for the autoctl user. In the
6177 following examples we're directly connecting as the autoctl role.
6178
6179 The main operations with pg_auto_failover are node maintenance and man‐
6180 ual failover, also known as a controlled switchover.
6181
6182 Maintenance of a secondary node
6183 It is possible to put a secondary node in any group in a MAINTENANCE
6184 state, so that the Postgres server is not doing synchronous replication
6185 anymore and can be taken down for maintenance purposes, such as secu‐
6186 rity kernel upgrades or the like.
6187
6188 The command line tool pg_autoctl exposes an API to schedule maintenance
6189 operations on the current node, which must be a secondary node at the
6190 moment when maintenance is requested.
6191
6192 Here's an example of using the maintenance commands on a secondary
6193 node, including the output. Of course, when you try that on your own
6194 nodes, dates and PID information might differ:
6195
6196 $ pg_autoctl enable maintenance
6197 17:49:19 14377 INFO Listening monitor notifications about state changes in formation "default" and group 0
6198 17:49:19 14377 INFO Following table displays times when notifications are received
6199 Time | ID | Host | Port | Current State | Assigned State
6200 ---------+-----+-----------+--------+---------------------+--------------------
6201 17:49:19 | 1 | localhost | 5001 | primary | wait_primary
6202 17:49:19 | 2 | localhost | 5002 | secondary | wait_maintenance
6203 17:49:19 | 2 | localhost | 5002 | wait_maintenance | wait_maintenance
6204 17:49:20 | 1 | localhost | 5001 | wait_primary | wait_primary
6205 17:49:20 | 2 | localhost | 5002 | wait_maintenance | maintenance
6206 17:49:20 | 2 | localhost | 5002 | maintenance | maintenance
6207
6208 The command listens to the state changes in the current node's forma‐
6209 tion and group on the monitor and displays those changes as it receives
6210 them. The operation is done when the node has reached the maintenance
6211 state.
6212
6213 It is now possible to disable maintenance to allow pg_autoctl to manage
6214 this standby node again:
6215
6216 $ pg_autoctl disable maintenance
6217 17:49:26 14437 INFO Listening monitor notifications about state changes in formation "default" and group 0
6218 17:49:26 14437 INFO Following table displays times when notifications are received
6219 Time | ID | Host | Port | Current State | Assigned State
6220 ---------+-----+-----------+--------+---------------------+--------------------
6221 17:49:27 | 2 | localhost | 5002 | maintenance | catchingup
6222 17:49:27 | 2 | localhost | 5002 | catchingup | catchingup
6223 17:49:28 | 2 | localhost | 5002 | catchingup | secondary
6224 17:49:28 | 1 | localhost | 5001 | wait_primary | primary
6225 17:49:28 | 2 | localhost | 5002 | secondary | secondary
6226 17:49:29 | 1 | localhost | 5001 | primary | primary
6227
6228 When a standby node is in maintenance, the monitor sets the primary
6229 node replication to WAIT_PRIMARY: in this role, the PostgreSQL stream‐
6230 ing replication is now asynchronous and the standby PostgreSQL server
6231 may be stopped, rebooted, etc.
6232
6233 Maintenance of a primary node
6234 A primary node must be available at all times in any formation and
6235 group in pg_auto_failover, that is the invariant provided by the whole
6236 solution. With that in mind, the only way to allow a primary node to go
6237 to a maintenance mode is to first failover and promote the secondary
6238 node.
6239
6240 The same command pg_autoctl enable maintenance implements that opera‐
6241 tion when run on a primary node with the option --allow-failover. Here
6242 is an example of such an operation:
6243
6244 $ pg_autoctl enable maintenance
6245 11:53:03 50526 WARN Enabling maintenance on a primary causes a failover
6246 11:53:03 50526 FATAL Please use --allow-failover to allow the command proceed
6247
6248 As we can see the option allow-maintenance is mandatory. In the next
6249 example we use it:
6250
6251 $ pg_autoctl enable maintenance --allow-failover
6252 13:13:42 1614 INFO Listening monitor notifications about state changes in formation "default" and group 0
6253 13:13:42 1614 INFO Following table displays times when notifications are received
6254 Time | ID | Host | Port | Current State | Assigned State
6255 ---------+-----+-----------+--------+---------------------+--------------------
6256 13:13:43 | 2 | localhost | 5002 | primary | prepare_maintenance
6257 13:13:43 | 1 | localhost | 5001 | secondary | prepare_promotion
6258 13:13:43 | 1 | localhost | 5001 | prepare_promotion | prepare_promotion
6259 13:13:43 | 2 | localhost | 5002 | prepare_maintenance | prepare_maintenance
6260 13:13:44 | 1 | localhost | 5001 | prepare_promotion | stop_replication
6261 13:13:45 | 1 | localhost | 5001 | stop_replication | stop_replication
6262 13:13:46 | 1 | localhost | 5001 | stop_replication | wait_primary
6263 13:13:46 | 2 | localhost | 5002 | prepare_maintenance | maintenance
6264 13:13:46 | 1 | localhost | 5001 | wait_primary | wait_primary
6265 13:13:47 | 2 | localhost | 5002 | maintenance | maintenance
6266
6267 When the operation is done we can have the old primary re-join the
6268 group, this time as a secondary:
6269
6270 $ pg_autoctl disable maintenance
6271 13:14:46 1985 INFO Listening monitor notifications about state changes in formation "default" and group 0
6272 13:14:46 1985 INFO Following table displays times when notifications are received
6273 Time | ID | Host | Port | Current State | Assigned State
6274 ---------+-----+-----------+--------+---------------------+--------------------
6275 13:14:47 | 2 | localhost | 5002 | maintenance | catchingup
6276 13:14:47 | 2 | localhost | 5002 | catchingup | catchingup
6277 13:14:52 | 2 | localhost | 5002 | catchingup | secondary
6278 13:14:52 | 1 | localhost | 5001 | wait_primary | primary
6279 13:14:52 | 2 | localhost | 5002 | secondary | secondary
6280 13:14:53 | 1 | localhost | 5001 | primary | primary
6281
6282 Triggering a failover
6283 It is possible to trigger a manual failover, or a switchover, using the
6284 command pg_autoctl perform failover. Here's an example of what happens
6285 when running the command:
6286
6287 $ pg_autoctl perform failover
6288 11:58:00 53224 INFO Listening monitor notifications about state changes in formation "default" and group 0
6289 11:58:00 53224 INFO Following table displays times when notifications are received
6290 Time | ID | Host | Port | Current State | Assigned State
6291 ---------+-----+-----------+--------+--------------------+-------------------
6292 11:58:01 | 1 | localhost | 5001 | primary | draining
6293 11:58:01 | 2 | localhost | 5002 | secondary | prepare_promotion
6294 11:58:01 | 1 | localhost | 5001 | draining | draining
6295 11:58:01 | 2 | localhost | 5002 | prepare_promotion | prepare_promotion
6296 11:58:02 | 2 | localhost | 5002 | prepare_promotion | stop_replication
6297 11:58:02 | 1 | localhost | 5001 | draining | demote_timeout
6298 11:58:03 | 1 | localhost | 5001 | demote_timeout | demote_timeout
6299 11:58:04 | 2 | localhost | 5002 | stop_replication | stop_replication
6300 11:58:05 | 2 | localhost | 5002 | stop_replication | wait_primary
6301 11:58:05 | 1 | localhost | 5001 | demote_timeout | demoted
6302 11:58:05 | 2 | localhost | 5002 | wait_primary | wait_primary
6303 11:58:05 | 1 | localhost | 5001 | demoted | demoted
6304 11:58:06 | 1 | localhost | 5001 | demoted | catchingup
6305 11:58:06 | 1 | localhost | 5001 | catchingup | catchingup
6306 11:58:08 | 1 | localhost | 5001 | catchingup | secondary
6307 11:58:08 | 2 | localhost | 5002 | wait_primary | primary
6308 11:58:08 | 1 | localhost | 5001 | secondary | secondary
6309 11:58:08 | 2 | localhost | 5002 | primary | primary
6310
6311 Again, timings and PID numbers are not expected to be the same when you
6312 run the command on your own setup.
6313
6314 Also note in the output that the command shows the whole set of transi‐
6315 tions including when the old primary is now a secondary node. The data‐
6316 base is available for read-write traffic as soon as we reach the state
6317 wait_primary.
6318
6319 Implementing a controlled switchover
6320 It is generally useful to distinguish a controlled switchover to a
6321 failover. In a controlled switchover situation it is possible to organ‐
6322 ise the sequence of events in a way to avoid data loss and lower down‐
6323 time to a minimum.
6324
6325 In the case of pg_auto_failover, because we use synchronous replica‐
6326 tion, we don't face data loss risks when triggering a manual failover.
6327 Moreover, our monitor knows the current primary health at the time when
6328 the failover is triggered, and drives the failover accordingly.
6329
6330 So to trigger a controlled switchover with pg_auto_failover you can use
6331 the same API as for a manual failover:
6332
6333 $ pg_autoctl perform switchover
6334
6335 Because the subtelties of orchestrating either a controlled switchover
6336 or an unplanned failover are all handled by the monitor, rather than
6337 the client side command line, at the client level the two command
6338 pg_autoctl perform failover and pg_autoctl perform switchover are syn‐
6339 onyms, or aliases.
6340
6341 Current state, last events
6342 The following commands display information from the pg_auto_failover
6343 monitor tables pgautofailover.node and pgautofailover.event:
6344
6345 $ pg_autoctl show state
6346 $ pg_autoctl show events
6347
6348 When run on the monitor, the commands outputs all the known states and
6349 events for the whole set of formations handled by the monitor. When run
6350 on a PostgreSQL node, the command connects to the monitor and outputs
6351 the information relevant to the service group of the local node only.
6352
6353 For interactive debugging it is helpful to run the following command
6354 from the monitor node while e.g. initializing a formation from scratch,
6355 or performing a manual failover:
6356
6357 $ watch pg_autoctl show state
6358
6359 Monitoring pg_auto_failover in Production
6360 The monitor reports every state change decision to a LISTEN/NOTIFY
6361 channel named state. PostgreSQL logs on the monitor are also stored in
6362 a table, pgautofailover.event, and broadcast by NOTIFY in the channel
6363 log.
6364
6365 Replacing the monitor online
6366 When the monitor node is not available anymore, it is possible to cre‐
6367 ate a new monitor node and then switch existing nodes to a new monitor
6368 by using the following commands.
6369
6370 1. Apply the STONITH approach on the old monitor to make sure this
6371 node is not going to show up again during the procedure. This
6372 step is sometimes refered to as “fencing”.
6373
6374 2. On every node, ending with the (current) Postgres primary node
6375 for each group, disable the monitor while pg_autoctl is still
6376 running:
6377
6378 $ pg_autoctl disable monitor --force
6379
6380 3. Create a new monitor node:
6381
6382 $ pg_autoctl create monitor ...
6383
6384 4. On the current primary node first, so that it's registered first
6385 and as a primary still, for each group in your formation(s), en‐
6386 able the monitor online again:
6387
6388 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover
6389
6390 5. On every other (secondary) node, enable the monitor online again:
6391
6392 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover
6393
6394 See pg_autoctl disable monitor and pg_autoctl enable monitor for de‐
6395 tails about those commands.
6396
6397 This operation relies on the fact that a pg_autoctl can be operated
6398 without a monitor, and when reconnecting to a new monitor, this process
6399 reset the parts of the node state that comes from the monitor, such as
6400 the node identifier.
6401
6402 Trouble-Shooting Guide
6403 pg_auto_failover commands can be run repeatedly. If initialization
6404 fails the first time -- for instance because a firewall rule hasn't yet
6405 activated -- it's possible to try pg_autoctl create again.
6406 pg_auto_failover will review its previous progress and repeat idempo‐
6407 tent operations (create database, create extension etc), gracefully
6408 handling errors.
6409
6411 Those questions have been asked in GitHub issues for the project by
6412 several people. If you have more questions, feel free to open a new is‐
6413 sue, and your question and its answer might make it to this FAQ.
6414
6415 I stopped the primary and no failover is happening for 20s to 30s, why?
6416 In order to avoid spurious failovers when the network connectivity is
6417 not stable, pg_auto_failover implements a timeout of 20s before acting
6418 on a node that is known unavailable. This needs to be added to the de‐
6419 lay between health checks and the retry policy.
6420
6421 See the Configuring pg_auto_failover part for more information about
6422 how to setup the different delays and timeouts that are involved in the
6423 decision making.
6424
6425 See also pg_autoctl watch to have a dashboard that helps understanding
6426 the system and what's going on in the moment.
6427
6428 The secondary is blocked in the CATCHING_UP state, what should I do?
6429 In the pg_auto_failover design, the following two things are needed for
6430 the monitor to be able to orchestrate nodes integration completely:
6431
6432 1. Health Checks must be successful
6433
6434 The monitor runs periodic health checks with all the nodes regis‐
6435 tered in the system. Those health checks are Postgres connections
6436 from the monitor to the registered Postgres nodes, and use the
6437 hostname and port as registered.
6438
6439 The pg_autoctl show state commands column Reachable contains
6440 "yes" when the monitor could connect to a specific node, "no"
6441 when this connection failed, and "unknown" when no connection has
6442 been attempted yet, since the last startup time of the monitor.
6443
6444 The Reachable column from pg_autoctl show state command output
6445 must show a "yes" entry before a new standby node can be orches‐
6446 trated up to the "secondary" goal state.
6447
6448 2. pg_autoctl service must be running
6449
6450 The pg_auto_failover monitor works by assigning goal states to
6451 individual Postgres nodes. The monitor will not assign a new goal
6452 state until the current one has been reached.
6453
6454 To implement a transition from the current state to the goal
6455 state assigned by the monitor, the pg_autoctl service must be
6456 running on every node.
6457
6458 When your new standby node stays in the "catchingup" state for a long
6459 time, please check that the node is reachable from the monitor given
6460 its hostname and port known on the monitor, and check that the pg_au‐
6461 toctl run command is running for this node.
6462
6463 When things are not obvious, the next step is to go read the logs. Both
6464 the output of the pg_autoctl command and the Postgres logs are rele‐
6465 vant. See the Should I read the logs? Where are the logs? question for
6466 details.
6467
6468 Should I read the logs? Where are the logs?
6469 Yes. If anything seems strange to you, please do read the logs.
6470
6471 As maintainers of the pg_autoctl tool, we can't foresee everything that
6472 may happen to your production environment. Still, a lot of efforts is
6473 spent on having a meaningful output. So when you're in a situation
6474 that's hard to understand, please make sure to read the pg_autoctl logs
6475 and the Postgres logs.
6476
6477 When using systemd integration, the pg_autoctl logs are then handled
6478 entirely by the journal facility of systemd. Please then refer to jour‐
6479 nalctl for viewing the logs.
6480
6481 The Postgres logs are to be found in the $PGDATA/log directory with the
6482 default configuration deployed by pg_autoctl create .... When a custom
6483 Postgres setup is used, please refer to your actual setup to find Post‐
6484 gres logs.
6485
6486 The state of the system is blocked, what should I do?
6487 This question is a general case situation that is similar in nature to
6488 the previous situation, reached when adding a new standby to a group of
6489 Postgres nodes. Please check the same two elements: the monitor health
6490 checks are successful, and the pg_autoctl run command is running.
6491
6492 When things are not obvious, the next step is to go read the logs. Both
6493 the output of the pg_autoctl command and the Postgres logs are rele‐
6494 vant. See the Should I read the logs? Where are the logs? question for
6495 details.
6496
6497 The monitor is a SPOF in pg_auto_failover design, how should we handle
6498 that?
6499 When using pg_auto_failover, the monitor is needed to make decisions
6500 and orchestrate changes in all the registered Postgres groups. Deci‐
6501 sions are transmitted to the Postgres nodes by the monitor assigning
6502 nodes a goal state which is different from their current state.
6503
6504 Consequences of the monitor being unavailable
6505 Nodes contact the monitor each second and call the node_active stored
6506 procedure, which returns a goal state that is possibly different from
6507 the current state.
6508
6509 The monitor only assigns Postgres nodes with a new goal state when a
6510 cluster wide operation is needed. In practice, only the following oper‐
6511 ations require the monitor to assign a new goal state to a Postgres
6512 node:
6513
6514 • a new node is registered
6515
6516 • a failover needs to happen, either triggered automatically or man‐
6517 ually
6518
6519 • a node is being put to maintenance
6520
6521 • a node replication setting is being changed.
6522
6523 When the monitor node is not available, the pg_autoctl processes on the
6524 Postgres nodes will fail to contact the monitor every second, and log
6525 about this failure. Adding to that, no orchestration is possible.
6526
6527 The Postgres streaming replication does not need the monitor to be
6528 available in order to deliver its service guarantees to your applica‐
6529 tion, so your Postgres service is still available when the monitor is
6530 not available.
6531
6532 To repair your installation after having lost a monitor, the following
6533 scenarios are to be considered.
6534
6535 The monitor node can be brought up again without data having been lost
6536 This is typically the case in Cloud Native environments such as Kuber‐
6537 netes, where you could have a service migrated to another pod and
6538 re-attached to its disk volume. This scenario is well supported by
6539 pg_auto_failover, and no intervention is needed.
6540
6541 It is also possible to use synchronous archiving with the monitor so
6542 that it's possible to recover from the current archives and continue
6543 operating without intervention on the Postgres nodes, except for updat‐
6544 ing their monitor URI. This requires an archiving setup that uses syn‐
6545 chronous replication so that any transaction committed on the monitor
6546 is known to have been replicated in your WAL archive.
6547
6548 At the moment, you have to take care of that setup yourself. Here's a
6549 quick summary of what needs to be done:
6550
6551 1. Schedule base backups
6552
6553 Use pg_basebackup every once in a while to have a full copy of
6554 the monitor Postgres database available.
6555
6556 2. Archive WAL files in a synchronous fashion
6557
6558 Use pg_receivewal --sync ... as a service to keep a WAL archive
6559 in sync with the monitor Postgres instance at all time.
6560
6561 3. Prepare a recovery tool on top of your archiving strategy
6562
6563 Write a utility that knows how to create a new monitor node from
6564 your most recent pg_basebackup copy and the WAL files copy.
6565
6566 Bonus points if that tool/script is tested at least once a day,
6567 so that you avoid surprises on the unfortunate day that you actu‐
6568 ally need to use it in production.
6569
6570 A future version of pg_auto_failover will include this facility, but
6571 the current versions don't.
6572
6573 The monitor node can only be built from scratch again
6574 If you don't have synchronous archiving for the monitor set-up, then
6575 you might not be able to restore a monitor database with the expected
6576 up-to-date node metadata. Specifically we need the nodes state to be in
6577 sync with what each pg_autoctl process has received the last time they
6578 could contact the monitor, before it has been unavailable.
6579
6580 It is possible to register nodes that are currently running to a new
6581 monitor without restarting Postgres on the primary. For that, the pro‐
6582 cedure mentionned in Replacing the monitor online must be followed, us‐
6583 ing the following commands:
6584
6585 $ pg_autoctl disable monitor
6586 $ pg_autoctl enable monitor
6587
6589 Microsoft
6590
6592 Copyright (c) Microsoft Corporation. All rights reserved.
6593
6594
6595
6596
65971.6 Aug 04, 2022 PG_AUTO_FAILOVER(1)