1BOOTHD(8)                                                            BOOTHD(8)
2
3
4

NAME

6       boothd - The Booth Cluster Ticket Manager.
7

SYNOPSIS

9       boothd daemon [-SD] [-c config] [-l lockfile]
10
11       booth list [-s site] [-c config]
12
13       booth grant [-s site] [-c config] [-FCw] ticket
14
15       booth revoke [-s site] [-c config] [-w] ticket
16
17       booth peers [-s site] [-c config]
18
19       booth status [-D] [-c config]
20

DESCRIPTION

22       Booth manages tickets which authorizes one of the cluster sites located
23       in geographically dispersed distances to run certain resources. It is
24       designed to be extend Pacemaker to support geographically distributed
25       clustering.
26
27       It is based on the RAFT protocol, see eg.
28       https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
29       for details.
30

SHORT EXAMPLES

32           # boothd daemon -D
33
34           # booth list
35
36           # booth grant ticket-nfs
37
38           # booth revoke ticket-nfs
39

OPTIONS

41       -c configfile
42           Configuration to use.
43
44           Can be a full path to a configuration file, or a short name; in the
45           latter case, the directory /etc/booth and suffix .conf are added.
46           Per default booth is used, which results in the path
47           /etc/booth/booth.conf.
48
49           The configuration name also determines the name of the PID file -
50           for the defaults, /var/run/booth/booth.pid.
51
52       -s
53           Site address or name.
54
55               The special value 'other' can be used to specify the other
56               site. Obviously, in that case, the booth configuration must
57               have exactly two sites defined.
58
59       -F
60           immediate grant: Don’t wait for unreachable sites to relinquish the
61           ticket. See the Booth ticket management section below for more
62           details.
63
64               This option may be DANGEROUS. It makes booth grant the ticket
65               even though it cannot ascertain that unreachable sites don't
66               hold the same ticket. It is up to the user to make sure that
67               unreachable sites don't have this ticket as granted.
68
69       -w
70           wait for the request outcome: The client waits for the final
71           outcome of grant or revoke request.
72
73       -C
74           wait for ticket commit to CIB: The client waits for the ticket
75           commit to CIB (only for grant requests). If one or more sites are
76           unreachable, this takes the ticket expire time (plus, if defined,
77           the acquire-after time).
78
79       -h, --help
80           Give a short usage output.
81
82       --version
83           Report version information.
84
85       -S
86           systemd mode: don’t fork. This is like -D but without the debug
87           output.
88
89       -D
90           Debug output/don’t daemonize. Increases the debug output level;
91           booth daemon remains in the foreground.
92
93       -l lockfile
94           Use another lock file. By default, the lock file name is inferred
95           from the configuration file name. Normally not needed.
96

COMMANDS

98       Whether the binary is called as boothd or booth doesn’t matter; the
99       first argument determines the mode of operation.
100
101       daemon
102           Tells boothd to serve a site. The locally configured interfaces are
103           searched for an IP address that is defined in the configuration.
104           booth then runs in either /arbitrator/ or /site/ mode.
105
106       client
107           Booth clients can list the ticket information (see also crm_ticket
108           -L), and revoke or grant tickets to a site.
109
110           The grant and, under certain circumstances, revoke operations may
111           take a while to return a definite operation’s outcome. The client
112           will wait up to the network timeout value (by default 5 seconds)
113           for the result. Unless the -w option was set, in which case the
114           client waits indefinitely.
115
116           In this mode the configuration file is searched for an IP address
117           that is locally reachable, ie. matches a configured subnet. This
118           allows to run the client commands on another node in the same
119           cluster, as long as the config file and the service IP is locally
120           reachable.
121
122           For instance, if the booth service IP is 192.168.55.200, and the
123           local node has 192.168.55.15 configured on one of its network
124           interfaces, it knows which site it belongs to.
125
126           Use -s to direct client to connect to a different site.
127
128       status
129           boothd looks for the (locked) PID file and the UDP socket, prints
130           some output to stdout (for use in shell scripts) and returns an
131           OCF-compatible return code. With -D, a human-readable message is
132           printed to STDERR as well.
133
134       peers
135           List the other boothd servers we know about.
136
137           In addition to the type, name (IP address), and the last time the
138           server was heard from, network statistics are also printed. The
139           statistics are split into two rows, the first one consists of
140           counters for the sent packets and the second one for the received
141           packets. The first counter is the total number of packets and
142           descriptions of the other counters follows:
143
144       resends
145           Packets which had to be resent because the recipient didn’t
146           acknowledge a message. This usually means that either the message
147           or the acknowledgement got lost. The number of resends usually
148           reflect the network reliability.
149
150       error
151           Packets which either couldn’t be sent, got truncated, or were badly
152           formed. Should be zero.
153
154       invalid
155           These packets contain either invalid or non-existing ticket name or
156           refer to a non-existing ticket leader. Should be zero.
157
158       authfail
159           Packets which couldn’t be authenticated. Should be zero.
160

CONFIGURATION FILE

162       The configuration file must be identical on all sites and arbitrators.
163
164       A minimal file may look like this:
165
166           site="192.168.201.100"
167           site="192.168.202.100"
168           arbitrator="192.168.203.100"
169           ticket="ticket-db8"
170
171       Comments start with a hash-sign ('#'). Whitespace at the start and end
172       of the line, and around the '=', are ignored.
173
174       The following key/value pairs are defined:
175
176       port
177           The UDP/TCP port to use. Default is 9929.
178
179       transport
180           The transport protocol to use for Raft exchanges. Currently only
181           UDP is supported.
182
183           Clients use TCP to communicate with a daemon; Booth will always
184           bind and listen to both UDP and TCP ports.
185
186       authfile
187           File containing the authentication key. The key can be either
188           binary or text. If the latter, then both leading and trailing white
189           space, including new lines, is ignored. This key is a shared secret
190           and used to authenticate both clients and servers. The key must be
191           between 8 and 64 characters long and be readable only by the file
192           owner.
193
194       maxtimeskew
195           As protection against replay attacks, packets contain generation
196           timestamps. Such a timestamp is not allowed to be too old. Just how
197           old can be specified with this parameter. The value is in seconds
198           and the default is 600 (10 minutes). If clocks vary more than this
199           default between sites and nodes (which is definitely something you
200           should fix) then set this parameter to a higher value. The time
201           skew test is performed only in concert with authentication.
202
203       site
204           Defines a site Raft member with the given IP. Sites can acquire
205           tickets. The sites' IP should be managed by the cluster.
206
207       arbitrator
208           Defines an arbitrator Raft member with the given IP. Arbitrators
209           help reach consensus in elections and cannot hold tickets.
210
211       Booth needs at least three members for normal operation. Odd number of
212       members provides more redundancy.
213
214       site-user, site-group, arbitrator-user, arbitrator-group
215           These define the credentials boothd will be running with.
216
217           On a (Pacemaker) site the booth process will have to call
218           crm_ticket, so the default is to use hacluster:'haclient'; for an
219           arbitrator this user and group might not exists, so there we
220           default to nobody:'nobody'.
221
222       ticket
223           Registers a ticket. Multiple tickets can be handled by single Booth
224           instance.
225
226           Use the special ticket name defaults to modify the defaults. The
227           defaults stanza must precede all the other ticket specifications.
228
229       All times are in seconds.
230
231       expire
232           The lease time for a ticket. After that time the ticket can be
233           acquired by another site if the ticket holder is not reachable.
234
235           The default is 600.
236
237       acquire-after
238           Once a ticket is lost, wait this time in addition before acquiring
239           the ticket.
240
241           This is to allow for the site that lost the ticket to relinquish
242           the resources, by either stopping them or fencing a node.
243
244           A typical delay might be 60 seconds, but ultimately it depends on
245           the protected resources and the fencing configuration.
246
247           The default is 0.
248
249       renewal-freq
250           Set the ticket renewal frequency period.
251
252           If the network reliability is often reduced over prolonged periods,
253           it is advisable to try to renew more often.
254
255           Before every renewal, if defined, the command or commands specified
256           in before-acquire-handler is run. In that case the renewal-freq
257           parameter is effectively also the local cluster monitoring
258           interval.
259
260       timeout
261           After that time booth will re-send packets if there was an
262           insufficient number of replies. This should be long enough to allow
263           packets to reach other members.
264
265           The default is 5.
266
267       retries
268           Defines how many times to retry sending packets before giving up
269           waiting for acks from other members.
270
271           Default is 10. Values lower than 3 are illegal.
272
273           Ticket renewals should allow for this number of retries. Hence, the
274           total retry time must be shorter than the renewal time (either half
275           the expire time or renewal-freq):
276
277               timeout*(retries+1) < renewal
278
279       weights
280           A comma-separated list of integers that define the weight of
281           individual Raft members, in the same order as the site and
282           arbitrator lines.
283
284           Default is 0 for all; this means that the order in the
285           configuration file defines priority for conflicting requests.
286
287       before-acquire-handler
288           If set, this parameter specifies either a file containing a program
289           to be run or a directory where a number of programs can reside.
290           They are invoked before boothd tries to acquire or renew a ticket.
291           If any of them exits with a code other than 0, boothd relinquishes
292           the ticket.
293
294           Thus it is possible to ensure whether the services and its
295           dependencies protected by the ticket are in good shape at this
296           site. For instance, if a service in the dependency-chain has a
297           failcount of INFINITY on all available nodes, the service will be
298           unable to run. In that case, it is of no use to claim the ticket.
299
300           One or more arguments may follow the program or directory location.
301           Typically, there is at least the name of one of the resources which
302           depend on this ticket.
303
304           See below for details about booth specific environment variables.
305           The distributed service-runnable script is an example which may be
306           used to test whether a pacemaker resource can be started.
307
308       attr-prereq
309           Sites can have GEO attributes managed with the geostore(8) program.
310           Attributes are within ticket’s scope and may be tested by boothd
311           for additional control of ticket failover (automatic) or ticket
312           acquire (manual).
313
314           Attributes are typically used to convey extra information about
315           resources, for instance database replication status. The attributes
316           are commonly updated by resource agents.
317
318           Attribute values are referenced in expressions and may be tested
319           for equality with the eq binary operator or inequality with the ne
320           operator. The usage is as follows:
321
322               attr-prereq = <grant_type> <name> <op> <value>
323
324               <grant_type>: "auto" | "manual"
325               <name>:       attribute name
326               <op>:         "eq" | "ne"
327               <value>:      attribute value
328
329           The two grant types are auto for ticket failover and manual for
330           grants using the booth client. Only in case the expression
331           evaluates to true can the ticket be granted.
332
333           It is not clear whether the manual grant type has any practical use
334           because, obviously, this operation is anyway controlled by a human.
335
336           Note that there can be no guarantee on whether an attribute value
337           is up to date, i.e. if it actually reflects the current state.
338
339       One example of a booth configuration file:
340
341           transport = udp
342           port = 9930
343
344           # D-85774
345           site="192.168.201.100"
346           # D-90409
347           site="::ffff:192.168.202.100"
348           # A-1120
349           arbitrator="192.168.203.100"
350
351           ticket="ticket-db8"
352               expire        = 600
353               acquire-after = 60
354               timeout       = 10
355               retries       = 5
356               renewal-freq  = 60
357               before-acquire-handler = /usr/share/booth/service-runnable db8
358               attr-prereq = auto repl_state eq ACTIVE
359

BOOTH TICKET MANAGEMENT

361       The booth cluster guarantees that every ticket is owned by only one
362       site at the time.
363
364       Tickets must be initially granted with the booth client grant command.
365       Once it gets granted, the ticket is managed by the booth cluster.
366       Hence, only granted tickets are managed by booth.
367
368       If the ticket gets lost, i.e. that the other members of the booth
369       cluster do not hear from the ticket owner in a sufficiently long time,
370       one of the remaining sites will acquire the ticket. This is what is
371       called ticket failover.
372
373       If the remaining members cannot form a majority, then the ticket cannot
374       fail over.
375
376       A ticket may be revoked at any time with the booth client revoke
377       command. For revoke to succeed, the site holding the ticket must be
378       reachable.
379
380       Once the ticket is administratively revoked, it is not managed by the
381       booth cluster anymore. For the booth cluster to start managing the
382       ticket again, it must be again granted to a site.
383
384       The grant operation, in case not all sites are reachable, may get
385       delayed for the ticket expire time (and, if defined, the acquire-after
386       time). The reason is that the other booth members may not know if the
387       ticket is currently granted at the unreachable site.
388
389       This delay may be disabled with the -F option. In that case, it is up
390       to the administrator to make sure that the unreachable site is not
391       holding the ticket.
392
393       When the ticket is managed by booth, it is dangerous to modify it
394       manually using either crm_ticket command or crm site ticket. Neither of
395       these tools is aware of booth and, consequently, booth itself may not
396       be aware of any ticket status changes. A notable exception is setting
397       the ticket to standby which is typically done before a planned
398       failover.
399

NOTES

401       Tickets are not meant to be moved around quickly, the default expire
402       time is 600 seconds (10 minutes).
403
404       booth works with both IPv4 and IPv6 addresses.
405
406       booth renews a ticket before it expires, to account for possible
407       transmission delays. The renewal time, unless explicitly set, is set to
408       half the expire time.
409

HANDLERS

411       Currently, there’s only one external handler defined (see the
412       before-acquire-handler configuration item above).
413
414       The following environment variables are exported to the handler:
415
416       *BOOTH_TICKET
417           The ticket name, as given in the configuration file. (See ticket
418           item above.)
419
420       *BOOTH_LOCAL
421           The local site name, as defined in site.
422
423       *BOOTH_CONF_PATH
424           The path to the active configuration file.
425
426       *BOOTH_CONF_NAME
427           The configuration name, as used by the -c commandline argument.
428
429       *BOOTH_TICKET_EXPIRES
430           When the ticket expires (in seconds since 1.1.1970), or 0.
431
432       The handler is invoked with positional arguments specified after it.
433

FILES

435       /etc/booth/booth.conf
436           The default configuration file name. See also the -c argument.
437
438       /etc/booth/authkey
439           There is no default, but this is a typical location for the shared
440           secret (authentication key).
441
442       /var/run/booth/
443           Directory that holds PID/lock files. See also the status command.
444

RAFT IMPLEMENTATION

446       In essence, every ticket corresponds to a separate Raft cluster.
447
448       A ticket is granted to the Raft Leader which then owns (or keeps) the
449       ticket.
450

ARBITRATOR MANAGEMENT

452       The booth daemon for an arbitrator which typically doesn’t run the
453       cluster stack, may be started through systemd or with
454       /etc/init.d/booth-arbitrator, depending on which init system the
455       platform supports.
456
457       The SysV init script starts a booth arbitrator for every configuration
458       file found in /etc/booth.
459
460       Platforms running systemd can enable and start every configuration
461       separately using systemctl:
462
463           # systemctl enable booth@<configurationname>
464           # systemctl start  booth@<configurationname>
465
466       systemctl requires the configuration name, even for the default name
467       booth.
468

EXIT STATUS

470       0
471           Success. For the status command: Daemon running.
472
473       1 (PCMK_OCF_UNKNOWN_ERROR)
474           General error code.
475
476       7 (PCMK_OCF_NOT_RUNNING)
477           No daemon process for that configuration active.
478

BUGS

480       Booth is tested regularly. See the README-testing file for more
481       information.
482
483       Please report any bugs either at GitHub:
484       https://github.com/ClusterLabs/booth/issues
485
486       Or, if you prefer bugzilla, at openSUSE bugzilla (component "High
487       Availability"):
488       https://bugzilla.opensuse.org/enter_bug.cgi?product=openSUSE%20Factory
489

AUTHOR

491       boothd was originally written (mostly) by Jiaju Zhang.
492
493       In 2013 and 2014 Philipp Marek took over maintainership.
494
495       Since April 2014 it has been mainly developed by Dejan Muhamedagic.
496
497       Many people contributed (see the AUTHORS file).
498

RESOURCES

500       GitHub: https://github.com/ClusterLabs/booth
501
502       Documentation:
503       http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html
504

COPYING

506       Copyright © 2011 Jiaju Zhang <jjzhang@suse.de>
507
508       Copyright © 2013-2014 Philipp Marek <philipp.marek@linbit.com>
509
510       Copyright © 2014 Dejan Muhamedagic <dmuhamedagic@suse.com>
511
512       Free use of this software is granted under the terms of the GNU General
513       Public License (GPL) as of version 2 (see COPYING file) or later.
514
515
516
517                                  10/30/2018                         BOOTHD(8)
Impressum