1BOOTHD(8)                                                            BOOTHD(8)
2
3
4

NAME

6       boothd - The Booth Cluster Ticket Manager.
7

SYNOPSIS

9       boothd daemon [-SD] [-c config] [-l lockfile]
10
11       booth list [-s site] [-c config]
12
13       booth grant [-s site] [-c config] [-FCw] ticket
14
15       booth revoke [-s site] [-c config] [-w] ticket
16
17       booth peers [-s site] [-c config]
18
19       booth status [-D] [-c config]
20

DESCRIPTION

22       Booth manages tickets which authorizes one of the cluster sites located
23       in geographically dispersed distances to run certain resources. It is
24       designed to be extend Pacemaker to support geographically distributed
25       clustering.
26
27       It is based on the RAFT protocol, see eg. <https://
28       ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf>  for
29       details.
30

SHORT EXAMPLES

32           # boothd daemon -D
33
34           # booth list
35
36           # booth grant ticket-nfs
37
38           # booth revoke ticket-nfs
39

OPTIONS

41       -c configfile
42           Configuration to use.
43
44           Can be a full path to a configuration file, or a short name; in the
45           latter  case,  the directory /etc/booth and suffix .conf are added.
46           Per  default  booth  is   used,   which   results   in   the   path
47           /etc/booth/booth.conf.
48
49           The  configuration  name also determines the name of the PID file -
50           for the defaults, /var/run/booth/booth.pid.
51
52       -s
53           Site address or name.
54
55               The special value 'other' can be used to specify the other
56               site. Obviously, in that case, the booth configuration must
57               have exactly two sites defined.
58
59       -F
60           immediate grant: Don’t wait for unreachable sites to relinquish the
61           ticket.  See  the  Booth  ticket  management section below for more
62           details. For manual tickets this option allows to  grant  a  ticket
63           which  is  currently  granted. See the Manual tickets section below
64           for more details.
65
66               This option may be DANGEROUS. It makes booth grant the ticket
67               even though it cannot ascertain that unreachable sites don't
68               hold the same ticket. It is up to the user to make sure that
69               unreachable sites don't have this ticket as granted.
70
71       -w
72           wait for the request  outcome:  The  client  waits  for  the  final
73           outcome of grant or revoke request.
74
75       -C
76           wait  for  ticket  commit  to  CIB: The client waits for the ticket
77           commit to CIB (only for grant requests). If one or more  sites  are
78           unreachable,  this  takes the ticket expire time (plus, if defined,
79           the acquire-after time).
80
81       -h, --help
82           Give a short usage output.
83
84       --version
85           Report version information.
86
87       -S
88           systemd mode: don’t fork. Disables daemonizing,  the  process  will
89           remain in the foreground.
90
91       -D
92           Increases the debug output level.
93
94       -l lockfile
95           Use  another  lock file. By default, the lock file name is inferred
96           from the configuration file name. Normally not needed.
97

COMMANDS

99       Whether the binary is called as boothd or  booth  doesn’t  matter;  the
100       first argument determines the mode of operation.
101
102       daemon
103           Tells boothd to serve a site. The locally configured interfaces are
104           searched for an IP address that is defined  in  the  configuration.
105           booth then runs in either /arbitrator/ or /site/ mode.
106
107       client
108           Booth  clients can list the ticket information (see also crm_ticket
109           -L), and revoke or grant tickets to a site.
110
111           The grant and, under certain circumstances, revoke  operations  may
112           take  a  while to return a definite operation’s outcome. The client
113           will wait up to the network timeout value (by  default  5  seconds)
114           for  the  result.  Unless  the -w option was set, in which case the
115           client waits indefinitely.
116
117           In this mode the configuration file is searched for an  IP  address
118           that  is  locally  reachable, ie. matches a configured subnet. This
119           allows to run the client commands  on  another  node  in  the  same
120           cluster,  as  long as the config file and the service IP is locally
121           reachable.
122
123           For instance, if the booth service IP is  192.168.55.200,  and  the
124           local  node  has  192.168.55.15  configured  on  one of its network
125           interfaces, it knows which site it belongs to.
126
127           Use -s to direct client to connect to a different site.
128
129       status
130           boothd looks for the (locked) PID file and the UDP  socket,  prints
131           some  output  to  stdout  (for use in shell scripts) and returns an
132           OCF-compatible return code. With -D, a  human-readable  message  is
133           printed to STDERR as well.
134
135       peers
136           List the other boothd servers we know about.
137
138           In  addition  to the type, name (IP address), and the last time the
139           server was heard from, network statistics  are  also  printed.  The
140           statistics  are  split  into  two  rows,  the first one consists of
141           counters for the sent packets and the second one for  the  received
142           packets.  The  first  counter  is  the  total number of packets and
143           descriptions of the other counters follows:
144
145       resends
146           Packets which  had  to  be  resent  because  the  recipient  didn’t
147           acknowledge  a  message. This usually means that either the message
148           or the acknowledgement got lost.  The  number  of  resends  usually
149           reflect the network reliability.
150
151       error
152           Packets which either couldn’t be sent, got truncated, or were badly
153           formed. Should be zero.
154
155       invalid
156           These packets contain either invalid or non-existing ticket name or
157           refer to a non-existing ticket leader. Should be zero.
158
159       authfail
160           Packets which couldn’t be authenticated. Should be zero.
161

CONFIGURATION FILE

163       The configuration file must be identical on all sites and arbitrators.
164
165       A minimal file may look like this:
166
167           site="192.168.201.100"
168           site="192.168.202.100"
169           arbitrator="192.168.203.100"
170           ticket="ticket-db8"
171
172       Comments  start with a hash-sign ('#'). Whitespace at the start and end
173       of the line, and around the '=', are ignored.
174
175       The following key/value pairs are defined:
176
177       port
178           The UDP/TCP port to use. Default is 9929.
179
180       transport
181           The transport protocol to use for Raft  exchanges.  Currently  only
182           UDP is supported.
183
184           Clients  use  TCP  to  communicate with a daemon; Booth will always
185           bind and listen to both UDP and TCP ports.
186
187       authfile
188           File containing the authentication  key.  The  key  can  be  either
189           binary or text. If the latter, then both leading and trailing white
190           space, including new lines, is ignored. This key is a shared secret
191           and  used to authenticate both clients and servers. The key must be
192           between 8 and 64 characters long and be readable only by  the  file
193           owner.
194
195       maxtimeskew
196           As  protection  against  replay attacks, packets contain generation
197           timestamps. Such a timestamp is not allowed to be too old. Just how
198           old  can  be specified with this parameter. The value is in seconds
199           and the default is 600 (10 minutes). If clocks vary more than  this
200           default  between sites and nodes (which is definitely something you
201           should fix) then set this parameter to a  higher  value.  The  time
202           skew test is performed only in concert with authentication.
203
204       site
205           Defines  a  site  Raft  member with the given IP. Sites can acquire
206           tickets. The sites' IP should be managed by the cluster.
207
208       arbitrator
209           Defines an arbitrator Raft member with the  given  IP.  Arbitrators
210           help reach consensus in elections and cannot hold tickets.
211
212       Booth  needs at least three members for normal operation. Odd number of
213       members provides more redundancy.
214
215       site-user, site-group, arbitrator-user, arbitrator-group
216           These define the credentials boothd will be running with.
217
218           On  a  (Pacemaker)  site  the  booth  process  will  have  to  call
219           crm_ticket,  so  the default is to use hacluster:'haclient'; for an
220           arbitrator this user and  group  might  not  exists,  so  there  we
221           default to nobody:'nobody'.
222
223       ticket
224           Registers a ticket. Multiple tickets can be handled by single Booth
225           instance.
226
227           Use the special ticket name defaults to modify  the  defaults.  The
228           defaults stanza must precede all the other ticket specifications.
229
230       All times are in seconds.
231
232       expire
233           The  lease  time  for  a  ticket. After that time the ticket can be
234           acquired by another site if the ticket holder is not reachable.
235
236           The default is 600.
237
238       acquire-after
239           Once a ticket is lost, wait this time in addition before  acquiring
240           the ticket.
241
242           This  is  to  allow for the site that lost the ticket to relinquish
243           the resources, by either stopping them or fencing a node.
244
245           A typical delay might be 60 seconds, but ultimately it  depends  on
246           the protected resources and the fencing configuration.
247
248           The default is 0.
249
250       renewal-freq
251           Set the ticket renewal frequency period.
252
253           If the network reliability is often reduced over prolonged periods,
254           it is advisable to try to renew more often.
255
256           Before every renewal, if defined, the command or commands specified
257           in  before-acquire-handler  is  run.  In that case the renewal-freq
258           parameter  is  effectively  also  the  local   cluster   monitoring
259           interval.
260
261       timeout
262           After  that  time  booth  will  re-send  packets  if  there  was an
263           insufficient number of replies. This should be long enough to allow
264           packets to reach other members.
265
266           The default is 5.
267
268       retries
269           Defines  how  many  times to retry sending packets before giving up
270           waiting for acks from other members.
271
272           Default is 10. Values lower than 3 are illegal.
273
274           Ticket renewals should allow for this number of retries. Hence, the
275           total retry time must be shorter than the renewal time (either half
276           the expire time or renewal-freq):
277
278               timeout*(retries+1) < renewal
279
280       weights
281           A comma-separated list  of  integers  that  define  the  weight  of
282           individual  Raft  members,  in  the  same  order  as  the  site and
283           arbitrator lines.
284
285           Default  is  0  for  all;  this  means  that  the  order   in   the
286           configuration file defines priority for conflicting requests.
287
288       before-acquire-handler
289           If set, this parameter specifies either a file containing a program
290           to be run or a directory where a number  of  programs  can  reside.
291           They  are invoked before boothd tries to acquire or renew a ticket.
292           If any of them exits with a code other than 0, boothd  relinquishes
293           the ticket.
294
295           Thus  it  is  possible  to  ensure  whether  the  services  and its
296           dependencies protected by the ticket are  in  good  shape  at  this
297           site.  For  instance,  if  a  service in the dependency-chain has a
298           failcount of INFINITY on all available nodes, the service  will  be
299           unable to run. In that case, it is of no use to claim the ticket.
300
301           One or more arguments may follow the program or directory location.
302           Typically, there is at least the name of one of the resources which
303           depend on this ticket.
304
305           See  below  for details about booth specific environment variables.
306           The distributed service-runnable script is an example which may  be
307           used to test whether a pacemaker resource can be started.
308
309       attr-prereq
310           Sites can have GEO attributes managed with the geostore(8) program.
311           Attributes are within ticket’s scope and may be  tested  by  boothd
312           for  additional  control  of  ticket failover (automatic) or ticket
313           acquire (manual).
314
315           Attributes are typically used to  convey  extra  information  about
316           resources, for instance database replication status. The attributes
317           are commonly updated by resource agents.
318
319           Attribute values are referenced in expressions and  may  be  tested
320           for  equality with the eq binary operator or inequality with the ne
321           operator. The usage is as follows:
322
323               attr-prereq = <grant_type> <name> <op> <value>
324
325               <grant_type>: "auto" | "manual"
326               <name>:       attribute name
327               <op>:         "eq" | "ne"
328               <value>:      attribute value
329
330           The two grant types are auto for ticket  failover  and  manual  for
331           grants  using  the  booth  client.  Only  in  case  the  expression
332           evaluates to true can the ticket be granted.
333
334           It is not clear whether the manual grant type has any practical use
335           because, obviously, this operation is anyway controlled by a human.
336
337           Note  that  there can be no guarantee on whether an attribute value
338           is up to date, i.e. if it actually reflects the current state.
339
340       mode
341           Specifies if the ticket is manual or automatic.
342
343           By default all tickets are  automatic  (that  is,  they  are  fully
344           controlled  by  Raft  algorithm).  Assign  the  strings "manual" or
345           "MANUAL" to define the ticket as manually controlled.
346
347       One example of a booth configuration file:
348
349           transport = udp
350           port = 9930
351
352           # D-85774
353           site="192.168.201.100"
354           # D-90409
355           site="::ffff:192.168.202.100"
356           # A-1120
357           arbitrator="192.168.203.100"
358
359           ticket="ticket-db8"
360               expire        = 600
361               acquire-after = 60
362               timeout       = 10
363               retries       = 5
364               renewal-freq  = 60
365               before-acquire-handler = /usr/share/booth/service-runnable db8
366               attr-prereq = auto repl_state eq ACTIVE
367

BOOTH TICKET MANAGEMENT

369       The booth cluster guarantees that every ticket is  owned  by  only  one
370       site at the time.
371
372       Tickets  must be initially granted with the booth client grant command.
373       Once it gets granted, the ticket  is  managed  by  the  booth  cluster.
374       Hence, only granted tickets are managed by booth.
375
376       If  the  ticket  gets  lost,  i.e.  that the other members of the booth
377       cluster do not hear from the ticket owner in a sufficiently long  time,
378       one  of  the  remaining  sites will acquire the ticket. This is what is
379       called ticket failover.
380
381       If the remaining members cannot form a majority, then the ticket cannot
382       fail over.
383
384       A  ticket  may  be  revoked  at  any  time with the booth client revoke
385       command. For revoke to succeed, the site holding  the  ticket  must  be
386       reachable.
387
388       Once  the  ticket is administratively revoked, it is not managed by the
389       booth cluster anymore. For the booth  cluster  to  start  managing  the
390       ticket again, it must be again granted to a site.
391
392       The  grant  operation,  in  case  not  all sites are reachable, may get
393       delayed for the ticket expire time (and, if defined, the  acquire-after
394       time).  The  reason is that the other booth members may not know if the
395       ticket is currently granted at the unreachable site.
396
397       This delay may be disabled with the -F option. In that case, it  is  up
398       to  the  administrator  to  make  sure that the unreachable site is not
399       holding the ticket.
400
401       When the ticket is managed by booth,  it  is  dangerous  to  modify  it
402       manually using either crm_ticket command or crm site ticket. Neither of
403       these tools is aware of booth and, consequently, booth itself  may  not
404       be  aware  of any ticket status changes. A notable exception is setting
405       the ticket  to  standby  which  is  typically  done  before  a  planned
406       failover.
407

NOTES

409       Tickets  are  not  meant to be moved around quickly, the default expire
410       time is 600 seconds (10 minutes).
411
412       booth works with both IPv4 and IPv6 addresses.
413
414       booth renews a ticket  before  it  expires,  to  account  for  possible
415       transmission delays. The renewal time, unless explicitly set, is set to
416       half the expire time.
417

HANDLERS

419       Currently,  there’s  only  one  external  handler  defined   (see   the
420       before-acquire-handler configuration item above).
421
422       The following environment variables are exported to the handler:
423
424       *BOOTH_TICKET
425           The  ticket  name,  as given in the configuration file. (See ticket
426           item above.)
427
428       *BOOTH_LOCAL
429           The local site name, as defined in site.
430
431       *BOOTH_CONF_PATH
432           The path to the active configuration file.
433
434       *BOOTH_CONF_NAME
435           The configuration name, as used by the -c commandline argument.
436
437       *BOOTH_TICKET_EXPIRES
438           When the ticket expires (in seconds since 1.1.1970), or 0.
439
440       The handler is invoked with positional arguments specified after it.
441

FILES

443       /etc/booth/booth.conf
444           The default configuration file name. See also the -c argument.
445
446       /etc/booth/authkey
447           There is no default, but this is a typical location for the  shared
448           secret (authentication key).
449
450       /var/run/booth/
451           Directory that holds PID/lock files. See also the status command.
452

RAFT IMPLEMENTATION

454       In essence, every ticket corresponds to a separate Raft cluster.
455
456       A  ticket  is granted to the Raft Leader which then owns (or keeps) the
457       ticket.
458

ARBITRATOR MANAGEMENT

460       The booth daemon for an arbitrator  which  typically  doesn’t  run  the
461       cluster    stack,    may   be   started   through   systemd   or   with
462       /etc/init.d/booth-arbitrator,  depending  on  which  init  system   the
463       platform supports.
464
465       The  SysV init script starts a booth arbitrator for every configuration
466       file found in /etc/booth.
467
468       Platforms running systemd can  enable  and  start  every  configuration
469       separately using systemctl:
470
471           # systemctl enable booth@<configurationname>
472           # systemctl start  booth@<configurationname>
473
474       systemctl  requires  the  configuration name, even for the default name
475       booth.
476
477           MANUAL TICKETS
478
479       Manual tickets allow users to  create  and  manage  tickets  which  are
480       subsequently  handled  by  booth  without  using  the  Raft  algorithm.
481       Granting and  revoking  manual  tickets  is  fully  controlled  by  the
482       administrator.  It  is possible to define a number of manual and normal
483       tickets in one GEO cluster.
484
485       Automatic ticket management provided by Raft algorithm isn’t applied to
486       manually  controlled  tickets.  In  particular,  there is no elections,
487       automatic failover procedures, and term expiration.
488
489       However, booth controls if a ticket is currently being granted  to  any
490       site and warns the user approprietly.
491
492       Tickets  which were manually granted to a site, will remain there until
493       they are manually revoked. Even if a site becomes offline,  the  ticket
494       will  not be moved to another site. This behavior allows administrators
495       to make sure that some services will remain in a  particular  site  and
496       will  not  be  moved  to  another site, possibly located in a different
497       geographical location.
498
499       Also, configuring only manual tickets in a GEO cluster, allows to  have
500       just  two  sites  in a cluster, without a need of having an arbitrator.
501       This is possible because there is no automatic elections and no  voting
502       performed for manual tickets.
503
504       Manual  tickets  are  defined in a configuration files by adding a mode
505       ticket parameter and setting it to manual or MANUAL:
506
507       ticket="manual-ticket"
508           [...]
509           mode = manual
510           [...]
511
512       Manual tickets can be granted and revoked by  using  normal  grant  and
513       revoke  commands,  with  the  usual  flags  and  parameters.  The  only
514       difference is that specyfiyng -F flag during grant  command,  forced  a
515       site  to become a leader of the specified ticket, even if the ticket is
516       granted to another site.
517

EXIT STATUS

519       0
520           Success. For the status command: Daemon running.
521
522       1 (PCMK_OCF_UNKNOWN_ERROR)
523           General error code.
524
525       7 (PCMK_OCF_NOT_RUNNING)
526           No daemon process for that configuration active.
527

BUGS

529       Booth is  tested  regularly.  See  the  README-testing  file  for  more
530       information.
531
532       Please report any bugs either at GitHub: <https://github.com/
533       ClusterLabs/booth/issues>
534
535       Or, if you prefer  bugzilla,  at  openSUSE  bugzilla  (component  "High
536       Availability"): <https://bugzilla.opensuse.org/
537       enter_bug.cgi?product=openSUSE%20Factory>
538

AUTHOR

540       boothd was originally written (mostly) by Jiaju Zhang.
541
542       In 2013 and 2014 Philipp Marek took over maintainership.
543
544       Since April 2014 it has been mainly developed by Dejan Muhamedagic.
545
546       Many people contributed (see the AUTHORS file).
547

RESOURCES

549       GitHub: <https://github.com/ClusterLabs/booth>
550
551       Documentation: <http://doc.opensuse.org/products/draft/SLE-HA/
552       SLE-ha-guide_sd_draft/cha.ha.geo.html>
553

COPYING

555       Copyright © 2011 Jiaju Zhang <jjzhang@suse.de <jjzhang@suse.de>>
556
557       Copyright   ©   2013-2014   Philipp   Marek   <philipp.marek@linbit.com
558       <philipp.marek@linbit.com>>
559
560       Copyright   ©    2014    Dejan    Muhamedagic    <dmuhamedagic@suse.com
561       <dmuhamedagic@suse.com>>
562
563       Free use of this software is granted under the terms of the GNU General
564       Public License (GPL) as of version 2 (see COPYING file) or later.
565
566
567
568                                  2018-06-27                         BOOTHD(8)
Impressum