1FENCE_SANLOCK(8)            System Manager's Manual           FENCE_SANLOCK(8)
2
3
4

NAME

6       fence_sanlock - fence agent using watchdog and shared storage leases
7
8

SYNOPSIS

10       fence_sanlock [OPTIONS]
11
12

DESCRIPTION

14       fence_sanlock  uses  the watchdog device to reset nodes, in conjunction
15       with three daemons: fence_sanlockd, sanlock, and wdmd.
16
17       The watchdog device, controlled  through  /dev/watchdog,  is  available
18       when a watchdog kernel module is loaded.  A module should be loaded for
19       the available hardware.  If no hardware watchdog is  available,  or  no
20       module is loaded, the "softdog" module will be loaded, which emulates a
21       hardware watchdog device.
22
23       Shared storage must be configured for sanlock to use  from  all  hosts.
24       This is generally an lvm lv (non-clustered), but could be another block
25       device, or NFS file.  The storage should  be  1GB  of  fully  allocated
26       space.   After  being created, the storage must be initialized with the
27       command:
28       # fence_sanlock -o sanlock_init -p /path/to/storage
29
30       The fence_sanlock agent uses sanlock leases on shared storage to verify
31       that  hosts  have been reset, and to notify fenced nodes that are still
32       running, that they should be reset.
33
34       The fence_sanlockd init script starts the wdmd, sanlock and  fence_san‐
35       lockd  daemons  before the cluster or fencing systems are started (e.g.
36       cman, corosync and fenced).  The fence_sanlockd daemon is started  with
37       the  -w  option so it waits for the path and host_id options to be pro‐
38       vided when they are available.
39
40       Unfencing must be configured for fence_sanlock  in  cluster.conf.   The
41       cman init script does unfencing by running fence_node -U, which in turn
42       runs fence_sanlock with the "on" action and local path and host_id val‐
43       ues taken from cluster.conf.  fence_sanlock in turn passes the path and
44       host_id values to the waiting fence_sanlockd daemon.  With  these  val‐
45       ues, fence_sanlockd joins the sanlock lockspace and acquires a resource
46       lease for the local host.  It can  take  several  minutes  to  complete
47       these unfencing steps.
48
49       Once  unfencing  is  complete,  the  node  is  a  member of the sanlock
50       lockspace named "fence" and the node's fence_sanlockd process  holds  a
51       resource  lease  named "hN", where N is the node's host_id.  (To verify
52       this, run the commands "sanlock  client  status"  and  "sanlock  client
53       host_status",  which  show  state  from the sanlock daemon, or "sanlock
54       direct dump <path>" which shows state from shared storage.)
55
56       When fence_sanlock fences a node,  it  tries  to  acquire  that  node's
57       resource  lease.  sanlock will not grant the lease until the owner (the
58       node being fenced) has been reset by its watchdog device.  The time  it
59       takes  to  acquire  the  lease  is  140  seconds from the victim's last
60       lockspace renewal timestamp on the shared storage.  Once acquired,  the
61       victim's lease is released, and fencing completes successfully.
62
63       Live nodes being fenced
64
65       When  a  live node is being fenced, fence_sanlock will continually fail
66       to acquire the victim's lease, because the victim  continues  to  renew
67       its  lockspace  membership  on storage, and the fencing node sees it is
68       alive.  This is by design.  As long as the victim  is  alive,  it  must
69       continue to renew its lockspace membership on storage.  The victim must
70       not allow the remote fence_sanlock to acquire its lease and consider it
71       fenced while it is still alive.
72
73       At  the  same  time,  a  victim  knows that when it is being fenced, it
74       should be reset to avoid blocking recovery of the rest of the  cluster.
75       To communicate this, fence_sanlock makes a "request" on storage for the
76       victim's resource lease.  On the victim,  fence_sanlockd,  which  holds
77       the  resource  lease,  is configured to receive SIGUSR1 from sanlock if
78       anyone requests its lease.  Upon receiving the  signal,  fence_sanlockd
79       knows that it is a fencing victim.  In response to this, fence_sanlockd
80       allows its wdmd connection to expire, which in turn causes the watchdog
81       device to fire, resetting the node.
82
83       The  watchdog reset will obviously have the effect of stopping the vic‐
84       tim's  lockspace  membership  renewals.   Once   the   renewals   stop,
85       fence_sanlock  will finally be able to acquire the victim's lease after
86       waiting a fixed time from the final lockspace renewal.
87
88       Loss of shared storage
89
90       If access to shared storage with sanlock leases is lost for 80 seconds,
91       sanlock  is  not  able  to  renew  the lockspace membership, and enters
92       recovery.   This  causes  sanlock  clients  holding  leases,  such   as
93       fence_sanlockd,  to  be  notified that their leases are being lost.  In
94       response, fence_sanlockd must reset the node, much as if it  was  being
95       fenced.
96
97       Daemons killed/crashed/hung
98
99       If  sanlock,  fence_sanlockd daemons are killed abnormally, or crash or
100       hang, their wdmd connections will expire, causing the  watchdog  device
101       to fire, resetting the node.  fence_sanlock from another node will then
102       run and acquire the victim's resource lease.  If  the  wdmd  daemon  is
103       killed  abnormally  or  crashes  or hangs, it will not pet the watchdog
104       device, causing it to fire and reset the node.
105
106       Time Values
107
108       The specific times periods referenced above, e.g. 140, 80, are based on
109       the  default  sanlock i/o timeout of 10 seconds.  If sanlock is config‐
110       ured to use a different i/o timeout, these numbers will be different.
111
112

OPTIONS

114       -o action
115           The agent action:
116
117
118              on
119              Enable the local node to be fenced.  Used by unfencing.
120
121
122              off
123              Disable another node.
124
125
126              status
127              Test if a node is on or off.  A node is  on  if  it's  lease  is
128              held, and off is it's lease is free.
129
130
131              metadata
132              Print xml description of required parameters.
133
134
135              sanlock_init
136              Initialize sanlock leases on shared storage.
137
138
139       -p path
140           The path to shared storage with sanlock leases.
141
142
143       -i host_id
144           The host_id, from 1-128.
145
146

STDIN PARAMETERS

148       Options  can be passed on stdin, with the format key=val.  Each key=val
149       pair is separated by a new line.
150
151       action=on|off|status
152       See -o
153
154       path=/path/to/shared/storage
155       See -p
156
157       host_id=num
158       See -i
159
160

FILES

162       Example cluster.conf configuration for fence_sanlock.
163       (For cman based clusters in which fenced runs agents.)
164       Also see cluster.conf(5), fenced(8), fence_node(8).
165
166       <clusternode name="node01" nodeid="1">
167               <fence>
168               <method name="1">
169               <device name="wd" host_id="1"/>
170               </method>
171               </fence>
172               <unfence>
173               <device name="wd" host_id="1" action="on"/>
174               </unfence>
175       </clusternode>
176
177       <clusternode name="node02" nodeid="2">
178               <fence>
179               <method name="1">
180               <device name="wd" host_id="2"/>
181               </method>
182               </fence>
183               <unfence>
184               <device name="wd" host_id="2" action="on"/>
185               </unfence>
186       </clusternode>
187
188       <fencedevice name="wd" agent="fence_sanlock" path="/dev/fence/leases"/>
189
190
191       Example dlm.conf configuration for fence_sanlock.
192       (For non-cman based clusters in which dlm_controld runs agents.)
193       Also see dlm.conf(5), dlm_controld(8).
194
195       device wd /usr/sbin/fence_sanlock path=/dev/fence/leases
196       connect wd node=1 host_id=1
197       connect wd node=2 host_id=2
198       unfence wd
199
200

TEST

202       To test fence_sanlock directly, without clustering:
203
204       1. Initialize storage
205
206       node1: create 1G lv on shared storage /dev/fence/leases
207       node1: fence_sanlock -o sanlock_init -p /dev/fence/leases
208
209       2. Start services
210
211       node1: service fence_sanlockd start
212       node2: service fence_sanlockd start
213
214       3. Enable fencing
215
216       node1: fence_sanlock -o on -p /dev/fence/leases -i 1
217       node2: fence_sanlock -o on -p /dev/fence/leases -i 2
218
219       This "unfence" step may take a couple minutes.
220
221       4. Verify hosts and leases
222
223       node1: sanlock status
224       s fence:1:/dev/fence/leases:0
225       r fence:h1:/dev/fence/leases:1048576:1 p 2465
226
227       node2: sanlock status
228       s fence:2:/dev/fence/leases:0
229       r fence:h2:/dev/fence/leases:2097152:1 p 2366
230
231       node1: sanlock host_status
232       lockspace fence
233       1 timestamp 717
234       2 timestamp 678
235
236       node2: sanlock host_status
237       lockspace fence
238       1 timestamp 738
239       2 timestamp 678
240
241       5. Fence node2
242
243       node1: fence_sanlock -o off -p /dev/fence/leases -i 2
244
245       This may take a few minutes to return.
246
247       When node2 is not dead before fencing, sanlock on node1 will log errors
248       about failing to acquire the lease while node2 is still alive.  This is
249       expected.
250
251       6. Success
252
253       node1 fence_sanlock should exit 0 after node2 is reset by its watchdog.
254
255
256

SEE ALSO

258       fence_sanlockd(8), sanlock(8), wdmd(8)
259
260
261
262
263                                  2013-05-02                  FENCE_SANLOCK(8)
Impressum