1OCF_LINBIT_DRBD(7) OCF resource agents OCF_LINBIT_DRBD(7)
2
3
4
6 ocf_linbit_drbd - Manages a DRBD device as a Master/Slave resource
7
9 drbd [start | stop | monitor | promote | demote | meta-data |
10 validate-all]
11
13 This resource agent manages a DRBD resource as a master/slave resource.
14 DRBD is a shared-nothing replicated storage device.
15
16 NOTE: To avoid data-divergence, you should enable either DRBD "quorum"
17 and "on-no-quorum io-error" (recommended), or configure proper fencing
18 policies in both DRBD *and* Pacemaker (fencing resource-and-stonith).
19 This cannot be done from this resource agent alone.
20
21 See the DRBD User's Guide for more information.
22 https://docs.linbit.com/
23
25 drbd_resource
26 The name of the drbd resource from the drbd.conf file.
27
28 (unique, required, string, no default)
29
30 drbdconf
31 Full path to the drbd.conf file.
32
33 (optional, string, default "/etc/drbd.conf")
34
35 adjust_master_score
36 Space separated list of four master score adjustments for different
37 scenarios: - only access to 'consistent' data - only remote access
38 to 'uptodate' data - currently Secondary, local access to
39 'uptodate' data, but remote is unknown - local access to 'uptodate'
40 data, and currently Primary or remote is known
41
42 Numeric values are expected to be non-decreasing.
43
44 The first value is 0 by default to prevent pacemaker from trying to
45 promote while it is unclear whether the data is really the most
46 recent copy. (DRBD knows it is "consistent", but is unsure about
47 "uptodate"ness). Please configure proper fencing methods both in
48 DRBD (fencing resource-and-stonith; appropriate (un)fence-peer
49 handlers) AND in Pacemaker to make this work reliably.
50
51 Advanced use: Adjust the other values to better fit into complex
52 dependency score calculations.
53
54 Intentionally diskless nodes ("Diskless Clients") with access to
55 good data via some (or all) their peers will use the 3rd or 4th
56 value (minus one) when they are (Secondary, not all peers
57 up-to-date) or (ALL peers are up-to-date, or they are Primary
58 themselves). This may need to change if this should become a
59 frequent use case.
60
61 Special considerations:
62
63 If a Secondary DRBD is connected to a peer in Primary role, but
64 Pacemaker does not know about any Primary (using crm_resource
65 --locate), we conclude that there likely is a cluster-split-brain,
66 and may try to "help" Pacemaker by removing the master-score. Also
67 see "remove_master_score_if_peer_primary".
68
69 (optional, string, default "0 10 1000 10000")
70
71 stop_outdates_secondary
72 Recommended setting: leave at default (disabled).
73
74 Note that this feature depends on the passed in information in
75 OCF_RESKEY_CRM_meta_notify_master_uname to be correct, which
76 unfortunately is not reliable for pacemaker versions up to at least
77 1.0.10 / 1.1.4.
78
79 If a Secondary is stopped (unconfigured), it may be marked as
80 outdated in the drbd meta data, if we know there is still a Primary
81 running in the cluster. Note that this does not affect fencing
82 policies set in drbd config, but is an additional safety feature of
83 this resource agent only. You can enable this behaviour by setting
84 the parameter to true.
85
86 If this feature seems to not do what you expect, make sure you have
87 defined fencing policies in the drbd configuration as well.
88
89 (optional, boolean, default false)
90
91 ignore_missing_notifications
92 Some setups do not benefit from notifications. Allow to disable
93 notifications without patching this resource agent.
94
95 (optional, boolean, default false)
96
97 wfc_timeout
98 Unless set to the empty string or any non-digits, wait (at most)
99 this many seconds for the connection(s) to be established after
100 bringing them up during "start".
101
102 (optional, numeric, default 5)
103
104 remove_master_score_if_peer_primary
105 See also "adjust_master_score" and
106 "fail_promote_early_if_peer_primary".
107
108 To prevent a potentially failed promotion attempt in case of
109 cluster split-brain (Pacemaker communication loss) while DRBD is
110 still connected to a Primary, you can request to remove any master
111 score while DRBD is connected to a Primary (and that Primary peer
112 looks like it has all disks up-to-date).
113
114 This may delay legitimate failovers after Primary crash by up to
115 some TCP timeout (until DRBD realizes that the Primary is gone)
116 plus one monitoring interval.
117
118 This parameter is interpreted almost as an "ocf boolean", with the
119 exception of a literal "unexpected", that is:
120
121 - (yes|true|1) [actually, according to the OCF spec, also
122 (YES|TRUE|True|ja|ON), but please don't go there]: is "true":
123 remove (or never assign) master scores, if DRBD appears to see a
124 (healthy) Primary
125
126 - "unexpected": assign master scores as described under
127 "adjust_master_score", while removing it if DRBD appears to see a
128 (healthy) Primary that Pacemaker does not know about (as determined
129 by crm_resource --locate).
130
131 - everything else is "false": ignore the peer role while assigning
132 master scores.
133
134 (optional, string, default "false")
135
136 fail_promote_early_if_peer_primary
137 See also "adjust_master_score" and
138 "remove_master_score_if_peer_primary".
139
140 To avoid a useless retry loop during promotion attempts in case of
141 cluster split-brain (Pacemaker communication loss) while DRBD is
142 still connected to a Primary, you can chose to give up after the
143 first try if this situation is detected.
144
145 If a Primary "vanishes", TCP may not immediately detect this, and
146 an idle DRBD may take some time until it does in-DRBD-protocol
147 "pings". Pacemaker may well detect Primary loss earlier than DRBD,
148 and try to promote while DRBD thinks it can still see a Primary.
149 Which means, in general, trying to promote at least once is
150 necessary, as that implies an in-DRBD-protocol "peer alive" check.
151
152 But if that does not succeed, re-trying until we hit the operation
153 timeout may not be desired, so you can disable it.
154
155 (optional, boolean, default false)
156
157 unfence_if_all_uptodate
158 If all volumes of this resource report to be UpToDate, call an
159 unfence script hook, just in case some stale fencing constraint or
160 similar is still around.
161
162 - With DRBD utils version <= 8.9.4, this is hardcoded to
163 /usr/lib/drbd/crm-unfence-peer.sh -r $DRBD_RESOURCE
164
165 - With DRBD utils version >= 8.9.5, this is dispatched to $DRBDADM
166 unfence-peer $DRBD_RESOURCE
167
168 In any case, the hook itself is responsible to fetch
169 $OCF_RESKEY_unfence_extra_args from its environment.
170
171 (optional, boolean, default false)
172
173 unfence_extra_args
174 This may be used to pass extra hints to the unfence hook. See
175 description of unfence_if_all_uptodate.
176
177 (optional, boolean, default --quiet --flock-required
178 --flock-timeout 0 --unfence-only-if-owner-match)
179
180 require_drbd_module_version_ge
181 Use this you want to force failure of this resource agent if the
182 detected DRBD kernel (module) driver version is lower than a
183 required minimum.
184
185 Example: use require_drbd_module_version_ge=9.0.16 to fail unless
186 DRBD module version >= 9.0.16 is available (effectively requires
187 DRBD 9).
188
189 The intention of this is to give a more useful failure message
190 after accidentally downgrading the DRBD version by
191 installing/upgrading a new kernel.
192
193 Note: "ge", "greater-or-equal", inclusive. Required format: x.y.z
194
195 (optional, string, no default)
196
197 require_drbd_module_version_lt
198 Use this you want to force failure of this resource agent if the
199 detected DRBD kernel (module) driver version is higher than a
200 required maximum.
201
202 Example: use require_drbd_module_version_lt=9.0.0 to fail unless
203 DRBD module version < 9.0 is available (effectively requires DRBD
204 8.4).
205
206 Note: "lt", "less-than", exclusive. Required format: x.y.z
207
208 (optional, string, no default)
209
210 connect_only_after_promote
211 This may be useful for "stacked" setups without proper fencing on
212 the lower layer (which we obviously do not recommend), to avoid
213 some of the ugly side effects that may arise after resolving a
214 split-brain on the lower layer.
215
216 Keep this DRBD instance disconnected until it is promoted. After
217 promotion we issue an additional "adjust", which is supposed to
218 initiate the connection attempts.
219
220 This causes a new data generation identifier ("current uuid") to be
221 generated after the failover of a "healthy" DRBD.
222
223 (optional, boolean, default false)
224
226 This resource agent supports the following actions (operations):
227
228 start
229 Starts the resource. Suggested minimum timeout: 240.
230
231 reload
232 Suggested minimum timeout: 30.
233
234 promote
235 Promotes the resource to the Master role. Suggested minimum
236 timeout: 90.
237
238 demote
239 Demotes the resource to the Slave role. Suggested minimum timeout:
240 90.
241
242 notify
243 Suggested minimum timeout: 90.
244
245 stop
246 Stops the resource. Suggested minimum timeout: 100.
247
248 monitor (Slave role)
249 Performs a detailed status check. Suggested minimum timeout: 20.
250 Suggested interval: 20.
251
252 monitor (Master role)
253 Performs a detailed status check. Suggested minimum timeout: 20.
254 Suggested interval: 10.
255
256 meta-data
257 Retrieves resource agent metadata (internal use only). Suggested
258 minimum timeout: 5.
259
260 validate-all
261 Performs a validation of the resource configuration.
262
264 The following is an example configuration for a drbd resource using the
265 crm(8) shell:
266
267 primitive p_drbd ocf:linbit:drbd \
268 params \
269 drbd_resource=string \
270 op monitor timeout="20" interval="20" role="Slave" \
271 op monitor timeout="20" interval="10" role="Master"
272
273 ms ms_drbd p_drbd \
274 meta notify="true" interleave="true"
275
277 The following is an example configuration for a drbd resource using
278 pcs(8)
279
280 pcs resource create p_drbd ocf:linbit:drbd \
281 drbd_resource=string \
282 op monitor timeout="20" interval="20" role="Slave" \
283 op monitor timeout="20" interval="10" role="Master" --master
284
286 https://docs.linbit.com/, https://clusterlabs.org/,
287 https://www.linbit.com/drbd-community/
288
290 LINBIT HA Solutions GmbH
291
292
293
294drbd-pacemaker 9.12.2 04/29/2020 OCF_LINBIT_DRBD(7)