1DRBDSETUP(8) System Administration DRBDSETUP(8)
2
3
4
6 drbdsetup - Configure the DRBD kernel module
7
9 drbdsetup command {argument...} [option...]
10
12 The drbdsetup utility serves to configure the DRBD kernel module and to
13 show its current configuration. Users usually interact with the drbdadm
14 utility, which provides a more high-level interface to DRBD than
15 drbdsetup. (See drbdadm's --dry-run option to see how drbdadm uses
16 drbdsetup.)
17
18 Some option arguments have a default scale which applies when a plain
19 number is specified (for example Kilo, or 1024 times the numeric
20 value). Such default scales can be overridden by using a suffix (for
21 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
22 and G = 1024 M are supported.
23
25 drbdsetup attach minor lower_dev meta_data_dev meta_data_index,
26 drbdsetup disk-options minor
27 The attach command attaches a lower-level device to an existing
28 replicated device. The disk-options command changes the disk
29 options of an attached lower-level device. In either case, the
30 replicated device must have been created with drbdsetup new-minor.
31
32 Both commands refer to the replicated device by its minor number.
33 lower_dev is the name of the lower-level device. meta_data_dev is
34 the name of the device containing the metadata, and may be the same
35 as lower_dev. meta_data_index is either a numeric metadata index,
36 or the keyword internal for internal metadata, or the keyword
37 flexible for variable-size external metadata. Available options:
38
39 --al-extents extents
40 DRBD automatically maintains a "hot" or "active" disk area
41 likely to be written to again soon based on the recent write
42 activity. The "active" disk area can be written to immediately,
43 while "inactive" disk areas must be "activated" first, which
44 requires a meta-data write. We also refer to this active disk
45 area as the "activity log".
46
47 The activity log saves meta-data writes, but the whole log must
48 be resynced upon recovery of a failed node. The size of the
49 activity log is a major factor of how long a resync will take
50 and how fast a replicated disk will become consistent after a
51 crash.
52
53 The activity log consists of a number of 4-Megabyte segments;
54 the al-extents parameter determines how many of those segments
55 can be active at the same time. The default value for
56 al-extents is 1237, with a minimum of 7 and a maximum of 65536.
57
58 Note that the effective maximum may be smaller, depending on
59 how you created the device meta data, see also drbdmeta(8) The
60 effective maximum is 919 * (available on-disk activity-log
61 ring-buffer area/4kB -1), the default 32kB ring-buffer effects
62 a maximum of 6433 (covers more than 25 GiB of data) We
63 recommend to keep this well within the amount your backend
64 storage and replication link are able to resync inside of about
65 5 minutes.
66
67 --al-updates {yes | no}
68 With this parameter, the activity log can be turned off
69 entirely (see the al-extents parameter). This will speed up
70 writes because fewer meta-data writes will be necessary, but
71 the entire device needs to be resynchronized opon recovery of a
72 failed primary node. The default value for al-updates is yes.
73
74 --disk-barrier,
75 --disk-flushes,
76 --disk-drain
77 DRBD has three methods of handling the ordering of dependent
78 write requests:
79
80 disk-barrier
81 Use disk barriers to make sure that requests are written to
82 disk in the right order. Barriers ensure that all requests
83 submitted before a barrier make it to the disk before any
84 requests submitted after the barrier. This is implemented
85 using 'tagged command queuing' on SCSI devices and 'native
86 command queuing' on SATA devices. Only some devices and
87 device stacks support this method. The device mapper (LVM)
88 only supports barriers in some configurations.
89
90 Note that on systems which do not support disk barriers,
91 enabling this option can lead to data loss or corruption.
92 Until DRBD 8.4.1, disk-barrier was turned on if the I/O
93 stack below DRBD did support barriers. Kernels since
94 linux-2.6.36 (or 2.6.32 RHEL6) no longer allow to detect if
95 barriers are supported. Since drbd-8.4.2, this option is
96 off by default and needs to be enabled explicitly.
97
98 disk-flushes
99 Use disk flushes between dependent write requests, also
100 referred to as 'force unit access' by drive vendors. This
101 forces all data to disk. This option is enabled by default.
102
103 disk-drain
104 Wait for the request queue to "drain" (that is, wait for
105 the requests to finish) before submitting a dependent write
106 request. This method requires that requests are stable on
107 disk when they finish. Before DRBD 8.0.9, this was the only
108 method implemented. This option is enabled by default. Do
109 not disable in production environments.
110
111 From these three methods, drbd will use the first that is
112 enabled and supported by the backing storage device. If all
113 three of these options are turned off, DRBD will submit write
114 requests without bothering about dependencies. Depending on the
115 I/O stack, write requests can be reordered, and they can be
116 submitted in a different order on different cluster nodes. This
117 can result in data loss or corruption. Therefore, turning off
118 all three methods of controlling write ordering is strongly
119 discouraged.
120
121 A general guideline for configuring write ordering is to use
122 disk barriers or disk flushes when using ordinary disks (or an
123 ordinary disk array) with a volatile write cache. On storage
124 without cache or with a battery backed write cache, disk
125 draining can be a reasonable choice.
126
127 --disk-timeout
128 If the lower-level device on which a DRBD device stores its
129 data does not finish an I/O request within the defined
130 disk-timeout, DRBD treats this as a failure. The lower-level
131 device is detached, and the device's disk state advances to
132 Diskless. If DRBD is connected to one or more peers, the failed
133 request is passed on to one of them.
134
135 This option is dangerous and may lead to kernel panic!
136
137 "Aborting" requests, or force-detaching the disk, is intended
138 for completely blocked/hung local backing devices which do no
139 longer complete requests at all, not even do error completions.
140 In this situation, usually a hard-reset and failover is the
141 only way out.
142
143 By "aborting", basically faking a local error-completion, we
144 allow for a more graceful swichover by cleanly migrating
145 services. Still the affected node has to be rebooted "soon".
146
147 By completing these requests, we allow the upper layers to
148 re-use the associated data pages.
149
150 If later the local backing device "recovers", and now DMAs some
151 data from disk into the original request pages, in the best
152 case it will just put random data into unused pages; but
153 typically it will corrupt meanwhile completely unrelated data,
154 causing all sorts of damage.
155
156 Which means delayed successful completion, especially for READ
157 requests, is a reason to panic(). We assume that a delayed
158 *error* completion is OK, though we still will complain noisily
159 about it.
160
161 The default value of disk-timeout is 0, which stands for an
162 infinite timeout. Timeouts are specified in units of 0.1
163 seconds. This option is available since DRBD 8.3.12.
164
165 --md-flushes
166 Enable disk flushes and disk barriers on the meta-data device.
167 This option is enabled by default. See the disk-flushes
168 parameter.
169
170 --on-io-error handler
171 Configure how DRBD reacts to I/O errors on a lower-level
172 device. The following policies are defined:
173
174 pass_on
175 Change the disk status to Inconsistent, mark the failed
176 block as inconsistent in the bitmap, and retry the I/O
177 operation on a remote cluster node.
178
179 call-local-io-error
180 Call the local-io-error handler (see the handlers section).
181
182 detach
183 Detach the lower-level device and continue in diskless
184 mode.
185
186
187 --read-balancing policy
188 Distribute read requests among cluster nodes as defined by
189 policy. The supported policies are prefer-local (the default),
190 prefer-remote, round-robin, least-pending,
191 when-congested-remote, 32K-striping, 64K-striping,
192 128K-striping, 256K-striping, 512K-striping and 1M-striping.
193
194 This option is available since DRBD 8.4.1.
195
196 Note: the when-congested-remote option has no effect on Linux
197 kernel 5.18 or above. It is deprecated starting from DRBD
198 9.1.12.
199
200 resync-after minor
201 Define that a device should only resynchronize after the
202 specified other device. By default, no order between devices is
203 defined, and all devices will resynchronize in parallel.
204 Depending on the configuration of the lower-level devices, and
205 the available network and disk bandwidth, this can slow down
206 the overall resync process. This option can be used to form a
207 chain or tree of dependencies among devices.
208
209 --size size
210 Specify the size of the lower-level device explicitly instead
211 of determining it automatically. The device size must be
212 determined once and is remembered for the lifetime of the
213 device. In order to determine it automatically, all the
214 lower-level devices on all nodes must be attached, and all
215 nodes must be connected. If the size is specified explicitly,
216 this is not necessary. The size value is assumed to be in units
217 of sectors (512 bytes) by default.
218
219 --discard-zeroes-if-aligned {yes | no}
220 There are several aspects to discard/trim/unmap support on
221 linux block devices. Even if discard is supported in general,
222 it may fail silently, or may partially ignore discard requests.
223 Devices also announce whether reading from unmapped blocks
224 returns defined data (usually zeroes), or undefined data
225 (possibly old data, possibly garbage).
226
227 If on different nodes, DRBD is backed by devices with differing
228 discard characteristics, discards may lead to data divergence
229 (old data or garbage left over on one backend, zeroes due to
230 unmapped areas on the other backend). Online verify would now
231 potentially report tons of spurious differences. While probably
232 harmless for most use cases (fstrim on a file system), DRBD
233 cannot have that.
234
235 To play safe, we have to disable discard support, if our local
236 backend (on a Primary) does not support
237 "discard_zeroes_data=true". We also have to translate discards
238 to explicit zero-out on the receiving side, unless the
239 receiving side (Secondary) supports "discard_zeroes_data=true",
240 thereby allocating areas what were supposed to be unmapped.
241
242 There are some devices (notably the LVM/DM thin provisioning)
243 that are capable of discard, but announce
244 discard_zeroes_data=false. In the case of DM-thin, discards
245 aligned to the chunk size will be unmapped, and reading from
246 unmapped sectors will return zeroes. However, unaligned partial
247 head or tail areas of discard requests will be silently
248 ignored.
249
250 If we now add a helper to explicitly zero-out these unaligned
251 partial areas, while passing on the discard of the aligned full
252 chunks, we effectively achieve discard_zeroes_data=true on such
253 devices.
254
255 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
256 discards, and to announce discard_zeroes_data=true, even on
257 backends that announce discard_zeroes_data=false.
258
259 Setting discard-zeroes-if-aligned to no will cause DRBD to
260 always fall-back to zero-out on the receiving side, and to not
261 even announce discard capabilities on the Primary, if the
262 respective backend announces discard_zeroes_data=false.
263
264 We used to ignore the discard_zeroes_data setting completely.
265 To not break established and expected behaviour, and suddenly
266 cause fstrim on thin-provisioned LVs to run out-of-space
267 instead of freeing up space, the default value is yes.
268
269 This option is available since 8.4.7.
270
271 --disable-write-same {yes | no}
272 Some disks announce WRITE_SAME support to the kernel but fail
273 with an I/O error upon actually receiving such a request. This
274 mostly happens when using virtualized disks -- notably, this
275 behavior has been observed with VMware's virtual disks.
276
277 When disable-write-same is set to yes, WRITE_SAME detection is
278 manually overriden and support is disabled.
279
280 The default value of disable-write-same is no. This option is
281 available since 8.4.7.
282
283 --rs-discard-granularity byte
284 When rs-discard-granularity is set to a non zero, positive
285 value then DRBD tries to do a resync operation in requests of
286 this size. In case such a block contains only zero bytes on the
287 sync source node, the sync target node will issue a
288 discard/trim/unmap command for the area.
289
290 The value is constrained by the discard granularity of the
291 backing block device. In case rs-discard-granularity is not a
292 multiplier of the discard granularity of the backing block
293 device DRBD rounds it up. The feature only gets active if the
294 backing block device reads back zeroes after a discard command.
295
296 The usage of rs-discard-granularity may cause c-max-rate to be
297 exceeded. In particular, the resync rate may reach 10x the
298 value of rs-discard-granularity per second.
299
300 The default value of rs-discard-granularity is 0. This option
301 is available since 8.4.7.
302
303 drbdsetup peer-device-options resource peer_node_id volume
304 These are options that affect the peer's device.
305
306 --c-delay-target delay_target,
307 --c-fill-target fill_target,
308 --c-max-rate max_rate,
309 --c-plan-ahead plan_time
310 Dynamically control the resync speed. The following modes are
311 available:
312
313 • Dynamic control with fill target (default). Enabled when
314 c-plan-ahead is non-zero and c-fill-target is non-zero. The
315 goal is to fill the buffers along the data path with a
316 defined amount of data. This mode is recommended when
317 DRBD-proxy is used. Configured with c-plan-ahead,
318 c-fill-target and c-max-rate.
319
320 • Dynamic control with delay target. Enabled when
321 c-plan-ahead is non-zero (default) and c-fill-target is
322 zero. The goal is to have a defined delay along the path.
323 Configured with c-plan-ahead, c-delay-target and
324 c-max-rate.
325
326 • Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD
327 will try to perform resync I/O at a fixed rate. Configured
328 with resync-rate.
329
330 The c-plan-ahead parameter defines how fast DRBD adapts to
331 changes in the resync speed. It should be set to five times the
332 network round-trip time or more. The default value of
333 c-plan-ahead is 20, in units of 0.1 seconds.
334
335 The c-fill-target parameter defines the how much resync data
336 DRBD should aim to have in-flight at all times. Common values
337 for "normal" data paths range from 4K to 100K. The default
338 value of c-fill-target is 100, in units of sectors
339
340 The c-delay-target parameter defines the delay in the resync
341 path that DRBD should aim for. This should be set to five times
342 the network round-trip time or more. The default value of
343 c-delay-target is 10, in units of 0.1 seconds.
344
345 The c-max-rate parameter limits the maximum bandwidth used by
346 dynamically controlled resyncs. Setting this to zero removes
347 the limitation (since DRBD 9.0.28). It should be set to either
348 the bandwidth available between the DRBD hosts and the machines
349 hosting DRBD-proxy, or to the available disk bandwidth. The
350 default value of c-max-rate is 102400, in units of KiB/s.
351
352 Dynamic resync speed control is available since DRBD 8.3.9.
353
354 --c-min-rate min_rate
355 A node which is primary and sync-source has to schedule
356 application I/O requests and resync I/O requests. The
357 c-min-rate parameter limits how much bandwidth is available for
358 resync I/O; the remaining bandwidth is used for application
359 I/O.
360
361 A c-min-rate value of 0 means that there is no limit on the
362 resync I/O bandwidth. This can slow down application I/O
363 significantly. Use a value of 1 (1 KiB/s) for the lowest
364 possible resync rate.
365
366 The default value of c-min-rate is 250, in units of KiB/s.
367
368 --resync-rate rate
369 Define how much bandwidth DRBD may use for resynchronizing.
370 DRBD allows "normal" application I/O even during a resync. If
371 the resync takes up too much bandwidth, application I/O can
372 become very slow. This parameter allows to avoid that. Please
373 note this is option only works when the dynamic resync
374 controller is disabled.
375
376 drbdsetup check-resize minor
377 Remember the current size of the lower-level device of the
378 specified replicated device. Used by drbdadm. The size information
379 is stored in file /var/lib/drbd/drbd-minor-minor.lkbd.
380
381 drbdsetup new-peer resource peer_node_id,
382 drbdsetup net-options resource peer_node_id
383 The new-peer command creates a connection within a resource. The
384 resource must have been created with drbdsetup new-resource. The
385 net-options command changes the network options of an existing
386 connection. Before a connection can be activated with the connect
387 command, at least one path need to added with the new-path command.
388 Available options:
389
390 --after-sb-0pri policy
391 Define how to react if a split-brain scenario is detected and
392 none of the two nodes is in primary role. (We detect
393 split-brain scenarios when two nodes connect; split-brain
394 decisions are always between two nodes.) The defined policies
395 are:
396
397 disconnect
398 No automatic resynchronization; simply disconnect.
399
400 discard-younger-primary,
401 discard-older-primary
402 Resynchronize from the node which became primary first
403 (discard-younger-primary) or last (discard-older-primary).
404 If both nodes became primary independently, the
405 discard-least-changes policy is used.
406
407 discard-zero-changes
408 If only one of the nodes wrote data since the split brain
409 situation was detected, resynchronize from this node to the
410 other. If both nodes wrote data, disconnect.
411
412 discard-least-changes
413 Resynchronize from the node with more modified blocks.
414
415 discard-node-nodename
416 Always resynchronize to the named node.
417
418 --after-sb-1pri policy
419 Define how to react if a split-brain scenario is detected, with
420 one node in primary role and one node in secondary role. (We
421 detect split-brain scenarios when two nodes connect, so
422 split-brain decisions are always among two nodes.) The defined
423 policies are:
424
425 disconnect
426 No automatic resynchronization, simply disconnect.
427
428 consensus
429 Discard the data on the secondary node if the after-sb-0pri
430 algorithm would also discard the data on the secondary
431 node. Otherwise, disconnect.
432
433 violently-as0p
434 Always take the decision of the after-sb-0pri algorithm,
435 even if it causes an erratic change of the primary's view
436 of the data. This is only useful if a single-node file
437 system (i.e., not OCFS2 or GFS) with the
438 allow-two-primaries flag is used. This option can cause the
439 primary node to crash, and should not be used.
440
441 discard-secondary
442 Discard the data on the secondary node.
443
444 call-pri-lost-after-sb
445 Always take the decision of the after-sb-0pri algorithm. If
446 the decision is to discard the data on the primary node,
447 call the pri-lost-after-sb handler on the primary node.
448
449 --after-sb-2pri policy
450 Define how to react if a split-brain scenario is detected and
451 both nodes are in primary role. (We detect split-brain
452 scenarios when two nodes connect, so split-brain decisions are
453 always among two nodes.) The defined policies are:
454
455 disconnect
456 No automatic resynchronization, simply disconnect.
457
458 violently-as0p
459 See the violently-as0p policy for after-sb-1pri.
460
461 call-pri-lost-after-sb
462 Call the pri-lost-after-sb helper program on one of the
463 machines unless that machine can demote to secondary. The
464 helper program is expected to reboot the machine, which
465 brings the node into a secondary role. Which machine runs
466 the helper program is determined by the after-sb-0pri
467 strategy.
468
469 --allow-two-primaries
470 The most common way to configure DRBD devices is to allow only
471 one node to be primary (and thus writable) at a time.
472
473 In some scenarios it is preferable to allow two nodes to be
474 primary at once; a mechanism outside of DRBD then must make
475 sure that writes to the shared, replicated device happen in a
476 coordinated way. This can be done with a shared-storage cluster
477 file system like OCFS2 and GFS, or with virtual machine images
478 and a virtual machine manager that can migrate virtual machines
479 between physical machines.
480
481 The allow-two-primaries parameter tells DRBD to allow two nodes
482 to be primary at the same time. Never enable this option when
483 using a non-distributed file system; otherwise, data corruption
484 and node crashes will result!
485
486 --always-asbp
487 Normally the automatic after-split-brain policies are only used
488 if current states of the UUIDs do not indicate the presence of
489 a third node.
490
491 With this option you request that the automatic
492 after-split-brain policies are used as long as the data sets of
493 the nodes are somehow related. This might cause a full sync, if
494 the UUIDs indicate the presence of a third node. (Or double
495 faults led to strange UUID sets.)
496
497 --connect-int time
498 As soon as a connection between two nodes is configured with
499 drbdsetup connect, DRBD immediately tries to establish the
500 connection. If this fails, DRBD waits for connect-int seconds
501 and then repeats. The default value of connect-int is 10
502 seconds.
503
504 --cram-hmac-alg hash-algorithm
505 Configure the hash-based message authentication code (HMAC) or
506 secure hash algorithm to use for peer authentication. The
507 kernel supports a number of different algorithms, some of which
508 may be loadable as kernel modules. See the shash algorithms
509 listed in /proc/crypto. By default, cram-hmac-alg is unset.
510 Peer authentication also requires a shared-secret to be
511 configured.
512
513 --csums-alg hash-algorithm
514 Normally, when two nodes resynchronize, the sync target
515 requests a piece of out-of-sync data from the sync source, and
516 the sync source sends the data. With many usage patterns, a
517 significant number of those blocks will actually be identical.
518
519 When a csums-alg algorithm is specified, when requesting a
520 piece of out-of-sync data, the sync target also sends along a
521 hash of the data it currently has. The sync source compares
522 this hash with its own version of the data. It sends the sync
523 target the new data if the hashes differ, and tells it that the
524 data are the same otherwise. This reduces the network bandwidth
525 required, at the cost of higher cpu utilization and possibly
526 increased I/O on the sync target.
527
528 The csums-alg can be set to one of the secure hash algorithms
529 supported by the kernel; see the shash algorithms listed in
530 /proc/crypto. By default, csums-alg is unset.
531
532 --csums-after-crash-only
533 Enabling this option (and csums-alg, above) makes it possible
534 to use the checksum based resync only for the first resync
535 after primary crash, but not for later "network hickups".
536
537 In most cases, block that are marked as need-to-be-resynced are
538 in fact changed, so calculating checksums, and both reading and
539 writing the blocks on the resync target is all effective
540 overhead.
541
542 The advantage of checksum based resync is mostly after primary
543 crash recovery, where the recovery marked larger areas (those
544 covered by the activity log) as need-to-be-resynced, just in
545 case. Introduced in 8.4.5.
546
547 --data-integrity-alg alg
548 DRBD normally relies on the data integrity checks built into
549 the TCP/IP protocol, but if a data integrity algorithm is
550 configured, it will additionally use this algorithm to make
551 sure that the data received over the network match what the
552 sender has sent. If a data integrity error is detected, DRBD
553 will close the network connection and reconnect, which will
554 trigger a resync.
555
556 The data-integrity-alg can be set to one of the secure hash
557 algorithms supported by the kernel; see the shash algorithms
558 listed in /proc/crypto. By default, this mechanism is turned
559 off.
560
561 Because of the CPU overhead involved, we recommend not to use
562 this option in production environments. Also see the notes on
563 data integrity below.
564
565 --fencing fencing_policy
566 Fencing is a preventive measure to avoid situations where both
567 nodes are primary and disconnected. This is also known as a
568 split-brain situation. DRBD supports the following fencing
569 policies:
570
571 dont-care
572 No fencing actions are taken. This is the default policy.
573
574 resource-only
575 If a node becomes a disconnected primary, it tries to fence
576 the peer. This is done by calling the fence-peer handler.
577 The handler is supposed to reach the peer over an
578 alternative communication path and call 'drbdadm outdate
579 minor' there.
580
581 resource-and-stonith
582 If a node becomes a disconnected primary, it freezes all
583 its IO operations and calls its fence-peer handler. The
584 fence-peer handler is supposed to reach the peer over an
585 alternative communication path and call 'drbdadm outdate
586 minor' there. In case it cannot do that, it should stonith
587 the peer. IO is resumed as soon as the situation is
588 resolved. In case the fence-peer handler fails, I/O can be
589 resumed manually with 'drbdadm resume-io'.
590
591 --ko-count number
592 If a secondary node fails to complete a write request in
593 ko-count times the timeout parameter, it is excluded from the
594 cluster. The primary node then sets the connection to this
595 secondary node to Standalone. To disable this feature, you
596 should explicitly set it to 0; defaults may change between
597 versions.
598
599 --load-balance-paths {yes | no}
600 By default, the TCP transport establishes only one configured
601 path at a time. It switches to another path only in case the
602 established one fails. When you set load-balance-paths to yes
603 the TCP transport establishes all paths in parallel. It will
604 transmit data packets over the paths with the least data in its
605 socket send queue.
606
607 Please note enabling load-balancing introduces additional
608 chunking headers into the network protocol. In other words, you
609 must enable it on both sides of a connection.
610
611 As of drbd-9.2.6 the RDMA transport does not obey this setting.
612 It always uses all paths in parallel. This option became
613 available with drbd-9.2.6.
614
615 --max-buffers number
616 Limits the memory usage per DRBD minor device on the receiving
617 side, or for internal buffers during resync or online-verify.
618 Unit is PAGE_SIZE, which is 4 KiB on most systems. The minimum
619 possible setting is hard coded to 32 (=128 KiB). These buffers
620 are used to hold data blocks while they are written to/read
621 from disk. To avoid possible distributed deadlocks on
622 congestion, this setting is used as a throttle threshold rather
623 than a hard limit. Once more than max-buffers pages are in use,
624 further allocation from this pool is throttled. You want to
625 increase max-buffers if you cannot saturate the IO backend on
626 the receiving side.
627
628 --max-epoch-size number
629 Define the maximum number of write requests DRBD may issue
630 before issuing a write barrier. The default value is 2048, with
631 a minimum of 1 and a maximum of 20000. Setting this parameter
632 to a value below 10 is likely to decrease performance.
633
634 --on-congestion policy,
635 --congestion-fill threshold,
636 --congestion-extents threshold
637 By default, DRBD blocks when the TCP send queue is full. This
638 prevents applications from generating further write requests
639 until more buffer space becomes available again.
640
641 When DRBD is used together with DRBD-proxy, it can be better to
642 use the pull-ahead on-congestion policy, which can switch DRBD
643 into ahead/behind mode before the send queue is full. DRBD then
644 records the differences between itself and the peer in its
645 bitmap, but it no longer replicates them to the peer. When
646 enough buffer space becomes available again, the node
647 resynchronizes with the peer and switches back to normal
648 replication.
649
650 This has the advantage of not blocking application I/O even
651 when the queues fill up, and the disadvantage that peer nodes
652 can fall behind much further. Also, while resynchronizing, peer
653 nodes will become inconsistent.
654
655 The available congestion policies are block (the default) and
656 pull-ahead. The congestion-fill parameter defines how much data
657 is allowed to be "in flight" in this connection. The default
658 value is 0, which disables this mechanism of congestion
659 control, with a maximum of 10 GiBytes. The congestion-extents
660 parameter defines how many bitmap extents may be active before
661 switching into ahead/behind mode, with the same default and
662 limits as the al-extents parameter. The congestion-extents
663 parameter is effective only when set to a value smaller than
664 al-extents.
665
666 Ahead/behind mode is available since DRBD 8.3.10.
667
668 --ping-int interval
669 When the TCP/IP connection to a peer is idle for more than
670 ping-int seconds, DRBD will send a keep-alive packet to make
671 sure that a failed peer or network connection is detected
672 reasonably soon. The default value is 10 seconds, with a
673 minimum of 1 and a maximum of 120 seconds. The unit is seconds.
674
675 --ping-timeout timeout
676 Define the timeout for replies to keep-alive packets. If the
677 peer does not reply within ping-timeout, DRBD will close and
678 try to reestablish the connection. The default value is 0.5
679 seconds, with a minimum of 0.1 seconds and a maximum of 30
680 seconds. The unit is tenths of a second.
681
682 --socket-check-timeout timeout
683 In setups involving a DRBD-proxy and connections that
684 experience a lot of buffer-bloat it might be necessary to set
685 ping-timeout to an unusual high value. By default DRBD uses the
686 same value to wait if a newly established TCP-connection is
687 stable. Since the DRBD-proxy is usually located in the same
688 data center such a long wait time may hinder DRBD's connect
689 process.
690
691 In such setups socket-check-timeout should be set to at least
692 to the round trip time between DRBD and DRBD-proxy. I.e. in
693 most cases to 1.
694
695 The default unit is tenths of a second, the default value is 0
696 (which causes DRBD to use the value of ping-timeout instead).
697 Introduced in 8.4.5.
698
699 --protocol name
700 Use the specified protocol on this connection. The supported
701 protocols are:
702
703 A
704 Writes to the DRBD device complete as soon as they have
705 reached the local disk and the TCP/IP send buffer.
706
707 B
708 Writes to the DRBD device complete as soon as they have
709 reached the local disk, and all peers have acknowledged the
710 receipt of the write requests.
711
712 C
713 Writes to the DRBD device complete as soon as they have
714 reached the local and all remote disks.
715
716
717 --rcvbuf-size size
718 Configure the size of the TCP/IP receive buffer. A value of 0
719 (the default) causes the buffer size to adjust dynamically.
720 This parameter usually does not need to be set, but it can be
721 set to a value up to 10 MiB. The default unit is bytes.
722
723 --rdma-ctrl-rcvbuf-size value
724 By default, the RDMA transport divides the rcvbuf-size by 64
725 and uses the result for the number of buffers on the control
726 stream. This result might be too low depending on the timing
727 characteristics of the backing storage devices and the network
728 link.
729
730 The option rdma-ctrl-rcvbuf-size allows you to explicitly set
731 the number of buffers for the control stream, overruling the
732 divide by 64 heuristics. The default unit of this setting is
733 bytes.
734
735 --rdma-ctrl-sndbuf-size value
736 By default, the RDMA transport divides the sndbuf-size by 64
737 and uses the result for the number of buffers on the control
738 stream. This result might be too low depending on the timing
739 characteristics of the backing storage devices and the network
740 link.
741
742 The option rdma-ctrl-sndbuf-size allows you to explicitly set
743 the number of buffers for the control stream, overruling the
744 divide by 64 heuristics. The default unit of this setting is
745 bytes.
746
747 --rr-conflict policy
748 This option helps to solve the cases when the outcome of the
749 resync decision is incompatible with the current role
750 assignment in the cluster. The defined policies are:
751
752 disconnect
753 No automatic resynchronization, simply disconnect.
754
755 retry-connect
756 Disconnect now, and retry to connect immediatly afterwards.
757
758 violently
759 Resync to the primary node is allowed, violating the
760 assumption that data on a block device are stable for one
761 of the nodes. Do not use this option, it is dangerous.
762
763 call-pri-lost
764 Call the pri-lost handler on one of the machines. The
765 handler is expected to reboot the machine, which puts it
766 into secondary role.
767
768 auto-discard
769 Auto-discard reverses the resync direction, so that DRBD
770 resyncs the current primary to the current secondary.
771 Auto-discard only applies when protocol A is in use and the
772 resync decision is based on the principle that a crashed
773 primary should be the source of a resync. When a primary
774 node crashes, it might have written some last updates to
775 its disk, which were not received by a protocol A
776 secondary. By promoting the secondary in the meantime the
777 user accepted that those last updates have been lost. By
778 using auto-discard you consent that the last updates
779 (before the crash of the primary) should be rolled back
780 automatically.
781
782 --shared-secret secret
783 Configure the shared secret used for peer authentication. The
784 secret is a string of up to 64 characters. Peer authentication
785 also requires the cram-hmac-alg parameter to be set.
786
787 --sndbuf-size size
788 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13
789 / 8.2.7, a value of 0 (the default) causes the buffer size to
790 adjust dynamically. Values below 32 KiB are harmful to the
791 throughput on this connection. Large buffer sizes can be useful
792 especially when protocol A is used over high-latency networks;
793 the maximum value supported is 10 MiB.
794
795 --tcp-cork
796 By default, DRBD uses the TCP_CORK socket option to prevent the
797 kernel from sending partial messages; this results in fewer and
798 bigger packets on the network. Some network stacks can perform
799 worse with this optimization. On these, the tcp-cork parameter
800 can be used to turn this optimization off.
801
802 --timeout time
803 Define the timeout for replies over the network: if a peer node
804 does not send an expected reply within the specified timeout,
805 it is considered dead and the TCP/IP connection is closed. The
806 timeout value must be lower than connect-int and lower than
807 ping-int. The default is 6 seconds; the value is specified in
808 tenths of a second.
809
810 --tls bool-value
811 Enable TLS.
812
813 --tls-keyring key-description
814 Key description (name) of the keyring where the TLS key
815 material is stored. The keyring will be shared with the
816 handshake daemon.
817
818 --tls-privkey key-description
819 Key description (name) of the DER encoded private key for TLS
820 encryption.
821
822 --tls-certificate key-description
823 Key description (name) of the DER encoded certificate for TLS
824 encryption.
825
826 --transport type
827 With DRBD9 the network transport used by DRBD is loaded as a
828 seperate module. With this option you can specify which
829 transport and module to load. At present only two options
830 exist, tcp and rdma. Default is tcp.
831
832 --use-rle
833 Each replicated device on a cluster node has a separate bitmap
834 for each of its peer devices. The bitmaps are used for tracking
835 the differences between the local and peer device: depending on
836 the cluster state, a disk range can be marked as different from
837 the peer in the device's bitmap, in the peer device's bitmap,
838 or in both bitmaps. When two cluster nodes connect, they
839 exchange each other's bitmaps, and they each compute the union
840 of the local and peer bitmap to determine the overall
841 differences.
842
843 Bitmaps of very large devices are also relatively large, but
844 they usually compress very well using run-length encoding. This
845 can save time and bandwidth for the bitmap transfers.
846
847 The use-rle parameter determines if run-length encoding should
848 be used. It is on by default since DRBD 8.4.0.
849
850 --verify-alg hash-algorithm
851 Online verification (drbdadm verify) computes and compares
852 checksums of disk blocks (i.e., hash values) in order to detect
853 if they differ. The verify-alg parameter determines which
854 algorithm to use for these checksums. It must be set to one of
855 the secure hash algorithms supported by the kernel before
856 online verify can be used; see the shash algorithms listed in
857 /proc/crypto.
858
859 We recommend to schedule online verifications regularly during
860 low-load periods, for example once a month. Also see the notes
861 on data integrity below.
862
863 drbdsetup new-path resource peer_node_id local-addr remote-addr
864 The new-path command creates a path within a connection. The
865 connection must have been created with drbdsetup new-peer.
866 Local_addr and remote_addr refer to the local and remote protocol,
867 network address, and port in the format
868 [address-family:]address[:port]. The address families ipv4, ipv6,
869 ssocks (Dolphin Interconnect Solutions' "super sockets"), sdp
870 (Infiniband Sockets Direct Protocol), and sci are supported (sci is
871 an alias for ssocks). If no address family is specified, ipv4 is
872 assumed. For all address families except ipv6, the address uses
873 IPv4 address notation (for example, 1.2.3.4). For ipv6, the address
874 is enclosed in brackets and uses IPv6 address notation (for
875 example, [fd01:2345:6789:abcd::1]). The port defaults to 7788.
876
877 drbdsetup connect resource peer_node_id
878 The connect command activates a connection. That means that the
879 DRBD driver will bind and listen on all local addresses of the
880 connection-'s paths. It will begin to try to establish one or more
881 paths of the connection. Available options:
882
883 --tentative
884 Only determine if a connection to the peer can be established
885 and if a resync is necessary (and in which direction) without
886 actually establishing the connection or starting the resync.
887 Check the system log to see what DRBD would do without the
888 --tentative option.
889
890 --discard-my-data
891 Discard the local data and resynchronize with the peer that has
892 the most up-to-data data. Use this option to manually recover
893 from a split-brain situation.
894
895 drbdsetup del-peer resource peer_node_id
896 The del-peer command removes a connection from a resource.
897
898 drbdsetup del-path resource peer_node_id local-addr remote-addr
899 The del-path command removes a path from a connection. Please note
900 that it fails if the path is necessary to keep a connected
901 connection in tact. In order to remove all paths, disconnect the
902 connection first.
903
904 drbdsetup cstate resource peer_node_id
905 Show the current state of a connection. The connection is
906 identified by the node-id of the peer; see the drbdsetup connect
907 command.
908
909 drbdsetup del-minor minor
910 Remove a replicated device. No lower-level device may be attached;
911 see drbdsetup detach.
912
913 drbdsetup del-resource resource
914 Remove a resource. All volumes and connections must be removed
915 first (drbdsetup del-minor, drbdsetup disconnect). Alternatively,
916 drbdsetup down can be used to remove a resource together with all
917 its volumes and connections.
918
919 drbdsetup detach minor
920 Detach the lower-level device of a replicated device. Available
921 options:
922
923 --force,
924 --diskless
925 Force the detach and return immediately. This puts the
926 lower-level device into failed state until all pending I/O has
927 completed, and then detaches the device. Any I/O not yet
928 submitted to the lower-level device (for example, because I/O
929 on the device was suspended) is assumed to have failed. Mark
930 the detached volume as "permanently diskless", converting it to
931 "intentionally diskless client", in contrast to temporarily
932 diskless, to be re-attached later.
933
934
935 drbdsetup disconnect resource peer_node_id
936 Remove a connection to a peer host. The connection is identified by
937 the node-id of the peer; see the drbdsetup connect command.
938
939 drbdsetup down {resource | all}
940 Take a resource down by removing all volumes, connections, and the
941 resource itself.
942
943 drbdsetup dstate minor
944 Show the current disk state of a lower-level device.
945
946 drbdsetup events2 {resource | all}
947 Show the current state of all configured DRBD objects, followed by
948 all changes to the state.
949
950 The output format is meant to be human as well as machine readable.
951 The line starts with a word that indicates the kind of event:
952 exists for an existing object; create, destroy, and change if an
953 object is created, destroyed, or changed; call or response if an
954 event handler is called or it returns; or rename when the name of
955 an object is changed. The second word indicates the object the
956 event applies to: resource, device, connection, peer-device, path,
957 helper, or a dash (-) to indicate that the current state has been
958 dumped completely.
959
960 The remaining words identify the object and describe the state that
961 the object is in. Some special keys are worth mentioning:
962
963 resource may_promote:{yes|no}
964 Whether promoting to primary is expected to succeed. When
965 quorum is enabled, this can be used to trigger failover. When
966 may_promote:yes is reported on this node, then no writes are
967 possible on any other node, which generally means that the
968 application can be started on this node, even when it has been
969 running on another.
970
971 resource promotion_score:score
972 An integer heuristic indicating the relative preference for
973 promoting this resource. A higher score is better in terms of
974 having local disks and having access to up-to-date data. The
975 score may be positive even when some node is primary. It will
976 be zero when promotion is impossible due to quorum or lack of
977 any access to up-to-date data.
978
979 Available options:
980
981 --now
982 Terminate after reporting the current state. The default is to
983 continuously listen and report state changes.
984
985 --poll
986 Read from stdin and update when n is read. Newlines are
987 ignored. Every other input terminates the command.
988
989 Without --now, changes are printed as usual. On each n the
990 current state is fetched, but only changed objects are printed.
991 This is useful with --statistics or --full because DRBD does
992 not otherwise send updates when only the statistics change.
993
994 In combination with --now the full state is printed on each n.
995 No other changes are printed.
996
997 --statistics
998 Include statistics in the output.
999
1000 --diff
1001 Write information in form of a diff between old and new state.
1002 This helps simple tools to avoid (old) state tracking on their
1003 own.
1004
1005 --full
1006 Write complete state information, especially on change events.
1007 This enables --statistics.
1008
1009 --timestamps
1010 Prefix event lines with a timestamp in ISO 8601 format
1011 (YYYY-MM-DDThh:mm:ss.ssssss±hh:mm). The timetsamp is generated
1012 by drbdsetup just before it prints the event line(s). If a
1013 single event causes multiple lines to be printed, they should
1014 all have the same timestamp prefix.
1015
1016 --color={always | auto | never}
1017 Colorize the output. With --color=auto, drbdsetup emits color
1018 codes only when standard output is connected to a terminal.
1019
1020
1021 drbdsetup get-gi resource peer_node_id volume
1022 Show the data generation identifiers for a device on a particular
1023 connection. The device is identified by its volume number. The
1024 connection is identified by its endpoints; see the drbdsetup
1025 connect command.
1026
1027 The output consists of the current UUID, bitmap UUID, and the first
1028 two history UUIDS, folowed by a set of flags. The current UUID and
1029 history UUIDs are device specific; the bitmap UUID and flags are
1030 peer device specific. This command only shows the first two history
1031 UUIDs. Internally, DRBD maintains one history UUID for each
1032 possible peer device.
1033
1034 drbdsetup invalidate minor
1035 Replace the local data of a device with that of a peer. All the
1036 local data will be marked out-of-sync, and a resync with the
1037 specified peer device will be initialted.
1038
1039 Available options:
1040
1041 --reset-bitmap=no
1042 Usually an invalidate operation sets all bits in the bitmap to
1043 out-of-sync before beginning the resync from the peer. By
1044 giving --reset-bitmap=no DRBD will use the bitmap as it is.
1045 Usually this is used after an online verify operation found
1046 differences in the backing devices.
1047
1048 The --reset-bitmap option is available since DRBD kernel driver
1049 9.0.29 and drbd-utils 9.17.
1050
1051 --sync-from-peer-node-id
1052 This option allows the caller to select the node to resync
1053 from. if it is not gives, DRBD selects a suitable source node
1054 itself.
1055
1056
1057 drbdsetup invalidate-remote resource peer_node_id volume
1058 Replace a peer device's data of a resource with the local data. The
1059 peer device's data will be marked out-of-sync, and a resync from
1060 the local node to the specified peer will be initiated.
1061
1062 Available options:
1063
1064 --reset-bitmap=no
1065 Usually an invalidate remote operation sets all bits in the
1066 bitmap to out-of-sync before beginning the resync to the peer.
1067 By giving --reset-bitmap=no DRBD will use the bitmap as it is.
1068 Usually this is used after an online verify operation found
1069 differences in the backing devices.
1070
1071 The --reset-bitmap option is available since DRBD kernel driver
1072 9.0.29 and drbd-utils 9.17.
1073
1074
1075 drbdsetup new-current-uuid minor
1076 Generate a new current UUID and rotates all other UUID values. This
1077 has three use cases: start the initial resync; skip the initial
1078 resync; bootstrap a single node cluster.
1079
1080 Available options:
1081
1082 --force-resync
1083 Start an initial resync. A precondition is that the volume is
1084 in disk state Inconsistent on all nodes. This command updates
1085 the disk state on the current node to UpToDate and makes it
1086 source of the resync operations to the peers.
1087
1088 --clear-bitmap
1089 Clears the sync bitmap in addition to generating a new current
1090 UUID. This skips the initial resync. As a consqeuence this
1091 volume's disk state changes to UpToDate on all nodes in this
1092 resource.
1093
1094 Both operations require a "Just Created" meta data. Here is the
1095 complete sequence step by step how to skip the initial resync:
1096
1097 1. On both nodes, initialize meta data and configure the device.
1098
1099 drbdadm create-md --force res/volume-number
1100
1101 2. They need to do the initial handshake, so they know their
1102 sizes.
1103
1104 drbdadm up res
1105
1106 3. They are now Connected Secondary/Secondary
1107 Inconsistent/Inconsistent. Generate a new current-uuid and
1108 clear the dirty bitmap.
1109
1110 drbdadm --clear-bitmap new-current-uuid res
1111
1112 4. They are now Connected Secondary/Secondary UpToDate/UpToDate.
1113 Make one side primary and create a file system.
1114
1115 drbdadm primary res
1116
1117 mkfs -t fs-type $(drbdadm sh-dev res/vol)
1118
1119 One obvious side-effect is that the replica is full of old garbage
1120 (unless you made them identical using other means), so any
1121 online-verify is expected to find any number of out-of-sync blocks.
1122
1123 You must not use this on pre-existing data! Even though it may
1124 appear to work at first glance, once you switch to the other node,
1125 your data is toast, as it never got replicated. So do not leave out
1126 the mkfs (or equivalent).
1127
1128 Bootstraping a single node cluster
1129 This can also be used to shorten the initial resync of a
1130 cluster where the second node is added after the first node is
1131 gone into production, by means of disk shipping. This use-case
1132 works on disconnected devices only, the device may be in
1133 primary or secondary role.
1134
1135 The necessary steps on the current active server are:
1136
1137 1. drbdsetup new-current-uuid --clear-bitmap minor
1138
1139 2. Take the copy of the current active server. E.g. by pulling
1140 a disk out of the RAID1 controller, or by copying with dd.
1141 You need to copy the actual data, and the meta data.
1142
1143 3. drbdsetup new-current-uuid minor
1144
1145 Now add the disk to the new secondary node, and join it to the
1146 cluster. You will get a resync of that parts that were changed
1147 since the first call to drbdsetup in step 1.
1148
1149 drbdsetup new-minor resource minor volume
1150 Create a new replicated device within a resource. The command
1151 creates a block device inode for the replicated device (by default,
1152 /dev/drbdminor). The volume number identifies the device within the
1153 resource.
1154
1155 --block-size size
1156 Block storage devices have a particular sector size or block
1157 size. This block size has many different names. Examples are
1158 'hw_sector_size', 'PHY-SEC', 'physical block (sector) size',
1159 and 'logical block (sector) size'.
1160
1161 DRBD needs to combine these block sizes of the backing disks.
1162 In clusters with storage devices with different block sizes, it
1163 is necessary to configure the maximal block sizes on the DRBD
1164 level. Here is an example highlighting the need.
1165
1166 Let's say node A is diskless. It connects to node B, which has
1167 a physical block size of 512 bytes. Then the user mounts the
1168 filesystem on node A; the filesystem recognizes that it can do
1169 I/O in units of 512 bytes. Later, node C joins the cluster with
1170 a physical block size of 4096 bytes. Now, suddenly DRBD starts
1171 to deliver I/O errors to the filesystem if it chooses to do I/O
1172 on, e.g., 512 or 1024 bytes.
1173
1174 The default value of block-size 512 bytes. This option is
1175 available since drbd-utils 9.24 and the drbd kernel driver
1176 9.1.14 and 9.2.3.
1177
1178
1179 drbdsetup new-resource resource node_id,
1180 drbdsetup resource-options resource
1181 The new-resource command creates a new resource. The
1182 resource-options command changes the resource options of an
1183 existing resource. Available options:
1184
1185 --auto-promote bool-value
1186 A resource must be promoted to primary role before any of its
1187 devices can be mounted or opened for writing.
1188
1189 Before DRBD 9, this could only be done explicitly ("drbdadm
1190 primary"). Since DRBD 9, the auto-promote parameter allows to
1191 automatically promote a resource to primary role when one of
1192 its devices is mounted or opened for writing. As soon as all
1193 devices are unmounted or closed with no more remaining users,
1194 the role of the resource changes back to secondary.
1195
1196 Automatic promotion only succeeds if the cluster state allows
1197 it (that is, if an explicit drbdadm primary command would
1198 succeed). Otherwise, mounting or opening the device fails as it
1199 already did before DRBD 9: the mount(2) system call fails with
1200 errno set to EROFS (Read-only file system); the open(2) system
1201 call fails with errno set to EMEDIUMTYPE (wrong medium type).
1202
1203 Irrespective of the auto-promote parameter, if a device is
1204 promoted explicitly (drbdadm primary), it also needs to be
1205 demoted explicitly (drbdadm secondary).
1206
1207 The auto-promote parameter is available since DRBD 9.0.0, and
1208 defaults to yes.
1209
1210 --auto-promote-timeout 1/10-of-seconds
1211 When a user process promotes a drbd resource by opening one of
1212 its devices, DRBD waits up to auto-promote-timeout for the
1213 device to become promotable if it is not in the first place.
1214
1215 auto-promote-timeout is specified in units of 0.1 seconds. Its
1216 default value is 20 (2 seconds), its minimum value is 0, and
1217 its maximum value is 600 (=one minute).
1218
1219 --cpu-mask cpu-mask
1220 Set the cpu affinity mask for DRBD kernel threads. The cpu mask
1221 is specified as a hexadecimal number. The default value is 0,
1222 which lets the scheduler decide which kernel threads run on
1223 which CPUs. CPU numbers in cpu-mask which do not exist in the
1224 system are ignored.
1225
1226 --max-io-depth value
1227 This limits the number of outstanding requests on a DRBD
1228 device. Any process that tries to issue more I/O requests will
1229 sleep in "D state" (uninterruptible by signals) until some
1230 previously issued requests finish.
1231
1232 max-io-depth has a default value of 8000, its minimum value is
1233 4, and its maximum value is 2^32.
1234
1235 --on-no-data-accessible policy
1236 Determine how to deal with I/O requests when the requested data
1237 is not available locally or remotely (for example, when all
1238 disks have failed). When quorum is enabled,
1239 on-no-data-accessible should be set to the same value as
1240 on-no-quorum. The defined policies are:
1241
1242 io-error
1243 System calls fail with errno set to EIO.
1244
1245 suspend-io
1246 The resource suspends I/O. I/O can be resumed by
1247 (re)attaching the lower-level device, by connecting to a
1248 peer which has access to the data, or by forcing DRBD to
1249 resume I/O with drbdadm resume-io res. When no data is
1250 available, forcing I/O to resume will result in the same
1251 behavior as the io-error policy.
1252
1253 This setting is available since DRBD 8.3.9; the default policy
1254 is io-error.
1255
1256 --on-no-quorum {io-error | suspend-io}
1257 By default DRBD freezes IO on a device, that lost quorum. By
1258 setting the on-no-quorum to io-error it completes all IO
1259 operations with an error if quorum is lost.
1260
1261 Usually, the on-no-data-accessible should be set to the same
1262 value as on-no-quorum, as it has precedence.
1263
1264 The on-no-quorum options is available starting with the DRBD
1265 kernel driver version 9.0.8.
1266
1267 --on-suspended-primary-outdated {disconnect | force-secondary}
1268 This setting is only relevant when on-no-quorum is set to
1269 suspend-io. It is relevant in the following scenario. A primary
1270 node loses quorum hence has all IO requests frozen. This
1271 primary node then connects to another, quorate partition. It
1272 detects that a node in this quorate partition was promoted to
1273 primary, and started a newer data-generation there. As a
1274 result, the first primary learns that it has to consider itself
1275 outdated.
1276
1277 When it is set to force-secondary then it will demote to
1278 secondary immediately, and fail all pending (and new) IO
1279 requests with IO errors. It will refuse to allow any process to
1280 open the DRBD devices until all openers closed the device. This
1281 state is visible in status and events2 under the name
1282 force-io-failures.
1283
1284 The disconnect setting simply causes that node to reject
1285 connect attempts and stay isolated.
1286
1287 The on-suspended-primary-outdated option is available starting
1288 with the DRBD kernel driver version 9.1.7. It has a default
1289 value of disconnect.
1290
1291 --peer-ack-delay expiry-time
1292 If after the last finished write request no new write request
1293 gets issued for expiry-time, then a peer-ack packet is sent. If
1294 a new write request is issued before the timer expires, the
1295 timer gets reset to expiry-time. (Note: peer-ack packets may be
1296 sent due to other reasons as well, e.g. membership changes or
1297 the peer-ack-window option.)
1298
1299 This parameter may influence resync behavior on remote nodes.
1300 Peer nodes need to wait until they receive an peer-ack for
1301 releasing a lock on an AL-extent. Resync operations between
1302 peers may need to wait for for these locks.
1303
1304 The default value for peer-ack-delay is 100 milliseconds, the
1305 default unit is milliseconds. This option is available since
1306 9.0.0.
1307
1308 --peer-ack-window value
1309 On each node and for each device, DRBD maintains a bitmap of
1310 the differences between the local and remote data for each peer
1311 device. For example, in a three-node setup (nodes A, B, C) each
1312 with a single device, every node maintains one bitmap for each
1313 of its peers.
1314
1315 When nodes receive write requests, they know how to update the
1316 bitmaps for the writing node, but not how to update the bitmaps
1317 between themselves. In this example, when a write request
1318 propagates from node A to B and C, nodes B and C know that they
1319 have the same data as node A, but not whether or not they both
1320 have the same data.
1321
1322 As a remedy, the writing node occasionally sends peer-ack
1323 packets to its peers which tell them which state they are in
1324 relative to each other.
1325
1326 The peer-ack-window parameter specifies how much data a primary
1327 node may send before sending a peer-ack packet. A low value
1328 causes increased network traffic; a high value causes less
1329 network traffic but higher memory consumption on secondary
1330 nodes and higher resync times between the secondary nodes after
1331 primary node failures. (Note: peer-ack packets may be sent due
1332 to other reasons as well, e.g. membership changes or expiry of
1333 the peer-ack-delay timer.)
1334
1335 The default value for peer-ack-window is 2 MiB, the default
1336 unit is sectors. This option is available since 9.0.0.
1337
1338 --quorum value
1339 When activated, a cluster partition requires quorum in order to
1340 modify the replicated data set. That means a node in the
1341 cluster partition can only be promoted to primary if the
1342 cluster partition has quorum. Every node with a disk directly
1343 connected to the node that should be promoted counts. If a
1344 primary node should execute a write request, but the cluster
1345 partition has lost quorum, it will freeze IO or reject the
1346 write request with an error (depending on the on-no-quorum
1347 setting). Upon loosing quorum a primary always invokes the
1348 quorum-lost handler. The handler is intended for notification
1349 purposes, its return code is ignored.
1350
1351 The option's value might be set to off, majority, all or a
1352 numeric value. If you set it to a numeric value, make sure that
1353 the value is greater than half of your number of nodes. Quorum
1354 is a mechanism to avoid data divergence, it might be used
1355 instead of fencing when there are more than two repicas. It
1356 defaults to off
1357
1358 If all missing nodes are marked as outdated, a partition always
1359 has quorum, no matter how small it is. I.e. If you disconnect
1360 all secondary nodes gracefully a single primary continues to
1361 operate. In the moment a single secondary is lost, it has to be
1362 assumed that it forms a partition with all the missing outdated
1363 nodes. In case my partition might be smaller than the other,
1364 quorum is lost in this moment.
1365
1366 In case you want to allow permanently diskless nodes to gain
1367 quorum it is recommendet to not use majority or all. It is
1368 recommended to specify an absolute number, since DBRD's
1369 heuristic to determine the complete number of diskfull nodes in
1370 the cluster is unreliable.
1371
1372 The quorum implementation is available starting with the DRBD
1373 kernel driver version 9.0.7.
1374
1375 --quorum-minimum-redundancy value
1376 This option sets the minimal required number of nodes with an
1377 UpToDate disk to allow the partition to gain quorum. This is a
1378 different requirement than the plain quorum option expresses.
1379
1380 The option's value might be set to off, majority, all or a
1381 numeric value. If you set it to a numeric value, make sure that
1382 the value is greater than half of your number of nodes.
1383
1384 In case you want to allow permanently diskless nodes to gain
1385 quorum it is recommendet to not use majority or all. It is
1386 recommended to specify an absolute number, since DBRD's
1387 heuristic to determine the complete number of diskfull nodes in
1388 the cluster is unreliable.
1389
1390 This option is available starting with the DRBD kernel driver
1391 version 9.0.10.
1392
1393 --twopc-retry-timeout 1/10-of-seconds
1394 Due to conflicting two-phase-commit sometimes DRBD needs to
1395 retry them. But if two nodes retry their intended
1396 two-phase-commits after the same time, they would end up in an
1397 endless retry loop. To avoid that, DRBD selects a random wait
1398 time within an upper bound, an exponential backoff, and a
1399 function of the retry number. The twopc-retry-timeout is a base
1400 multiplier for that function.
1401
1402 twopc-retry-timeout has a default value of a (0.1 seconds), its
1403 minimum value is 1 (0.1 seconds), and its maximum value is 50
1404 (5 seconds).
1405
1406 --twopc-timeout 1/10-of-seconds
1407 In some situations, a DRBD cluster requires a cluster-wide
1408 coordinated state transition. A perfect example of this is the
1409 'promote-to-primary' action. Even if two not directly connected
1410 nodes in a cluster try this action concurrently, it may only
1411 succeed for one of the two.
1412
1413 For these cluster-wide coordinated state transitions, DRBD
1414 implements a two-phase commit protocol. If a connection breaks
1415 in phase one (prepare packet sent), the coordinator of the
1416 two-phase commit might never get the expected reply packet.
1417
1418 A cluster in this state can not start any new cluster-wide
1419 coordinated state transition, as the already prepared one
1420 blocks all such attempts. After twopc-timeout all nodes abort
1421 the prepared transaction and unlock the cluster again.
1422
1423 twopc-timeout has a default value of 300 (30 seconds), its
1424 minimum value is 50 (5 seconds), and its maximum value is 600
1425 (one minute).
1426
1427
1428 drbdsetup outdate minor
1429 Mark the data on a lower-level device as outdated. This is used for
1430 fencing, and prevents the resource the device is part of from
1431 becoming primary in the future. See the --fencing disk option.
1432
1433 drbdsetup pause-sync resource peer_node_id volume
1434 Stop resynchronizing between a local and a peer device by setting
1435 the local pause flag. The resync can only resume if the pause flags
1436 on both sides of a connection are cleared.
1437
1438 drbdsetup primary resource
1439 Change the role of a node in a resource to primary. This allows the
1440 replicated devices in this resource to be mounted or opened for
1441 writing. Available options:
1442
1443 --overwrite-data-of-peer
1444 This option is an alias for the --force option.
1445
1446 --force
1447 Force the resource to become primary even if some devices are
1448 not guaranteed to have up-to-date data. This option is used to
1449 turn one of the nodes in a newly created cluster into the
1450 primary node, or when manually recovering from a disaster.
1451
1452 Note that this can lead to split-brain scenarios. Also, when
1453 forcefully turning an inconsistent device into an up-to-date
1454 device, it is highly recommended to use any integrity checks
1455 available (such as a filesystem check) to make sure that the
1456 device can at least be used without crashing the system.
1457
1458 Note that DRBD usually only allows one node in a cluster to be in
1459 primary role at any time; this allows DRBD to coordinate access to
1460 the devices in a resource across nodes. The --allow-two-primaries
1461 network option changes this; in that case, a mechanism outside of
1462 DRBD needs to coordinate device access.
1463
1464 drbdsetup resize minor
1465 Reexamine the size of the lower-level devices of a replicated
1466 device on all nodes. This command is called after the lower-level
1467 devices on all nodes have been grown to adjust the size of the
1468 replicated device. Available options:
1469
1470 --assume-peer-has-space
1471 Resize the device even if some of the peer devices are not
1472 connected at the moment. DRBD will try to resize the peer
1473 devices when they next connect. It will refuse to connect to a
1474 peer device which is too small.
1475
1476 --assume-clean
1477 Do not resynchronize the added disk space; instead, assume that
1478 it is identical on all nodes. This option can be used when the
1479 disk space is uninitialized and differences do not matter, or
1480 when it is known to be identical on all nodes. See the
1481 drbdsetup verify command.
1482
1483 --size val
1484 This option can be used to online shrink the usable size of a
1485 drbd device. It's the users responsibility to make sure that a
1486 file system on the device is not truncated by that operation.
1487
1488 --al-stripes val --al-stripes val
1489 These options may be used to change the layout of the activity
1490 log online. In case of internal meta data this may invovle
1491 shrinking the user visible size at the same time (unsing the
1492 --size) or increasing the avalable space on the backing
1493 devices.
1494
1495
1496 drbdsetup resume-io minor
1497 Resume I/O on a replicated device. See the --fencing net option.
1498
1499 drbdsetup resume-sync resource peer_node_id volume
1500 Allow resynchronization to resume by clearing the local sync pause
1501 flag.
1502
1503 drbdsetup role resource
1504 Show the current role of a resource.
1505
1506 drbdsetup secondary resource
1507 Change the role of a node in a resource to secondary. This command
1508 fails if the replicated device is in use.
1509
1510 --force
1511 A forced demotion to secondary causes all pending and new IO
1512 requests to terminate with IO errors.
1513
1514 Please note that a forced demotion returns immediately. The
1515 user should unmount any filesystem that might be mounted on the
1516 DRBD device. The device can be used again when
1517 force-io-failures has a value of no. (See drbdsetup status and
1518 drbdsetup events2).
1519
1520 drbdsetup show {resource | all}
1521 Show the current configuration of a resource, or of all resources.
1522 Available options:
1523
1524 --show-defaults
1525 Show all configuration parameters, even the ones with default
1526 values. Normally, parameters with default values are not shown.
1527
1528
1529 drbdsetup show-gi resource peer_node_id volume
1530 Show the data generation identifiers for a device on a particular
1531 connection. In addition, explain the output. The output otherwise
1532 is the same as in the drbdsetup get-gi command.
1533
1534 drbdsetup state
1535 This is an alias for drbdsetup role. Deprecated.
1536
1537 drbdsetup status {resource | all}
1538 Show the status of a resource, or of all resources. The output
1539 consists of one paragraph for each configured resource. Each
1540 paragraph contains one line for each resource, followed by one line
1541 for each device, and one line for each connection. The device and
1542 connection lines are indented. The connection lines are followed by
1543 one line for each peer device; these lines are indented against the
1544 connection line.
1545
1546 Long lines are wrapped around at terminal width, and indented to
1547 indicate how the lines belongs together. Available options:
1548
1549 --verbose
1550 Include more information in the output even when it is likely
1551 redundant or irrelevant.
1552
1553 --statistics
1554 Include data transfer statistics in the output.
1555
1556 --color={always | auto | never}
1557 Colorize the output. With --color=auto, drbdsetup emits color
1558 codes only when standard output is connected to a terminal.
1559
1560 For example, the non-verbose output for a resource with only one
1561 connection and only one volume could look like this:
1562
1563 drbd0 role:Primary
1564 disk:UpToDate
1565 host2.example.com role:Secondary
1566 disk:UpToDate
1567
1568
1569 With the --verbose option, the same resource could be reported as:
1570
1571 drbd0 node-id:1 role:Primary suspended:no
1572 volume:0 minor:1 disk:UpToDate blocked:no
1573 host2.example.com local:ipv4:192.168.123.4:7788
1574 peer:ipv4:192.168.123.2:7788 node-id:0 connection:WFReportParams
1575 role:Secondary congested:no
1576 volume:0 replication:Connected disk:UpToDate resync-suspended:no
1577
1578
1579
1580 drbdsetup suspend-io minor
1581 Suspend I/O on a replicated device. It is not usually necessary to
1582 use this command.
1583
1584 drbdsetup verify resource peer_node_id volume
1585 Start online verification, change which part of the device will be
1586 verified, or stop online verification. The command requires the
1587 specified peer to be connected.
1588
1589 Online verification compares each disk block on the local and peer
1590 node. Blocks which differ between the nodes are marked as
1591 out-of-sync, but they are not automatically brought back into sync.
1592 To bring them into sync, the drbdsetup invalidate or drbdsetup
1593 invalidate-remote with the --reset-bitmap=no option can be used.
1594 Progress can be monitored in the output of drbdsetup status
1595 --statistics. Available options:
1596
1597 --start position
1598 Define where online verification should start. This parameter
1599 is ignored if online verification is already in progress. If
1600 the start parameter is not specified, online verification will
1601 continue where it was interrupted (if the connection to the
1602 peer was lost while verifying), after the previous stop sector
1603 (if the previous online verification has finished), or at the
1604 beginning of the device (if the end of the device was reached,
1605 or online verify has not run before).
1606
1607 The position on disk is specified in disk sectors (512 bytes)
1608 by default.
1609
1610 --stop position
1611 Define where online verification should stop. If online
1612 verification is already in progress, the stop position of the
1613 active online verification process is changed. Use this to stop
1614 online verification.
1615
1616 The position on disk is specified in disk sectors (512 bytes)
1617 by default.
1618
1619 Also see the notes on data integrity in the drbd.conf(5) manual
1620 page.
1621
1622 drbdsetup wait-connect-volume resource peer_node_id volume,
1623 drbdsetup wait-connect-connection resource peer_node_id,
1624 drbdsetup wait-connect-resource resource,
1625 drbdsetup wait-sync-volume resource peer_node_id volume,
1626 drbdsetup wait-sync-connection resource peer_node_id,
1627 drbdsetup wait-sync-resource resource
1628 The wait-connect-* commands waits until a device on a peer is
1629 visible. The wait-sync-* commands waits until a device on a peer is
1630 up to date. Available options for both commands:
1631
1632 --degr-wfc-timeout timeout
1633 Define how long to wait until all peers are connected in case
1634 the cluster consisted of a single node only when the system
1635 went down. This parameter is usually set to a value smaller
1636 than wfc-timeout. The assumption here is that peers which were
1637 unreachable before a reboot are less likely to be reachable
1638 after the reboot, so waiting is less likely to help.
1639
1640 The timeout is specified in seconds. The default value is 0,
1641 which stands for an infinite timeout. Also see the wfc-timeout
1642 parameter.
1643
1644 --outdated-wfc-timeout timeout
1645 Define how long to wait until all peers are connected if all
1646 peers were outdated when the system went down. This parameter
1647 is usually set to a value smaller than wfc-timeout. The
1648 assumption here is that an outdated peer cannot have become
1649 primary in the meantime, so we don't need to wait for it as
1650 long as for a node which was alive before.
1651
1652 The timeout is specified in seconds. The default value is 0,
1653 which stands for an infinite timeout. Also see the wfc-timeout
1654 parameter.
1655
1656 --wait-after-sb
1657 This parameter causes DRBD to continue waiting in the init
1658 script even when a split-brain situation has been detected, and
1659 the nodes therefore refuse to connect to each other.
1660
1661 --wfc-timeout timeout
1662 Define how long the init script waits until all peers are
1663 connected. This can be useful in combination with a cluster
1664 manager which cannot manage DRBD resources: when the cluster
1665 manager starts, the DRBD resources will already be up and
1666 running. With a more capable cluster manager such as Pacemaker,
1667 it makes more sense to let the cluster manager control DRBD
1668 resources. The timeout is specified in seconds. The default
1669 value is 0, which stands for an infinite timeout. Also see the
1670 degr-wfc-timeout parameter.
1671
1672
1673 drbdsetup forget-peer resource peer_node_id
1674 The forget-peer command removes all traces of a peer node from the
1675 meta-data. It frees a bitmap slot in the meta-data and make it
1676 avalable for futher bitmap slot allocation in case a so-far never
1677 seen node connects.
1678
1679 The connection must be taken down before this command may be used.
1680 In case the peer re-connects at a later point a bit-map based
1681 resync will be turned into a full-sync.
1682
1683 drbdsetup rename-resource resource new_name
1684 Change the name of resource to new_name on the local node. Note
1685 that, since there is no concept of resource names in DRBD's network
1686 protocol, it is technically possible to have different names for a
1687 resource on different nodes. However, it is strongly recommended to
1688 issue the same rename-resource command on all nodes to have
1689 consistent naming across the cluster.
1690
1691 A rename event will be issued on the events2 stream to notify users
1692 of the new name.
1693
1695 Please see the DRBD User's Guide[1] for examples.
1696
1698 This document was revised for version 9.0.0 of the DRBD distribution.
1699
1701 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1702 Ellenberg <lars.ellenberg@linbit.com>.
1703
1705 Report bugs to <drbd-user@lists.linbit.com>.
1706
1708 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1709 Lars Ellenberg. This is free software; see the source for copying
1710 conditions. There is NO warranty; not even for MERCHANTABILITY or
1711 FITNESS FOR A PARTICULAR PURPOSE.
1712
1714 drbd.conf(5), drbd(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1715 Site[2]
1716
1718 1. DRBD User's Guide
1719 http://www.drbd.org/users-guide/
1720
1721 2. DRBD Web Site
1722 http://www.drbd.org/
1723
1724
1725
1726DRBD 9.0.x 17 January 2018 DRBDSETUP(8)