1DRBD.CONF(5) Configuration Files DRBD.CONF(5)
2
3
4
6 drbd.conf - DRBD Configuration Files
7
9 DRBD implements block devices which replicate their data to all nodes
10 of a cluster. The actual data and associated metadata are usually
11 stored redundantly on "ordinary" block devices on each cluster node.
12
13 Replicated block devices are called /dev/drbdminor by default. They are
14 grouped into resources, with one or more devices per resource.
15 Replication among the devices in a resource takes place in
16 chronological order. With DRBD, we refer to the devices inside a
17 resource as volumes.
18
19 In DRBD 9, a resource can be replicated between two or more cluster
20 nodes. The connections between cluster nodes are point-to-point links,
21 and use TCP or a TCP-like protocol. All nodes must be directly
22 connected.
23
24 DRBD consists of low-level user-space components which interact with
25 the kernel and perform basic operations (drbdsetup, drbdmeta), a
26 high-level user-space component which understands and processes the
27 DRBD configuration and translates it into basic operations of the
28 low-level components (drbdadm), and a kernel component.
29
30 The default DRBD configuration consists of /etc/drbd.conf and of
31 additional files included from there, usually global_common.conf and
32 all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33 define each resource in a separate *.res file.
34
35 The configuration files are designed so that each cluster node can
36 contain an identical copy of the entire cluster configuration. The host
37 name of each node determines which parts of the configuration apply
38 (uname -n). It is highly recommended to keep the cluster configuration
39 on all nodes in sync by manually copying it to all nodes, or by
40 automating the process with csync2 or a similar tool.
41
43 global {
44 usage-count yes;
45 udev-always-use-vnr;
46 }
47 resource r0 {
48 net {
49 cram-hmac-alg sha1;
50 shared-secret "FooFunFactory";
51 }
52 volume 0 {
53 device /dev/drbd1;
54 disk /dev/sda7;
55 meta-disk internal;
56 }
57 on alice {
58 node-id 0;
59 address 10.1.1.31:7000;
60 }
61 on bob {
62 node-id 1;
63 address 10.1.1.32:7000;
64 }
65 connection {
66 host alice port 7000;
67 host bob port 7000;
68 net {
69 protocol C;
70 }
71 }
72 }
73
74 This example defines a resource r0 which contains a single replicated
75 device with volume number 0. The resource is replicated among hosts
76 alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77 and the node identifiers 0 and 1, respectively. On both hosts, the
78 replicated device is called /dev/drbd1, and the actual data and
79 metadata are stored on the lower-level device /dev/sda7. The connection
80 between the hosts uses protocol C.
81
82 Please refer to the DRBD User's Guide[1] for more examples.
83
85 DRBD configuration files consist of sections, which contain other
86 sections and parameters depending on the section types. Each section
87 consists of one or more keywords, sometimes a section name, an opening
88 brace (“{”), the section's contents, and a closing brace (“}”).
89 Parameters inside a section consist of a keyword, followed by one or
90 more keywords or values, and a semicolon (“;”).
91
92 Some parameter values have a default scale which applies when a plain
93 number is specified (for example Kilo, or 1024 times the numeric
94 value). Such default scales can be overridden by using a suffix (for
95 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
96 and G = 1024 M are supported.
97
98 Comments start with a hash sign (“#”) and extend to the end of the
99 line. In addition, any section can be prefixed with the keyword skip,
100 which causes the section and any sub-sections to be ignored.
101
102 Additional files can be included with the include file-pattern
103 statement (see glob(7) for the expressions supported in file-pattern).
104 Include statements are only allowed outside of sections.
105
106 The following sections are defined (indentation indicates in which
107 context):
108
109 common
110 [disk]
111 [handlers]
112 [net]
113 [options]
114 [startup]
115 global
116 [require-drbd-module-version-{eq,ne,gt,ge,lt,le}]
117 resource
118 connection
119 multiple path | 2 host
120 [net]
121 [volume]
122 [peer-device-options]
123 [peer-device-options]
124 connection-mesh
125 [net]
126 [disk]
127 floating
128 handlers
129 [net]
130 on
131 volume
132 disk
133 [disk]
134 options
135 stacked-on-top-of
136 startup
137
138 Sections in brackets affect other parts of the configuration: inside
139 the common section, they apply to all resources. A disk section inside
140 a resource or on section applies to all volumes of that resource, and a
141 net section inside a resource section applies to all connections of
142 that resource. This allows to avoid repeating identical options for
143 each resource, connection, or volume. Options can be overridden in a
144 more specific resource, connection, on, or volume section.
145
146 peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
147 c-fill-target, c-max-rate and c-min-rate. Due to backward
148 comapatibility they can be specified in any disk options section as
149 well. They are inherited into all relevant connections. If they are
150 given on connection level they are inherited to all volumes on that
151 connection. A peer-device-options section is started with the disk
152 keyword.
153
154 Sections
155 common
156
157 This section can contain each a disk, handlers, net, options, and
158 startup section. All resources inherit the parameters in these
159 sections as their default values.
160
161 connection [name]
162
163 Define a connection between two hosts. This section must contain
164 two host parameters or multiple path sections. The optional name is
165 used to refer to the connection in the system log and in other
166 messages. If no name is specified, the peer's host name is used
167 instead.
168
169 path
170
171 Define a path between two hosts. This section must contain two host
172 parameters.
173
174 connection-mesh
175
176 Define a connection mesh between multiple hosts. This section must
177 contain a hosts parameter, which has the host names as arguments.
178 This section is a shortcut to define many connections which share
179 the same network options.
180
181 disk
182
183 Define parameters for a volume. All parameters in this section are
184 optional.
185
186 floating [address-family] addr:port
187
188 Like the on section, except that instead of the host name a network
189 address is used to determine if it matches a floating section.
190
191 The node-id parameter in this section is required. If the address
192 parameter is not provided, no connections to peers will be created
193 by default. The device, disk, and meta-disk parameters must be
194 defined in, or inherited by, this section.
195
196 global
197
198 Define some global parameters. All parameters in this section are
199 optional. Only one global section is allowed in the configuration.
200
201 require-drbd-module-version-{eq,ne,gt,ge,lt,le}
202
203 This statement contains one of the valid forms and a three digit
204 version number (e.g., require-drbd-module-version-eq 9.0.16;). If
205 the currently loaded DRBD kernel module does not match the
206 specification, parsing is aborted. Comparison operator names have
207 same semantic as in test(1).
208
209 handlers
210
211 Define handlers to be invoked when certain events occur. The kernel
212 passes the resource name in the first command-line argument and
213 sets the following environment variables depending on the event's
214 context:
215
216 • For events related to a particular device: the device's minor
217 number in DRBD_MINOR, the device's volume number in
218 DRBD_VOLUME.
219
220 • For events related to a particular device on a particular peer:
221 the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
222 DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
223 number in DRBD_MINOR, and the device's volume number in
224 DRBD_VOLUME.
225
226 • For events related to a particular connection: the connection
227 endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
228 and DRBD_PEER_AF; and, for each device defined for that
229 connection: the device's minor number in
230 DRBD_MINOR_volume-number.
231
232 • For events that identify a device, if a lower-level device is
233 attached, the lower-level device's device name is passed in
234 DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
235
236 All parameters in this section are optional. Only a single handler
237 can be defined for each event; if no handler is defined, nothing
238 will happen.
239
240 net
241
242 Define parameters for a connection. All parameters in this section
243 are optional.
244
245 on host-name [...]
246
247 Define the properties of a resource on a particular host or set of
248 hosts. Specifying more than one host name can make sense in a setup
249 with IP address failover, for example. The host-name argument must
250 match the Linux host name (uname -n).
251
252 Usually contains or inherits at least one volume section. The
253 node-id and address parameters must be defined in this section. The
254 device, disk, and meta-disk parameters must be defined in, or
255 inherited by, this section.
256
257 A normal configuration file contains two or more on sections for
258 each resource. Also see the floating section.
259
260 options
261
262 Define parameters for a resource. All parameters in this section
263 are optional.
264
265 resource name
266
267 Define a resource. Usually contains at least two on sections and at
268 least one connection section.
269
270 stacked-on-top-of resource
271
272 Used instead of an on section for configuring a stacked resource
273 with three to four nodes.
274
275 Starting with DRBD 9, stacking is deprecated. It is advised to use
276 resources which are replicated among more than two nodes instead.
277
278 startup
279
280 The parameters in this section determine the behavior of a resource
281 at startup time.
282
283 volume volume-number
284
285 Define a volume within a resource. The volume numbers in the
286 various volume sections of a resource define which devices on which
287 hosts form a replicated device.
288
289 Section connection Parameters
290 host name [address [address-family] address] [port port-number]
291
292 Defines an endpoint for a connection. Each host statement refers to
293 an on section in a resource. If a port number is defined, this
294 endpoint will use the specified port instead of the port defined in
295 the on section. Each connection section must contain exactly two
296 host parameters. Instead of two host parameters the connection may
297 contain multiple path sections.
298
299 Section path Parameters
300 host name [address [address-family] address] [port port-number]
301
302 Defines an endpoint for a connection. Each host statement refers to
303 an on section in a resource. If a port number is defined, this
304 endpoint will use the specified port instead of the port defined in
305 the on section. Each path section must contain exactly two host
306 parameters.
307
308 Section connection-mesh Parameters
309 hosts name...
310
311 Defines all nodes of a mesh. Each name refers to an on section in a
312 resource. The port that is defined in the on section will be used.
313
314 Section disk Parameters
315 al-extents extents
316
317 DRBD automatically maintains a "hot" or "active" disk area likely
318 to be written to again soon based on the recent write activity. The
319 "active" disk area can be written to immediately, while "inactive"
320 disk areas must be "activated" first, which requires a meta-data
321 write. We also refer to this active disk area as the "activity
322 log".
323
324 The activity log saves meta-data writes, but the whole log must be
325 resynced upon recovery of a failed node. The size of the activity
326 log is a major factor of how long a resync will take and how fast a
327 replicated disk will become consistent after a crash.
328
329 The activity log consists of a number of 4-Megabyte segments; the
330 al-extents parameter determines how many of those segments can be
331 active at the same time. The default value for al-extents is 1237,
332 with a minimum of 7 and a maximum of 65536.
333
334 Note that the effective maximum may be smaller, depending on how
335 you created the device meta data, see also drbdmeta(8) The
336 effective maximum is 919 * (available on-disk activity-log
337 ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
338 maximum of 6433 (covers more than 25 GiB of data) We recommend to
339 keep this well within the amount your backend storage and
340 replication link are able to resync inside of about 5 minutes.
341
342 al-updates {yes | no}
343
344 With this parameter, the activity log can be turned off entirely
345 (see the al-extents parameter). This will speed up writes because
346 fewer meta-data writes will be necessary, but the entire device
347 needs to be resynchronized opon recovery of a failed primary node.
348 The default value for al-updates is yes.
349
350 disk-barrier,
351 disk-flushes,
352 disk-drain
353 DRBD has three methods of handling the ordering of dependent write
354 requests:
355
356 disk-barrier
357 Use disk barriers to make sure that requests are written to
358 disk in the right order. Barriers ensure that all requests
359 submitted before a barrier make it to the disk before any
360 requests submitted after the barrier. This is implemented using
361 'tagged command queuing' on SCSI devices and 'native command
362 queuing' on SATA devices. Only some devices and device stacks
363 support this method. The device mapper (LVM) only supports
364 barriers in some configurations.
365
366 Note that on systems which do not support disk barriers,
367 enabling this option can lead to data loss or corruption. Until
368 DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
369 DRBD did support barriers. Kernels since linux-2.6.36 (or
370 2.6.32 RHEL6) no longer allow to detect if barriers are
371 supported. Since drbd-8.4.2, this option is off by default and
372 needs to be enabled explicitly.
373
374 disk-flushes
375 Use disk flushes between dependent write requests, also
376 referred to as 'force unit access' by drive vendors. This
377 forces all data to disk. This option is enabled by default.
378
379 disk-drain
380 Wait for the request queue to "drain" (that is, wait for the
381 requests to finish) before submitting a dependent write
382 request. This method requires that requests are stable on disk
383 when they finish. Before DRBD 8.0.9, this was the only method
384 implemented. This option is enabled by default. Do not disable
385 in production environments.
386
387 From these three methods, drbd will use the first that is enabled
388 and supported by the backing storage device. If all three of these
389 options are turned off, DRBD will submit write requests without
390 bothering about dependencies. Depending on the I/O stack, write
391 requests can be reordered, and they can be submitted in a different
392 order on different cluster nodes. This can result in data loss or
393 corruption. Therefore, turning off all three methods of controlling
394 write ordering is strongly discouraged.
395
396 A general guideline for configuring write ordering is to use disk
397 barriers or disk flushes when using ordinary disks (or an ordinary
398 disk array) with a volatile write cache. On storage without cache
399 or with a battery backed write cache, disk draining can be a
400 reasonable choice.
401
402 disk-timeout
403 If the lower-level device on which a DRBD device stores its data
404 does not finish an I/O request within the defined disk-timeout,
405 DRBD treats this as a failure. The lower-level device is detached,
406 and the device's disk state advances to Diskless. If DRBD is
407 connected to one or more peers, the failed request is passed on to
408 one of them.
409
410 This option is dangerous and may lead to kernel panic!
411
412 "Aborting" requests, or force-detaching the disk, is intended for
413 completely blocked/hung local backing devices which do no longer
414 complete requests at all, not even do error completions. In this
415 situation, usually a hard-reset and failover is the only way out.
416
417 By "aborting", basically faking a local error-completion, we allow
418 for a more graceful swichover by cleanly migrating services. Still
419 the affected node has to be rebooted "soon".
420
421 By completing these requests, we allow the upper layers to re-use
422 the associated data pages.
423
424 If later the local backing device "recovers", and now DMAs some
425 data from disk into the original request pages, in the best case it
426 will just put random data into unused pages; but typically it will
427 corrupt meanwhile completely unrelated data, causing all sorts of
428 damage.
429
430 Which means delayed successful completion, especially for READ
431 requests, is a reason to panic(). We assume that a delayed *error*
432 completion is OK, though we still will complain noisily about it.
433
434 The default value of disk-timeout is 0, which stands for an
435 infinite timeout. Timeouts are specified in units of 0.1 seconds.
436 This option is available since DRBD 8.3.12.
437
438 md-flushes
439 Enable disk flushes and disk barriers on the meta-data device. This
440 option is enabled by default. See the disk-flushes parameter.
441
442 on-io-error handler
443
444 Configure how DRBD reacts to I/O errors on a lower-level device.
445 The following policies are defined:
446
447 pass_on
448 Change the disk status to Inconsistent, mark the failed block
449 as inconsistent in the bitmap, and retry the I/O operation on a
450 remote cluster node.
451
452 call-local-io-error
453 Call the local-io-error handler (see the handlers section).
454
455 detach
456 Detach the lower-level device and continue in diskless mode.
457
458
459 read-balancing policy
460 Distribute read requests among cluster nodes as defined by policy.
461 The supported policies are prefer-local (the default),
462 prefer-remote, round-robin, least-pending, when-congested-remote,
463 32K-striping, 64K-striping, 128K-striping, 256K-striping,
464 512K-striping and 1M-striping.
465
466 This option is available since DRBD 8.4.1.
467
468 resync-after res-name/volume
469
470 Define that a device should only resynchronize after the specified
471 other device. By default, no order between devices is defined, and
472 all devices will resynchronize in parallel. Depending on the
473 configuration of the lower-level devices, and the available network
474 and disk bandwidth, this can slow down the overall resync process.
475 This option can be used to form a chain or tree of dependencies
476 among devices.
477
478 rs-discard-granularity byte
479 When rs-discard-granularity is set to a non zero, positive value
480 then DRBD tries to do a resync operation in requests of this size.
481 In case such a block contains only zero bytes on the sync source
482 node, the sync target node will issue a discard/trim/unmap command
483 for the area.
484
485 The value is constrained by the discard granularity of the backing
486 block device. In case rs-discard-granularity is not a multiplier of
487 the discard granularity of the backing block device DRBD rounds it
488 up. The feature only gets active if the backing block device reads
489 back zeroes after a discard command.
490
491 The default value of rs-discard-granularity is 0. This option is
492 available since 8.4.7.
493
494 discard-zeroes-if-aligned {yes | no}
495
496 There are several aspects to discard/trim/unmap support on linux
497 block devices. Even if discard is supported in general, it may fail
498 silently, or may partially ignore discard requests. Devices also
499 announce whether reading from unmapped blocks returns defined data
500 (usually zeroes), or undefined data (possibly old data, possibly
501 garbage).
502
503 If on different nodes, DRBD is backed by devices with differing
504 discard characteristics, discards may lead to data divergence (old
505 data or garbage left over on one backend, zeroes due to unmapped
506 areas on the other backend). Online verify would now potentially
507 report tons of spurious differences. While probably harmless for
508 most use cases (fstrim on a file system), DRBD cannot have that.
509
510 To play safe, we have to disable discard support, if our local
511 backend (on a Primary) does not support "discard_zeroes_data=true".
512 We also have to translate discards to explicit zero-out on the
513 receiving side, unless the receiving side (Secondary) supports
514 "discard_zeroes_data=true", thereby allocating areas what were
515 supposed to be unmapped.
516
517 There are some devices (notably the LVM/DM thin provisioning) that
518 are capable of discard, but announce discard_zeroes_data=false. In
519 the case of DM-thin, discards aligned to the chunk size will be
520 unmapped, and reading from unmapped sectors will return zeroes.
521 However, unaligned partial head or tail areas of discard requests
522 will be silently ignored.
523
524 If we now add a helper to explicitly zero-out these unaligned
525 partial areas, while passing on the discard of the aligned full
526 chunks, we effectively achieve discard_zeroes_data=true on such
527 devices.
528
529 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
530 discards, and to announce discard_zeroes_data=true, even on
531 backends that announce discard_zeroes_data=false.
532
533 Setting discard-zeroes-if-aligned to no will cause DRBD to always
534 fall-back to zero-out on the receiving side, and to not even
535 announce discard capabilities on the Primary, if the respective
536 backend announces discard_zeroes_data=false.
537
538 We used to ignore the discard_zeroes_data setting completely. To
539 not break established and expected behaviour, and suddenly cause
540 fstrim on thin-provisioned LVs to run out-of-space instead of
541 freeing up space, the default value is yes.
542
543 This option is available since 8.4.7.
544
545 disable-write-same {yes | no}
546
547 Some disks announce WRITE_SAME support to the kernel but fail with
548 an I/O error upon actually receiving such a request. This mostly
549 happens when using virtualized disks -- notably, this behavior has
550 been observed with VMware's virtual disks.
551
552 When disable-write-same is set to yes, WRITE_SAME detection is
553 manually overriden and support is disabled.
554
555 The default value of disable-write-same is no. This option is
556 available since 8.4.7.
557
558 Section peer-device-options Parameters
559 Please note that you open the section with the disk keyword.
560
561 c-delay-target delay_target,
562 c-fill-target fill_target,
563 c-max-rate max_rate,
564 c-plan-ahead plan_time
565 Dynamically control the resync speed. The following modes are
566 available:
567
568 • Dynamic control with fill target (default). Enabled when
569 c-plan-ahead is non-zero and c-fill-target is non-zero. The
570 goal is to fill the buffers along the data path with a defined
571 amount of data. This mode is recommended when DRBD-proxy is
572 used. Configured with c-plan-ahead, c-fill-target and
573 c-max-rate.
574
575 • Dynamic control with delay target. Enabled when c-plan-ahead is
576 non-zero (default) and c-fill-target is zero. The goal is to
577 have a defined delay along the path. Configured with
578 c-plan-ahead, c-delay-target and c-max-rate.
579
580 • Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD will
581 try to perform resync I/O at a fixed rate. Configured with
582 resync-rate.
583
584 The c-plan-ahead parameter defines how fast DRBD adapts to changes
585 in the resync speed. It should be set to five times the network
586 round-trip time or more. The default value of c-plan-ahead is 20,
587 in units of 0.1 seconds.
588
589 The c-fill-target parameter defines the how much resync data DRBD
590 should aim to have in-flight at all times. Common values for
591 "normal" data paths range from 4K to 100K. The default value of
592 c-fill-target is 100, in units of sectors
593
594 The c-delay-target parameter defines the delay in the resync path
595 that DRBD should aim for. This should be set to five times the
596 network round-trip time or more. The default value of
597 c-delay-target is 10, in units of 0.1 seconds.
598
599 The c-max-rate parameter limits the maximum bandwidth used by
600 dynamically controlled resyncs. Setting this to zero removes the
601 limitation (since DRBD 9.0.28). It should be set to either the
602 bandwidth available between the DRBD hosts and the machines hosting
603 DRBD-proxy, or to the available disk bandwidth. The default value
604 of c-max-rate is 102400, in units of KiB/s.
605
606 Dynamic resync speed control is available since DRBD 8.3.9.
607
608 c-min-rate min_rate
609 A node which is primary and sync-source has to schedule application
610 I/O requests and resync I/O requests. The c-min-rate parameter
611 limits how much bandwidth is available for resync I/O; the
612 remaining bandwidth is used for application I/O.
613
614 A c-min-rate value of 0 means that there is no limit on the resync
615 I/O bandwidth. This can slow down application I/O significantly.
616 Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
617
618 The default value of c-min-rate is 250, in units of KiB/s.
619
620 resync-rate rate
621
622 Define how much bandwidth DRBD may use for resynchronizing. DRBD
623 allows "normal" application I/O even during a resync. If the resync
624 takes up too much bandwidth, application I/O can become very slow.
625 This parameter allows to avoid that. Please note this is option
626 only works when the dynamic resync controller is disabled.
627
628 Section global Parameters
629 dialog-refresh time
630
631 The DRBD init script can be used to configure and start DRBD
632 devices, which can involve waiting for other cluster nodes. While
633 waiting, the init script shows the remaining waiting time. The
634 dialog-refresh defines the number of seconds between updates of
635 that countdown. The default value is 1; a value of 0 turns off the
636 countdown.
637
638 disable-ip-verification
639 Normally, DRBD verifies that the IP addresses in the configuration
640 match the host names. Use the disable-ip-verification parameter to
641 disable these checks.
642
643 usage-count {yes | no | ask}
644 A explained on DRBD's Online Usage Counter[2] web page, DRBD
645 includes a mechanism for anonymously counting how many
646 installations are using which versions of DRBD. The results are
647 available on the web page for anyone to see.
648
649 This parameter defines if a cluster node participates in the usage
650 counter; the supported values are yes, no, and ask (ask the user,
651 the default).
652
653 We would like to ask users to participate in the online usage
654 counter as this provides us valuable feedback for steering the
655 development of DRBD.
656
657 udev-always-use-vnr
658 When udev asks drbdadm for a list of device related symlinks,
659 drbdadm would suggest symlinks with differing naming conventions,
660 depending on whether the resource has explicit volume VNR { }
661 definitions, or only one single volume with the implicit volume
662 number 0:
663
664 # implicit single volume without "volume 0 {}" block
665 DEVICE=drbd<minor>
666 SYMLINK_BY_RES=drbd/by-res/<resource-name>
667 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
668
669 # explicit volume definition: volume VNR { }
670 DEVICE=drbd<minor>
671 SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
672 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
673
674 If you define this parameter in the global section, drbdadm will
675 always add the .../VNR part, and will not care for whether the
676 volume definition was implicit or explicit.
677
678 For legacy backward compatibility, this is off by default, but we
679 do recommend to enable it.
680
681 Section handlers Parameters
682 after-resync-target cmd
683
684 Called on a resync target when a node state changes from
685 Inconsistent to Consistent when a resync finishes. This handler can
686 be used for removing the snapshot created in the
687 before-resync-target handler.
688
689 before-resync-target cmd
690
691 Called on a resync target before a resync begins. This handler can
692 be used for creating a snapshot of the lower-level device for the
693 duration of the resync: if the resync source becomes unavailable
694 during a resync, reverting to the snapshot can restore a consistent
695 state.
696
697 before-resync-source cmd
698
699 Called on a resync source before a resync begins.
700
701 out-of-sync cmd
702
703 Called on all nodes after a verify finishes and out-of-sync blocks
704 were found. This handler is mainly used for monitoring purposes. An
705 example would be to call a script that sends an alert SMS.
706
707 quorum-lost cmd
708
709 Called on a Primary that lost quorum. This handler is usually used
710 to reboot the node if it is not possible to restart the application
711 that uses the storage on top of DRBD.
712
713 fence-peer cmd
714
715 Called when a node should fence a resource on a particular peer.
716 The handler should not use the same communication path that DRBD
717 uses for talking to the peer.
718
719 unfence-peer cmd
720
721 Called when a node should remove fencing constraints from other
722 nodes.
723
724 initial-split-brain cmd
725
726 Called when DRBD connects to a peer and detects that the peer is in
727 a split-brain state with the local node. This handler is also
728 called for split-brain scenarios which will be resolved
729 automatically.
730
731 local-io-error cmd
732
733 Called when an I/O error occurs on a lower-level device.
734
735 pri-lost cmd
736
737 The local node is currently primary, but DRBD believes that it
738 should become a sync target. The node should give up its primary
739 role.
740
741 pri-lost-after-sb cmd
742
743 The local node is currently primary, but it has lost the
744 after-split-brain auto recovery procedure. The node should be
745 abandoned.
746
747 pri-on-incon-degr cmd
748
749 The local node is primary, and neither the local lower-level device
750 nor a lower-level device on a peer is up to date. (The primary has
751 no device to read from or to write to.)
752
753 split-brain cmd
754
755 DRBD has detected a split-brain situation which could not be
756 resolved automatically. Manual recovery is necessary. This handler
757 can be used to call for administrator attention.
758
759 disconnected cmd
760
761 A connection to a peer went down. The handler can learn about the
762 reason for the disconnect from the DRBD_CSTATE environment
763 variable.
764
765 Section net Parameters
766 after-sb-0pri policy
767 Define how to react if a split-brain scenario is detected and none
768 of the two nodes is in primary role. (We detect split-brain
769 scenarios when two nodes connect; split-brain decisions are always
770 between two nodes.) The defined policies are:
771
772 disconnect
773 No automatic resynchronization; simply disconnect.
774
775 discard-younger-primary,
776 discard-older-primary
777 Resynchronize from the node which became primary first
778 (discard-younger-primary) or last (discard-older-primary). If
779 both nodes became primary independently, the
780 discard-least-changes policy is used.
781
782 discard-zero-changes
783 If only one of the nodes wrote data since the split brain
784 situation was detected, resynchronize from this node to the
785 other. If both nodes wrote data, disconnect.
786
787 discard-least-changes
788 Resynchronize from the node with more modified blocks.
789
790 discard-node-nodename
791 Always resynchronize to the named node.
792
793 after-sb-1pri policy
794 Define how to react if a split-brain scenario is detected, with one
795 node in primary role and one node in secondary role. (We detect
796 split-brain scenarios when two nodes connect, so split-brain
797 decisions are always among two nodes.) The defined policies are:
798
799 disconnect
800 No automatic resynchronization, simply disconnect.
801
802 consensus
803 Discard the data on the secondary node if the after-sb-0pri
804 algorithm would also discard the data on the secondary node.
805 Otherwise, disconnect.
806
807 violently-as0p
808 Always take the decision of the after-sb-0pri algorithm, even
809 if it causes an erratic change of the primary's view of the
810 data. This is only useful if a single-node file system (i.e.,
811 not OCFS2 or GFS) with the allow-two-primaries flag is used.
812 This option can cause the primary node to crash, and should not
813 be used.
814
815 discard-secondary
816 Discard the data on the secondary node.
817
818 call-pri-lost-after-sb
819 Always take the decision of the after-sb-0pri algorithm. If the
820 decision is to discard the data on the primary node, call the
821 pri-lost-after-sb handler on the primary node.
822
823 after-sb-2pri policy
824 Define how to react if a split-brain scenario is detected and both
825 nodes are in primary role. (We detect split-brain scenarios when
826 two nodes connect, so split-brain decisions are always among two
827 nodes.) The defined policies are:
828
829 disconnect
830 No automatic resynchronization, simply disconnect.
831
832 violently-as0p
833 See the violently-as0p policy for after-sb-1pri.
834
835 call-pri-lost-after-sb
836 Call the pri-lost-after-sb helper program on one of the
837 machines unless that machine can demote to secondary. The
838 helper program is expected to reboot the machine, which brings
839 the node into a secondary role. Which machine runs the helper
840 program is determined by the after-sb-0pri strategy.
841
842 allow-two-primaries
843
844 The most common way to configure DRBD devices is to allow only one
845 node to be primary (and thus writable) at a time.
846
847 In some scenarios it is preferable to allow two nodes to be primary
848 at once; a mechanism outside of DRBD then must make sure that
849 writes to the shared, replicated device happen in a coordinated
850 way. This can be done with a shared-storage cluster file system
851 like OCFS2 and GFS, or with virtual machine images and a virtual
852 machine manager that can migrate virtual machines between physical
853 machines.
854
855 The allow-two-primaries parameter tells DRBD to allow two nodes to
856 be primary at the same time. Never enable this option when using a
857 non-distributed file system; otherwise, data corruption and node
858 crashes will result!
859
860 always-asbp
861 Normally the automatic after-split-brain policies are only used if
862 current states of the UUIDs do not indicate the presence of a third
863 node.
864
865 With this option you request that the automatic after-split-brain
866 policies are used as long as the data sets of the nodes are somehow
867 related. This might cause a full sync, if the UUIDs indicate the
868 presence of a third node. (Or double faults led to strange UUID
869 sets.)
870
871 connect-int time
872
873 As soon as a connection between two nodes is configured with
874 drbdsetup connect, DRBD immediately tries to establish the
875 connection. If this fails, DRBD waits for connect-int seconds and
876 then repeats. The default value of connect-int is 10 seconds.
877
878 cram-hmac-alg hash-algorithm
879
880 Configure the hash-based message authentication code (HMAC) or
881 secure hash algorithm to use for peer authentication. The kernel
882 supports a number of different algorithms, some of which may be
883 loadable as kernel modules. See the shash algorithms listed in
884 /proc/crypto. By default, cram-hmac-alg is unset. Peer
885 authentication also requires a shared-secret to be configured.
886
887 csums-alg hash-algorithm
888
889 Normally, when two nodes resynchronize, the sync target requests a
890 piece of out-of-sync data from the sync source, and the sync source
891 sends the data. With many usage patterns, a significant number of
892 those blocks will actually be identical.
893
894 When a csums-alg algorithm is specified, when requesting a piece of
895 out-of-sync data, the sync target also sends along a hash of the
896 data it currently has. The sync source compares this hash with its
897 own version of the data. It sends the sync target the new data if
898 the hashes differ, and tells it that the data are the same
899 otherwise. This reduces the network bandwidth required, at the cost
900 of higher cpu utilization and possibly increased I/O on the sync
901 target.
902
903 The csums-alg can be set to one of the secure hash algorithms
904 supported by the kernel; see the shash algorithms listed in
905 /proc/crypto. By default, csums-alg is unset.
906
907 csums-after-crash-only
908
909 Enabling this option (and csums-alg, above) makes it possible to
910 use the checksum based resync only for the first resync after
911 primary crash, but not for later "network hickups".
912
913 In most cases, block that are marked as need-to-be-resynced are in
914 fact changed, so calculating checksums, and both reading and
915 writing the blocks on the resync target is all effective overhead.
916
917 The advantage of checksum based resync is mostly after primary
918 crash recovery, where the recovery marked larger areas (those
919 covered by the activity log) as need-to-be-resynced, just in case.
920 Introduced in 8.4.5.
921
922 data-integrity-alg alg
923 DRBD normally relies on the data integrity checks built into the
924 TCP/IP protocol, but if a data integrity algorithm is configured,
925 it will additionally use this algorithm to make sure that the data
926 received over the network match what the sender has sent. If a data
927 integrity error is detected, DRBD will close the network connection
928 and reconnect, which will trigger a resync.
929
930 The data-integrity-alg can be set to one of the secure hash
931 algorithms supported by the kernel; see the shash algorithms listed
932 in /proc/crypto. By default, this mechanism is turned off.
933
934 Because of the CPU overhead involved, we recommend not to use this
935 option in production environments. Also see the notes on data
936 integrity below.
937
938 fencing fencing_policy
939
940 Fencing is a preventive measure to avoid situations where both
941 nodes are primary and disconnected. This is also known as a
942 split-brain situation. DRBD supports the following fencing
943 policies:
944
945 dont-care
946 No fencing actions are taken. This is the default policy.
947
948 resource-only
949 If a node becomes a disconnected primary, it tries to fence the
950 peer. This is done by calling the fence-peer handler. The
951 handler is supposed to reach the peer over an alternative
952 communication path and call 'drbdadm outdate minor' there.
953
954 resource-and-stonith
955 If a node becomes a disconnected primary, it freezes all its IO
956 operations and calls its fence-peer handler. The fence-peer
957 handler is supposed to reach the peer over an alternative
958 communication path and call 'drbdadm outdate minor' there. In
959 case it cannot do that, it should stonith the peer. IO is
960 resumed as soon as the situation is resolved. In case the
961 fence-peer handler fails, I/O can be resumed manually with
962 'drbdadm resume-io'.
963
964 ko-count number
965
966 If a secondary node fails to complete a write request in ko-count
967 times the timeout parameter, it is excluded from the cluster. The
968 primary node then sets the connection to this secondary node to
969 Standalone. To disable this feature, you should explicitly set it
970 to 0; defaults may change between versions.
971
972 max-buffers number
973
974 Limits the memory usage per DRBD minor device on the receiving
975 side, or for internal buffers during resync or online-verify. Unit
976 is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
977 setting is hard coded to 32 (=128 KiB). These buffers are used to
978 hold data blocks while they are written to/read from disk. To avoid
979 possible distributed deadlocks on congestion, this setting is used
980 as a throttle threshold rather than a hard limit. Once more than
981 max-buffers pages are in use, further allocation from this pool is
982 throttled. You want to increase max-buffers if you cannot saturate
983 the IO backend on the receiving side.
984
985 max-epoch-size number
986
987 Define the maximum number of write requests DRBD may issue before
988 issuing a write barrier. The default value is 2048, with a minimum
989 of 1 and a maximum of 20000. Setting this parameter to a value
990 below 10 is likely to decrease performance.
991
992 on-congestion policy,
993 congestion-fill threshold,
994 congestion-extents threshold
995 By default, DRBD blocks when the TCP send queue is full. This
996 prevents applications from generating further write requests until
997 more buffer space becomes available again.
998
999 When DRBD is used together with DRBD-proxy, it can be better to use
1000 the pull-ahead on-congestion policy, which can switch DRBD into
1001 ahead/behind mode before the send queue is full. DRBD then records
1002 the differences between itself and the peer in its bitmap, but it
1003 no longer replicates them to the peer. When enough buffer space
1004 becomes available again, the node resynchronizes with the peer and
1005 switches back to normal replication.
1006
1007 This has the advantage of not blocking application I/O even when
1008 the queues fill up, and the disadvantage that peer nodes can fall
1009 behind much further. Also, while resynchronizing, peer nodes will
1010 become inconsistent.
1011
1012 The available congestion policies are block (the default) and
1013 pull-ahead. The congestion-fill parameter defines how much data is
1014 allowed to be "in flight" in this connection. The default value is
1015 0, which disables this mechanism of congestion control, with a
1016 maximum of 10 GiBytes. The congestion-extents parameter defines how
1017 many bitmap extents may be active before switching into
1018 ahead/behind mode, with the same default and limits as the
1019 al-extents parameter. The congestion-extents parameter is effective
1020 only when set to a value smaller than al-extents.
1021
1022 Ahead/behind mode is available since DRBD 8.3.10.
1023
1024 ping-int interval
1025
1026 When the TCP/IP connection to a peer is idle for more than ping-int
1027 seconds, DRBD will send a keep-alive packet to make sure that a
1028 failed peer or network connection is detected reasonably soon. The
1029 default value is 10 seconds, with a minimum of 1 and a maximum of
1030 120 seconds. The unit is seconds.
1031
1032 ping-timeout timeout
1033
1034 Define the timeout for replies to keep-alive packets. If the peer
1035 does not reply within ping-timeout, DRBD will close and try to
1036 reestablish the connection. The default value is 0.5 seconds, with
1037 a minimum of 0.1 seconds and a maximum of 30 seconds. The unit is
1038 tenths of a second.
1039
1040 socket-check-timeout timeout
1041 In setups involving a DRBD-proxy and connections that experience a
1042 lot of buffer-bloat it might be necessary to set ping-timeout to an
1043 unusual high value. By default DRBD uses the same value to wait if
1044 a newly established TCP-connection is stable. Since the DRBD-proxy
1045 is usually located in the same data center such a long wait time
1046 may hinder DRBD's connect process.
1047
1048 In such setups socket-check-timeout should be set to at least to
1049 the round trip time between DRBD and DRBD-proxy. I.e. in most cases
1050 to 1.
1051
1052 The default unit is tenths of a second, the default value is 0
1053 (which causes DRBD to use the value of ping-timeout instead).
1054 Introduced in 8.4.5.
1055
1056 protocol name
1057 Use the specified protocol on this connection. The supported
1058 protocols are:
1059
1060 A
1061 Writes to the DRBD device complete as soon as they have reached
1062 the local disk and the TCP/IP send buffer.
1063
1064 B
1065 Writes to the DRBD device complete as soon as they have reached
1066 the local disk, and all peers have acknowledged the receipt of
1067 the write requests.
1068
1069 C
1070 Writes to the DRBD device complete as soon as they have reached
1071 the local and all remote disks.
1072
1073
1074 rcvbuf-size size
1075
1076 Configure the size of the TCP/IP receive buffer. A value of 0 (the
1077 default) causes the buffer size to adjust dynamically. This
1078 parameter usually does not need to be set, but it can be set to a
1079 value up to 10 MiB. The default unit is bytes.
1080
1081 rr-conflict policy
1082 This option helps to solve the cases when the outcome of the resync
1083 decision is incompatible with the current role assignment in the
1084 cluster. The defined policies are:
1085
1086 disconnect
1087 No automatic resynchronization, simply disconnect.
1088
1089 retry-connect
1090 Disconnect now, and retry to connect immediatly afterwards.
1091
1092 violently
1093 Resync to the primary node is allowed, violating the assumption
1094 that data on a block device are stable for one of the nodes.
1095 Do not use this option, it is dangerous.
1096
1097 call-pri-lost
1098 Call the pri-lost handler on one of the machines. The handler
1099 is expected to reboot the machine, which puts it into secondary
1100 role.
1101
1102 auto-discard
1103 Auto-discard reverses the resync direction, so that DRBD
1104 resyncs the current primary to the current secondary.
1105 Auto-discard only applies when protocol A is in use and the
1106 resync decision is based on the principle that a crashed
1107 primary should be the source of a resync. When a primary node
1108 crashes, it might have written some last updates to its disk,
1109 which were not received by a protocol A secondary. By promoting
1110 the secondary in the meantime the user accepted that those last
1111 updates have been lost. By using auto-discard you consent that
1112 the last updates (before the crash of the primary) should be
1113 rolled back automatically.
1114
1115 shared-secret secret
1116
1117 Configure the shared secret used for peer authentication. The
1118 secret is a string of up to 64 characters. Peer authentication also
1119 requires the cram-hmac-alg parameter to be set.
1120
1121 sndbuf-size size
1122
1123 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1124 8.2.7, a value of 0 (the default) causes the buffer size to adjust
1125 dynamically. Values below 32 KiB are harmful to the throughput on
1126 this connection. Large buffer sizes can be useful especially when
1127 protocol A is used over high-latency networks; the maximum value
1128 supported is 10 MiB.
1129
1130 tcp-cork
1131 By default, DRBD uses the TCP_CORK socket option to prevent the
1132 kernel from sending partial messages; this results in fewer and
1133 bigger packets on the network. Some network stacks can perform
1134 worse with this optimization. On these, the tcp-cork parameter can
1135 be used to turn this optimization off.
1136
1137 timeout time
1138
1139 Define the timeout for replies over the network: if a peer node
1140 does not send an expected reply within the specified timeout, it is
1141 considered dead and the TCP/IP connection is closed. The timeout
1142 value must be lower than connect-int and lower than ping-int. The
1143 default is 6 seconds; the value is specified in tenths of a second.
1144
1145 transport type
1146
1147 With DRBD9 the network transport used by DRBD is loaded as a
1148 seperate module. With this option you can specify which transport
1149 and module to load. At present only two options exist, tcp and
1150 rdma. Please note that currently the RDMA transport module is only
1151 available with a license purchased from LINBIT. Default is tcp.
1152
1153 use-rle
1154
1155 Each replicated device on a cluster node has a separate bitmap for
1156 each of its peer devices. The bitmaps are used for tracking the
1157 differences between the local and peer device: depending on the
1158 cluster state, a disk range can be marked as different from the
1159 peer in the device's bitmap, in the peer device's bitmap, or in
1160 both bitmaps. When two cluster nodes connect, they exchange each
1161 other's bitmaps, and they each compute the union of the local and
1162 peer bitmap to determine the overall differences.
1163
1164 Bitmaps of very large devices are also relatively large, but they
1165 usually compress very well using run-length encoding. This can save
1166 time and bandwidth for the bitmap transfers.
1167
1168 The use-rle parameter determines if run-length encoding should be
1169 used. It is on by default since DRBD 8.4.0.
1170
1171 verify-alg hash-algorithm
1172 Online verification (drbdadm verify) computes and compares
1173 checksums of disk blocks (i.e., hash values) in order to detect if
1174 they differ. The verify-alg parameter determines which algorithm to
1175 use for these checksums. It must be set to one of the secure hash
1176 algorithms supported by the kernel before online verify can be
1177 used; see the shash algorithms listed in /proc/crypto.
1178
1179 We recommend to schedule online verifications regularly during
1180 low-load periods, for example once a month. Also see the notes on
1181 data integrity below.
1182
1183 allow-remote-read bool-value
1184 Allows or disallows DRBD to read from a peer node.
1185
1186 When the disk of a primary node is detached, DRBD will try to
1187 continue reading and writing from another node in the cluster. For
1188 this purpose, it searches for nodes with up-to-date data, and uses
1189 any found node to resume operations. In some cases it may not be
1190 desirable to read back data from a peer node, because the node
1191 should only be used as a replication target. In this case, the
1192 allow-remote-read parameter can be set to no, which would prohibit
1193 this node from reading data from the peer node.
1194
1195 The allow-remote-read parameter is available since DRBD 9.0.19, and
1196 defaults to yes.
1197
1198 Section on Parameters
1199 address [address-family] address:port
1200
1201 Defines the address family, address, and port of a connection
1202 endpoint.
1203
1204 The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1205 Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1206 Protocol), and sci are supported (sci is an alias for ssocks). If
1207 no address family is specified, ipv4 is assumed. For all address
1208 families except ipv6, the address is specified in IPV4 address
1209 notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1210 in brackets and uses IPv6 address notation (for example,
1211 [fd01:2345:6789:abcd::1]). The port is always specified as a
1212 decimal number from 1 to 65535.
1213
1214 On each host, the port numbers must be unique for each address;
1215 ports cannot be shared.
1216
1217 node-id value
1218
1219 Defines the unique node identifier for a node in the cluster. Node
1220 identifiers are used to identify individual nodes in the network
1221 protocol, and to assign bitmap slots to nodes in the metadata.
1222
1223 Node identifiers can only be reasssigned in a cluster when the
1224 cluster is down. It is essential that the node identifiers in the
1225 configuration and in the device metadata are changed consistently
1226 on all hosts. To change the metadata, dump the current state with
1227 drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1228 metadata with drbdmeta restore-md.
1229
1230 The node-id parameter exists since DRBD 9. Its value ranges from 0
1231 to 16; there is no default.
1232
1233 Section options Parameters (Resource Options)
1234 auto-promote bool-value
1235 A resource must be promoted to primary role before any of its
1236 devices can be mounted or opened for writing.
1237
1238 Before DRBD 9, this could only be done explicitly ("drbdadm
1239 primary"). Since DRBD 9, the auto-promote parameter allows to
1240 automatically promote a resource to primary role when one of its
1241 devices is mounted or opened for writing. As soon as all devices
1242 are unmounted or closed with no more remaining users, the role of
1243 the resource changes back to secondary.
1244
1245 Automatic promotion only succeeds if the cluster state allows it
1246 (that is, if an explicit drbdadm primary command would succeed).
1247 Otherwise, mounting or opening the device fails as it already did
1248 before DRBD 9: the mount(2) system call fails with errno set to
1249 EROFS (Read-only file system); the open(2) system call fails with
1250 errno set to EMEDIUMTYPE (wrong medium type).
1251
1252 Irrespective of the auto-promote parameter, if a device is promoted
1253 explicitly (drbdadm primary), it also needs to be demoted
1254 explicitly (drbdadm secondary).
1255
1256 The auto-promote parameter is available since DRBD 9.0.0, and
1257 defaults to yes.
1258
1259 cpu-mask cpu-mask
1260
1261 Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1262 specified as a hexadecimal number. The default value is 0, which
1263 lets the scheduler decide which kernel threads run on which CPUs.
1264 CPU numbers in cpu-mask which do not exist in the system are
1265 ignored.
1266
1267 on-no-data-accessible policy
1268 Determine how to deal with I/O requests when the requested data is
1269 not available locally or remotely (for example, when all disks have
1270 failed). When quorum is enabled, on-no-data-accessible should be
1271 set to the same value as on-no-quorum. The defined policies are:
1272
1273 io-error
1274 System calls fail with errno set to EIO.
1275
1276 suspend-io
1277 The resource suspends I/O. I/O can be resumed by (re)attaching
1278 the lower-level device, by connecting to a peer which has
1279 access to the data, or by forcing DRBD to resume I/O with
1280 drbdadm resume-io res. When no data is available, forcing I/O
1281 to resume will result in the same behavior as the io-error
1282 policy.
1283
1284 This setting is available since DRBD 8.3.9; the default policy is
1285 io-error.
1286
1287 peer-ack-window value
1288
1289 On each node and for each device, DRBD maintains a bitmap of the
1290 differences between the local and remote data for each peer device.
1291 For example, in a three-node setup (nodes A, B, C) each with a
1292 single device, every node maintains one bitmap for each of its
1293 peers.
1294
1295 When nodes receive write requests, they know how to update the
1296 bitmaps for the writing node, but not how to update the bitmaps
1297 between themselves. In this example, when a write request
1298 propagates from node A to B and C, nodes B and C know that they
1299 have the same data as node A, but not whether or not they both have
1300 the same data.
1301
1302 As a remedy, the writing node occasionally sends peer-ack packets
1303 to its peers which tell them which state they are in relative to
1304 each other.
1305
1306 The peer-ack-window parameter specifies how much data a primary
1307 node may send before sending a peer-ack packet. A low value causes
1308 increased network traffic; a high value causes less network traffic
1309 but higher memory consumption on secondary nodes and higher resync
1310 times between the secondary nodes after primary node failures.
1311 (Note: peer-ack packets may be sent due to other reasons as well,
1312 e.g. membership changes or expiry of the peer-ack-delay timer.)
1313
1314 The default value for peer-ack-window is 2 MiB, the default unit is
1315 sectors. This option is available since 9.0.0.
1316
1317 peer-ack-delay expiry-time
1318
1319 If after the last finished write request no new write request gets
1320 issued for expiry-time, then a peer-ack packet is sent. If a new
1321 write request is issued before the timer expires, the timer gets
1322 reset to expiry-time. (Note: peer-ack packets may be sent due to
1323 other reasons as well, e.g. membership changes or the
1324 peer-ack-window option.)
1325
1326 This parameter may influence resync behavior on remote nodes. Peer
1327 nodes need to wait until they receive an peer-ack for releasing a
1328 lock on an AL-extent. Resync operations between peers may need to
1329 wait for for these locks.
1330
1331 The default value for peer-ack-delay is 100 milliseconds, the
1332 default unit is milliseconds. This option is available since 9.0.0.
1333
1334 quorum value
1335
1336 When activated, a cluster partition requires quorum in order to
1337 modify the replicated data set. That means a node in the cluster
1338 partition can only be promoted to primary if the cluster partition
1339 has quorum. Every node with a disk directly connected to the node
1340 that should be promoted counts. If a primary node should execute a
1341 write request, but the cluster partition has lost quorum, it will
1342 freeze IO or reject the write request with an error (depending on
1343 the on-no-quorum setting). Upon loosing quorum a primary always
1344 invokes the quorum-lost handler. The handler is intended for
1345 notification purposes, its return code is ignored.
1346
1347 The option's value might be set to off, majority, all or a numeric
1348 value. If you set it to a numeric value, make sure that the value
1349 is greater than half of your number of nodes. Quorum is a mechanism
1350 to avoid data divergence, it might be used instead of fencing when
1351 there are more than two repicas. It defaults to off
1352
1353 If all missing nodes are marked as outdated, a partition always has
1354 quorum, no matter how small it is. I.e. If you disconnect all
1355 secondary nodes gracefully a single primary continues to operate.
1356 In the moment a single secondary is lost, it has to be assumed that
1357 it forms a partition with all the missing outdated nodes. In case
1358 my partition might be smaller than the other, quorum is lost in
1359 this moment.
1360
1361 In case you want to allow permanently diskless nodes to gain quorum
1362 it is recommendet to not use majority or all. It is recommended to
1363 specify an absolute number, since DBRD's heuristic to determine the
1364 complete number of diskfull nodes in the cluster is unreliable.
1365
1366 The quorum implementation is available starting with the DRBD
1367 kernel driver version 9.0.7.
1368
1369 quorum-minimum-redundancy value
1370
1371 This option sets the minimal required number of nodes with an
1372 UpToDate disk to allow the partition to gain quorum. This is a
1373 different requirement than the plain quorum option expresses.
1374
1375 The option's value might be set to off, majority, all or a numeric
1376 value. If you set it to a numeric value, make sure that the value
1377 is greater than half of your number of nodes.
1378
1379 In case you want to allow permanently diskless nodes to gain quorum
1380 it is recommendet to not use majority or all. It is recommended to
1381 specify an absolute number, since DBRD's heuristic to determine the
1382 complete number of diskfull nodes in the cluster is unreliable.
1383
1384 This option is available starting with the DRBD kernel driver
1385 version 9.0.10.
1386
1387 on-no-quorum {io-error | suspend-io}
1388
1389 By default DRBD freezes IO on a device, that lost quorum. By
1390 setting the on-no-quorum to io-error it completes all IO operations
1391 with an error if quorum ist lost.
1392
1393 Usually, the on-no-data-accessible should be set to the same value
1394 as on-no-quorum, as it has precedence.
1395
1396 The on-no-quorum options is available starting with the DRBD kernel
1397 driver version 9.0.8.
1398
1399 Section startup Parameters
1400 The parameters in this section define the behavior of DRBD at system
1401 startup time, in the DRBD init script. They have no effect once the
1402 system is up and running.
1403
1404 degr-wfc-timeout timeout
1405
1406 Define how long to wait until all peers are connected in case the
1407 cluster consisted of a single node only when the system went down.
1408 This parameter is usually set to a value smaller than wfc-timeout.
1409 The assumption here is that peers which were unreachable before a
1410 reboot are less likely to be reachable after the reboot, so waiting
1411 is less likely to help.
1412
1413 The timeout is specified in seconds. The default value is 0, which
1414 stands for an infinite timeout. Also see the wfc-timeout parameter.
1415
1416 outdated-wfc-timeout timeout
1417
1418 Define how long to wait until all peers are connected if all peers
1419 were outdated when the system went down. This parameter is usually
1420 set to a value smaller than wfc-timeout. The assumption here is
1421 that an outdated peer cannot have become primary in the meantime,
1422 so we don't need to wait for it as long as for a node which was
1423 alive before.
1424
1425 The timeout is specified in seconds. The default value is 0, which
1426 stands for an infinite timeout. Also see the wfc-timeout parameter.
1427
1428 stacked-timeouts
1429 On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1430 in the configuration are usually ignored, and both timeouts are set
1431 to twice the connect-int timeout. The stacked-timeouts parameter
1432 tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1433 as defined in the configuration, even on stacked devices. Only use
1434 this parameter if the peer of the stacked resource is usually not
1435 available, or will not become primary. Incorrect use of this
1436 parameter can lead to unexpected split-brain scenarios.
1437
1438 wait-after-sb
1439 This parameter causes DRBD to continue waiting in the init script
1440 even when a split-brain situation has been detected, and the nodes
1441 therefore refuse to connect to each other.
1442
1443 wfc-timeout timeout
1444
1445 Define how long the init script waits until all peers are
1446 connected. This can be useful in combination with a cluster manager
1447 which cannot manage DRBD resources: when the cluster manager
1448 starts, the DRBD resources will already be up and running. With a
1449 more capable cluster manager such as Pacemaker, it makes more sense
1450 to let the cluster manager control DRBD resources. The timeout is
1451 specified in seconds. The default value is 0, which stands for an
1452 infinite timeout. Also see the degr-wfc-timeout parameter.
1453
1454 Section volume Parameters
1455 device /dev/drbdminor-number
1456
1457 Define the device name and minor number of a replicated block
1458 device. This is the device that applications are supposed to
1459 access; in most cases, the device is not used directly, but as a
1460 file system. This parameter is required and the standard device
1461 naming convention is assumed.
1462
1463 In addition to this device, udev will create
1464 /dev/drbd/by-res/resource/volume and
1465 /dev/drbd/by-disk/lower-level-device symlinks to the device.
1466
1467 disk {[disk] | none}
1468
1469 Define the lower-level block device that DRBD will use for storing
1470 the actual data. While the replicated drbd device is configured,
1471 the lower-level device must not be used directly. Even read-only
1472 access with tools like dumpe2fs(8) and similar is not allowed. The
1473 keyword none specifies that no lower-level block device is
1474 configured; this also overrides inheritance of the lower-level
1475 device.
1476
1477 meta-disk internal,
1478 meta-disk device,
1479 meta-disk device [index]
1480
1481 Define where the metadata of a replicated block device resides: it
1482 can be internal, meaning that the lower-level device contains both
1483 the data and the metadata, or on a separate device.
1484
1485 When the index form of this parameter is used, multiple replicated
1486 devices can share the same metadata device, each using a separate
1487 index. Each index occupies 128 MiB of data, which corresponds to a
1488 replicated device size of at most 4 TiB with two cluster nodes. We
1489 recommend not to share metadata devices anymore, and to instead use
1490 the lvm volume manager for creating metadata devices as needed.
1491
1492 When the index form of this parameter is not used, the size of the
1493 lower-level device determines the size of the metadata. The size
1494 needed is 36 KiB + (size of lower-level device) / 32K * (number of
1495 nodes - 1). If the metadata device is bigger than that, the extra
1496 space is not used.
1497
1498 This parameter is required if a disk other than none is specified,
1499 and ignored if disk is set to none. A meta-disk parameter without a
1500 disk parameter is not allowed.
1501
1503 DRBD supports two different mechanisms for data integrity checking:
1504 first, the data-integrity-alg network parameter allows to add a
1505 checksum to the data sent over the network. Second, the online
1506 verification mechanism (drbdadm verify and the verify-alg parameter)
1507 allows to check for differences in the on-disk data.
1508
1509 Both mechanisms can produce false positives if the data is modified
1510 during I/O (i.e., while it is being sent over the network or written to
1511 disk). This does not always indicate a problem: for example, some file
1512 systems and applications do modify data under I/O for certain
1513 operations. Swap space can also undergo changes while under I/O.
1514
1515 Network data integrity checking tries to identify data modification
1516 during I/O by verifying the checksums on the sender side after sending
1517 the data. If it detects a mismatch, it logs an error. The receiver also
1518 logs an error when it detects a mismatch. Thus, an error logged only on
1519 the receiver side indicates an error on the network, and an error
1520 logged on both sides indicates data modification under I/O.
1521
1522 The most recent example of systematic data corruption was identified as
1523 a bug in the TCP offloading engine and driver of a certain type of GBit
1524 NIC in 2007: the data corruption happened on the DMA transfer from core
1525 memory to the card. Because the TCP checksum were calculated on the
1526 card, the TCP/IP protocol checksums did not reveal this problem.
1527
1529 This document was revised for version 9.0.0 of the DRBD distribution.
1530
1532 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1533 Ellenberg <lars.ellenberg@linbit.com>.
1534
1536 Report bugs to <drbd-user@lists.linbit.com>.
1537
1539 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1540 Lars Ellenberg. This is free software; see the source for copying
1541 conditions. There is NO warranty; not even for MERCHANTABILITY or
1542 FITNESS FOR A PARTICULAR PURPOSE.
1543
1545 drbd(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1546 Site[3]
1547
1549 1. DRBD User's Guide
1550 http://www.drbd.org/users-guide/
1551
1552 2.
1553
1554 Online Usage Counter
1555 http://usage.drbd.org
1556
1557 3. DRBD Web Site
1558 http://www.drbd.org/
1559
1560
1561
1562DRBD 9.0.x 17 January 2018 DRBD.CONF(5)