1DRBD.CONF(5) Configuration Files DRBD.CONF(5)
2
3
4
6 drbd.conf - DRBD Configuration Files
7
9 DRBD implements block devices which replicate their data to all nodes
10 of a cluster. The actual data and associated metadata are usually
11 stored redundantly on "ordinary" block devices on each cluster node.
12
13 Replicated block devices are called /dev/drbdminor by default. They are
14 grouped into resources, with one or more devices per resource.
15 Replication among the devices in a resource takes place in
16 chronological order. With DRBD, we refer to the devices inside a
17 resource as volumes.
18
19 In DRBD 9, a resource can be replicated between two or more cluster
20 nodes. The connections between cluster nodes are point-to-point links,
21 and use TCP or a TCP-like protocol. All nodes must be directly
22 connected.
23
24 DRBD consists of low-level user-space components which interact with
25 the kernel and perform basic operations (drbdsetup, drbdmeta), a
26 high-level user-space component which understands and processes the
27 DRBD configuration and translates it into basic operations of the
28 low-level components (drbdadm), and a kernel component.
29
30 The default DRBD configuration consists of /etc/drbd.conf and of
31 additional files included from there, usually global_common.conf and
32 all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33 define each resource in a separate *.res file.
34
35 The configuration files are designed so that each cluster node can
36 contain an identical copy of the entire cluster configuration. The host
37 name of each node determines which parts of the configuration apply
38 (uname -n). It is highly recommended to keep the cluster configuration
39 on all nodes in sync by manually copying it to all nodes, or by
40 automating the process with csync2 or a similar tool.
41
43 global {
44 usage-count yes;
45 udev-always-use-vnr;
46 }
47 resource r0 {
48 net {
49 cram-hmac-alg sha1;
50 shared-secret "FooFunFactory";
51 }
52 volume 0 {
53 device "/dev/drbd1";
54 disk "/dev/sda7";
55 meta-disk internal;
56 }
57 on "alice" {
58 node-id 0;
59 address 10.1.1.31:7000;
60 }
61 on "bob" {
62 node-id 1;
63 address 10.1.1.32:7000;
64 }
65 connection {
66 host "alice" port 7000;
67 host "bob" port 7000;
68 net {
69 protocol C;
70 }
71 }
72 }
73
74 This example defines a resource r0 which contains a single replicated
75 device with volume number 0. The resource is replicated among hosts
76 alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77 and the node identifiers 0 and 1, respectively. On both hosts, the
78 replicated device is called /dev/drbd1, and the actual data and
79 metadata are stored on the lower-level device /dev/sda7. The connection
80 between the hosts uses protocol C.
81
82 Enclose strings within double-quotation marks (") to differentiate them
83 from resource keywords. Please refer to the DRBD User's Guide[1] for
84 more examples.
85
87 DRBD configuration files consist of sections, which contain other
88 sections and parameters depending on the section types. Each section
89 consists of one or more keywords, sometimes a section name, an opening
90 brace (“{”), the section's contents, and a closing brace (“}”).
91 Parameters inside a section consist of a keyword, followed by one or
92 more keywords or values, and a semicolon (“;”).
93
94 Some parameter values have a default scale which applies when a plain
95 number is specified (for example Kilo, or 1024 times the numeric
96 value). Such default scales can be overridden by using a suffix (for
97 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
98 and G = 1024 M are supported.
99
100 Comments start with a hash sign (“#”) and extend to the end of the
101 line. In addition, any section can be prefixed with the keyword skip,
102 which causes the section and any sub-sections to be ignored.
103
104 Additional files can be included with the include file-pattern
105 statement (see glob(7) for the expressions supported in file-pattern).
106 Include statements are only allowed outside of sections.
107
108 The following sections are defined (indentation indicates in which
109 context):
110
111 common
112 [disk]
113 [handlers]
114 [net]
115 [options]
116 [startup]
117 global
118 [require-drbd-module-version-{eq,ne,gt,ge,lt,le}]
119 resource
120 connection
121 multiple path | 2 host
122 [net]
123 [volume]
124 [peer-device-options]
125 [peer-device-options]
126 connection-mesh
127 [net]
128 [disk]
129 floating
130 handlers
131 [net]
132 on
133 volume
134 disk
135 [disk]
136 options
137 stacked-on-top-of
138 startup
139
140 Sections in brackets affect other parts of the configuration: inside
141 the common section, they apply to all resources. A disk section inside
142 a resource or on section applies to all volumes of that resource, and a
143 net section inside a resource section applies to all connections of
144 that resource. This allows to avoid repeating identical options for
145 each resource, connection, or volume. Options can be overridden in a
146 more specific resource, connection, on, or volume section.
147
148 peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
149 c-fill-target, c-max-rate and c-min-rate. Due to backward
150 comapatibility they can be specified in any disk options section as
151 well. They are inherited into all relevant connections. If they are
152 given on connection level they are inherited to all volumes on that
153 connection. A peer-device-options section is started with the disk
154 keyword.
155
156 Sections
157 common
158
159 This section can contain each a disk, handlers, net, options, and
160 startup section. All resources inherit the parameters in these
161 sections as their default values.
162
163 connection
164
165 Define a connection between two hosts. This section must contain
166 two host parameters or multiple path sections.
167
168 path
169
170 Define a path between two hosts. This section must contain two host
171 parameters.
172
173 connection-mesh
174
175 Define a connection mesh between multiple hosts. This section must
176 contain a hosts parameter, which has the host names as arguments.
177 This section is a shortcut to define many connections which share
178 the same network options.
179
180 disk
181
182 Define parameters for a volume. All parameters in this section are
183 optional.
184
185 floating [address-family] addr:port
186
187 Like the on section, except that instead of the host name a network
188 address is used to determine if it matches a floating section.
189
190 The node-id parameter in this section is required. If the address
191 parameter is not provided, no connections to peers will be created
192 by default. The device, disk, and meta-disk parameters must be
193 defined in, or inherited by, this section.
194
195 global
196
197 Define some global parameters. All parameters in this section are
198 optional. Only one global section is allowed in the configuration.
199
200 require-drbd-module-version-{eq,ne,gt,ge,lt,le}
201
202 This statement contains one of the valid forms and a three digit
203 version number (e.g., require-drbd-module-version-eq 9.0.16;). If
204 the currently loaded DRBD kernel module does not match the
205 specification, parsing is aborted. Comparison operator names have
206 same semantic as in test(1).
207
208 handlers
209
210 Define handlers to be invoked when certain events occur. The kernel
211 passes the resource name in the first command-line argument and
212 sets the following environment variables depending on the event's
213 context:
214
215 • For events related to a particular device: the device's minor
216 number in DRBD_MINOR, the device's volume number in
217 DRBD_VOLUME.
218
219 • For events related to a particular device on a particular peer:
220 the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
221 DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
222 number in DRBD_MINOR, and the device's volume number in
223 DRBD_VOLUME.
224
225 • For events related to a particular connection: the connection
226 endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
227 and DRBD_PEER_AF; and, for each device defined for that
228 connection: the device's minor number in
229 DRBD_MINOR_volume-number.
230
231 • For events that identify a device, if a lower-level device is
232 attached, the lower-level device's device name is passed in
233 DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
234
235 All parameters in this section are optional. Only a single handler
236 can be defined for each event; if no handler is defined, nothing
237 will happen.
238
239 net
240
241 Define parameters for a connection. All parameters in this section
242 are optional.
243
244 on host-name [...]
245
246 Define the properties of a resource on a particular host or set of
247 hosts. Specifying more than one host name can make sense in a setup
248 with IP address failover, for example. The host-name argument must
249 match the Linux host name (uname -n).
250
251 Usually contains or inherits at least one volume section. The
252 node-id and address parameters must be defined in this section. The
253 device, disk, and meta-disk parameters must be defined in, or
254 inherited by, this section.
255
256 A normal configuration file contains two or more on sections for
257 each resource. Also see the floating section.
258
259 options
260
261 Define parameters for a resource. All parameters in this section
262 are optional.
263
264 resource name
265
266 Define a resource. Usually contains at least two on sections and at
267 least one connection section.
268
269 stacked-on-top-of resource
270
271 Used instead of an on section for configuring a stacked resource
272 with three to four nodes.
273
274 Starting with DRBD 9, stacking is deprecated. It is advised to use
275 resources which are replicated among more than two nodes instead.
276
277 startup
278
279 The parameters in this section determine the behavior of a resource
280 at startup time.
281
282 volume volume-number
283
284 Define a volume within a resource. The volume numbers in the
285 various volume sections of a resource define which devices on which
286 hosts form a replicated device.
287
288 Section connection Parameters
289 host name [address [address-family] address] [port port-number]
290
291 Defines an endpoint for a connection. Each host statement refers to
292 an on section in a resource. If a port number is defined, this
293 endpoint will use the specified port instead of the port defined in
294 the on section. Each connection section must contain exactly two
295 host parameters. Instead of two host parameters the connection may
296 contain multiple path sections.
297
298 Section path Parameters
299 host name [address [address-family] address] [port port-number]
300
301 Defines an endpoint for a connection. Each host statement refers to
302 an on section in a resource. If a port number is defined, this
303 endpoint will use the specified port instead of the port defined in
304 the on section. Each path section must contain exactly two host
305 parameters.
306
307 Section connection-mesh Parameters
308 hosts name...
309
310 Defines all nodes of a mesh. Each name refers to an on section in a
311 resource. The port that is defined in the on section will be used.
312
313 Section disk Parameters
314 al-extents extents
315
316 DRBD automatically maintains a "hot" or "active" disk area likely
317 to be written to again soon based on the recent write activity. The
318 "active" disk area can be written to immediately, while "inactive"
319 disk areas must be "activated" first, which requires a meta-data
320 write. We also refer to this active disk area as the "activity
321 log".
322
323 The activity log saves meta-data writes, but the whole log must be
324 resynced upon recovery of a failed node. The size of the activity
325 log is a major factor of how long a resync will take and how fast a
326 replicated disk will become consistent after a crash.
327
328 The activity log consists of a number of 4-Megabyte segments; the
329 al-extents parameter determines how many of those segments can be
330 active at the same time. The default value for al-extents is 1237,
331 with a minimum of 7 and a maximum of 65536.
332
333 Note that the effective maximum may be smaller, depending on how
334 you created the device meta data, see also drbdmeta(8) The
335 effective maximum is 919 * (available on-disk activity-log
336 ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
337 maximum of 6433 (covers more than 25 GiB of data) We recommend to
338 keep this well within the amount your backend storage and
339 replication link are able to resync inside of about 5 minutes.
340
341 al-updates {yes | no}
342
343 With this parameter, the activity log can be turned off entirely
344 (see the al-extents parameter). This will speed up writes because
345 fewer meta-data writes will be necessary, but the entire device
346 needs to be resynchronized opon recovery of a failed primary node.
347 The default value for al-updates is yes.
348
349 disk-barrier,
350 disk-flushes,
351 disk-drain
352 DRBD has three methods of handling the ordering of dependent write
353 requests:
354
355 disk-barrier
356 Use disk barriers to make sure that requests are written to
357 disk in the right order. Barriers ensure that all requests
358 submitted before a barrier make it to the disk before any
359 requests submitted after the barrier. This is implemented using
360 'tagged command queuing' on SCSI devices and 'native command
361 queuing' on SATA devices. Only some devices and device stacks
362 support this method. The device mapper (LVM) only supports
363 barriers in some configurations.
364
365 Note that on systems which do not support disk barriers,
366 enabling this option can lead to data loss or corruption. Until
367 DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
368 DRBD did support barriers. Kernels since linux-2.6.36 (or
369 2.6.32 RHEL6) no longer allow to detect if barriers are
370 supported. Since drbd-8.4.2, this option is off by default and
371 needs to be enabled explicitly.
372
373 disk-flushes
374 Use disk flushes between dependent write requests, also
375 referred to as 'force unit access' by drive vendors. This
376 forces all data to disk. This option is enabled by default.
377
378 disk-drain
379 Wait for the request queue to "drain" (that is, wait for the
380 requests to finish) before submitting a dependent write
381 request. This method requires that requests are stable on disk
382 when they finish. Before DRBD 8.0.9, this was the only method
383 implemented. This option is enabled by default. Do not disable
384 in production environments.
385
386 From these three methods, drbd will use the first that is enabled
387 and supported by the backing storage device. If all three of these
388 options are turned off, DRBD will submit write requests without
389 bothering about dependencies. Depending on the I/O stack, write
390 requests can be reordered, and they can be submitted in a different
391 order on different cluster nodes. This can result in data loss or
392 corruption. Therefore, turning off all three methods of controlling
393 write ordering is strongly discouraged.
394
395 A general guideline for configuring write ordering is to use disk
396 barriers or disk flushes when using ordinary disks (or an ordinary
397 disk array) with a volatile write cache. On storage without cache
398 or with a battery backed write cache, disk draining can be a
399 reasonable choice.
400
401 disk-timeout
402 If the lower-level device on which a DRBD device stores its data
403 does not finish an I/O request within the defined disk-timeout,
404 DRBD treats this as a failure. The lower-level device is detached,
405 and the device's disk state advances to Diskless. If DRBD is
406 connected to one or more peers, the failed request is passed on to
407 one of them.
408
409 This option is dangerous and may lead to kernel panic!
410
411 "Aborting" requests, or force-detaching the disk, is intended for
412 completely blocked/hung local backing devices which do no longer
413 complete requests at all, not even do error completions. In this
414 situation, usually a hard-reset and failover is the only way out.
415
416 By "aborting", basically faking a local error-completion, we allow
417 for a more graceful swichover by cleanly migrating services. Still
418 the affected node has to be rebooted "soon".
419
420 By completing these requests, we allow the upper layers to re-use
421 the associated data pages.
422
423 If later the local backing device "recovers", and now DMAs some
424 data from disk into the original request pages, in the best case it
425 will just put random data into unused pages; but typically it will
426 corrupt meanwhile completely unrelated data, causing all sorts of
427 damage.
428
429 Which means delayed successful completion, especially for READ
430 requests, is a reason to panic(). We assume that a delayed *error*
431 completion is OK, though we still will complain noisily about it.
432
433 The default value of disk-timeout is 0, which stands for an
434 infinite timeout. Timeouts are specified in units of 0.1 seconds.
435 This option is available since DRBD 8.3.12.
436
437 md-flushes
438 Enable disk flushes and disk barriers on the meta-data device. This
439 option is enabled by default. See the disk-flushes parameter.
440
441 on-io-error handler
442
443 Configure how DRBD reacts to I/O errors on a lower-level device.
444 The following policies are defined:
445
446 pass_on
447 Change the disk status to Inconsistent, mark the failed block
448 as inconsistent in the bitmap, and retry the I/O operation on a
449 remote cluster node.
450
451 call-local-io-error
452 Call the local-io-error handler (see the handlers section).
453
454 detach
455 Detach the lower-level device and continue in diskless mode.
456
457
458 read-balancing policy
459 Distribute read requests among cluster nodes as defined by policy.
460 The supported policies are prefer-local (the default),
461 prefer-remote, round-robin, least-pending, when-congested-remote,
462 32K-striping, 64K-striping, 128K-striping, 256K-striping,
463 512K-striping and 1M-striping.
464
465 This option is available since DRBD 8.4.1.
466
467 Note: the when-congested-remote option has no effect on Linux
468 kernel 5.18 or above. It is deprecated starting from DRBD 9.1.12.
469
470 resync-after res-name/volume
471
472 Define that a device should only resynchronize after the specified
473 other device. By default, no order between devices is defined, and
474 all devices will resynchronize in parallel. Depending on the
475 configuration of the lower-level devices, and the available network
476 and disk bandwidth, this can slow down the overall resync process.
477 This option can be used to form a chain or tree of dependencies
478 among devices.
479
480 rs-discard-granularity byte
481 When rs-discard-granularity is set to a non zero, positive value
482 then DRBD tries to do a resync operation in requests of this size.
483 In case such a block contains only zero bytes on the sync source
484 node, the sync target node will issue a discard/trim/unmap command
485 for the area.
486
487 The value is constrained by the discard granularity of the backing
488 block device. In case rs-discard-granularity is not a multiplier of
489 the discard granularity of the backing block device DRBD rounds it
490 up. The feature only gets active if the backing block device reads
491 back zeroes after a discard command.
492
493 The usage of rs-discard-granularity may cause c-max-rate to be
494 exceeded. In particular, the resync rate may reach 10x the value of
495 rs-discard-granularity per second.
496
497 The default value of rs-discard-granularity is 0. This option is
498 available since 8.4.7.
499
500 discard-zeroes-if-aligned {yes | no}
501
502 There are several aspects to discard/trim/unmap support on linux
503 block devices. Even if discard is supported in general, it may fail
504 silently, or may partially ignore discard requests. Devices also
505 announce whether reading from unmapped blocks returns defined data
506 (usually zeroes), or undefined data (possibly old data, possibly
507 garbage).
508
509 If on different nodes, DRBD is backed by devices with differing
510 discard characteristics, discards may lead to data divergence (old
511 data or garbage left over on one backend, zeroes due to unmapped
512 areas on the other backend). Online verify would now potentially
513 report tons of spurious differences. While probably harmless for
514 most use cases (fstrim on a file system), DRBD cannot have that.
515
516 To play safe, we have to disable discard support, if our local
517 backend (on a Primary) does not support "discard_zeroes_data=true".
518 We also have to translate discards to explicit zero-out on the
519 receiving side, unless the receiving side (Secondary) supports
520 "discard_zeroes_data=true", thereby allocating areas what were
521 supposed to be unmapped.
522
523 There are some devices (notably the LVM/DM thin provisioning) that
524 are capable of discard, but announce discard_zeroes_data=false. In
525 the case of DM-thin, discards aligned to the chunk size will be
526 unmapped, and reading from unmapped sectors will return zeroes.
527 However, unaligned partial head or tail areas of discard requests
528 will be silently ignored.
529
530 If we now add a helper to explicitly zero-out these unaligned
531 partial areas, while passing on the discard of the aligned full
532 chunks, we effectively achieve discard_zeroes_data=true on such
533 devices.
534
535 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
536 discards, and to announce discard_zeroes_data=true, even on
537 backends that announce discard_zeroes_data=false.
538
539 Setting discard-zeroes-if-aligned to no will cause DRBD to always
540 fall-back to zero-out on the receiving side, and to not even
541 announce discard capabilities on the Primary, if the respective
542 backend announces discard_zeroes_data=false.
543
544 We used to ignore the discard_zeroes_data setting completely. To
545 not break established and expected behaviour, and suddenly cause
546 fstrim on thin-provisioned LVs to run out-of-space instead of
547 freeing up space, the default value is yes.
548
549 This option is available since 8.4.7.
550
551 disable-write-same {yes | no}
552
553 Some disks announce WRITE_SAME support to the kernel but fail with
554 an I/O error upon actually receiving such a request. This mostly
555 happens when using virtualized disks -- notably, this behavior has
556 been observed with VMware's virtual disks.
557
558 When disable-write-same is set to yes, WRITE_SAME detection is
559 manually overriden and support is disabled.
560
561 The default value of disable-write-same is no. This option is
562 available since 8.4.7.
563
564 block-size size
565
566 Block storage devices have a particular sector size or block size.
567 This block size has many different names. Examples are
568 'hw_sector_size', 'PHY-SEC', 'physical block (sector) size', and
569 'logical block (sector) size'.
570
571 DRBD needs to combine these block sizes of the backing disks. In
572 clusters with storage devices with different block sizes, it is
573 necessary to configure the maximal block sizes on the DRBD level.
574 Here is an example highlighting the need.
575
576 Let's say node A is diskless. It connects to node B, which has a
577 physical block size of 512 bytes. Then the user mounts the
578 filesystem on node A; the filesystem recognizes that it can do I/O
579 in units of 512 bytes. Later, node C joins the cluster with a
580 physical block size of 4096 bytes. Now, suddenly DRBD starts to
581 deliver I/O errors to the filesystem if it chooses to do I/O on,
582 e.g., 512 or 1024 bytes.
583
584 The default value of block-size 512 bytes. This option is available
585 since drbd-utils 9.24 and the drbd kernel driver 9.1.14 and 9.2.3.
586
587 Section peer-device-options Parameters
588 Please note that you open the section with the disk keyword.
589
590 c-delay-target delay_target,
591 c-fill-target fill_target,
592 c-max-rate max_rate,
593 c-plan-ahead plan_time
594 Dynamically control the resync speed. The following modes are
595 available:
596
597 • Dynamic control with fill target (default). Enabled when
598 c-plan-ahead is non-zero and c-fill-target is non-zero. The
599 goal is to fill the buffers along the data path with a defined
600 amount of data. This mode is recommended when DRBD-proxy is
601 used. Configured with c-plan-ahead, c-fill-target and
602 c-max-rate.
603
604 • Dynamic control with delay target. Enabled when c-plan-ahead is
605 non-zero (default) and c-fill-target is zero. The goal is to
606 have a defined delay along the path. Configured with
607 c-plan-ahead, c-delay-target and c-max-rate.
608
609 • Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD will
610 try to perform resync I/O at a fixed rate. Configured with
611 resync-rate.
612
613 The c-plan-ahead parameter defines how fast DRBD adapts to changes
614 in the resync speed. It should be set to five times the network
615 round-trip time or more. The default value of c-plan-ahead is 20,
616 in units of 0.1 seconds.
617
618 The c-fill-target parameter defines the how much resync data DRBD
619 should aim to have in-flight at all times. Common values for
620 "normal" data paths range from 4K to 100K. The default value of
621 c-fill-target is 100, in units of sectors
622
623 The c-delay-target parameter defines the delay in the resync path
624 that DRBD should aim for. This should be set to five times the
625 network round-trip time or more. The default value of
626 c-delay-target is 10, in units of 0.1 seconds.
627
628 The c-max-rate parameter limits the maximum bandwidth used by
629 dynamically controlled resyncs. Setting this to zero removes the
630 limitation (since DRBD 9.0.28). It should be set to either the
631 bandwidth available between the DRBD hosts and the machines hosting
632 DRBD-proxy, or to the available disk bandwidth. The default value
633 of c-max-rate is 102400, in units of KiB/s.
634
635 Dynamic resync speed control is available since DRBD 8.3.9.
636
637 c-min-rate min_rate
638 A node which is primary and sync-source has to schedule application
639 I/O requests and resync I/O requests. The c-min-rate parameter
640 limits how much bandwidth is available for resync I/O; the
641 remaining bandwidth is used for application I/O.
642
643 A c-min-rate value of 0 means that there is no limit on the resync
644 I/O bandwidth. This can slow down application I/O significantly.
645 Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
646
647 The default value of c-min-rate is 250, in units of KiB/s.
648
649 resync-rate rate
650
651 Define how much bandwidth DRBD may use for resynchronizing. DRBD
652 allows "normal" application I/O even during a resync. If the resync
653 takes up too much bandwidth, application I/O can become very slow.
654 This parameter allows to avoid that. Please note this is option
655 only works when the dynamic resync controller is disabled.
656
657 Section global Parameters
658 dialog-refresh time
659
660 The DRBD init script can be used to configure and start DRBD
661 devices, which can involve waiting for other cluster nodes. While
662 waiting, the init script shows the remaining waiting time. The
663 dialog-refresh defines the number of seconds between updates of
664 that countdown. The default value is 1; a value of 0 turns off the
665 countdown.
666
667 disable-ip-verification
668 Normally, DRBD verifies that the IP addresses in the configuration
669 match the host names. Use the disable-ip-verification parameter to
670 disable these checks.
671
672 usage-count {yes | no | ask}
673 A explained on DRBD's Online Usage Counter[2] web page, DRBD
674 includes a mechanism for anonymously counting how many
675 installations are using which versions of DRBD. The results are
676 available on the web page for anyone to see.
677
678 This parameter defines if a cluster node participates in the usage
679 counter; the supported values are yes, no, and ask (ask the user,
680 the default).
681
682 We would like to ask users to participate in the online usage
683 counter as this provides us valuable feedback for steering the
684 development of DRBD.
685
686 udev-always-use-vnr
687 When udev asks drbdadm for a list of device related symlinks,
688 drbdadm would suggest symlinks with differing naming conventions,
689 depending on whether the resource has explicit volume VNR { }
690 definitions, or only one single volume with the implicit volume
691 number 0:
692
693 # implicit single volume without "volume 0 {}" block
694 DEVICE=drbd<minor>
695 SYMLINK_BY_RES=drbd/by-res/<resource-name>
696 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
697
698 # explicit volume definition: volume VNR { }
699 DEVICE=drbd<minor>
700 SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
701 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
702
703 If you define this parameter in the global section, drbdadm will
704 always add the .../VNR part, and will not care for whether the
705 volume definition was implicit or explicit.
706
707 For legacy backward compatibility, this is off by default, but we
708 do recommend to enable it.
709
710 Section handlers Parameters
711 after-resync-target cmd
712
713 Called on a resync target when a node state changes from
714 Inconsistent to Consistent when a resync finishes. This handler can
715 be used for removing the snapshot created in the
716 before-resync-target handler.
717
718 before-resync-target cmd
719
720 Called on a resync target before a resync begins. This handler can
721 be used for creating a snapshot of the lower-level device for the
722 duration of the resync: if the resync source becomes unavailable
723 during a resync, reverting to the snapshot can restore a consistent
724 state.
725
726 before-resync-source cmd
727
728 Called on a resync source before a resync begins.
729
730 out-of-sync cmd
731
732 Called on all nodes after a verify finishes and out-of-sync blocks
733 were found. This handler is mainly used for monitoring purposes. An
734 example would be to call a script that sends an alert SMS.
735
736 quorum-lost cmd
737
738 Called on a Primary that lost quorum. This handler is usually used
739 to reboot the node if it is not possible to restart the application
740 that uses the storage on top of DRBD.
741
742 fence-peer cmd
743
744 Called when a node should fence a resource on a particular peer.
745 The handler should not use the same communication path that DRBD
746 uses for talking to the peer.
747
748 unfence-peer cmd
749
750 Called when a node should remove fencing constraints from other
751 nodes.
752
753 initial-split-brain cmd
754
755 Called when DRBD connects to a peer and detects that the peer is in
756 a split-brain state with the local node. This handler is also
757 called for split-brain scenarios which will be resolved
758 automatically.
759
760 local-io-error cmd
761
762 Called when an I/O error occurs on a lower-level device.
763
764 pri-lost cmd
765
766 The local node is currently primary, but DRBD believes that it
767 should become a sync target. The node should give up its primary
768 role.
769
770 pri-lost-after-sb cmd
771
772 The local node is currently primary, but it has lost the
773 after-split-brain auto recovery procedure. The node should be
774 abandoned.
775
776 pri-on-incon-degr cmd
777
778 The local node is primary, and neither the local lower-level device
779 nor a lower-level device on a peer is up to date. (The primary has
780 no device to read from or to write to.)
781
782 split-brain cmd
783
784 DRBD has detected a split-brain situation which could not be
785 resolved automatically. Manual recovery is necessary. This handler
786 can be used to call for administrator attention.
787
788 disconnected cmd
789
790 A connection to a peer went down. The handler can learn about the
791 reason for the disconnect from the DRBD_CSTATE environment
792 variable.
793
794 Section net Parameters
795 after-sb-0pri policy
796 Define how to react if a split-brain scenario is detected and none
797 of the two nodes is in primary role. (We detect split-brain
798 scenarios when two nodes connect; split-brain decisions are always
799 between two nodes.) The defined policies are:
800
801 disconnect
802 No automatic resynchronization; simply disconnect.
803
804 discard-younger-primary,
805 discard-older-primary
806 Resynchronize from the node which became primary first
807 (discard-younger-primary) or last (discard-older-primary). If
808 both nodes became primary independently, the
809 discard-least-changes policy is used.
810
811 discard-zero-changes
812 If only one of the nodes wrote data since the split brain
813 situation was detected, resynchronize from this node to the
814 other. If both nodes wrote data, disconnect.
815
816 discard-least-changes
817 Resynchronize from the node with more modified blocks.
818
819 discard-node-nodename
820 Always resynchronize to the named node.
821
822 after-sb-1pri policy
823 Define how to react if a split-brain scenario is detected, with one
824 node in primary role and one node in secondary role. (We detect
825 split-brain scenarios when two nodes connect, so split-brain
826 decisions are always among two nodes.) The defined policies are:
827
828 disconnect
829 No automatic resynchronization, simply disconnect.
830
831 consensus
832 Discard the data on the secondary node if the after-sb-0pri
833 algorithm would also discard the data on the secondary node.
834 Otherwise, disconnect.
835
836 violently-as0p
837 Always take the decision of the after-sb-0pri algorithm, even
838 if it causes an erratic change of the primary's view of the
839 data. This is only useful if a single-node file system (i.e.,
840 not OCFS2 or GFS) with the allow-two-primaries flag is used.
841 This option can cause the primary node to crash, and should not
842 be used.
843
844 discard-secondary
845 Discard the data on the secondary node.
846
847 call-pri-lost-after-sb
848 Always take the decision of the after-sb-0pri algorithm. If the
849 decision is to discard the data on the primary node, call the
850 pri-lost-after-sb handler on the primary node.
851
852 after-sb-2pri policy
853 Define how to react if a split-brain scenario is detected and both
854 nodes are in primary role. (We detect split-brain scenarios when
855 two nodes connect, so split-brain decisions are always among two
856 nodes.) The defined policies are:
857
858 disconnect
859 No automatic resynchronization, simply disconnect.
860
861 violently-as0p
862 See the violently-as0p policy for after-sb-1pri.
863
864 call-pri-lost-after-sb
865 Call the pri-lost-after-sb helper program on one of the
866 machines unless that machine can demote to secondary. The
867 helper program is expected to reboot the machine, which brings
868 the node into a secondary role. Which machine runs the helper
869 program is determined by the after-sb-0pri strategy.
870
871 allow-remote-read bool-value
872 Allows or disallows DRBD to read from a peer node.
873
874 When the disk of a primary node is detached, DRBD will try to
875 continue reading and writing from another node in the cluster. For
876 this purpose, it searches for nodes with up-to-date data, and uses
877 any found node to resume operations. In some cases it may not be
878 desirable to read back data from a peer node, because the node
879 should only be used as a replication target. In this case, the
880 allow-remote-read parameter can be set to no, which would prohibit
881 this node from reading data from the peer node.
882
883 The allow-remote-read parameter is available since DRBD 9.0.19, and
884 defaults to yes.
885
886 allow-two-primaries
887
888 The most common way to configure DRBD devices is to allow only one
889 node to be primary (and thus writable) at a time.
890
891 In some scenarios it is preferable to allow two nodes to be primary
892 at once; a mechanism outside of DRBD then must make sure that
893 writes to the shared, replicated device happen in a coordinated
894 way. This can be done with a shared-storage cluster file system
895 like OCFS2 and GFS, or with virtual machine images and a virtual
896 machine manager that can migrate virtual machines between physical
897 machines.
898
899 The allow-two-primaries parameter tells DRBD to allow two nodes to
900 be primary at the same time. Never enable this option when using a
901 non-distributed file system; otherwise, data corruption and node
902 crashes will result!
903
904 always-asbp
905 Normally the automatic after-split-brain policies are only used if
906 current states of the UUIDs do not indicate the presence of a third
907 node.
908
909 With this option you request that the automatic after-split-brain
910 policies are used as long as the data sets of the nodes are somehow
911 related. This might cause a full sync, if the UUIDs indicate the
912 presence of a third node. (Or double faults led to strange UUID
913 sets.)
914
915 connect-int time
916
917 As soon as a connection between two nodes is configured with
918 drbdsetup connect, DRBD immediately tries to establish the
919 connection. If this fails, DRBD waits for connect-int seconds and
920 then repeats. The default value of connect-int is 10 seconds.
921
922 cram-hmac-alg hash-algorithm
923
924 Configure the hash-based message authentication code (HMAC) or
925 secure hash algorithm to use for peer authentication. The kernel
926 supports a number of different algorithms, some of which may be
927 loadable as kernel modules. See the shash algorithms listed in
928 /proc/crypto. By default, cram-hmac-alg is unset. Peer
929 authentication also requires a shared-secret to be configured.
930
931 csums-alg hash-algorithm
932
933 Normally, when two nodes resynchronize, the sync target requests a
934 piece of out-of-sync data from the sync source, and the sync source
935 sends the data. With many usage patterns, a significant number of
936 those blocks will actually be identical.
937
938 When a csums-alg algorithm is specified, when requesting a piece of
939 out-of-sync data, the sync target also sends along a hash of the
940 data it currently has. The sync source compares this hash with its
941 own version of the data. It sends the sync target the new data if
942 the hashes differ, and tells it that the data are the same
943 otherwise. This reduces the network bandwidth required, at the cost
944 of higher cpu utilization and possibly increased I/O on the sync
945 target.
946
947 The csums-alg can be set to one of the secure hash algorithms
948 supported by the kernel; see the shash algorithms listed in
949 /proc/crypto. By default, csums-alg is unset.
950
951 csums-after-crash-only
952
953 Enabling this option (and csums-alg, above) makes it possible to
954 use the checksum based resync only for the first resync after
955 primary crash, but not for later "network hickups".
956
957 In most cases, block that are marked as need-to-be-resynced are in
958 fact changed, so calculating checksums, and both reading and
959 writing the blocks on the resync target is all effective overhead.
960
961 The advantage of checksum based resync is mostly after primary
962 crash recovery, where the recovery marked larger areas (those
963 covered by the activity log) as need-to-be-resynced, just in case.
964 Introduced in 8.4.5.
965
966 data-integrity-alg alg
967 DRBD normally relies on the data integrity checks built into the
968 TCP/IP protocol, but if a data integrity algorithm is configured,
969 it will additionally use this algorithm to make sure that the data
970 received over the network match what the sender has sent. If a data
971 integrity error is detected, DRBD will close the network connection
972 and reconnect, which will trigger a resync.
973
974 The data-integrity-alg can be set to one of the secure hash
975 algorithms supported by the kernel; see the shash algorithms listed
976 in /proc/crypto. By default, this mechanism is turned off.
977
978 Because of the CPU overhead involved, we recommend not to use this
979 option in production environments. Also see the notes on data
980 integrity below.
981
982 fencing fencing_policy
983
984 Fencing is a preventive measure to avoid situations where both
985 nodes are primary and disconnected. This is also known as a
986 split-brain situation. DRBD supports the following fencing
987 policies:
988
989 dont-care
990 No fencing actions are taken. This is the default policy.
991
992 resource-only
993 If a node becomes a disconnected primary, it tries to fence the
994 peer. This is done by calling the fence-peer handler. The
995 handler is supposed to reach the peer over an alternative
996 communication path and call 'drbdadm outdate minor' there.
997
998 resource-and-stonith
999 If a node becomes a disconnected primary, it freezes all its IO
1000 operations and calls its fence-peer handler. The fence-peer
1001 handler is supposed to reach the peer over an alternative
1002 communication path and call 'drbdadm outdate minor' there. In
1003 case it cannot do that, it should stonith the peer. IO is
1004 resumed as soon as the situation is resolved. In case the
1005 fence-peer handler fails, I/O can be resumed manually with
1006 'drbdadm resume-io'.
1007
1008 ko-count number
1009
1010 If a secondary node fails to complete a write request in ko-count
1011 times the timeout parameter, it is excluded from the cluster. The
1012 primary node then sets the connection to this secondary node to
1013 Standalone. To disable this feature, you should explicitly set it
1014 to 0; defaults may change between versions.
1015
1016 load-balance-paths {yes | no}
1017 By default, the TCP transport establishes only one configured path
1018 at a time. It switches to another path only in case the established
1019 one fails. When you set load-balance-paths to yes the TCP transport
1020 establishes all paths in parallel. It will transmit data packets
1021 over the paths with the least data in its socket send queue.
1022
1023 Please note enabling load-balancing introduces additional chunking
1024 headers into the network protocol. In other words, you must enable
1025 it on both sides of a connection.
1026
1027 As of drbd-9.2.6 the RDMA transport does not obey this setting. It
1028 always uses all paths in parallel. This option became available
1029 with drbd-9.2.6.
1030
1031 max-buffers number
1032
1033 Limits the memory usage per DRBD minor device on the receiving
1034 side, or for internal buffers during resync or online-verify. Unit
1035 is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
1036 setting is hard coded to 32 (=128 KiB). These buffers are used to
1037 hold data blocks while they are written to/read from disk. To avoid
1038 possible distributed deadlocks on congestion, this setting is used
1039 as a throttle threshold rather than a hard limit. Once more than
1040 max-buffers pages are in use, further allocation from this pool is
1041 throttled. You want to increase max-buffers if you cannot saturate
1042 the IO backend on the receiving side.
1043
1044 max-epoch-size number
1045
1046 Define the maximum number of write requests DRBD may issue before
1047 issuing a write barrier. The default value is 2048, with a minimum
1048 of 1 and a maximum of 20000. Setting this parameter to a value
1049 below 10 is likely to decrease performance.
1050
1051 on-congestion policy,
1052 congestion-fill threshold,
1053 congestion-extents threshold
1054 By default, DRBD blocks when the TCP send queue is full. This
1055 prevents applications from generating further write requests until
1056 more buffer space becomes available again.
1057
1058 When DRBD is used together with DRBD-proxy, it can be better to use
1059 the pull-ahead on-congestion policy, which can switch DRBD into
1060 ahead/behind mode before the send queue is full. DRBD then records
1061 the differences between itself and the peer in its bitmap, but it
1062 no longer replicates them to the peer. When enough buffer space
1063 becomes available again, the node resynchronizes with the peer and
1064 switches back to normal replication.
1065
1066 This has the advantage of not blocking application I/O even when
1067 the queues fill up, and the disadvantage that peer nodes can fall
1068 behind much further. Also, while resynchronizing, peer nodes will
1069 become inconsistent.
1070
1071 The available congestion policies are block (the default) and
1072 pull-ahead. The congestion-fill parameter defines how much data is
1073 allowed to be "in flight" in this connection. The default value is
1074 0, which disables this mechanism of congestion control, with a
1075 maximum of 10 GiBytes. The congestion-extents parameter defines how
1076 many bitmap extents may be active before switching into
1077 ahead/behind mode, with the same default and limits as the
1078 al-extents parameter. The congestion-extents parameter is effective
1079 only when set to a value smaller than al-extents.
1080
1081 Ahead/behind mode is available since DRBD 8.3.10.
1082
1083 ping-int interval
1084
1085 When the TCP/IP connection to a peer is idle for more than ping-int
1086 seconds, DRBD will send a keep-alive packet to make sure that a
1087 failed peer or network connection is detected reasonably soon. The
1088 default value is 10 seconds, with a minimum of 1 and a maximum of
1089 120 seconds. The unit is seconds.
1090
1091 ping-timeout timeout
1092
1093 Define the timeout for replies to keep-alive packets. If the peer
1094 does not reply within ping-timeout, DRBD will close and try to
1095 reestablish the connection. The default value is 0.5 seconds, with
1096 a minimum of 0.1 seconds and a maximum of 30 seconds. The unit is
1097 tenths of a second.
1098
1099 socket-check-timeout timeout
1100 In setups involving a DRBD-proxy and connections that experience a
1101 lot of buffer-bloat it might be necessary to set ping-timeout to an
1102 unusual high value. By default DRBD uses the same value to wait if
1103 a newly established TCP-connection is stable. Since the DRBD-proxy
1104 is usually located in the same data center such a long wait time
1105 may hinder DRBD's connect process.
1106
1107 In such setups socket-check-timeout should be set to at least to
1108 the round trip time between DRBD and DRBD-proxy. I.e. in most cases
1109 to 1.
1110
1111 The default unit is tenths of a second, the default value is 0
1112 (which causes DRBD to use the value of ping-timeout instead).
1113 Introduced in 8.4.5.
1114
1115 protocol name
1116 Use the specified protocol on this connection. The supported
1117 protocols are:
1118
1119 A
1120 Writes to the DRBD device complete as soon as they have reached
1121 the local disk and the TCP/IP send buffer.
1122
1123 B
1124 Writes to the DRBD device complete as soon as they have reached
1125 the local disk, and all peers have acknowledged the receipt of
1126 the write requests.
1127
1128 C
1129 Writes to the DRBD device complete as soon as they have reached
1130 the local and all remote disks.
1131
1132
1133 rcvbuf-size size
1134
1135 Configure the size of the TCP/IP receive buffer. A value of 0 (the
1136 default) causes the buffer size to adjust dynamically. This
1137 parameter usually does not need to be set, but it can be set to a
1138 value up to 10 MiB. The default unit is bytes.
1139
1140 rdma-ctrl-rcvbuf-size value
1141
1142 By default, the RDMA transport divides the rcvbuf-size by 64 and
1143 uses the result for the number of buffers on the control stream.
1144 This result might be too low depending on the timing
1145 characteristics of the backing storage devices and the network
1146 link.
1147
1148 The option rdma-ctrl-rcvbuf-size allows you to explicitly set the
1149 number of buffers for the control stream, overruling the divide by
1150 64 heuristics. The default unit of this setting is bytes.
1151
1152 rdma-ctrl-sndbuf-size value
1153
1154 By default, the RDMA transport divides the sndbuf-size by 64 and
1155 uses the result for the number of buffers on the control stream.
1156 This result might be too low depending on the timing
1157 characteristics of the backing storage devices and the network
1158 link.
1159
1160 The option rdma-ctrl-sndbuf-size allows you to explicitly set the
1161 number of buffers for the control stream, overruling the divide by
1162 64 heuristics. The default unit of this setting is bytes.
1163
1164 rr-conflict policy
1165 This option helps to solve the cases when the outcome of the resync
1166 decision is incompatible with the current role assignment in the
1167 cluster. The defined policies are:
1168
1169 disconnect
1170 No automatic resynchronization, simply disconnect.
1171
1172 retry-connect
1173 Disconnect now, and retry to connect immediatly afterwards.
1174
1175 violently
1176 Resync to the primary node is allowed, violating the assumption
1177 that data on a block device are stable for one of the nodes.
1178 Do not use this option, it is dangerous.
1179
1180 call-pri-lost
1181 Call the pri-lost handler on one of the machines. The handler
1182 is expected to reboot the machine, which puts it into secondary
1183 role.
1184
1185 auto-discard
1186 Auto-discard reverses the resync direction, so that DRBD
1187 resyncs the current primary to the current secondary.
1188 Auto-discard only applies when protocol A is in use and the
1189 resync decision is based on the principle that a crashed
1190 primary should be the source of a resync. When a primary node
1191 crashes, it might have written some last updates to its disk,
1192 which were not received by a protocol A secondary. By promoting
1193 the secondary in the meantime the user accepted that those last
1194 updates have been lost. By using auto-discard you consent that
1195 the last updates (before the crash of the primary) should be
1196 rolled back automatically.
1197
1198 shared-secret secret
1199
1200 Configure the shared secret used for peer authentication. The
1201 secret is a string of up to 64 characters. Peer authentication also
1202 requires the cram-hmac-alg parameter to be set.
1203
1204 sndbuf-size size
1205
1206 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1207 8.2.7, a value of 0 (the default) causes the buffer size to adjust
1208 dynamically. Values below 32 KiB are harmful to the throughput on
1209 this connection. Large buffer sizes can be useful especially when
1210 protocol A is used over high-latency networks; the maximum value
1211 supported is 10 MiB.
1212
1213 tcp-cork
1214 By default, DRBD uses the TCP_CORK socket option to prevent the
1215 kernel from sending partial messages; this results in fewer and
1216 bigger packets on the network. Some network stacks can perform
1217 worse with this optimization. On these, the tcp-cork parameter can
1218 be used to turn this optimization off.
1219
1220 timeout time
1221
1222 Define the timeout for replies over the network: if a peer node
1223 does not send an expected reply within the specified timeout, it is
1224 considered dead and the TCP/IP connection is closed. The timeout
1225 value must be lower than connect-int and lower than ping-int. The
1226 default is 6 seconds; the value is specified in tenths of a second.
1227
1228 tls bool-value
1229 Enable TLS.
1230
1231 tls-keyring key-description
1232 Key description (name) of the keyring where the TLS key material is
1233 stored. The keyring will be shared with the handshake daemon.
1234
1235 tls-privkey key-description
1236 Key description (name) of the DER encoded private key for TLS
1237 encryption.
1238
1239 tls-certificate key-description
1240 Key description (name) of the DER encoded certificate for TLS
1241 encryption.
1242
1243 transport type
1244
1245 With DRBD9 the network transport used by DRBD is loaded as a
1246 seperate module. With this option you can specify which transport
1247 and module to load. At present only two options exist, tcp and
1248 rdma. Default is tcp.
1249
1250 use-rle
1251
1252 Each replicated device on a cluster node has a separate bitmap for
1253 each of its peer devices. The bitmaps are used for tracking the
1254 differences between the local and peer device: depending on the
1255 cluster state, a disk range can be marked as different from the
1256 peer in the device's bitmap, in the peer device's bitmap, or in
1257 both bitmaps. When two cluster nodes connect, they exchange each
1258 other's bitmaps, and they each compute the union of the local and
1259 peer bitmap to determine the overall differences.
1260
1261 Bitmaps of very large devices are also relatively large, but they
1262 usually compress very well using run-length encoding. This can save
1263 time and bandwidth for the bitmap transfers.
1264
1265 The use-rle parameter determines if run-length encoding should be
1266 used. It is on by default since DRBD 8.4.0.
1267
1268 verify-alg hash-algorithm
1269 Online verification (drbdadm verify) computes and compares
1270 checksums of disk blocks (i.e., hash values) in order to detect if
1271 they differ. The verify-alg parameter determines which algorithm to
1272 use for these checksums. It must be set to one of the secure hash
1273 algorithms supported by the kernel before online verify can be
1274 used; see the shash algorithms listed in /proc/crypto.
1275
1276 We recommend to schedule online verifications regularly during
1277 low-load periods, for example once a month. Also see the notes on
1278 data integrity below.
1279
1280 Section on Parameters
1281 address [address-family] address:port
1282
1283 Defines the address family, address, and port of a connection
1284 endpoint.
1285
1286 The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1287 Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1288 Protocol), and sci are supported (sci is an alias for ssocks). If
1289 no address family is specified, ipv4 is assumed. For all address
1290 families except ipv6, the address is specified in IPV4 address
1291 notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1292 in brackets and uses IPv6 address notation (for example,
1293 [fd01:2345:6789:abcd::1]). The port is always specified as a
1294 decimal number from 1 to 65535.
1295
1296 On each host, the port numbers must be unique for each address;
1297 ports cannot be shared.
1298
1299 node-id value
1300
1301 Defines the unique node identifier for a node in the cluster. Node
1302 identifiers are used to identify individual nodes in the network
1303 protocol, and to assign bitmap slots to nodes in the metadata.
1304
1305 Node identifiers can only be reasssigned in a cluster when the
1306 cluster is down. It is essential that the node identifiers in the
1307 configuration and in the device metadata are changed consistently
1308 on all hosts. To change the metadata, dump the current state with
1309 drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1310 metadata with drbdmeta restore-md.
1311
1312 The node-id parameter exists since DRBD 9. Its value ranges from 0
1313 to 16; there is no default.
1314
1315 Section options Parameters (Resource Options)
1316 auto-promote bool-value
1317 A resource must be promoted to primary role before any of its
1318 devices can be mounted or opened for writing.
1319
1320 Before DRBD 9, this could only be done explicitly ("drbdadm
1321 primary"). Since DRBD 9, the auto-promote parameter allows to
1322 automatically promote a resource to primary role when one of its
1323 devices is mounted or opened for writing. As soon as all devices
1324 are unmounted or closed with no more remaining users, the role of
1325 the resource changes back to secondary.
1326
1327 Automatic promotion only succeeds if the cluster state allows it
1328 (that is, if an explicit drbdadm primary command would succeed).
1329 Otherwise, mounting or opening the device fails as it already did
1330 before DRBD 9: the mount(2) system call fails with errno set to
1331 EROFS (Read-only file system); the open(2) system call fails with
1332 errno set to EMEDIUMTYPE (wrong medium type).
1333
1334 Irrespective of the auto-promote parameter, if a device is promoted
1335 explicitly (drbdadm primary), it also needs to be demoted
1336 explicitly (drbdadm secondary).
1337
1338 The auto-promote parameter is available since DRBD 9.0.0, and
1339 defaults to yes.
1340
1341 auto-promote-timeout 1/10-of-seconds
1342
1343 When a user process promotes a drbd resource by opening one of its
1344 devices, DRBD waits up to auto-promote-timeout for the device to
1345 become promotable if it is not in the first place.
1346
1347 auto-promote-timeout is specified in units of 0.1 seconds. Its
1348 default value is 20 (2 seconds), its minimum value is 0, and its
1349 maximum value is 600 (=one minute).
1350
1351 cpu-mask cpu-mask
1352
1353 Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1354 specified as a hexadecimal number. The default value is 0, which
1355 lets the scheduler decide which kernel threads run on which CPUs.
1356 CPU numbers in cpu-mask which do not exist in the system are
1357 ignored.
1358
1359 max-io-depth value
1360
1361 This limits the number of outstanding requests on a DRBD device.
1362 Any process that tries to issue more I/O requests will sleep in "D
1363 state" (uninterruptible by signals) until some previously issued
1364 requests finish.
1365
1366 max-io-depth has a default value of 8000, its minimum value is 4,
1367 and its maximum value is 2^32.
1368
1369 on-no-data-accessible policy
1370 Determine how to deal with I/O requests when the requested data is
1371 not available locally or remotely (for example, when all disks have
1372 failed). When quorum is enabled, on-no-data-accessible should be
1373 set to the same value as on-no-quorum. The defined policies are:
1374
1375 io-error
1376 System calls fail with errno set to EIO.
1377
1378 suspend-io
1379 The resource suspends I/O. I/O can be resumed by (re)attaching
1380 the lower-level device, by connecting to a peer which has
1381 access to the data, or by forcing DRBD to resume I/O with
1382 drbdadm resume-io res. When no data is available, forcing I/O
1383 to resume will result in the same behavior as the io-error
1384 policy.
1385
1386 This setting is available since DRBD 8.3.9; the default policy is
1387 io-error.
1388
1389 on-no-quorum {io-error | suspend-io}
1390
1391 By default DRBD freezes IO on a device, that lost quorum. By
1392 setting the on-no-quorum to io-error it completes all IO operations
1393 with an error if quorum is lost.
1394
1395 Usually, the on-no-data-accessible should be set to the same value
1396 as on-no-quorum, as it has precedence.
1397
1398 The on-no-quorum options is available starting with the DRBD kernel
1399 driver version 9.0.8.
1400
1401 on-suspended-primary-outdated {disconnect | force-secondary}
1402
1403 This setting is only relevant when on-no-quorum is set to
1404 suspend-io. It is relevant in the following scenario. A primary
1405 node loses quorum hence has all IO requests frozen. This primary
1406 node then connects to another, quorate partition. It detects that a
1407 node in this quorate partition was promoted to primary, and started
1408 a newer data-generation there. As a result, the first primary
1409 learns that it has to consider itself outdated.
1410
1411 When it is set to force-secondary then it will demote to secondary
1412 immediately, and fail all pending (and new) IO requests with IO
1413 errors. It will refuse to allow any process to open the DRBD
1414 devices until all openers closed the device. This state is visible
1415 in status and events2 under the name force-io-failures.
1416
1417 The disconnect setting simply causes that node to reject connect
1418 attempts and stay isolated.
1419
1420 The on-suspended-primary-outdated option is available starting with
1421 the DRBD kernel driver version 9.1.7. It has a default value of
1422 disconnect.
1423
1424 peer-ack-delay expiry-time
1425
1426 If after the last finished write request no new write request gets
1427 issued for expiry-time, then a peer-ack packet is sent. If a new
1428 write request is issued before the timer expires, the timer gets
1429 reset to expiry-time. (Note: peer-ack packets may be sent due to
1430 other reasons as well, e.g. membership changes or the
1431 peer-ack-window option.)
1432
1433 This parameter may influence resync behavior on remote nodes. Peer
1434 nodes need to wait until they receive an peer-ack for releasing a
1435 lock on an AL-extent. Resync operations between peers may need to
1436 wait for for these locks.
1437
1438 The default value for peer-ack-delay is 100 milliseconds, the
1439 default unit is milliseconds. This option is available since 9.0.0.
1440
1441 peer-ack-window value
1442
1443 On each node and for each device, DRBD maintains a bitmap of the
1444 differences between the local and remote data for each peer device.
1445 For example, in a three-node setup (nodes A, B, C) each with a
1446 single device, every node maintains one bitmap for each of its
1447 peers.
1448
1449 When nodes receive write requests, they know how to update the
1450 bitmaps for the writing node, but not how to update the bitmaps
1451 between themselves. In this example, when a write request
1452 propagates from node A to B and C, nodes B and C know that they
1453 have the same data as node A, but not whether or not they both have
1454 the same data.
1455
1456 As a remedy, the writing node occasionally sends peer-ack packets
1457 to its peers which tell them which state they are in relative to
1458 each other.
1459
1460 The peer-ack-window parameter specifies how much data a primary
1461 node may send before sending a peer-ack packet. A low value causes
1462 increased network traffic; a high value causes less network traffic
1463 but higher memory consumption on secondary nodes and higher resync
1464 times between the secondary nodes after primary node failures.
1465 (Note: peer-ack packets may be sent due to other reasons as well,
1466 e.g. membership changes or expiry of the peer-ack-delay timer.)
1467
1468 The default value for peer-ack-window is 2 MiB, the default unit is
1469 sectors. This option is available since 9.0.0.
1470
1471 quorum value
1472
1473 When activated, a cluster partition requires quorum in order to
1474 modify the replicated data set. That means a node in the cluster
1475 partition can only be promoted to primary if the cluster partition
1476 has quorum. Every node with a disk directly connected to the node
1477 that should be promoted counts. If a primary node should execute a
1478 write request, but the cluster partition has lost quorum, it will
1479 freeze IO or reject the write request with an error (depending on
1480 the on-no-quorum setting). Upon loosing quorum a primary always
1481 invokes the quorum-lost handler. The handler is intended for
1482 notification purposes, its return code is ignored.
1483
1484 The option's value might be set to off, majority, all or a numeric
1485 value. If you set it to a numeric value, make sure that the value
1486 is greater than half of your number of nodes. Quorum is a mechanism
1487 to avoid data divergence, it might be used instead of fencing when
1488 there are more than two repicas. It defaults to off
1489
1490 If all missing nodes are marked as outdated, a partition always has
1491 quorum, no matter how small it is. I.e. If you disconnect all
1492 secondary nodes gracefully a single primary continues to operate.
1493 In the moment a single secondary is lost, it has to be assumed that
1494 it forms a partition with all the missing outdated nodes. In case
1495 my partition might be smaller than the other, quorum is lost in
1496 this moment.
1497
1498 In case you want to allow permanently diskless nodes to gain quorum
1499 it is recommendet to not use majority or all. It is recommended to
1500 specify an absolute number, since DBRD's heuristic to determine the
1501 complete number of diskfull nodes in the cluster is unreliable.
1502
1503 The quorum implementation is available starting with the DRBD
1504 kernel driver version 9.0.7.
1505
1506 quorum-minimum-redundancy value
1507
1508 This option sets the minimal required number of nodes with an
1509 UpToDate disk to allow the partition to gain quorum. This is a
1510 different requirement than the plain quorum option expresses.
1511
1512 The option's value might be set to off, majority, all or a numeric
1513 value. If you set it to a numeric value, make sure that the value
1514 is greater than half of your number of nodes.
1515
1516 In case you want to allow permanently diskless nodes to gain quorum
1517 it is recommendet to not use majority or all. It is recommended to
1518 specify an absolute number, since DBRD's heuristic to determine the
1519 complete number of diskfull nodes in the cluster is unreliable.
1520
1521 This option is available starting with the DRBD kernel driver
1522 version 9.0.10.
1523
1524 twopc-retry-timeout 1/10-of-seconds
1525
1526 Due to conflicting two-phase-commit sometimes DRBD needs to retry
1527 them. But if two nodes retry their intended two-phase-commits after
1528 the same time, they would end up in an endless retry loop. To avoid
1529 that, DRBD selects a random wait time within an upper bound, an
1530 exponential backoff, and a function of the retry number. The
1531 twopc-retry-timeout is a base multiplier for that function.
1532
1533 twopc-retry-timeout has a default value of a (0.1 seconds), its
1534 minimum value is 1 (0.1 seconds), and its maximum value is 50 (5
1535 seconds).
1536
1537 twopc-timeout 1/10-of-seconds
1538
1539 In some situations, a DRBD cluster requires a cluster-wide
1540 coordinated state transition. A perfect example of this is the
1541 'promote-to-primary' action. Even if two not directly connected
1542 nodes in a cluster try this action concurrently, it may only
1543 succeed for one of the two.
1544
1545 For these cluster-wide coordinated state transitions, DRBD
1546 implements a two-phase commit protocol. If a connection breaks in
1547 phase one (prepare packet sent), the coordinator of the two-phase
1548 commit might never get the expected reply packet.
1549
1550 A cluster in this state can not start any new cluster-wide
1551 coordinated state transition, as the already prepared one blocks
1552 all such attempts. After twopc-timeout all nodes abort the prepared
1553 transaction and unlock the cluster again.
1554
1555 twopc-timeout has a default value of 300 (30 seconds), its minimum
1556 value is 50 (5 seconds), and its maximum value is 600 (one minute).
1557
1558 Section startup Parameters
1559 The parameters in this section define the behavior of DRBD at system
1560 startup time, in the DRBD init script. They have no effect once the
1561 system is up and running.
1562
1563 degr-wfc-timeout timeout
1564
1565 Define how long to wait until all peers are connected in case the
1566 cluster consisted of a single node only when the system went down.
1567 This parameter is usually set to a value smaller than wfc-timeout.
1568 The assumption here is that peers which were unreachable before a
1569 reboot are less likely to be reachable after the reboot, so waiting
1570 is less likely to help.
1571
1572 The timeout is specified in seconds. The default value is 0, which
1573 stands for an infinite timeout. Also see the wfc-timeout parameter.
1574
1575 outdated-wfc-timeout timeout
1576
1577 Define how long to wait until all peers are connected if all peers
1578 were outdated when the system went down. This parameter is usually
1579 set to a value smaller than wfc-timeout. The assumption here is
1580 that an outdated peer cannot have become primary in the meantime,
1581 so we don't need to wait for it as long as for a node which was
1582 alive before.
1583
1584 The timeout is specified in seconds. The default value is 0, which
1585 stands for an infinite timeout. Also see the wfc-timeout parameter.
1586
1587 stacked-timeouts
1588 On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1589 in the configuration are usually ignored, and both timeouts are set
1590 to twice the connect-int timeout. The stacked-timeouts parameter
1591 tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1592 as defined in the configuration, even on stacked devices. Only use
1593 this parameter if the peer of the stacked resource is usually not
1594 available, or will not become primary. Incorrect use of this
1595 parameter can lead to unexpected split-brain scenarios.
1596
1597 wait-after-sb
1598 This parameter causes DRBD to continue waiting in the init script
1599 even when a split-brain situation has been detected, and the nodes
1600 therefore refuse to connect to each other.
1601
1602 wfc-timeout timeout
1603
1604 Define how long the init script waits until all peers are
1605 connected. This can be useful in combination with a cluster manager
1606 which cannot manage DRBD resources: when the cluster manager
1607 starts, the DRBD resources will already be up and running. With a
1608 more capable cluster manager such as Pacemaker, it makes more sense
1609 to let the cluster manager control DRBD resources. The timeout is
1610 specified in seconds. The default value is 0, which stands for an
1611 infinite timeout. Also see the degr-wfc-timeout parameter.
1612
1613 Section volume Parameters
1614 device /dev/drbdminor-number
1615
1616 Define the device name and minor number of a replicated block
1617 device. This is the device that applications are supposed to
1618 access; in most cases, the device is not used directly, but as a
1619 file system. This parameter is required and the standard device
1620 naming convention is assumed.
1621
1622 In addition to this device, udev will create
1623 /dev/drbd/by-res/resource/volume and
1624 /dev/drbd/by-disk/lower-level-device symlinks to the device.
1625
1626 disk {[disk] | none}
1627
1628 Define the lower-level block device that DRBD will use for storing
1629 the actual data. While the replicated drbd device is configured,
1630 the lower-level device must not be used directly. Even read-only
1631 access with tools like dumpe2fs(8) and similar is not allowed. The
1632 keyword none specifies that no lower-level block device is
1633 configured; this also overrides inheritance of the lower-level
1634 device.
1635
1636 meta-disk internal,
1637 meta-disk device,
1638 meta-disk device [index]
1639
1640 Define where the metadata of a replicated block device resides: it
1641 can be internal, meaning that the lower-level device contains both
1642 the data and the metadata, or on a separate device.
1643
1644 When the index form of this parameter is used, multiple replicated
1645 devices can share the same metadata device, each using a separate
1646 index. Each index occupies 128 MiB of data, which corresponds to a
1647 replicated device size of at most 4 TiB with two cluster nodes. We
1648 recommend not to share metadata devices anymore, and to instead use
1649 the lvm volume manager for creating metadata devices as needed.
1650
1651 When the index form of this parameter is not used, the size of the
1652 lower-level device determines the size of the metadata. The size
1653 needed is 36 KiB + (size of lower-level device) / 32K * (number of
1654 nodes - 1). If the metadata device is bigger than that, the extra
1655 space is not used.
1656
1657 This parameter is required if a disk other than none is specified,
1658 and ignored if disk is set to none. A meta-disk parameter without a
1659 disk parameter is not allowed.
1660
1662 DRBD supports two different mechanisms for data integrity checking:
1663 first, the data-integrity-alg network parameter allows to add a
1664 checksum to the data sent over the network. Second, the online
1665 verification mechanism (drbdadm verify and the verify-alg parameter)
1666 allows to check for differences in the on-disk data.
1667
1668 Both mechanisms can produce false positives if the data is modified
1669 during I/O (i.e., while it is being sent over the network or written to
1670 disk). This does not always indicate a problem: for example, some file
1671 systems and applications do modify data under I/O for certain
1672 operations. Swap space can also undergo changes while under I/O.
1673
1674 Network data integrity checking tries to identify data modification
1675 during I/O by verifying the checksums on the sender side after sending
1676 the data. If it detects a mismatch, it logs an error. The receiver also
1677 logs an error when it detects a mismatch. Thus, an error logged only on
1678 the receiver side indicates an error on the network, and an error
1679 logged on both sides indicates data modification under I/O.
1680
1681 The most recent example of systematic data corruption was identified as
1682 a bug in the TCP offloading engine and driver of a certain type of GBit
1683 NIC in 2007: the data corruption happened on the DMA transfer from core
1684 memory to the card. Because the TCP checksum were calculated on the
1685 card, the TCP/IP protocol checksums did not reveal this problem.
1686
1688 This document was revised for version 9.0.0 of the DRBD distribution.
1689
1691 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1692 Ellenberg <lars.ellenberg@linbit.com>.
1693
1695 Report bugs to <drbd-user@lists.linbit.com>.
1696
1698 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1699 Lars Ellenberg. This is free software; see the source for copying
1700 conditions. There is NO warranty; not even for MERCHANTABILITY or
1701 FITNESS FOR A PARTICULAR PURPOSE.
1702
1704 drbd(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1705 Site[3]
1706
1708 1. DRBD User's Guide
1709 http://www.drbd.org/users-guide/
1710
1711 2.
1712
1713 Online Usage Counter
1714 http://usage.drbd.org
1715
1716 3. DRBD Web Site
1717 http://www.drbd.org/
1718
1719
1720
1721DRBD 9.0.x 17 January 2018 DRBD.CONF(5)