1DRBD.CONF(5) Configuration Files DRBD.CONF(5)
2
3
4
6 drbd.conf - DRBD Configuration Files
7
9 DRBD implements block devices which replicate their data to all nodes
10 of a cluster. The actual data and associated metadata are usually
11 stored redundantly on "ordinary" block devices on each cluster node.
12
13 Replicated block devices are called /dev/drbdminor by default. They are
14 grouped into resources, with one or more devices per resource.
15 Replication among the devices in a resource takes place in
16 chronological order. With DRBD, we refer to the devices inside a
17 resource as volumes.
18
19 In DRBD 9, a resource can be replicated between two or more cluster
20 nodes. The connections between cluster nodes are point-to-point links,
21 and use TCP or a TCP-like protocol. All nodes must be directly
22 connected.
23
24 DRBD consists of low-level user-space components which interact with
25 the kernel and perform basic operations (drbdsetup, drbdmeta), a
26 high-level user-space component which understands and processes the
27 DRBD configuration and translates it into basic operations of the
28 low-level components (drbdadm), and a kernel component.
29
30 The default DRBD configuration consists of /etc/drbd.conf and of
31 additional files included from there, usually global_common.conf and
32 all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33 define each resource in a separate *.res file.
34
35 The configuration files are designed so that each cluster node can
36 contain an identical copy of the entire cluster configuration. The host
37 name of each node determines which parts of the configuration apply
38 (uname -n). It is highly recommended to keep the cluster configuration
39 on all nodes in sync by manually copying it to all nodes, or by
40 automating the process with csync2 or a similar tool.
41
43 global {
44 usage-count yes;
45 udev-always-use-vnr;
46 }
47 resource r0 {
48 net {
49 cram-hmac-alg sha1;
50 shared-secret "FooFunFactory";
51 }
52 volume 0 {
53 device "/dev/drbd1";
54 disk "/dev/sda7";
55 meta-disk internal;
56 }
57 on "alice" {
58 node-id 0;
59 address 10.1.1.31:7000;
60 }
61 on "bob" {
62 node-id 1;
63 address 10.1.1.32:7000;
64 }
65 connection {
66 host "alice" port 7000;
67 host "bob" port 7000;
68 net {
69 protocol C;
70 }
71 }
72 }
73
74 This example defines a resource r0 which contains a single replicated
75 device with volume number 0. The resource is replicated among hosts
76 alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77 and the node identifiers 0 and 1, respectively. On both hosts, the
78 replicated device is called /dev/drbd1, and the actual data and
79 metadata are stored on the lower-level device /dev/sda7. The connection
80 between the hosts uses protocol C.
81
82 Enclose strings within double-quotation marks (") to differentiate them
83 from resource keywords. Please refer to the DRBD User's Guide[1] for
84 more examples.
85
87 DRBD configuration files consist of sections, which contain other
88 sections and parameters depending on the section types. Each section
89 consists of one or more keywords, sometimes a section name, an opening
90 brace (“{”), the section's contents, and a closing brace (“}”).
91 Parameters inside a section consist of a keyword, followed by one or
92 more keywords or values, and a semicolon (“;”).
93
94 Some parameter values have a default scale which applies when a plain
95 number is specified (for example Kilo, or 1024 times the numeric
96 value). Such default scales can be overridden by using a suffix (for
97 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
98 and G = 1024 M are supported.
99
100 Comments start with a hash sign (“#”) and extend to the end of the
101 line. In addition, any section can be prefixed with the keyword skip,
102 which causes the section and any sub-sections to be ignored.
103
104 Additional files can be included with the include file-pattern
105 statement (see glob(7) for the expressions supported in file-pattern).
106 Include statements are only allowed outside of sections.
107
108 The following sections are defined (indentation indicates in which
109 context):
110
111 common
112 [disk]
113 [handlers]
114 [net]
115 [options]
116 [startup]
117 global
118 [require-drbd-module-version-{eq,ne,gt,ge,lt,le}]
119 resource
120 connection
121 multiple path | 2 host
122 [net]
123 [volume]
124 [peer-device-options]
125 [peer-device-options]
126 connection-mesh
127 [net]
128 [disk]
129 floating
130 handlers
131 [net]
132 on
133 volume
134 disk
135 [disk]
136 options
137 stacked-on-top-of
138 startup
139
140 Sections in brackets affect other parts of the configuration: inside
141 the common section, they apply to all resources. A disk section inside
142 a resource or on section applies to all volumes of that resource, and a
143 net section inside a resource section applies to all connections of
144 that resource. This allows to avoid repeating identical options for
145 each resource, connection, or volume. Options can be overridden in a
146 more specific resource, connection, on, or volume section.
147
148 peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
149 c-fill-target, c-max-rate and c-min-rate. Due to backward
150 comapatibility they can be specified in any disk options section as
151 well. They are inherited into all relevant connections. If they are
152 given on connection level they are inherited to all volumes on that
153 connection. A peer-device-options section is started with the disk
154 keyword.
155
156 Sections
157 common
158
159 This section can contain each a disk, handlers, net, options, and
160 startup section. All resources inherit the parameters in these
161 sections as their default values.
162
163 connection
164
165 Define a connection between two hosts. This section must contain
166 two host parameters or multiple path sections.
167
168 path
169
170 Define a path between two hosts. This section must contain two host
171 parameters.
172
173 connection-mesh
174
175 Define a connection mesh between multiple hosts. This section must
176 contain a hosts parameter, which has the host names as arguments.
177 This section is a shortcut to define many connections which share
178 the same network options.
179
180 disk
181
182 Define parameters for a volume. All parameters in this section are
183 optional.
184
185 floating [address-family] addr:port
186
187 Like the on section, except that instead of the host name a network
188 address is used to determine if it matches a floating section.
189
190 The node-id parameter in this section is required. If the address
191 parameter is not provided, no connections to peers will be created
192 by default. The device, disk, and meta-disk parameters must be
193 defined in, or inherited by, this section.
194
195 global
196
197 Define some global parameters. All parameters in this section are
198 optional. Only one global section is allowed in the configuration.
199
200 require-drbd-module-version-{eq,ne,gt,ge,lt,le}
201
202 This statement contains one of the valid forms and a three digit
203 version number (e.g., require-drbd-module-version-eq 9.0.16;). If
204 the currently loaded DRBD kernel module does not match the
205 specification, parsing is aborted. Comparison operator names have
206 same semantic as in test(1).
207
208 handlers
209
210 Define handlers to be invoked when certain events occur. The kernel
211 passes the resource name in the first command-line argument and
212 sets the following environment variables depending on the event's
213 context:
214
215 • For events related to a particular device: the device's minor
216 number in DRBD_MINOR, the device's volume number in
217 DRBD_VOLUME.
218
219 • For events related to a particular device on a particular peer:
220 the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
221 DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
222 number in DRBD_MINOR, and the device's volume number in
223 DRBD_VOLUME.
224
225 • For events related to a particular connection: the connection
226 endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
227 and DRBD_PEER_AF; and, for each device defined for that
228 connection: the device's minor number in
229 DRBD_MINOR_volume-number.
230
231 • For events that identify a device, if a lower-level device is
232 attached, the lower-level device's device name is passed in
233 DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
234
235 All parameters in this section are optional. Only a single handler
236 can be defined for each event; if no handler is defined, nothing
237 will happen.
238
239 net
240
241 Define parameters for a connection. All parameters in this section
242 are optional.
243
244 on host-name [...]
245
246 Define the properties of a resource on a particular host or set of
247 hosts. Specifying more than one host name can make sense in a setup
248 with IP address failover, for example. The host-name argument must
249 match the Linux host name (uname -n).
250
251 Usually contains or inherits at least one volume section. The
252 node-id and address parameters must be defined in this section. The
253 device, disk, and meta-disk parameters must be defined in, or
254 inherited by, this section.
255
256 A normal configuration file contains two or more on sections for
257 each resource. Also see the floating section.
258
259 options
260
261 Define parameters for a resource. All parameters in this section
262 are optional.
263
264 resource name
265
266 Define a resource. Usually contains at least two on sections and at
267 least one connection section.
268
269 stacked-on-top-of resource
270
271 Used instead of an on section for configuring a stacked resource
272 with three to four nodes.
273
274 Starting with DRBD 9, stacking is deprecated. It is advised to use
275 resources which are replicated among more than two nodes instead.
276
277 startup
278
279 The parameters in this section determine the behavior of a resource
280 at startup time.
281
282 volume volume-number
283
284 Define a volume within a resource. The volume numbers in the
285 various volume sections of a resource define which devices on which
286 hosts form a replicated device.
287
288 Section connection Parameters
289 host name [address [address-family] address] [port port-number]
290
291 Defines an endpoint for a connection. Each host statement refers to
292 an on section in a resource. If a port number is defined, this
293 endpoint will use the specified port instead of the port defined in
294 the on section. Each connection section must contain exactly two
295 host parameters. Instead of two host parameters the connection may
296 contain multiple path sections.
297
298 Section path Parameters
299 host name [address [address-family] address] [port port-number]
300
301 Defines an endpoint for a connection. Each host statement refers to
302 an on section in a resource. If a port number is defined, this
303 endpoint will use the specified port instead of the port defined in
304 the on section. Each path section must contain exactly two host
305 parameters.
306
307 Section connection-mesh Parameters
308 hosts name...
309
310 Defines all nodes of a mesh. Each name refers to an on section in a
311 resource. The port that is defined in the on section will be used.
312
313 Section disk Parameters
314 al-extents extents
315
316 DRBD automatically maintains a "hot" or "active" disk area likely
317 to be written to again soon based on the recent write activity. The
318 "active" disk area can be written to immediately, while "inactive"
319 disk areas must be "activated" first, which requires a meta-data
320 write. We also refer to this active disk area as the "activity
321 log".
322
323 The activity log saves meta-data writes, but the whole log must be
324 resynced upon recovery of a failed node. The size of the activity
325 log is a major factor of how long a resync will take and how fast a
326 replicated disk will become consistent after a crash.
327
328 The activity log consists of a number of 4-Megabyte segments; the
329 al-extents parameter determines how many of those segments can be
330 active at the same time. The default value for al-extents is 1237,
331 with a minimum of 7 and a maximum of 65536.
332
333 Note that the effective maximum may be smaller, depending on how
334 you created the device meta data, see also drbdmeta(8) The
335 effective maximum is 919 * (available on-disk activity-log
336 ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
337 maximum of 6433 (covers more than 25 GiB of data) We recommend to
338 keep this well within the amount your backend storage and
339 replication link are able to resync inside of about 5 minutes.
340
341 al-updates {yes | no}
342
343 With this parameter, the activity log can be turned off entirely
344 (see the al-extents parameter). This will speed up writes because
345 fewer meta-data writes will be necessary, but the entire device
346 needs to be resynchronized opon recovery of a failed primary node.
347 The default value for al-updates is yes.
348
349 disk-barrier,
350 disk-flushes,
351 disk-drain
352 DRBD has three methods of handling the ordering of dependent write
353 requests:
354
355 disk-barrier
356 Use disk barriers to make sure that requests are written to
357 disk in the right order. Barriers ensure that all requests
358 submitted before a barrier make it to the disk before any
359 requests submitted after the barrier. This is implemented using
360 'tagged command queuing' on SCSI devices and 'native command
361 queuing' on SATA devices. Only some devices and device stacks
362 support this method. The device mapper (LVM) only supports
363 barriers in some configurations.
364
365 Note that on systems which do not support disk barriers,
366 enabling this option can lead to data loss or corruption. Until
367 DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
368 DRBD did support barriers. Kernels since linux-2.6.36 (or
369 2.6.32 RHEL6) no longer allow to detect if barriers are
370 supported. Since drbd-8.4.2, this option is off by default and
371 needs to be enabled explicitly.
372
373 disk-flushes
374 Use disk flushes between dependent write requests, also
375 referred to as 'force unit access' by drive vendors. This
376 forces all data to disk. This option is enabled by default.
377
378 disk-drain
379 Wait for the request queue to "drain" (that is, wait for the
380 requests to finish) before submitting a dependent write
381 request. This method requires that requests are stable on disk
382 when they finish. Before DRBD 8.0.9, this was the only method
383 implemented. This option is enabled by default. Do not disable
384 in production environments.
385
386 From these three methods, drbd will use the first that is enabled
387 and supported by the backing storage device. If all three of these
388 options are turned off, DRBD will submit write requests without
389 bothering about dependencies. Depending on the I/O stack, write
390 requests can be reordered, and they can be submitted in a different
391 order on different cluster nodes. This can result in data loss or
392 corruption. Therefore, turning off all three methods of controlling
393 write ordering is strongly discouraged.
394
395 A general guideline for configuring write ordering is to use disk
396 barriers or disk flushes when using ordinary disks (or an ordinary
397 disk array) with a volatile write cache. On storage without cache
398 or with a battery backed write cache, disk draining can be a
399 reasonable choice.
400
401 disk-timeout
402 If the lower-level device on which a DRBD device stores its data
403 does not finish an I/O request within the defined disk-timeout,
404 DRBD treats this as a failure. The lower-level device is detached,
405 and the device's disk state advances to Diskless. If DRBD is
406 connected to one or more peers, the failed request is passed on to
407 one of them.
408
409 This option is dangerous and may lead to kernel panic!
410
411 "Aborting" requests, or force-detaching the disk, is intended for
412 completely blocked/hung local backing devices which do no longer
413 complete requests at all, not even do error completions. In this
414 situation, usually a hard-reset and failover is the only way out.
415
416 By "aborting", basically faking a local error-completion, we allow
417 for a more graceful swichover by cleanly migrating services. Still
418 the affected node has to be rebooted "soon".
419
420 By completing these requests, we allow the upper layers to re-use
421 the associated data pages.
422
423 If later the local backing device "recovers", and now DMAs some
424 data from disk into the original request pages, in the best case it
425 will just put random data into unused pages; but typically it will
426 corrupt meanwhile completely unrelated data, causing all sorts of
427 damage.
428
429 Which means delayed successful completion, especially for READ
430 requests, is a reason to panic(). We assume that a delayed *error*
431 completion is OK, though we still will complain noisily about it.
432
433 The default value of disk-timeout is 0, which stands for an
434 infinite timeout. Timeouts are specified in units of 0.1 seconds.
435 This option is available since DRBD 8.3.12.
436
437 md-flushes
438 Enable disk flushes and disk barriers on the meta-data device. This
439 option is enabled by default. See the disk-flushes parameter.
440
441 on-io-error handler
442
443 Configure how DRBD reacts to I/O errors on a lower-level device.
444 The following policies are defined:
445
446 pass_on
447 Change the disk status to Inconsistent, mark the failed block
448 as inconsistent in the bitmap, and retry the I/O operation on a
449 remote cluster node.
450
451 call-local-io-error
452 Call the local-io-error handler (see the handlers section).
453
454 detach
455 Detach the lower-level device and continue in diskless mode.
456
457
458 read-balancing policy
459 Distribute read requests among cluster nodes as defined by policy.
460 The supported policies are prefer-local (the default),
461 prefer-remote, round-robin, least-pending, when-congested-remote,
462 32K-striping, 64K-striping, 128K-striping, 256K-striping,
463 512K-striping and 1M-striping.
464
465 This option is available since DRBD 8.4.1.
466
467 resync-after res-name/volume
468
469 Define that a device should only resynchronize after the specified
470 other device. By default, no order between devices is defined, and
471 all devices will resynchronize in parallel. Depending on the
472 configuration of the lower-level devices, and the available network
473 and disk bandwidth, this can slow down the overall resync process.
474 This option can be used to form a chain or tree of dependencies
475 among devices.
476
477 rs-discard-granularity byte
478 When rs-discard-granularity is set to a non zero, positive value
479 then DRBD tries to do a resync operation in requests of this size.
480 In case such a block contains only zero bytes on the sync source
481 node, the sync target node will issue a discard/trim/unmap command
482 for the area.
483
484 The value is constrained by the discard granularity of the backing
485 block device. In case rs-discard-granularity is not a multiplier of
486 the discard granularity of the backing block device DRBD rounds it
487 up. The feature only gets active if the backing block device reads
488 back zeroes after a discard command.
489
490 The usage of rs-discard-granularity may cause c-max-rate to be
491 exceeded. In particular, the resync rate may reach 10x the value of
492 rs-discard-granularity per second.
493
494 The default value of rs-discard-granularity is 0. This option is
495 available since 8.4.7.
496
497 discard-zeroes-if-aligned {yes | no}
498
499 There are several aspects to discard/trim/unmap support on linux
500 block devices. Even if discard is supported in general, it may fail
501 silently, or may partially ignore discard requests. Devices also
502 announce whether reading from unmapped blocks returns defined data
503 (usually zeroes), or undefined data (possibly old data, possibly
504 garbage).
505
506 If on different nodes, DRBD is backed by devices with differing
507 discard characteristics, discards may lead to data divergence (old
508 data or garbage left over on one backend, zeroes due to unmapped
509 areas on the other backend). Online verify would now potentially
510 report tons of spurious differences. While probably harmless for
511 most use cases (fstrim on a file system), DRBD cannot have that.
512
513 To play safe, we have to disable discard support, if our local
514 backend (on a Primary) does not support "discard_zeroes_data=true".
515 We also have to translate discards to explicit zero-out on the
516 receiving side, unless the receiving side (Secondary) supports
517 "discard_zeroes_data=true", thereby allocating areas what were
518 supposed to be unmapped.
519
520 There are some devices (notably the LVM/DM thin provisioning) that
521 are capable of discard, but announce discard_zeroes_data=false. In
522 the case of DM-thin, discards aligned to the chunk size will be
523 unmapped, and reading from unmapped sectors will return zeroes.
524 However, unaligned partial head or tail areas of discard requests
525 will be silently ignored.
526
527 If we now add a helper to explicitly zero-out these unaligned
528 partial areas, while passing on the discard of the aligned full
529 chunks, we effectively achieve discard_zeroes_data=true on such
530 devices.
531
532 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
533 discards, and to announce discard_zeroes_data=true, even on
534 backends that announce discard_zeroes_data=false.
535
536 Setting discard-zeroes-if-aligned to no will cause DRBD to always
537 fall-back to zero-out on the receiving side, and to not even
538 announce discard capabilities on the Primary, if the respective
539 backend announces discard_zeroes_data=false.
540
541 We used to ignore the discard_zeroes_data setting completely. To
542 not break established and expected behaviour, and suddenly cause
543 fstrim on thin-provisioned LVs to run out-of-space instead of
544 freeing up space, the default value is yes.
545
546 This option is available since 8.4.7.
547
548 disable-write-same {yes | no}
549
550 Some disks announce WRITE_SAME support to the kernel but fail with
551 an I/O error upon actually receiving such a request. This mostly
552 happens when using virtualized disks -- notably, this behavior has
553 been observed with VMware's virtual disks.
554
555 When disable-write-same is set to yes, WRITE_SAME detection is
556 manually overriden and support is disabled.
557
558 The default value of disable-write-same is no. This option is
559 available since 8.4.7.
560
561 Section peer-device-options Parameters
562 Please note that you open the section with the disk keyword.
563
564 c-delay-target delay_target,
565 c-fill-target fill_target,
566 c-max-rate max_rate,
567 c-plan-ahead plan_time
568 Dynamically control the resync speed. The following modes are
569 available:
570
571 • Dynamic control with fill target (default). Enabled when
572 c-plan-ahead is non-zero and c-fill-target is non-zero. The
573 goal is to fill the buffers along the data path with a defined
574 amount of data. This mode is recommended when DRBD-proxy is
575 used. Configured with c-plan-ahead, c-fill-target and
576 c-max-rate.
577
578 • Dynamic control with delay target. Enabled when c-plan-ahead is
579 non-zero (default) and c-fill-target is zero. The goal is to
580 have a defined delay along the path. Configured with
581 c-plan-ahead, c-delay-target and c-max-rate.
582
583 • Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD will
584 try to perform resync I/O at a fixed rate. Configured with
585 resync-rate.
586
587 The c-plan-ahead parameter defines how fast DRBD adapts to changes
588 in the resync speed. It should be set to five times the network
589 round-trip time or more. The default value of c-plan-ahead is 20,
590 in units of 0.1 seconds.
591
592 The c-fill-target parameter defines the how much resync data DRBD
593 should aim to have in-flight at all times. Common values for
594 "normal" data paths range from 4K to 100K. The default value of
595 c-fill-target is 100, in units of sectors
596
597 The c-delay-target parameter defines the delay in the resync path
598 that DRBD should aim for. This should be set to five times the
599 network round-trip time or more. The default value of
600 c-delay-target is 10, in units of 0.1 seconds.
601
602 The c-max-rate parameter limits the maximum bandwidth used by
603 dynamically controlled resyncs. Setting this to zero removes the
604 limitation (since DRBD 9.0.28). It should be set to either the
605 bandwidth available between the DRBD hosts and the machines hosting
606 DRBD-proxy, or to the available disk bandwidth. The default value
607 of c-max-rate is 102400, in units of KiB/s.
608
609 Dynamic resync speed control is available since DRBD 8.3.9.
610
611 c-min-rate min_rate
612 A node which is primary and sync-source has to schedule application
613 I/O requests and resync I/O requests. The c-min-rate parameter
614 limits how much bandwidth is available for resync I/O; the
615 remaining bandwidth is used for application I/O.
616
617 A c-min-rate value of 0 means that there is no limit on the resync
618 I/O bandwidth. This can slow down application I/O significantly.
619 Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
620
621 The default value of c-min-rate is 250, in units of KiB/s.
622
623 resync-rate rate
624
625 Define how much bandwidth DRBD may use for resynchronizing. DRBD
626 allows "normal" application I/O even during a resync. If the resync
627 takes up too much bandwidth, application I/O can become very slow.
628 This parameter allows to avoid that. Please note this is option
629 only works when the dynamic resync controller is disabled.
630
631 Section global Parameters
632 dialog-refresh time
633
634 The DRBD init script can be used to configure and start DRBD
635 devices, which can involve waiting for other cluster nodes. While
636 waiting, the init script shows the remaining waiting time. The
637 dialog-refresh defines the number of seconds between updates of
638 that countdown. The default value is 1; a value of 0 turns off the
639 countdown.
640
641 disable-ip-verification
642 Normally, DRBD verifies that the IP addresses in the configuration
643 match the host names. Use the disable-ip-verification parameter to
644 disable these checks.
645
646 usage-count {yes | no | ask}
647 A explained on DRBD's Online Usage Counter[2] web page, DRBD
648 includes a mechanism for anonymously counting how many
649 installations are using which versions of DRBD. The results are
650 available on the web page for anyone to see.
651
652 This parameter defines if a cluster node participates in the usage
653 counter; the supported values are yes, no, and ask (ask the user,
654 the default).
655
656 We would like to ask users to participate in the online usage
657 counter as this provides us valuable feedback for steering the
658 development of DRBD.
659
660 udev-always-use-vnr
661 When udev asks drbdadm for a list of device related symlinks,
662 drbdadm would suggest symlinks with differing naming conventions,
663 depending on whether the resource has explicit volume VNR { }
664 definitions, or only one single volume with the implicit volume
665 number 0:
666
667 # implicit single volume without "volume 0 {}" block
668 DEVICE=drbd<minor>
669 SYMLINK_BY_RES=drbd/by-res/<resource-name>
670 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
671
672 # explicit volume definition: volume VNR { }
673 DEVICE=drbd<minor>
674 SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
675 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
676
677 If you define this parameter in the global section, drbdadm will
678 always add the .../VNR part, and will not care for whether the
679 volume definition was implicit or explicit.
680
681 For legacy backward compatibility, this is off by default, but we
682 do recommend to enable it.
683
684 Section handlers Parameters
685 after-resync-target cmd
686
687 Called on a resync target when a node state changes from
688 Inconsistent to Consistent when a resync finishes. This handler can
689 be used for removing the snapshot created in the
690 before-resync-target handler.
691
692 before-resync-target cmd
693
694 Called on a resync target before a resync begins. This handler can
695 be used for creating a snapshot of the lower-level device for the
696 duration of the resync: if the resync source becomes unavailable
697 during a resync, reverting to the snapshot can restore a consistent
698 state.
699
700 before-resync-source cmd
701
702 Called on a resync source before a resync begins.
703
704 out-of-sync cmd
705
706 Called on all nodes after a verify finishes and out-of-sync blocks
707 were found. This handler is mainly used for monitoring purposes. An
708 example would be to call a script that sends an alert SMS.
709
710 quorum-lost cmd
711
712 Called on a Primary that lost quorum. This handler is usually used
713 to reboot the node if it is not possible to restart the application
714 that uses the storage on top of DRBD.
715
716 fence-peer cmd
717
718 Called when a node should fence a resource on a particular peer.
719 The handler should not use the same communication path that DRBD
720 uses for talking to the peer.
721
722 unfence-peer cmd
723
724 Called when a node should remove fencing constraints from other
725 nodes.
726
727 initial-split-brain cmd
728
729 Called when DRBD connects to a peer and detects that the peer is in
730 a split-brain state with the local node. This handler is also
731 called for split-brain scenarios which will be resolved
732 automatically.
733
734 local-io-error cmd
735
736 Called when an I/O error occurs on a lower-level device.
737
738 pri-lost cmd
739
740 The local node is currently primary, but DRBD believes that it
741 should become a sync target. The node should give up its primary
742 role.
743
744 pri-lost-after-sb cmd
745
746 The local node is currently primary, but it has lost the
747 after-split-brain auto recovery procedure. The node should be
748 abandoned.
749
750 pri-on-incon-degr cmd
751
752 The local node is primary, and neither the local lower-level device
753 nor a lower-level device on a peer is up to date. (The primary has
754 no device to read from or to write to.)
755
756 split-brain cmd
757
758 DRBD has detected a split-brain situation which could not be
759 resolved automatically. Manual recovery is necessary. This handler
760 can be used to call for administrator attention.
761
762 disconnected cmd
763
764 A connection to a peer went down. The handler can learn about the
765 reason for the disconnect from the DRBD_CSTATE environment
766 variable.
767
768 Section net Parameters
769 after-sb-0pri policy
770 Define how to react if a split-brain scenario is detected and none
771 of the two nodes is in primary role. (We detect split-brain
772 scenarios when two nodes connect; split-brain decisions are always
773 between two nodes.) The defined policies are:
774
775 disconnect
776 No automatic resynchronization; simply disconnect.
777
778 discard-younger-primary,
779 discard-older-primary
780 Resynchronize from the node which became primary first
781 (discard-younger-primary) or last (discard-older-primary). If
782 both nodes became primary independently, the
783 discard-least-changes policy is used.
784
785 discard-zero-changes
786 If only one of the nodes wrote data since the split brain
787 situation was detected, resynchronize from this node to the
788 other. If both nodes wrote data, disconnect.
789
790 discard-least-changes
791 Resynchronize from the node with more modified blocks.
792
793 discard-node-nodename
794 Always resynchronize to the named node.
795
796 after-sb-1pri policy
797 Define how to react if a split-brain scenario is detected, with one
798 node in primary role and one node in secondary role. (We detect
799 split-brain scenarios when two nodes connect, so split-brain
800 decisions are always among two nodes.) The defined policies are:
801
802 disconnect
803 No automatic resynchronization, simply disconnect.
804
805 consensus
806 Discard the data on the secondary node if the after-sb-0pri
807 algorithm would also discard the data on the secondary node.
808 Otherwise, disconnect.
809
810 violently-as0p
811 Always take the decision of the after-sb-0pri algorithm, even
812 if it causes an erratic change of the primary's view of the
813 data. This is only useful if a single-node file system (i.e.,
814 not OCFS2 or GFS) with the allow-two-primaries flag is used.
815 This option can cause the primary node to crash, and should not
816 be used.
817
818 discard-secondary
819 Discard the data on the secondary node.
820
821 call-pri-lost-after-sb
822 Always take the decision of the after-sb-0pri algorithm. If the
823 decision is to discard the data on the primary node, call the
824 pri-lost-after-sb handler on the primary node.
825
826 after-sb-2pri policy
827 Define how to react if a split-brain scenario is detected and both
828 nodes are in primary role. (We detect split-brain scenarios when
829 two nodes connect, so split-brain decisions are always among two
830 nodes.) The defined policies are:
831
832 disconnect
833 No automatic resynchronization, simply disconnect.
834
835 violently-as0p
836 See the violently-as0p policy for after-sb-1pri.
837
838 call-pri-lost-after-sb
839 Call the pri-lost-after-sb helper program on one of the
840 machines unless that machine can demote to secondary. The
841 helper program is expected to reboot the machine, which brings
842 the node into a secondary role. Which machine runs the helper
843 program is determined by the after-sb-0pri strategy.
844
845 allow-two-primaries
846
847 The most common way to configure DRBD devices is to allow only one
848 node to be primary (and thus writable) at a time.
849
850 In some scenarios it is preferable to allow two nodes to be primary
851 at once; a mechanism outside of DRBD then must make sure that
852 writes to the shared, replicated device happen in a coordinated
853 way. This can be done with a shared-storage cluster file system
854 like OCFS2 and GFS, or with virtual machine images and a virtual
855 machine manager that can migrate virtual machines between physical
856 machines.
857
858 The allow-two-primaries parameter tells DRBD to allow two nodes to
859 be primary at the same time. Never enable this option when using a
860 non-distributed file system; otherwise, data corruption and node
861 crashes will result!
862
863 always-asbp
864 Normally the automatic after-split-brain policies are only used if
865 current states of the UUIDs do not indicate the presence of a third
866 node.
867
868 With this option you request that the automatic after-split-brain
869 policies are used as long as the data sets of the nodes are somehow
870 related. This might cause a full sync, if the UUIDs indicate the
871 presence of a third node. (Or double faults led to strange UUID
872 sets.)
873
874 connect-int time
875
876 As soon as a connection between two nodes is configured with
877 drbdsetup connect, DRBD immediately tries to establish the
878 connection. If this fails, DRBD waits for connect-int seconds and
879 then repeats. The default value of connect-int is 10 seconds.
880
881 cram-hmac-alg hash-algorithm
882
883 Configure the hash-based message authentication code (HMAC) or
884 secure hash algorithm to use for peer authentication. The kernel
885 supports a number of different algorithms, some of which may be
886 loadable as kernel modules. See the shash algorithms listed in
887 /proc/crypto. By default, cram-hmac-alg is unset. Peer
888 authentication also requires a shared-secret to be configured.
889
890 csums-alg hash-algorithm
891
892 Normally, when two nodes resynchronize, the sync target requests a
893 piece of out-of-sync data from the sync source, and the sync source
894 sends the data. With many usage patterns, a significant number of
895 those blocks will actually be identical.
896
897 When a csums-alg algorithm is specified, when requesting a piece of
898 out-of-sync data, the sync target also sends along a hash of the
899 data it currently has. The sync source compares this hash with its
900 own version of the data. It sends the sync target the new data if
901 the hashes differ, and tells it that the data are the same
902 otherwise. This reduces the network bandwidth required, at the cost
903 of higher cpu utilization and possibly increased I/O on the sync
904 target.
905
906 The csums-alg can be set to one of the secure hash algorithms
907 supported by the kernel; see the shash algorithms listed in
908 /proc/crypto. By default, csums-alg is unset.
909
910 csums-after-crash-only
911
912 Enabling this option (and csums-alg, above) makes it possible to
913 use the checksum based resync only for the first resync after
914 primary crash, but not for later "network hickups".
915
916 In most cases, block that are marked as need-to-be-resynced are in
917 fact changed, so calculating checksums, and both reading and
918 writing the blocks on the resync target is all effective overhead.
919
920 The advantage of checksum based resync is mostly after primary
921 crash recovery, where the recovery marked larger areas (those
922 covered by the activity log) as need-to-be-resynced, just in case.
923 Introduced in 8.4.5.
924
925 data-integrity-alg alg
926 DRBD normally relies on the data integrity checks built into the
927 TCP/IP protocol, but if a data integrity algorithm is configured,
928 it will additionally use this algorithm to make sure that the data
929 received over the network match what the sender has sent. If a data
930 integrity error is detected, DRBD will close the network connection
931 and reconnect, which will trigger a resync.
932
933 The data-integrity-alg can be set to one of the secure hash
934 algorithms supported by the kernel; see the shash algorithms listed
935 in /proc/crypto. By default, this mechanism is turned off.
936
937 Because of the CPU overhead involved, we recommend not to use this
938 option in production environments. Also see the notes on data
939 integrity below.
940
941 fencing fencing_policy
942
943 Fencing is a preventive measure to avoid situations where both
944 nodes are primary and disconnected. This is also known as a
945 split-brain situation. DRBD supports the following fencing
946 policies:
947
948 dont-care
949 No fencing actions are taken. This is the default policy.
950
951 resource-only
952 If a node becomes a disconnected primary, it tries to fence the
953 peer. This is done by calling the fence-peer handler. The
954 handler is supposed to reach the peer over an alternative
955 communication path and call 'drbdadm outdate minor' there.
956
957 resource-and-stonith
958 If a node becomes a disconnected primary, it freezes all its IO
959 operations and calls its fence-peer handler. The fence-peer
960 handler is supposed to reach the peer over an alternative
961 communication path and call 'drbdadm outdate minor' there. In
962 case it cannot do that, it should stonith the peer. IO is
963 resumed as soon as the situation is resolved. In case the
964 fence-peer handler fails, I/O can be resumed manually with
965 'drbdadm resume-io'.
966
967 ko-count number
968
969 If a secondary node fails to complete a write request in ko-count
970 times the timeout parameter, it is excluded from the cluster. The
971 primary node then sets the connection to this secondary node to
972 Standalone. To disable this feature, you should explicitly set it
973 to 0; defaults may change between versions.
974
975 max-buffers number
976
977 Limits the memory usage per DRBD minor device on the receiving
978 side, or for internal buffers during resync or online-verify. Unit
979 is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
980 setting is hard coded to 32 (=128 KiB). These buffers are used to
981 hold data blocks while they are written to/read from disk. To avoid
982 possible distributed deadlocks on congestion, this setting is used
983 as a throttle threshold rather than a hard limit. Once more than
984 max-buffers pages are in use, further allocation from this pool is
985 throttled. You want to increase max-buffers if you cannot saturate
986 the IO backend on the receiving side.
987
988 max-epoch-size number
989
990 Define the maximum number of write requests DRBD may issue before
991 issuing a write barrier. The default value is 2048, with a minimum
992 of 1 and a maximum of 20000. Setting this parameter to a value
993 below 10 is likely to decrease performance.
994
995 on-congestion policy,
996 congestion-fill threshold,
997 congestion-extents threshold
998 By default, DRBD blocks when the TCP send queue is full. This
999 prevents applications from generating further write requests until
1000 more buffer space becomes available again.
1001
1002 When DRBD is used together with DRBD-proxy, it can be better to use
1003 the pull-ahead on-congestion policy, which can switch DRBD into
1004 ahead/behind mode before the send queue is full. DRBD then records
1005 the differences between itself and the peer in its bitmap, but it
1006 no longer replicates them to the peer. When enough buffer space
1007 becomes available again, the node resynchronizes with the peer and
1008 switches back to normal replication.
1009
1010 This has the advantage of not blocking application I/O even when
1011 the queues fill up, and the disadvantage that peer nodes can fall
1012 behind much further. Also, while resynchronizing, peer nodes will
1013 become inconsistent.
1014
1015 The available congestion policies are block (the default) and
1016 pull-ahead. The congestion-fill parameter defines how much data is
1017 allowed to be "in flight" in this connection. The default value is
1018 0, which disables this mechanism of congestion control, with a
1019 maximum of 10 GiBytes. The congestion-extents parameter defines how
1020 many bitmap extents may be active before switching into
1021 ahead/behind mode, with the same default and limits as the
1022 al-extents parameter. The congestion-extents parameter is effective
1023 only when set to a value smaller than al-extents.
1024
1025 Ahead/behind mode is available since DRBD 8.3.10.
1026
1027 ping-int interval
1028
1029 When the TCP/IP connection to a peer is idle for more than ping-int
1030 seconds, DRBD will send a keep-alive packet to make sure that a
1031 failed peer or network connection is detected reasonably soon. The
1032 default value is 10 seconds, with a minimum of 1 and a maximum of
1033 120 seconds. The unit is seconds.
1034
1035 ping-timeout timeout
1036
1037 Define the timeout for replies to keep-alive packets. If the peer
1038 does not reply within ping-timeout, DRBD will close and try to
1039 reestablish the connection. The default value is 0.5 seconds, with
1040 a minimum of 0.1 seconds and a maximum of 30 seconds. The unit is
1041 tenths of a second.
1042
1043 socket-check-timeout timeout
1044 In setups involving a DRBD-proxy and connections that experience a
1045 lot of buffer-bloat it might be necessary to set ping-timeout to an
1046 unusual high value. By default DRBD uses the same value to wait if
1047 a newly established TCP-connection is stable. Since the DRBD-proxy
1048 is usually located in the same data center such a long wait time
1049 may hinder DRBD's connect process.
1050
1051 In such setups socket-check-timeout should be set to at least to
1052 the round trip time between DRBD and DRBD-proxy. I.e. in most cases
1053 to 1.
1054
1055 The default unit is tenths of a second, the default value is 0
1056 (which causes DRBD to use the value of ping-timeout instead).
1057 Introduced in 8.4.5.
1058
1059 protocol name
1060 Use the specified protocol on this connection. The supported
1061 protocols are:
1062
1063 A
1064 Writes to the DRBD device complete as soon as they have reached
1065 the local disk and the TCP/IP send buffer.
1066
1067 B
1068 Writes to the DRBD device complete as soon as they have reached
1069 the local disk, and all peers have acknowledged the receipt of
1070 the write requests.
1071
1072 C
1073 Writes to the DRBD device complete as soon as they have reached
1074 the local and all remote disks.
1075
1076
1077 rcvbuf-size size
1078
1079 Configure the size of the TCP/IP receive buffer. A value of 0 (the
1080 default) causes the buffer size to adjust dynamically. This
1081 parameter usually does not need to be set, but it can be set to a
1082 value up to 10 MiB. The default unit is bytes.
1083
1084 rr-conflict policy
1085 This option helps to solve the cases when the outcome of the resync
1086 decision is incompatible with the current role assignment in the
1087 cluster. The defined policies are:
1088
1089 disconnect
1090 No automatic resynchronization, simply disconnect.
1091
1092 retry-connect
1093 Disconnect now, and retry to connect immediatly afterwards.
1094
1095 violently
1096 Resync to the primary node is allowed, violating the assumption
1097 that data on a block device are stable for one of the nodes.
1098 Do not use this option, it is dangerous.
1099
1100 call-pri-lost
1101 Call the pri-lost handler on one of the machines. The handler
1102 is expected to reboot the machine, which puts it into secondary
1103 role.
1104
1105 auto-discard
1106 Auto-discard reverses the resync direction, so that DRBD
1107 resyncs the current primary to the current secondary.
1108 Auto-discard only applies when protocol A is in use and the
1109 resync decision is based on the principle that a crashed
1110 primary should be the source of a resync. When a primary node
1111 crashes, it might have written some last updates to its disk,
1112 which were not received by a protocol A secondary. By promoting
1113 the secondary in the meantime the user accepted that those last
1114 updates have been lost. By using auto-discard you consent that
1115 the last updates (before the crash of the primary) should be
1116 rolled back automatically.
1117
1118 shared-secret secret
1119
1120 Configure the shared secret used for peer authentication. The
1121 secret is a string of up to 64 characters. Peer authentication also
1122 requires the cram-hmac-alg parameter to be set.
1123
1124 sndbuf-size size
1125
1126 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1127 8.2.7, a value of 0 (the default) causes the buffer size to adjust
1128 dynamically. Values below 32 KiB are harmful to the throughput on
1129 this connection. Large buffer sizes can be useful especially when
1130 protocol A is used over high-latency networks; the maximum value
1131 supported is 10 MiB.
1132
1133 tcp-cork
1134 By default, DRBD uses the TCP_CORK socket option to prevent the
1135 kernel from sending partial messages; this results in fewer and
1136 bigger packets on the network. Some network stacks can perform
1137 worse with this optimization. On these, the tcp-cork parameter can
1138 be used to turn this optimization off.
1139
1140 timeout time
1141
1142 Define the timeout for replies over the network: if a peer node
1143 does not send an expected reply within the specified timeout, it is
1144 considered dead and the TCP/IP connection is closed. The timeout
1145 value must be lower than connect-int and lower than ping-int. The
1146 default is 6 seconds; the value is specified in tenths of a second.
1147
1148 transport type
1149
1150 With DRBD9 the network transport used by DRBD is loaded as a
1151 seperate module. With this option you can specify which transport
1152 and module to load. At present only two options exist, tcp and
1153 rdma. Please note that currently the RDMA transport module is only
1154 available with a license purchased from LINBIT. Default is tcp.
1155
1156 use-rle
1157
1158 Each replicated device on a cluster node has a separate bitmap for
1159 each of its peer devices. The bitmaps are used for tracking the
1160 differences between the local and peer device: depending on the
1161 cluster state, a disk range can be marked as different from the
1162 peer in the device's bitmap, in the peer device's bitmap, or in
1163 both bitmaps. When two cluster nodes connect, they exchange each
1164 other's bitmaps, and they each compute the union of the local and
1165 peer bitmap to determine the overall differences.
1166
1167 Bitmaps of very large devices are also relatively large, but they
1168 usually compress very well using run-length encoding. This can save
1169 time and bandwidth for the bitmap transfers.
1170
1171 The use-rle parameter determines if run-length encoding should be
1172 used. It is on by default since DRBD 8.4.0.
1173
1174 verify-alg hash-algorithm
1175 Online verification (drbdadm verify) computes and compares
1176 checksums of disk blocks (i.e., hash values) in order to detect if
1177 they differ. The verify-alg parameter determines which algorithm to
1178 use for these checksums. It must be set to one of the secure hash
1179 algorithms supported by the kernel before online verify can be
1180 used; see the shash algorithms listed in /proc/crypto.
1181
1182 We recommend to schedule online verifications regularly during
1183 low-load periods, for example once a month. Also see the notes on
1184 data integrity below.
1185
1186 allow-remote-read bool-value
1187 Allows or disallows DRBD to read from a peer node.
1188
1189 When the disk of a primary node is detached, DRBD will try to
1190 continue reading and writing from another node in the cluster. For
1191 this purpose, it searches for nodes with up-to-date data, and uses
1192 any found node to resume operations. In some cases it may not be
1193 desirable to read back data from a peer node, because the node
1194 should only be used as a replication target. In this case, the
1195 allow-remote-read parameter can be set to no, which would prohibit
1196 this node from reading data from the peer node.
1197
1198 The allow-remote-read parameter is available since DRBD 9.0.19, and
1199 defaults to yes.
1200
1201 Section on Parameters
1202 address [address-family] address:port
1203
1204 Defines the address family, address, and port of a connection
1205 endpoint.
1206
1207 The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1208 Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1209 Protocol), and sci are supported (sci is an alias for ssocks). If
1210 no address family is specified, ipv4 is assumed. For all address
1211 families except ipv6, the address is specified in IPV4 address
1212 notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1213 in brackets and uses IPv6 address notation (for example,
1214 [fd01:2345:6789:abcd::1]). The port is always specified as a
1215 decimal number from 1 to 65535.
1216
1217 On each host, the port numbers must be unique for each address;
1218 ports cannot be shared.
1219
1220 node-id value
1221
1222 Defines the unique node identifier for a node in the cluster. Node
1223 identifiers are used to identify individual nodes in the network
1224 protocol, and to assign bitmap slots to nodes in the metadata.
1225
1226 Node identifiers can only be reasssigned in a cluster when the
1227 cluster is down. It is essential that the node identifiers in the
1228 configuration and in the device metadata are changed consistently
1229 on all hosts. To change the metadata, dump the current state with
1230 drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1231 metadata with drbdmeta restore-md.
1232
1233 The node-id parameter exists since DRBD 9. Its value ranges from 0
1234 to 16; there is no default.
1235
1236 Section options Parameters (Resource Options)
1237 auto-promote bool-value
1238 A resource must be promoted to primary role before any of its
1239 devices can be mounted or opened for writing.
1240
1241 Before DRBD 9, this could only be done explicitly ("drbdadm
1242 primary"). Since DRBD 9, the auto-promote parameter allows to
1243 automatically promote a resource to primary role when one of its
1244 devices is mounted or opened for writing. As soon as all devices
1245 are unmounted or closed with no more remaining users, the role of
1246 the resource changes back to secondary.
1247
1248 Automatic promotion only succeeds if the cluster state allows it
1249 (that is, if an explicit drbdadm primary command would succeed).
1250 Otherwise, mounting or opening the device fails as it already did
1251 before DRBD 9: the mount(2) system call fails with errno set to
1252 EROFS (Read-only file system); the open(2) system call fails with
1253 errno set to EMEDIUMTYPE (wrong medium type).
1254
1255 Irrespective of the auto-promote parameter, if a device is promoted
1256 explicitly (drbdadm primary), it also needs to be demoted
1257 explicitly (drbdadm secondary).
1258
1259 The auto-promote parameter is available since DRBD 9.0.0, and
1260 defaults to yes.
1261
1262 cpu-mask cpu-mask
1263
1264 Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1265 specified as a hexadecimal number. The default value is 0, which
1266 lets the scheduler decide which kernel threads run on which CPUs.
1267 CPU numbers in cpu-mask which do not exist in the system are
1268 ignored.
1269
1270 on-no-data-accessible policy
1271 Determine how to deal with I/O requests when the requested data is
1272 not available locally or remotely (for example, when all disks have
1273 failed). When quorum is enabled, on-no-data-accessible should be
1274 set to the same value as on-no-quorum. The defined policies are:
1275
1276 io-error
1277 System calls fail with errno set to EIO.
1278
1279 suspend-io
1280 The resource suspends I/O. I/O can be resumed by (re)attaching
1281 the lower-level device, by connecting to a peer which has
1282 access to the data, or by forcing DRBD to resume I/O with
1283 drbdadm resume-io res. When no data is available, forcing I/O
1284 to resume will result in the same behavior as the io-error
1285 policy.
1286
1287 This setting is available since DRBD 8.3.9; the default policy is
1288 io-error.
1289
1290 peer-ack-window value
1291
1292 On each node and for each device, DRBD maintains a bitmap of the
1293 differences between the local and remote data for each peer device.
1294 For example, in a three-node setup (nodes A, B, C) each with a
1295 single device, every node maintains one bitmap for each of its
1296 peers.
1297
1298 When nodes receive write requests, they know how to update the
1299 bitmaps for the writing node, but not how to update the bitmaps
1300 between themselves. In this example, when a write request
1301 propagates from node A to B and C, nodes B and C know that they
1302 have the same data as node A, but not whether or not they both have
1303 the same data.
1304
1305 As a remedy, the writing node occasionally sends peer-ack packets
1306 to its peers which tell them which state they are in relative to
1307 each other.
1308
1309 The peer-ack-window parameter specifies how much data a primary
1310 node may send before sending a peer-ack packet. A low value causes
1311 increased network traffic; a high value causes less network traffic
1312 but higher memory consumption on secondary nodes and higher resync
1313 times between the secondary nodes after primary node failures.
1314 (Note: peer-ack packets may be sent due to other reasons as well,
1315 e.g. membership changes or expiry of the peer-ack-delay timer.)
1316
1317 The default value for peer-ack-window is 2 MiB, the default unit is
1318 sectors. This option is available since 9.0.0.
1319
1320 peer-ack-delay expiry-time
1321
1322 If after the last finished write request no new write request gets
1323 issued for expiry-time, then a peer-ack packet is sent. If a new
1324 write request is issued before the timer expires, the timer gets
1325 reset to expiry-time. (Note: peer-ack packets may be sent due to
1326 other reasons as well, e.g. membership changes or the
1327 peer-ack-window option.)
1328
1329 This parameter may influence resync behavior on remote nodes. Peer
1330 nodes need to wait until they receive an peer-ack for releasing a
1331 lock on an AL-extent. Resync operations between peers may need to
1332 wait for for these locks.
1333
1334 The default value for peer-ack-delay is 100 milliseconds, the
1335 default unit is milliseconds. This option is available since 9.0.0.
1336
1337 quorum value
1338
1339 When activated, a cluster partition requires quorum in order to
1340 modify the replicated data set. That means a node in the cluster
1341 partition can only be promoted to primary if the cluster partition
1342 has quorum. Every node with a disk directly connected to the node
1343 that should be promoted counts. If a primary node should execute a
1344 write request, but the cluster partition has lost quorum, it will
1345 freeze IO or reject the write request with an error (depending on
1346 the on-no-quorum setting). Upon loosing quorum a primary always
1347 invokes the quorum-lost handler. The handler is intended for
1348 notification purposes, its return code is ignored.
1349
1350 The option's value might be set to off, majority, all or a numeric
1351 value. If you set it to a numeric value, make sure that the value
1352 is greater than half of your number of nodes. Quorum is a mechanism
1353 to avoid data divergence, it might be used instead of fencing when
1354 there are more than two repicas. It defaults to off
1355
1356 If all missing nodes are marked as outdated, a partition always has
1357 quorum, no matter how small it is. I.e. If you disconnect all
1358 secondary nodes gracefully a single primary continues to operate.
1359 In the moment a single secondary is lost, it has to be assumed that
1360 it forms a partition with all the missing outdated nodes. In case
1361 my partition might be smaller than the other, quorum is lost in
1362 this moment.
1363
1364 In case you want to allow permanently diskless nodes to gain quorum
1365 it is recommendet to not use majority or all. It is recommended to
1366 specify an absolute number, since DBRD's heuristic to determine the
1367 complete number of diskfull nodes in the cluster is unreliable.
1368
1369 The quorum implementation is available starting with the DRBD
1370 kernel driver version 9.0.7.
1371
1372 quorum-minimum-redundancy value
1373
1374 This option sets the minimal required number of nodes with an
1375 UpToDate disk to allow the partition to gain quorum. This is a
1376 different requirement than the plain quorum option expresses.
1377
1378 The option's value might be set to off, majority, all or a numeric
1379 value. If you set it to a numeric value, make sure that the value
1380 is greater than half of your number of nodes.
1381
1382 In case you want to allow permanently diskless nodes to gain quorum
1383 it is recommendet to not use majority or all. It is recommended to
1384 specify an absolute number, since DBRD's heuristic to determine the
1385 complete number of diskfull nodes in the cluster is unreliable.
1386
1387 This option is available starting with the DRBD kernel driver
1388 version 9.0.10.
1389
1390 on-no-quorum {io-error | suspend-io}
1391
1392 By default DRBD freezes IO on a device, that lost quorum. By
1393 setting the on-no-quorum to io-error it completes all IO operations
1394 with an error if quorum is lost.
1395
1396 Usually, the on-no-data-accessible should be set to the same value
1397 as on-no-quorum, as it has precedence.
1398
1399 The on-no-quorum options is available starting with the DRBD kernel
1400 driver version 9.0.8.
1401
1402 on-suspended-primary-outdated {disconnect | force-secondary}
1403
1404 This setting is only relevant when on-no-quorum is set to
1405 suspend-io. It is relevant in the following scenario. A primary
1406 node loses quorum hence has all IO requests frozen. This primary
1407 node then connects to another, quorate partition. It detects that a
1408 node in this quorate partition was promoted to primary, and started
1409 a newer data-generation there. As a result, the first primary
1410 learns that it has to consider itself outdated.
1411
1412 When it is set to force-secondary then it will demote to secondary
1413 immediately, and fail all pending (and new) IO requests with IO
1414 errors. It will refuse to allow any process to open the DRBD
1415 devices until all openers closed the device. This state is visible
1416 in status and events2 under the name force-io-failures.
1417
1418 The disconnect setting simply causes that node to reject connect
1419 attempts and stay isolated.
1420
1421 The on-suspended-primary-outdated option is available starting with
1422 the DRBD kernel driver version 9.1.7. It has a default value of
1423 disconnect.
1424
1425 Section startup Parameters
1426 The parameters in this section define the behavior of DRBD at system
1427 startup time, in the DRBD init script. They have no effect once the
1428 system is up and running.
1429
1430 degr-wfc-timeout timeout
1431
1432 Define how long to wait until all peers are connected in case the
1433 cluster consisted of a single node only when the system went down.
1434 This parameter is usually set to a value smaller than wfc-timeout.
1435 The assumption here is that peers which were unreachable before a
1436 reboot are less likely to be reachable after the reboot, so waiting
1437 is less likely to help.
1438
1439 The timeout is specified in seconds. The default value is 0, which
1440 stands for an infinite timeout. Also see the wfc-timeout parameter.
1441
1442 outdated-wfc-timeout timeout
1443
1444 Define how long to wait until all peers are connected if all peers
1445 were outdated when the system went down. This parameter is usually
1446 set to a value smaller than wfc-timeout. The assumption here is
1447 that an outdated peer cannot have become primary in the meantime,
1448 so we don't need to wait for it as long as for a node which was
1449 alive before.
1450
1451 The timeout is specified in seconds. The default value is 0, which
1452 stands for an infinite timeout. Also see the wfc-timeout parameter.
1453
1454 stacked-timeouts
1455 On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1456 in the configuration are usually ignored, and both timeouts are set
1457 to twice the connect-int timeout. The stacked-timeouts parameter
1458 tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1459 as defined in the configuration, even on stacked devices. Only use
1460 this parameter if the peer of the stacked resource is usually not
1461 available, or will not become primary. Incorrect use of this
1462 parameter can lead to unexpected split-brain scenarios.
1463
1464 wait-after-sb
1465 This parameter causes DRBD to continue waiting in the init script
1466 even when a split-brain situation has been detected, and the nodes
1467 therefore refuse to connect to each other.
1468
1469 wfc-timeout timeout
1470
1471 Define how long the init script waits until all peers are
1472 connected. This can be useful in combination with a cluster manager
1473 which cannot manage DRBD resources: when the cluster manager
1474 starts, the DRBD resources will already be up and running. With a
1475 more capable cluster manager such as Pacemaker, it makes more sense
1476 to let the cluster manager control DRBD resources. The timeout is
1477 specified in seconds. The default value is 0, which stands for an
1478 infinite timeout. Also see the degr-wfc-timeout parameter.
1479
1480 Section volume Parameters
1481 device /dev/drbdminor-number
1482
1483 Define the device name and minor number of a replicated block
1484 device. This is the device that applications are supposed to
1485 access; in most cases, the device is not used directly, but as a
1486 file system. This parameter is required and the standard device
1487 naming convention is assumed.
1488
1489 In addition to this device, udev will create
1490 /dev/drbd/by-res/resource/volume and
1491 /dev/drbd/by-disk/lower-level-device symlinks to the device.
1492
1493 disk {[disk] | none}
1494
1495 Define the lower-level block device that DRBD will use for storing
1496 the actual data. While the replicated drbd device is configured,
1497 the lower-level device must not be used directly. Even read-only
1498 access with tools like dumpe2fs(8) and similar is not allowed. The
1499 keyword none specifies that no lower-level block device is
1500 configured; this also overrides inheritance of the lower-level
1501 device.
1502
1503 meta-disk internal,
1504 meta-disk device,
1505 meta-disk device [index]
1506
1507 Define where the metadata of a replicated block device resides: it
1508 can be internal, meaning that the lower-level device contains both
1509 the data and the metadata, or on a separate device.
1510
1511 When the index form of this parameter is used, multiple replicated
1512 devices can share the same metadata device, each using a separate
1513 index. Each index occupies 128 MiB of data, which corresponds to a
1514 replicated device size of at most 4 TiB with two cluster nodes. We
1515 recommend not to share metadata devices anymore, and to instead use
1516 the lvm volume manager for creating metadata devices as needed.
1517
1518 When the index form of this parameter is not used, the size of the
1519 lower-level device determines the size of the metadata. The size
1520 needed is 36 KiB + (size of lower-level device) / 32K * (number of
1521 nodes - 1). If the metadata device is bigger than that, the extra
1522 space is not used.
1523
1524 This parameter is required if a disk other than none is specified,
1525 and ignored if disk is set to none. A meta-disk parameter without a
1526 disk parameter is not allowed.
1527
1529 DRBD supports two different mechanisms for data integrity checking:
1530 first, the data-integrity-alg network parameter allows to add a
1531 checksum to the data sent over the network. Second, the online
1532 verification mechanism (drbdadm verify and the verify-alg parameter)
1533 allows to check for differences in the on-disk data.
1534
1535 Both mechanisms can produce false positives if the data is modified
1536 during I/O (i.e., while it is being sent over the network or written to
1537 disk). This does not always indicate a problem: for example, some file
1538 systems and applications do modify data under I/O for certain
1539 operations. Swap space can also undergo changes while under I/O.
1540
1541 Network data integrity checking tries to identify data modification
1542 during I/O by verifying the checksums on the sender side after sending
1543 the data. If it detects a mismatch, it logs an error. The receiver also
1544 logs an error when it detects a mismatch. Thus, an error logged only on
1545 the receiver side indicates an error on the network, and an error
1546 logged on both sides indicates data modification under I/O.
1547
1548 The most recent example of systematic data corruption was identified as
1549 a bug in the TCP offloading engine and driver of a certain type of GBit
1550 NIC in 2007: the data corruption happened on the DMA transfer from core
1551 memory to the card. Because the TCP checksum were calculated on the
1552 card, the TCP/IP protocol checksums did not reveal this problem.
1553
1555 This document was revised for version 9.0.0 of the DRBD distribution.
1556
1558 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1559 Ellenberg <lars.ellenberg@linbit.com>.
1560
1562 Report bugs to <drbd-user@lists.linbit.com>.
1563
1565 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1566 Lars Ellenberg. This is free software; see the source for copying
1567 conditions. There is NO warranty; not even for MERCHANTABILITY or
1568 FITNESS FOR A PARTICULAR PURPOSE.
1569
1571 drbd(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1572 Site[3]
1573
1575 1. DRBD User's Guide
1576 http://www.drbd.org/users-guide/
1577
1578 2.
1579
1580 Online Usage Counter
1581 http://usage.drbd.org
1582
1583 3. DRBD Web Site
1584 http://www.drbd.org/
1585
1586
1587
1588DRBD 9.0.x 17 January 2018 DRBD.CONF(5)