1DRBD.CONF(5) Configuration Files DRBD.CONF(5)
2
3
4
6 drbd.conf - DRBD Configuration Files
7
9 DRBD implements block devices which replicate their data to all nodes
10 of a cluster. The actual data and associated metadata are usually
11 stored redundantly on "ordinary" block devices on each cluster node.
12
13 Replicated block devices are called /dev/drbdminor by default. They are
14 grouped into resources, with one or more devices per resource.
15 Replication among the devices in a resource takes place in
16 chronological order. With DRBD, we refer to the devices inside a
17 resource as volumes.
18
19 In DRBD 9, a resource can be replicated between two or more cluster
20 nodes. The connections between cluster nodes are point-to-point links,
21 and use TCP or a TCP-like protocol. All nodes must be directly
22 connected.
23
24 DRBD consists of low-level user-space components which interact with
25 the kernel and perform basic operations (drbdsetup, drbdmeta), a
26 high-level user-space component which understands and processes the
27 DRBD configuration and translates it into basic operations of the
28 low-level components (drbdadm), and a kernel component.
29
30 The default DRBD configuration consists of /etc/drbd.conf and of
31 additional files included from there, usually global_common.conf and
32 all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33 define each resource in a separate *.res file.
34
35 The configuration files are designed so that each cluster node can
36 contain an identical copy of the entire cluster configuration. The host
37 name of each node determines which parts of the configuration apply
38 (uname -n). It is highly recommended to keep the cluster configuration
39 on all nodes in sync by manually copying it to all nodes, or by
40 automating the process with csync2 or a similar tool.
41
43 global {
44 usage-count yes;
45 udev-always-use-vnr;
46 }
47 resource r0 {
48 net {
49 cram-hmac-alg sha1;
50 shared-secret "FooFunFactory";
51 }
52 volume 0 {
53 device /dev/drbd1;
54 disk /dev/sda7;
55 meta-disk internal;
56 }
57 on alice {
58 node-id 0;
59 address 10.1.1.31:7000;
60 }
61 on bob {
62 node-id 1;
63 address 10.1.1.32:7000;
64 }
65 connection {
66 host alice port 7000;
67 host bob port 7000;
68 net {
69 protocol C;
70 }
71 }
72 }
73
74 This example defines a resource r0 which contains a single replicated
75 device with volume number 0. The resource is replicated among hosts
76 alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77 and the node identifiers 0 and 1, respectively. On both hosts, the
78 replicated device is called /dev/drbd1, and the actual data and
79 metadata are stored on the lower-level device /dev/sda7. The connection
80 between the hosts uses protocol C.
81
82 Please refer to the DRBD User's Guide[1] for more examples.
83
85 DRBD configuration files consist of sections, which contain other
86 sections and parameters depending on the section types. Each section
87 consists of one or more keywords, sometimes a section name, an opening
88 brace (“{”), the section's contents, and a closing brace (“}”).
89 Parameters inside a section consist of a keyword, followed by one or
90 more keywords or values, and a semicolon (“;”).
91
92 Some parameter values have a default scale which applies when a plain
93 number is specified (for example Kilo, or 1024 times the numeric
94 value). Such default scales can be overridden by using a suffix (for
95 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
96 and G = 1024 M are supported.
97
98 Comments start with a hash sign (“#”) and extend to the end of the
99 line. In addition, any section can be prefixed with the keyword skip,
100 which causes the section and any sub-sections to be ignored.
101
102 Additional files can be included with the include file-pattern
103 statement (see glob(7) for the expressions supported in file-pattern).
104 Include statements are only allowed outside of sections.
105
106 The following sections are defined (indentation indicates in which
107 context):
108
109 common
110 [disk]
111 [handlers]
112 [net]
113 [options]
114 [startup]
115 global
116 resource
117 connection
118 path
119 net
120 volume
121 peer-device-options
122 [peer-device-options]
123 connection-mesh
124 net
125 [disk]
126 floating
127 handlers
128 [net]
129 on
130 volume
131 disk
132 [disk]
133 options
134 stacked-on-top-of
135 startup
136
137 Sections in brackets affect other parts of the configuration: inside
138 the common section, they apply to all resources. A disk section inside
139 a resource or on section applies to all volumes of that resource, and a
140 net section inside a resource section applies to all connections of
141 that resource. This allows to avoid repeating identical options for
142 each resource, connection, or volume. Options can be overridden in a
143 more specific resource, connection, on, or volume section.
144
145 peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
146 c-fill-target, c-max-rate and c-min-rate. Due to backward
147 comapatibility they can be specified in any disk options section as
148 well. They are inherited into all relevant connections. If they are
149 given on connection level they are inherited to all volumes on that
150 connection. A peer-device-options section is started with the disk
151 keyword.
152
153 Sections
154 common
155
156 This section can contain each a disk, handlers, net, options, and
157 startup section. All resources inherit the parameters in these
158 sections as their default values.
159
160 connection [name]
161
162 Define a connection between two hosts. This section must contain
163 two host parameters or multiple path sections. The optional name is
164 used to refer to the connection in the system log and in other
165 messages. If no name is specified, the peer's host name is used
166 instead.
167
168 path
169
170 Define a path between two hosts. This section must contain two host
171 parameters.
172
173 connection-mesh
174
175 Define a connection mesh between multiple hosts. This section must
176 contain a hosts parameter, which has the host names as arguments.
177 This section is a shortcut to define many connections which share
178 the same network options.
179
180 disk
181
182 Define parameters for a volume. All parameters in this section are
183 optional.
184
185 floating [address-family] addr:port
186
187 Like the on section, except that instead of the host name a network
188 address is used to determine if it matches a floating section.
189
190 The node-id parameter in this section is required. If the address
191 parameter is not provided, no connections to peers will be created
192 by default. The device, disk, and meta-disk parameters must be
193 defined in, or inherited by, this section.
194
195 global
196
197 Define some global parameters. All parameters in this section are
198 optional. Only one global section is allowed in the configuration.
199
200 handlers
201
202 Define handlers to be invoked when certain events occur. The kernel
203 passes the resource name in the first command-line argument and
204 sets the following environment variables depending on the event's
205 context:
206
207 · For events related to a particular device: the device's minor
208 number in DRBD_MINOR, the device's volume number in
209 DRBD_VOLUME.
210
211 · For events related to a particular device on a particular peer:
212 the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
213 DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
214 number in DRBD_MINOR, and the device's volume number in
215 DRBD_VOLUME.
216
217 · For events related to a particular connection: the connection
218 endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
219 and DRBD_PEER_AF; and, for each device defined for that
220 connection: the device's minor number in
221 DRBD_MINOR_volume-number.
222
223 · For events that identify a device, if a lower-level device is
224 attached, the lower-level device's device name is passed in
225 DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
226
227 All parameters in this section are optional. Only a single handler
228 can be defined for each event; if no handler is defined, nothing
229 will happen.
230
231 net
232
233 Define parameters for a connection. All parameters in this section
234 are optional.
235
236 on host-name [...]
237
238 Define the properties of a resource on a particular host or set of
239 hosts. Specifying more than one host name can make sense in a setup
240 with IP address failover, for example. The host-name argument must
241 match the Linux host name (uname -n).
242
243 Usually contains or inherits at least one volume section. The
244 node-id and address parameters must be defined in this section. The
245 device, disk, and meta-disk parameters must be defined in, or
246 inherited by, this section.
247
248 A normal configuration file contains two or more on sections for
249 each resource. Also see the floating section.
250
251 options
252
253 Define parameters for a resource. All parameters in this section
254 are optional.
255
256 resource name
257
258 Define a resource. Usually contains at least two on sections and at
259 least one connection section.
260
261 stacked-on-top-of resource
262
263 Used instead of an on section for configuring a stacked resource
264 with three to four nodes.
265
266 Starting with DRBD 9, stacking is deprecated. It is advised to use
267 resources which are replicated among more than two nodes instead.
268
269 startup
270
271 The parameters in this section determine the behavior of a resource
272 at startup time.
273
274 volume volume-number
275
276 Define a volume within a resource. The volume numbers in the
277 various volume sections of a resource define which devices on which
278 hosts form a replicated device.
279
280 Section connection Parameters
281 host name [address [address-family] address] [port port-number]
282
283 Defines an endpoint for a connection. Each host statement refers to
284 an on section in a resource. If a port number is defined, this
285 endpoint will use the specified port instead of the port defined in
286 the on section. Each connection section must contain exactly two
287 host parameters. Instead of two host parameters the connection may
288 contain multiple path sections.
289
290 Section path Parameters
291 host name [address [address-family] address] [port port-number]
292
293 Defines an endpoint for a connection. Each host statement refers to
294 an on section in a resource. If a port number is defined, this
295 endpoint will use the specified port instead of the port defined in
296 the on section. Each path section must contain exactly two host
297 parameters.
298
299 Section connection-mesh Parameters
300 hosts name...
301
302 Defines all nodes of a mesh. Each name refers to an on section in a
303 resource. The port that is defined in the on section will be used.
304
305 Section disk Parameters
306 al-extents extents
307
308 DRBD automatically maintains a "hot" or "active" disk area likely
309 to be written to again soon based on the recent write activity. The
310 "active" disk area can be written to immediately, while "inactive"
311 disk areas must be "activated" first, which requires a meta-data
312 write. We also refer to this active disk area as the "activity
313 log".
314
315 The activity log saves meta-data writes, but the whole log must be
316 resynced upon recovery of a failed node. The size of the activity
317 log is a major factor of how long a resync will take and how fast a
318 replicated disk will become consistent after a crash.
319
320 The activity log consists of a number of 4-Megabyte segments; the
321 al-extents parameter determines how many of those segments can be
322 active at the same time. The default value for al-extents is 1237,
323 with a minimum of 7 and a maximum of 65536.
324
325 Note that the effective maximum may be smaller, depending on how
326 you created the device meta data, see also drbdmeta(8) The
327 effective maximum is 919 * (available on-disk activity-log
328 ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
329 maximum of 6433 (covers more than 25 GiB of data) We recommend to
330 keep this well within the amount your backend storage and
331 replication link are able to resync inside of about 5 minutes.
332
333 al-updates {yes | no}
334
335 With this parameter, the activity log can be turned off entirely
336 (see the al-extents parameter). This will speed up writes because
337 fewer meta-data writes will be necessary, but the entire device
338 needs to be resynchronized opon recovery of a failed primary node.
339 The default value for al-updates is yes.
340
341 disk-barrier,
342 disk-flushes,
343 disk-drain
344 DRBD has three methods of handling the ordering of dependent write
345 requests:
346
347 disk-barrier
348 Use disk barriers to make sure that requests are written to
349 disk in the right order. Barriers ensure that all requests
350 submitted before a barrier make it to the disk before any
351 requests submitted after the barrier. This is implemented using
352 'tagged command queuing' on SCSI devices and 'native command
353 queuing' on SATA devices. Only some devices and device stacks
354 support this method. The device mapper (LVM) only supports
355 barriers in some configurations.
356
357 Note that on systems which do not support disk barriers,
358 enabling this option can lead to data loss or corruption. Until
359 DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
360 DRBD did support barriers. Kernels since linux-2.6.36 (or
361 2.6.32 RHEL6) no longer allow to detect if barriers are
362 supported. Since drbd-8.4.2, this option is off by default and
363 needs to be enabled explicitly.
364
365 disk-flushes
366 Use disk flushes between dependent write requests, also
367 referred to as 'force unit access' by drive vendors. This
368 forces all data to disk. This option is enabled by default.
369
370 disk-drain
371 Wait for the request queue to "drain" (that is, wait for the
372 requests to finish) before submitting a dependent write
373 request. This method requires that requests are stable on disk
374 when they finish. Before DRBD 8.0.9, this was the only method
375 implemented. This option is enabled by default. Do not disable
376 in production environments.
377
378 From these three methods, drbd will use the first that is enabled
379 and supported by the backing storage device. If all three of these
380 options are turned off, DRBD will submit write requests without
381 bothering about dependencies. Depending on the I/O stack, write
382 requests can be reordered, and they can be submitted in a different
383 order on different cluster nodes. This can result in data loss or
384 corruption. Therefore, turning off all three methods of controlling
385 write ordering is strongly discouraged.
386
387 A general guideline for configuring write ordering is to use disk
388 barriers or disk flushes when using ordinary disks (or an ordinary
389 disk array) with a volatile write cache. On storage without cache
390 or with a battery backed write cache, disk draining can be a
391 reasonable choice.
392
393 disk-timeout
394 If the lower-level device on which a DRBD device stores its data
395 does not finish an I/O request within the defined disk-timeout,
396 DRBD treats this as a failure. The lower-level device is detached,
397 and the device's disk state advances to Diskless. If DRBD is
398 connected to one or more peers, the failed request is passed on to
399 one of them.
400
401 This option is dangerous and may lead to kernel panic!
402
403 "Aborting" requests, or force-detaching the disk, is intended for
404 completely blocked/hung local backing devices which do no longer
405 complete requests at all, not even do error completions. In this
406 situation, usually a hard-reset and failover is the only way out.
407
408 By "aborting", basically faking a local error-completion, we allow
409 for a more graceful swichover by cleanly migrating services. Still
410 the affected node has to be rebooted "soon".
411
412 By completing these requests, we allow the upper layers to re-use
413 the associated data pages.
414
415 If later the local backing device "recovers", and now DMAs some
416 data from disk into the original request pages, in the best case it
417 will just put random data into unused pages; but typically it will
418 corrupt meanwhile completely unrelated data, causing all sorts of
419 damage.
420
421 Which means delayed successful completion, especially for READ
422 requests, is a reason to panic(). We assume that a delayed *error*
423 completion is OK, though we still will complain noisily about it.
424
425 The default value of disk-timeout is 0, which stands for an
426 infinite timeout. Timeouts are specified in units of 0.1 seconds.
427 This option is available since DRBD 8.3.12.
428
429 md-flushes
430 Enable disk flushes and disk barriers on the meta-data device. This
431 option is enabled by default. See the disk-flushes parameter.
432
433 on-io-error handler
434
435 Configure how DRBD reacts to I/O errors on a lower-level device.
436 The following policies are defined:
437
438 pass_on
439 Change the disk status to Inconsistent, mark the failed block
440 as inconsistent in the bitmap, and retry the I/O operation on a
441 remote cluster node.
442
443 call-local-io-error
444 Call the local-io-error handler (see the handlers section).
445
446 detach
447 Detach the lower-level device and continue in diskless mode.
448
449
450 read-balancing policy
451 Distribute read requests among cluster nodes as defined by policy.
452 The supported policies are prefer-local (the default),
453 prefer-remote, round-robin, least-pending, when-congested-remote,
454 32K-striping, 64K-striping, 128K-striping, 256K-striping,
455 512K-striping and 1M-striping.
456
457 This option is available since DRBD 8.4.1.
458
459 resync-after res-name/volume
460
461 Define that a device should only resynchronize after the specified
462 other device. By default, no order between devices is defined, and
463 all devices will resynchronize in parallel. Depending on the
464 configuration of the lower-level devices, and the available network
465 and disk bandwidth, this can slow down the overall resync process.
466 This option can be used to form a chain or tree of dependencies
467 among devices.
468
469 rs-discard-granularity byte
470 When rs-discard-granularity is set to a non zero, positive value
471 then DRBD tries to do a resync operation in requests of this size.
472 In case such a block contains only zero bytes on the sync source
473 node, the sync target node will issue a discard/trim/unmap command
474 for the area.
475
476 The value is constrained by the discard granularity of the backing
477 block device. In case rs-discard-granularity is not a multiplier of
478 the discard granularity of the backing block device DRBD rounds it
479 up. The feature only gets active if the backing block device reads
480 back zeroes after a discard command.
481
482 The default value of is 0. This option is available since 8.4.7.
483
484 discard-zeroes-if-aligned {yes | no}
485
486 There are several aspects to discard/trim/unmap support on linux
487 block devices. Even if discard is supported in general, it may fail
488 silently, or may partially ignore discard requests. Devices also
489 announce whether reading from unmapped blocks returns defined data
490 (usually zeroes), or undefined data (possibly old data, possibly
491 garbage).
492
493 If on different nodes, DRBD is backed by devices with differing
494 discard characteristics, discards may lead to data divergence (old
495 data or garbage left over on one backend, zeroes due to unmapped
496 areas on the other backend). Online verify would now potentially
497 report tons of spurious differences. While probably harmless for
498 most use cases (fstrim on a file system), DRBD cannot have that.
499
500 To play safe, we have to disable discard support, if our local
501 backend (on a Primary) does not support "discard_zeroes_data=true".
502 We also have to translate discards to explicit zero-out on the
503 receiving side, unless the receiving side (Secondary) supports
504 "discard_zeroes_data=true", thereby allocating areas what were
505 supposed to be unmapped.
506
507 There are some devices (notably the LVM/DM thin provisioning) that
508 are capable of discard, but announce discard_zeroes_data=false. In
509 the case of DM-thin, discards aligned to the chunk size will be
510 unmapped, and reading from unmapped sectors will return zeroes.
511 However, unaligned partial head or tail areas of discard requests
512 will be silently ignored.
513
514 If we now add a helper to explicitly zero-out these unaligned
515 partial areas, while passing on the discard of the aligned full
516 chunks, we effectively achieve discard_zeroes_data=true on such
517 devices.
518
519 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
520 discards, and to announce discard_zeroes_data=true, even on
521 backends that announce discard_zeroes_data=false.
522
523 Setting discard-zeroes-if-aligned to no will cause DRBD to always
524 fall-back to zero-out on the receiving side, and to not even
525 announce discard capabilities on the Primary, if the respective
526 backend announces discard_zeroes_data=false.
527
528 We used to ignore the discard_zeroes_data setting completely. To
529 not break established and expected behaviour, and suddenly cause
530 fstrim on thin-provisioned LVs to run out-of-space instead of
531 freeing up space, the default value is yes.
532
533 This option is available since 8.4.7.
534
535 Section peer-device-options Parameters
536 Please note that you open the section with the disk keyword.
537
538 c-delay-target delay_target,
539 c-fill-target fill_target,
540 c-max-rate max_rate,
541 c-plan-ahead plan_time
542 Dynamically control the resync speed. This mechanism is enabled by
543 setting the c-plan-ahead parameter to a positive value. The goal is
544 to either fill the buffers along the data path with a defined
545 amount of data if c-fill-target is defined, or to have a defined
546 delay along the path if c-delay-target is defined. The maximum
547 bandwidth is limited by the c-max-rate parameter.
548
549 The c-plan-ahead parameter defines how fast drbd adapts to changes
550 in the resync speed. It should be set to five times the network
551 round-trip time or more. Common values for c-fill-target for
552 "normal" data paths range from 4K to 100K. If drbd-proxy is used,
553 it is advised to use c-delay-target instead of c-fill-target. The
554 c-delay-target parameter is used if the c-fill-target parameter is
555 undefined or set to 0. The c-delay-target parameter should be set
556 to five times the network round-trip time or more. The c-max-rate
557 option should be set to either the bandwidth available between the
558 DRBD-hosts and the machines hosting DRBD-proxy, or to the available
559 disk bandwidth.
560
561 The default values of these parameters are: c-plan-ahead = 20 (in
562 units of 0.1 seconds), c-fill-target = 0 (in units of sectors),
563 c-delay-target = 1 (in units of 0.1 seconds), and c-max-rate =
564 102400 (in units of KiB/s).
565
566 Dynamic resync speed control is available since DRBD 8.3.9.
567
568 c-min-rate min_rate
569 A node which is primary and sync-source has to schedule application
570 I/O requests and resync I/O requests. The c-min-rate parameter
571 limits how much bandwidth is available for resync I/O; the
572 remaining bandwidth is used for application I/O.
573
574 A c-min-rate value of 0 means that there is no limit on the resync
575 I/O bandwidth. This can slow down application I/O significantly.
576 Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
577
578 The default value of c-min-rate is 4096, in units of KiB/s.
579
580 resync-rate rate
581
582 Define how much bandwidth DRBD may use for resynchronizing. DRBD
583 allows "normal" application I/O even during a resync. If the resync
584 takes up too much bandwidth, application I/O can become very slow.
585 This parameter allows to avoid that. Please note this is option
586 only works when the dynamic resync controller is disabled.
587
588 Section global Parameters
589 dialog-refresh time
590
591 The DRBD init script can be used to configure and start DRBD
592 devices, which can involve waiting for other cluster nodes. While
593 waiting, the init script shows the remaining waiting time. The
594 dialog-refresh defines the number of seconds between updates of
595 that countdown. The default value is 1; a value of 0 turns off the
596 countdown.
597
598 disable-ip-verification
599 Normally, DRBD verifies that the IP addresses in the configuration
600 match the host names. Use the disable-ip-verification parameter to
601 disable these checks.
602
603 usage-count {yes | no | ask}
604 A explained on DRBD's Online Usage Counter[2] web page, DRBD
605 includes a mechanism for anonymously counting how many
606 installations are using which versions of DRBD. The results are
607 available on the web page for anyone to see.
608
609 This parameter defines if a cluster node participates in the usage
610 counter; the supported values are yes, no, and ask (ask the user,
611 the default).
612
613 We would like to ask users to participate in the online usage
614 counter as this provides us valuable feedback for steering the
615 development of DRBD.
616
617 udev-always-use-vnr
618 When udev asks drbdadm for a list of device related symlinks,
619 drbdadm would suggest symlinks with differing naming conventions,
620 depending on whether the resource has explicit volume VNR { }
621 definitions, or only one single volume with the implicit volume
622 number 0:
623
624 # implicit single volume without "volume 0 {}" block
625 DEVICE=drbd<minor>
626 SYMLINK_BY_RES=drbd/by-res/<resource-name>
627 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
628
629 # explicit volume definition: volume VNR { }
630 DEVICE=drbd<minor>
631 SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
632 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
633
634 If you define this parameter in the global section, drbdadm will
635 always add the .../VNR part, and will not care for whether the
636 volume definition was implicit or explicit.
637
638 For legacy backward compatibility, this is off by default, but we
639 do recommend to enable it.
640
641 Section handlers Parameters
642 after-resync-target cmd
643
644 Called on a resync target when a node state changes from
645 Inconsistent to Consistent when a resync finishes. This handler can
646 be used for removing the snapshot created in the
647 before-resync-target handler.
648
649 before-resync-target cmd
650
651 Called on a resync target before a resync begins. This handler can
652 be used for creating a snapshot of the lower-level device for the
653 duration of the resync: if the resync source becomes unavailable
654 during a resync, reverting to the snapshot can restore a consistent
655 state.
656
657 fence-peer cmd
658
659 Called when a node should fence a resource on a particular peer.
660 The handler should not use the same communication path that DRBD
661 uses for talking to the peer.
662
663 unfence-peer cmd
664
665 Called when a node should remove fencing constraints from other
666 nodes.
667
668 initial-split-brain cmd
669
670 Called when DRBD connects to a peer and detects that the peer is in
671 a split-brain state with the local node. This handler is also
672 called for split-brain scenarios which will be resolved
673 automatically.
674
675 local-io-error cmd
676
677 Called when an I/O error occurs on a lower-level device.
678
679 pri-lost cmd
680
681 The local node is currently primary, but DRBD believes that it
682 should become a sync target. The node should give up its primary
683 role.
684
685 pri-lost-after-sb cmd
686
687 The local node is currently primary, but it has lost the
688 after-split-brain auto recovery procedure. The node should be
689 abandoned.
690
691 pri-on-incon-degr cmd
692
693 The local node is primary, and neither the local lower-level device
694 nor a lower-level device on a peer is up to date. (The primary has
695 no device to read from or to write to.)
696
697 split-brain cmd
698
699 DRBD has detected a split-brain situation which could not be
700 resolved automatically. Manual recovery is necessary. This handler
701 can be used to call for administrator attention.
702
703 Section net Parameters
704 after-sb-0pri policy
705 Define how to react if a split-brain scenario is detected and none
706 of the two nodes is in primary role. (We detect split-brain
707 scenarios when two nodes connect; split-brain decisions are always
708 between two nodes.) The defined policies are:
709
710 disconnect
711 No automatic resynchronization; simply disconnect.
712
713 discard-younger-primary,
714 discard-older-primary
715 Resynchronize from the node which became primary first
716 (discard-younger-primary) or last (discard-older-primary). If
717 both nodes became primary independently, the
718 discard-least-changes policy is used.
719
720 discard-zero-changes
721 If only one of the nodes wrote data since the split brain
722 situation was detected, resynchronize from this node to the
723 other. If both nodes wrote data, disconnect.
724
725 discard-least-changes
726 Resynchronize from the node with more modified blocks.
727
728 discard-node-nodename
729 Always resynchronize to the named node.
730
731 after-sb-1pri policy
732 Define how to react if a split-brain scenario is detected, with one
733 node in primary role and one node in secondary role. (We detect
734 split-brain scenarios when two nodes connect, so split-brain
735 decisions are always among two nodes.) The defined policies are:
736
737 disconnect
738 No automatic resynchronization, simply disconnect.
739
740 consensus
741 Discard the data on the secondary node if the after-sb-0pri
742 algorithm would also discard the data on the secondary node.
743 Otherwise, disconnect.
744
745 violently-as0p
746 Always take the decision of the after-sb-0pri algorithm, even
747 if it causes an erratic change of the primary's view of the
748 data. This is only useful if a single-node file system (i.e.,
749 not OCFS2 or GFS) with the allow-two-primaries flag is used.
750 This option can cause the primary node to crash, and should not
751 be used.
752
753 discard-secondary
754 Discard the data on the secondary node.
755
756 call-pri-lost-after-sb
757 Always take the decision of the after-sb-0pri algorithm. If the
758 decision is to discard the data on the primary node, call the
759 pri-lost-after-sb handler on the primary node.
760
761 after-sb-2pri policy
762 Define how to react if a split-brain scenario is detected and both
763 nodes are in primary role. (We detect split-brain scenarios when
764 two nodes connect, so split-brain decisions are always among two
765 nodes.) The defined policies are:
766
767 disconnect
768 No automatic resynchronization, simply disconnect.
769
770 violently-as0p
771 See the violently-as0p policy for after-sb-1pri.
772
773 call-pri-lost-after-sb
774 Call the pri-lost-after-sb helper program on one of the
775 machines unless that machine can demote to secondary. The
776 helper program is expected to reboot the machine, which brings
777 the node into a secondary role. Which machine runs the helper
778 program is determined by the after-sb-0pri strategy.
779
780 allow-two-primaries
781
782 The most common way to configure DRBD devices is to allow only one
783 node to be primary (and thus writable) at a time.
784
785 In some scenarios it is preferable to allow two nodes to be primary
786 at once; a mechanism outside of DRBD then must make sure that
787 writes to the shared, replicated device happen in a coordinated
788 way. This can be done with a shared-storage cluster file system
789 like OCFS2 and GFS, or with virtual machine images and a virtual
790 machine manager that can migrate virtual machines between physical
791 machines.
792
793 The allow-two-primaries parameter tells DRBD to allow two nodes to
794 be primary at the same time. Never enable this option when using a
795 non-distributed file system; otherwise, data corruption and node
796 crashes will result!
797
798 always-asbp
799 Normally the automatic after-split-brain policies are only used if
800 current states of the UUIDs do not indicate the presence of a third
801 node.
802
803 With this option you request that the automatic after-split-brain
804 policies are used as long as the data sets of the nodes are somehow
805 related. This might cause a full sync, if the UUIDs indicate the
806 presence of a third node. (Or double faults led to strange UUID
807 sets.)
808
809 connect-int time
810
811 As soon as a connection between two nodes is configured with
812 drbdsetup connect, DRBD immediately tries to establish the
813 connection. If this fails, DRBD waits for connect-int seconds and
814 then repeats. The default value of connect-int is 10 seconds.
815
816 cram-hmac-alg hash-algorithm
817
818 Configure the hash-based message authentication code (HMAC) or
819 secure hash algorithm to use for peer authentication. The kernel
820 supports a number of different algorithms, some of which may be
821 loadable as kernel modules. See the shash algorithms listed in
822 /proc/crypto. By default, cram-hmac-alg is unset. Peer
823 authentication also requires a shared-secret to be configured.
824
825 csums-alg hash-algorithm
826
827 Normally, when two nodes resynchronize, the sync target requests a
828 piece of out-of-sync data from the sync source, and the sync source
829 sends the data. With many usage patterns, a significant number of
830 those blocks will actually be identical.
831
832 When a csums-alg algorithm is specified, when requesting a piece of
833 out-of-sync data, the sync target also sends along a hash of the
834 data it currently has. The sync source compares this hash with its
835 own version of the data. It sends the sync target the new data if
836 the hashes differ, and tells it that the data are the same
837 otherwise. This reduces the network bandwidth required, at the cost
838 of higher cpu utilization and possibly increased I/O on the sync
839 target.
840
841 The csums-alg can be set to one of the secure hash algorithms
842 supported by the kernel; see the shash algorithms listed in
843 /proc/crypto. By default, csums-alg is unset.
844
845 csums-after-crash-only
846
847 Enabling this option (and csums-alg, above) makes it possible to
848 use the checksum based resync only for the first resync after
849 primary crash, but not for later "network hickups".
850
851 In most cases, block that are marked as need-to-be-resynced are in
852 fact changed, so calculating checksums, and both reading and
853 writing the blocks on the resync target is all effective overhead.
854
855 The advantage of checksum based resync is mostly after primary
856 crash recovery, where the recovery marked larger areas (those
857 covered by the activity log) as need-to-be-resynced, just in case.
858 Introduced in 8.4.5.
859
860 data-integrity-alg alg
861 DRBD normally relies on the data integrity checks built into the
862 TCP/IP protocol, but if a data integrity algorithm is configured,
863 it will additionally use this algorithm to make sure that the data
864 received over the network match what the sender has sent. If a data
865 integrity error is detected, DRBD will close the network connection
866 and reconnect, which will trigger a resync.
867
868 The data-integrity-alg can be set to one of the secure hash
869 algorithms supported by the kernel; see the shash algorithms listed
870 in /proc/crypto. By default, this mechanism is turned off.
871
872 Because of the CPU overhead involved, we recommend not to use this
873 option in production environments. Also see the notes on data
874 integrity below.
875
876 fencing fencing_policy
877
878 Fencing is a preventive measure to avoid situations where both
879 nodes are primary and disconnected. This is also known as a
880 split-brain situation. DRBD supports the following fencing
881 policies:
882
883 dont-care
884 No fencing actions are taken. This is the default policy.
885
886 resource-only
887 If a node becomes a disconnected primary, it tries to fence the
888 peer. This is done by calling the fence-peer handler. The
889 handler is supposed to reach the peer over an alternative
890 communication path and call 'drbdadm outdate minor' there.
891
892 resource-and-stonith
893 If a node becomes a disconnected primary, it freezes all its IO
894 operations and calls its fence-peer handler. The fence-peer
895 handler is supposed to reach the peer over an alternative
896 communication path and call 'drbdadm outdate minor' there. In
897 case it cannot do that, it should stonith the peer. IO is
898 resumed as soon as the situation is resolved. In case the
899 fence-peer handler fails, I/O can be resumed manually with
900 'drbdadm resume-io'.
901
902 ko-count number
903
904 If a secondary node fails to complete a write request in ko-count
905 times the timeout parameter, it is excluded from the cluster. The
906 primary node then sets the connection to this secondary node to
907 Standalone. To disable this feature, you should explicitly set it
908 to 0; defaults may change between versions.
909
910 max-buffers number
911
912 Limits the memory usage per DRBD minor device on the receiving
913 side, or for internal buffers during resync or online-verify. Unit
914 is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
915 setting is hard coded to 32 (=128 KiB). These buffers are used to
916 hold data blocks while they are written to/read from disk. To avoid
917 possible distributed deadlocks on congestion, this setting is used
918 as a throttle threshold rather than a hard limit. Once more than
919 max-buffers pages are in use, further allocation from this pool is
920 throttled. You want to increase max-buffers if you cannot saturate
921 the IO backend on the receiving side.
922
923 max-epoch-size number
924
925 Define the maximum number of write requests DRBD may issue before
926 issuing a write barrier. The default value is 2048, with a minimum
927 of 1 and a maximum of 20000. Setting this parameter to a value
928 below 10 is likely to decrease performance.
929
930 on-congestion policy,
931 congestion-fill threshold,
932 congestion-extents threshold
933 By default, DRBD blocks when the TCP send queue is full. This
934 prevents applications from generating further write requests until
935 more buffer space becomes available again.
936
937 When DRBD is used together with DRBD-proxy, it can be better to use
938 the pull-ahead on-congestion policy, which can switch DRBD into
939 ahead/behind mode before the send queue is full. DRBD then records
940 the differences between itself and the peer in its bitmap, but it
941 no longer replicates them to the peer. When enough buffer space
942 becomes available again, the node resynchronizes with the peer and
943 switches back to normal replication.
944
945 This has the advantage of not blocking application I/O even when
946 the queues fill up, and the disadvantage that peer nodes can fall
947 behind much further. Also, while resynchronizing, peer nodes will
948 become inconsistent.
949
950 The available congestion policies are block (the default) and
951 pull-ahead. The congestion-fill parameter defines how much data is
952 allowed to be "in flight" in this connection. The default value is
953 0, which disables this mechanism of congestion control, with a
954 maximum of 10 GiBytes. The congestion-extents parameter defines how
955 many bitmap extents may be active before switching into
956 ahead/behind mode, with the same default and limits as the
957 al-extents parameter. The congestion-extents parameter is effective
958 only when set to a value smaller than al-extents.
959
960 Ahead/behind mode is available since DRBD 8.3.10.
961
962 ping-int interval
963
964 When the TCP/IP connection to a peer is idle for more than ping-int
965 seconds, DRBD will send a keep-alive packet to make sure that a
966 failed peer or network connection is detected reasonably soon. The
967 default value is 10 seconds, with a minimum of 1 and a maximum of
968 120 seconds. The unit is seconds.
969
970 ping-timeout timeout
971
972 Define the timeout for replies to keep-alive packets. If the peer
973 does not reply within ping-timeout, DRBD will close and try to
974 reestablish the connection. The default value is 0.5 seconds, with
975 a minimum of 0.1 seconds and a maximum of 3 seconds. The unit is
976 tenths of a second.
977
978 socket-check-timeout timeout
979 In setups involving a DRBD-proxy and connections that experience a
980 lot of buffer-bloat it might be necessary to set ping-timeout to an
981 unusual high value. By default DRBD uses the same value to wait if
982 a newly established TCP-connection is stable. Since the DRBD-proxy
983 is usually located in the same data center such a long wait time
984 may hinder DRBD's connect process.
985
986 In such setups socket-check-timeout should be set to at least to
987 the round trip time between DRBD and DRBD-proxy. I.e. in most cases
988 to 1.
989
990 The default unit is tenths of a second, the default value is 0
991 (which causes DRBD to use the value of ping-timeout instead).
992 Introduced in 8.4.5.
993
994 protocol name
995 Use the specified protocol on this connection. The supported
996 protocols are:
997
998 A
999 Writes to the DRBD device complete as soon as they have reached
1000 the local disk and the TCP/IP send buffer.
1001
1002 B
1003 Writes to the DRBD device complete as soon as they have reached
1004 the local disk, and all peers have acknowledged the receipt of
1005 the write requests.
1006
1007 C
1008 Writes to the DRBD device complete as soon as they have reached
1009 the local and all remote disks.
1010
1011
1012 rcvbuf-size size
1013
1014 Configure the size of the TCP/IP receive buffer. A value of 0 (the
1015 default) causes the buffer size to adjust dynamically. This
1016 parameter usually does not need to be set, but it can be set to a
1017 value up to 10 MiB. The default unit is bytes.
1018
1019 rr-conflict policy
1020 This option helps to solve the cases when the outcome of the resync
1021 decision is incompatible with the current role assignment in the
1022 cluster. The defined policies are:
1023
1024 disconnect
1025 No automatic resynchronization, simply disconnect.
1026
1027 violently
1028 Resync to the primary node is allowed, violating the assumption
1029 that data on a block device are stable for one of the nodes.
1030 Do not use this option, it is dangerous.
1031
1032 call-pri-lost
1033 Call the pri-lost handler on one of the machines. The handler
1034 is expected to reboot the machine, which puts it into secondary
1035 role.
1036
1037 shared-secret secret
1038
1039 Configure the shared secret used for peer authentication. The
1040 secret is a string of up to 64 characters. Peer authentication also
1041 requires the cram-hmac-alg parameter to be set.
1042
1043 sndbuf-size size
1044
1045 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1046 8.2.7, a value of 0 (the default) causes the buffer size to adjust
1047 dynamically. Values below 32 KiB are harmful to the throughput on
1048 this connection. Large buffer sizes can be useful especially when
1049 protocol A is used over high-latency networks; the maximum value
1050 supported is 10 MiB.
1051
1052 tcp-cork
1053 By default, DRBD uses the TCP_CORK socket option to prevent the
1054 kernel from sending partial messages; this results in fewer and
1055 bigger packets on the network. Some network stacks can perform
1056 worse with this optimization. On these, the tcp-cork parameter can
1057 be used to turn this optimization off.
1058
1059 timeout time
1060
1061 Define the timeout for replies over the network: if a peer node
1062 does not send an expected reply within the specified timeout, it is
1063 considered dead and the TCP/IP connection is closed. The timeout
1064 value must be lower than connect-int and lower than ping-int. The
1065 default is 6 seconds; the value is specified in tenths of a second.
1066
1067 use-rle
1068
1069 Each replicated device on a cluster node has a separate bitmap for
1070 each of its peer devices. The bitmaps are used for tracking the
1071 differences between the local and peer device: depending on the
1072 cluster state, a disk range can be marked as different from the
1073 peer in the device's bitmap, in the peer device's bitmap, or in
1074 both bitmaps. When two cluster nodes connect, they exchange each
1075 other's bitmaps, and they each compute the union of the local and
1076 peer bitmap to determine the overall differences.
1077
1078 Bitmaps of very large devices are also relatively large, but they
1079 usually compress very well using run-length encoding. This can save
1080 time and bandwidth for the bitmap transfers.
1081
1082 The use-rle parameter determines if run-length encoding should be
1083 used. It is on by default since DRBD 8.4.0.
1084
1085 verify-alg hash-algorithm
1086 Online verification (drbdadm verify) computes and compares
1087 checksums of disk blocks (i.e., hash values) in order to detect if
1088 they differ. The verify-alg parameter determines which algorithm to
1089 use for these checksums. It must be set to one of the secure hash
1090 algorithms supported by the kernel before online verify can be
1091 used; see the shash algorithms listed in /proc/crypto.
1092
1093 We recommend to schedule online verifications regularly during
1094 low-load periods, for example once a month. Also see the notes on
1095 data integrity below.
1096
1097 Section on Parameters
1098 address [address-family] address:port
1099
1100 Defines the address family, address, and port of a connection
1101 endpoint.
1102
1103 The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1104 Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1105 Protocol), and sci are supported (sci is an alias for ssocks). If
1106 no address family is specified, ipv4 is assumed. For all address
1107 families except ipv6, the address is specified in IPV4 address
1108 notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1109 in brackets and uses IPv6 address notation (for example,
1110 [fd01:2345:6789:abcd::1]). The port is always specified as a
1111 decimal number from 1 to 65535.
1112
1113 On each host, the port numbers must be unique for each address;
1114 ports cannot be shared.
1115
1116 node-id value
1117
1118 Defines the unique node identifier for a node in the cluster. Node
1119 identifiers are used to identify individual nodes in the network
1120 protocol, and to assign bitmap slots to nodes in the metadata.
1121
1122 Node identifiers can only be reasssigned in a cluster when the
1123 cluster is down. It is essential that the node identifiers in the
1124 configuration and in the device metadata are changed consistently
1125 on all hosts. To change the metadata, dump the current state with
1126 drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1127 metadata with drbdmeta restore-md.
1128
1129 The node-id parameter exists since DRBD 9. Its value ranges from 0
1130 to 16; there is no default.
1131
1132 Section options Parameters (Resource Options)
1133 auto-promote bool-value
1134 A resource must be promoted to primary role before any of its
1135 devices can be mounted or opened for writing.
1136
1137 Before DRBD 9, this could only be done explicitly ("drbdadm
1138 primary"). Since DRBD 9, the auto-promote parameter allows to
1139 automatically promote a resource to primary role when one of its
1140 devices is mounted or opened for writing. As soon as all devices
1141 are unmounted or closed with no more remaining users, the role of
1142 the resource changes back to secondary.
1143
1144 Automatic promotion only succeeds if the cluster state allows it
1145 (that is, if an explicit drbdadm primary command would succeed).
1146 Otherwise, mounting or opening the device fails as it already did
1147 before DRBD 9: the mount(2) system call fails with errno set to
1148 EROFS (Read-only file system); the open(2) system call fails with
1149 errno set to EMEDIUMTYPE (wrong medium type).
1150
1151 Irrespective of the auto-promote parameter, if a device is promoted
1152 explicitly (drbdadm primary), it also needs to be demoted
1153 explicitly (drbdadm secondary).
1154
1155 The auto-promote parameter is available since DRBD 9.0.0, and
1156 defaults to yes.
1157
1158 cpu-mask cpu-mask
1159
1160 Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1161 specified as a hexadecimal number. The default value is 0, which
1162 lets the scheduler decide which kernel threads run on which CPUs.
1163 CPU numbers in cpu-mask which do not exist in the system are
1164 ignored.
1165
1166 on-no-data-accessible policy
1167 Determine how to deal with I/O requests when the requested data is
1168 not available locally or remotely (for example, when all disks have
1169 failed). The defined policies are:
1170
1171 io-error
1172 System calls fail with errno set to EIO.
1173
1174 suspend-io
1175 The resource suspends I/O. I/O can be resumed by (re)attaching
1176 the lower-level device, by connecting to a peer which has
1177 access to the data, or by forcing DRBD to resume I/O with
1178 drbdadm resume-io res. When no data is available, forcing I/O
1179 to resume will result in the same behavior as the io-error
1180 policy.
1181
1182 This setting is available since DRBD 8.3.9; the default policy is
1183 io-error.
1184
1185 peer-ack-window value
1186
1187 On each node and for each device, DRBD maintains a bitmap of the
1188 differences between the local and remote data for each peer device.
1189 For example, in a three-node setup (nodes A, B, C) each with a
1190 single device, every node maintains one bitmap for each of its
1191 peers.
1192
1193 When nodes receive write requests, they know how to update the
1194 bitmaps for the writing node, but not how to update the bitmaps
1195 between themselves. In this example, when a write request
1196 propagates from node A to B and C, nodes B and C know that they
1197 have the same data as node A, but not whether or not they both have
1198 the same data.
1199
1200 As a remedy, the writing node occasionally sends peer-ack packets
1201 to its peers which tell them which state they are in relative to
1202 each other.
1203
1204 The peer-ack-window parameter specifies how much data a primary
1205 node may send before sending a peer-ack packet. A low value causes
1206 increased network traffic; a high value causes less network traffic
1207 but higher memory consumption on secondary nodes and higher resync
1208 times between the secondary nodes after primary node failures.
1209 (Note: peer-ack packets may be sent due to other reasons as well,
1210 e.g. membership changes or expiry of the peer-ack-delay timer.)
1211
1212 The default value for peer-ack-window is 2 MiB, the default unit is
1213 sectors. This option is available since 9.0.0.
1214
1215 peer-ack-delay expiry-time
1216
1217 If after the last finished write request no new write request gets
1218 issued for expiry-time, then a peer-ack packet is sent. If a new
1219 write request is issued before the timer expires, the timer gets
1220 reset to expiry-time. (Note: peer-ack packets may be sent due to
1221 other reasons as well, e.g. membership changes or the
1222 peer-ack-window option.)
1223
1224 This parameter may influence resync behavior on remote nodes. Peer
1225 nodes need to wait until they receive an peer-ack for releasing a
1226 lock on an AL-extent. Resync operations between peers may need to
1227 wait for for these locks.
1228
1229 The default value for peer-ack-delay is 100 milliseconds, the
1230 default unit is milliseconds. This option is available since 9.0.0.
1231
1232 quorum value
1233
1234 When activated, a cluster partition requires quorum in order to
1235 modify the replicated data set. That means a node in the cluster
1236 partition can only be promoted to primary if the cluster partition
1237 has quorum. Every node with a disk directly connected to the node
1238 that should be promoted counts. If a primary node should execute a
1239 write request, but the cluster partition has lost quorum, it will
1240 freeze IO or reject the write request with an error (depending on
1241 the on-no-quorum setting). Upon loosing quorum a primary always
1242 invokes the quorum-lost handler. The handler is intended for
1243 notification purposes, its return code is ignored.
1244
1245 The option's value might be set to off, majority, all or a numeric
1246 value. If you set it to a numeric value, make sure that the value
1247 is greater than half of your number of nodes. Quorum is a mechanism
1248 to avoid data divergence, it might be used instead of fencing when
1249 there are more than two repicas. It defaults to off
1250
1251 If all missing nodes are marked as outdated, a partition always has
1252 quorum, no matter how small it is. I.e. If you disconnect all
1253 secondary nodes gracefully a single primary continues to operate.
1254 In the moment a single secondary is lost, it has to be assumed that
1255 it forms a partition with all the missing outdated nodes. In case
1256 my partition might be smaller than the other, quorum is lost in
1257 this moment.
1258
1259 In case you want to allow permanently diskless nodes to gain quorum
1260 it is recommendet to not use majority or all. It is recommended to
1261 specify an absolute number, since DBRD's heuristic to determine the
1262 complete number of diskfull nodes in the cluster is unreliable.
1263
1264 The quorum implementation is available starting with the DRBD
1265 kernel driver version 9.0.7.
1266
1267 quorum-minimum-redundancy value
1268
1269 This option sets the minimal required number of nodes with an
1270 UpToDate disk to allow the partition to gain quorum. This is a
1271 different requirement than the plain quorum option expresses.
1272
1273 The option's value might be set to off, majority, all or a numeric
1274 value. If you set it to a numeric value, make sure that the value
1275 is greater than half of your number of nodes.
1276
1277 In case you want to allow permanently diskless nodes to gain quorum
1278 it is recommendet to not use majority or all. It is recommended to
1279 specify an absolute number, since DBRD's heuristic to determine the
1280 complete number of diskfull nodes in the cluster is unreliable.
1281
1282 This option is available starting with the DRBD kernel driver
1283 version 9.0.10.
1284
1285 on-no-quorum {io-error | suspend-io}
1286
1287 By default DRBD freezes IO on a device, that lost quorum. By
1288 setting the on-no-quorum to io-error it completes all IO operations
1289 with an error if quorum ist lost.
1290
1291 The on-no-quorum options is available starting with the DRBD kernel
1292 driver version 9.0.8.
1293
1294 Section startup Parameters
1295 The parameters in this section define the behavior of DRBD at system
1296 startup time, in the DRBD init script. They have no effect once the
1297 system is up and running.
1298
1299 degr-wfc-timeout timeout
1300
1301 Define how long to wait until all peers are connected in case the
1302 cluster consisted of a single node only when the system went down.
1303 This parameter is usually set to a value smaller than wfc-timeout.
1304 The assumption here is that peers which were unreachable before a
1305 reboot are less likely to be reachable after the reboot, so waiting
1306 is less likely to help.
1307
1308 The timeout is specified in seconds. The default value is 0, which
1309 stands for an infinite timeout. Also see the wfc-timeout parameter.
1310
1311 outdated-wfc-timeout timeout
1312
1313 Define how long to wait until all peers are connected if all peers
1314 were outdated when the system went down. This parameter is usually
1315 set to a value smaller than wfc-timeout. The assumption here is
1316 that an outdated peer cannot have become primary in the meantime,
1317 so we don't need to wait for it as long as for a node which was
1318 alive before.
1319
1320 The timeout is specified in seconds. The default value is 0, which
1321 stands for an infinite timeout. Also see the wfc-timeout parameter.
1322
1323 stacked-timeouts
1324 On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1325 in the configuration are usually ignored, and both timeouts are set
1326 to twice the connect-int timeout. The stacked-timeouts parameter
1327 tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1328 as defined in the configuration, even on stacked devices. Only use
1329 this parameter if the peer of the stacked resource is usually not
1330 available, or will not become primary. Incorrect use of this
1331 parameter can lead to unexpected split-brain scenarios.
1332
1333 wait-after-sb
1334 This parameter causes DRBD to continue waiting in the init script
1335 even when a split-brain situation has been detected, and the nodes
1336 therefore refuse to connect to each other.
1337
1338 wfc-timeout timeout
1339
1340 Define how long the init script waits until all peers are
1341 connected. This can be useful in combination with a cluster manager
1342 which cannot manage DRBD resources: when the cluster manager
1343 starts, the DRBD resources will already be up and running. With a
1344 more capable cluster manager such as Pacemaker, it makes more sense
1345 to let the cluster manager control DRBD resources. The timeout is
1346 specified in seconds. The default value is 0, which stands for an
1347 infinite timeout. Also see the degr-wfc-timeout parameter.
1348
1349 Section volume Parameters
1350 device /dev/drbdminor-number
1351
1352 Define the device name and minor number of a replicated block
1353 device. This is the device that applications are supposed to
1354 access; in most cases, the device is not used directly, but as a
1355 file system. This parameter is required and the standard device
1356 naming convention is assumed.
1357
1358 In addition to this device, udev will create
1359 /dev/drbd/by-res/resource/volume and
1360 /dev/drbd/by-disk/lower-level-device symlinks to the device.
1361
1362 disk {[disk] | none}
1363
1364 Define the lower-level block device that DRBD will use for storing
1365 the actual data. While the replicated drbd device is configured,
1366 the lower-level device must not be used directly. Even read-only
1367 access with tools like dumpe2fs(8) and similar is not allowed. The
1368 keyword none specifies that no lower-level block device is
1369 configured; this also overrides inheritance of the lower-level
1370 device.
1371
1372 meta-disk internal,
1373 meta-disk device,
1374 meta-disk device [index]
1375
1376 Define where the metadata of a replicated block device resides: it
1377 can be internal, meaning that the lower-level device contains both
1378 the data and the metadata, or on a separate device.
1379
1380 When the index form of this parameter is used, multiple replicated
1381 devices can share the same metadata device, each using a separate
1382 index. Each index occupies 128 MiB of data, which corresponds to a
1383 replicated device size of at most 4 TiB with two cluster nodes. We
1384 recommend not to share metadata devices anymore, and to instead use
1385 the lvm volume manager for creating metadata devices as needed.
1386
1387 When the index form of this parameter is not used, the size of the
1388 lower-level device determines the size of the metadata. The size
1389 needed is 36 KiB + (size of lower-level device) / 32K * (number of
1390 nodes - 1). If the metadata device is bigger than that, the extra
1391 space is not used.
1392
1393 This parameter is required if a disk other than none is specified,
1394 and ignored if disk is set to none. A meta-disk parameter without a
1395 disk parameter is not allowed.
1396
1398 DRBD supports two different mechanisms for data integrity checking:
1399 first, the data-integrity-alg network parameter allows to add a
1400 checksum to the data sent over the network. Second, the online
1401 verification mechanism (drbdadm verify and the verify-alg parameter)
1402 allows to check for differences in the on-disk data.
1403
1404 Both mechanisms can produce false positives if the data is modified
1405 during I/O (i.e., while it is being sent over the network or written to
1406 disk). This does not always indicate a problem: for example, some file
1407 systems and applications do modify data under I/O for certain
1408 operations. Swap space can also undergo changes while under I/O.
1409
1410 Network data integrity checking tries to identify data modification
1411 during I/O by verifying the checksums on the sender side after sending
1412 the data. If it detects a mismatch, it logs an error. The receiver also
1413 logs an error when it detects a mismatch. Thus, an error logged only on
1414 the receiver side indicates an error on the network, and an error
1415 logged on both sides indicates data modification under I/O.
1416
1417 The most recent example of systematic data corruption was identified as
1418 a bug in the TCP offloading engine and driver of a certain type of GBit
1419 NIC in 2007: the data corruption happened on the DMA transfer from core
1420 memory to the card. Because the TCP checksum were calculated on the
1421 card, the TCP/IP protocol checksums did not reveal this problem.
1422
1424 This document was revised for version 9.0.0 of the DRBD distribution.
1425
1427 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1428 Ellenberg <lars.ellenberg@linbit.com>.
1429
1431 Report bugs to <drbd-user@lists.linbit.com>.
1432
1434 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1435 Lars Ellenberg. This is free software; see the source for copying
1436 conditions. There is NO warranty; not even for MERCHANTABILITY or
1437 FITNESS FOR A PARTICULAR PURPOSE.
1438
1440 drbd(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1441 Site[3]
1442
1444 1. DRBD User's Guide
1445 http://www.drbd.org/users-guide/
1446
1447 2.
1448
1449 Online Usage Counter
1450 http://usage.drbd.org
1451
1452 3. DRBD Web Site
1453 http://www.drbd.org/
1454
1455
1456
1457DRBD 9.0.x 17 January 2018 DRBD.CONF(5)