mdmon(8) - f37

1MDMON(8)                    System Manager's Manual                   MDMON(8)
2
3
4

NAME

6       mdmon - monitor MD external metadata arrays
7
8

SYNOPSIS

10       mdmon [--all] [--takeover] [--foreground] CONTAINER
11
12

OVERVIEW

14       The  2.6.27  kernel brings the ability to support external metadata ar‐
15       rays.  External metadata implies that user space handles all updates to
16       the metadata.  The kernel's responsibility is to notify user space when
17       a "metadata event" occurs, like disk failures and clean-to-dirty  tran‐
18       sitions.   The kernel, in important cases, waits for user space to take
19       action on these notifications.
20
21

DESCRIPTION

23   Metadata updates:
24       To service metadata update requests a  daemon,  mdmon,  is  introduced.
25       Mdmon is tasked with polling the sysfs namespace looking for changes in
26       array_state, sync_action, and per disk state attributes.  When a change
27       is  detected it calls a per metadata type handler to make modifications
28       to the metadata.  The following actions are taken:
29
30              array_state - inactive
31                     Clear the dirty bit for the volume and let the  array  be
32                     stopped
33
34              array_state - write pending
35                     Set  the dirty bit for the array and then set array_state
36                     to active.  Writes are blocked until userspace writes ac‐
37                     tive.
38
39              array_state - active-idle
40                     The  safe  mode  timer  has expired so set array state to
41                     clean to block writes to the array
42
43              array_state - clean
44                     Clear the dirty bit for the volume
45
46              array_state - read-only
47                     This is the initial state that all arrays start at.   md‐
48                     mon takes one of the three actions:
49
50                     1/     Transition  the  array  to  read-auto  keeping the
51                            dirty bit clear if the metadata handler determines
52                            that  the  array  does not need resyncing or other
53                            modification
54
55                     2/     Transition the array to  active  if  the  metadata
56                            handler  determines a resync or some other manipu‐
57                            lation is necessary
58
59                     3/     Leave the array read-only if the volume is  marked
60                            to  not  be  monitored;  for example, the metadata
61                            version has been set to "external:-dev/md127"  in‐
62                            stead of "external:/dev/md127"
63
64              sync_action - resync-to-idle
65                     Notify  the  metadata handler that a resync may have com‐
66                     pleted.  If a resync process is idled before it completes
67                     this  event  allows  the  metadata  handler to checkpoint
68                     resync.
69
70              sync_action - recover-to-idle
71                     A spare may have completed rebuilding so tell  the  meta‐
72                     data  handler  about the state of each disk.  This is the
73                     metadata handler's opportunity to clear any "out-of-sync"
74                     bits and clear the volume's degraded status.  If a recov‐
75                     ery process is idled before it completes this  event  al‐
76                     lows the metadata handler to checkpoint recovery.
77
78              <disk>/state - faulty
79                     A  disk failure kicks off a series of events.  First, no‐
80                     tify the metadata handler that a  disk  has  failed,  and
81                     then  notify  the  kernel that it can unblock writes that
82                     were dependent on this disk.  After unblocking the kernel
83                     this  disk  is  set to be removed+ from the member array.
84                     Finally the disk is marked failed in all other member ar‐
85                     rays in the container.
86
87                     +  Note This behavior differs slightly from native MD ar‐
88                     rays where removal  is  reserved  for  a  mdadm  --remove
89                     event.  In the external metadata case the container holds
90                     the final reference on a block device and a  mdadm  --re‐
91                     move <container> <victim> call is still required.
92
93
94   Containers:
95       External metadata formats, like DDF, differ from the native MD metadata
96       formats in that they define a set of disks and a series  of  sub-arrays
97       within  those disks.  MD metadata in comparison defines a 1:1 relation‐
98       ship between a set of block devices and a RAID array.  For  example  to
99       create  2  arrays at different RAID levels on a single set of disks, MD
100       metadata requires the disks be partitioned and then each array  can  be
101       created with a subset of those partitions.  The supported external for‐
102       mats perform this disk carving internally.
103
104       Container devices simply hold references to all member disks and  allow
105       tools  like mdmon to determine which active arrays belong to which con‐
106       tainer.  Some array management commands like disk removal and disk  add
107       are  now  only valid at the container level.  Attempts to perform these
108       actions on member arrays are blocked with error messages like:
109
110              "mdadm: Cannot remove disks from a ´member´ array, perform  this
111              operation on the parent container"
112
113       Containers  are  identified  in  /proc/mdstat  with  a metadata version
114       string "external:<metadata name>". Member  devices  are  identified  by
115       "external:/<container device>/<member index>", or "external:-<container
116       device>/<member index>" if the array is to remain readonly.
117
118

OPTIONS

120       CONTAINER
121              The container device to monitor.  It can be  a  full  path  like
122              /dev/md/container, or a simple md device name like md127.
123
124       --foreground
125              Normally,  mdmon  will  fork  and  continue  in  the background.
126              Adding this option will skip that step  and  run  mdmon  in  the
127              foreground.
128
129       --takeover
130              This  instructs  mdmon to replace any active mdmon which is cur‐
131              rently monitoring the array.  This is primarily used late in the
132              boot  process  to  replace  any  mdmon which was started from an
133              initramfs before the root filesystem was mounted.   This  avoids
134              holding  a  reference on that initramfs indefinitely and ensures
135              that the pid and sock files used to communicate with  mdmon  are
136              in a standard place.
137
138       --all  This  tells  mdmon to find any active containers and start moni‐
139              toring each of them if appropriate.  This is normally used  with
140              --takeover  late in the boot sequence.  A separate mdmon process
141              is started for each container as the  --all  argument  is  over-
142              written with the name of the container.  To allow for containers
143              with names longer than 5 characters, this argument can be  arbi‐
144              trarily extended, e.g. to --all-active-arrays.
145
146
147
148              Note that
149              mdmon  is automatically started by mdadm when needed and so does
150              not need to be considered when working with  RAID  arrays.   The
151              only  times  it  is  run  other  than  by mdadm is when the boot
152              scripts need to restart it after mounting the new root  filesys‐
153              tem.
154
155

START UP AND SHUTDOWN

157       As  mdmon  needs to be running whenever any filesystem on the monitored
158       device is mounted  there  are  special  considerations  when  the  root
159       filesystem  is  mounted  from  an mdmon monitored device.  Note that in
160       general mdmon is needed even if the filesystem is mounted read-only  as
161       some  filesystems can still write to the device in those circumstances,
162       for example to replay a journal after an unclean shutdown.
163
164       When the array is assembled by the initramfs code, mdadm will automati‐
165       cally start mdmon as required.  This means that mdmon must be installed
166       on the initramfs and there must be  a  writable  filesystem  (typically
167       tmpfs) in which mdmon can create a .pid and .sock file.  The particular
168       filesystem to use is given to mdmon at compile  time  and  defaults  to
169       /run/mdadm.
170
171       This filesystem must persist through to shutdown time.
172
173       After  the  final  root  filesystem  has  be instantiated (usually with
174       pivot_root) mdmon should be run with --all --takeover so that the mdmon
175       running from the initramfs can be replaced with one running in the main
176       root, and so the memory used by the initramfs can be released.
177
178       At shutdown time, mdmon should not be  killed  along  with  other  pro‐
179       cesses.  Also as it holds a file (socket actually) open in /dev (by de‐
180       fault) it will not be possible to unmount /dev  if  it  is  a  separate
181       filesystem.
182
183

EXAMPLES

185         mdmon --all-active-arrays --takeover
186       Any  mdmon  which  is currently running is killed and a new instance is
187       started.  This should  be  run  during  in  the  boot  sequence  if  an
188       initramfs  was  used, so that any mdmon running from the initramfs will
189       not hold the initramfs active.
190