1NDCTL-CREATE-NAMESPACE(1) ndctl Manual NDCTL-CREATE-NAMESPACE(1)
2
3
4
6 ndctl-create-namespace - provision or reconfigure a namespace
7
9 ndctl create-namespace [<options>]
10
12 The capacity of an NVDIMM REGION (contiguous span of persistent memory)
13 is accessed via one or more NAMESPACE devices. REGION is the Linux term
14 for what ACPI and UEFI call a DIMM-interleave-set, or a
15 system-physical-address-range that is striped (by the memory
16 controller) across one or more memory modules.
17
18 The UEFI specification defines the NVDIMM Label Protocol as the
19 combination of label area access methods and a data format for
20 provisioning one or more NAMESPACE objects from a REGION. Note that
21 label support is optional and if Linux does not detect the label
22 capability it will automatically instantiate a "label-less" namespace
23 per region. Examples of label-less namespaces are the ones created by
24 the kernel’s memmap=ss!nn command line option (see the nvdimm wiki on
25 kernel.org), or NVDIMMs without a valid namespace index in their label
26 area.
27
28 Note
29 Label-less namespaces lack many of the features of their label-rich
30 cousins. For example, their size cannot be modified, or they cannot
31 be fully destroyed (i.e. the space reclaimed). A destroy operation
32 will zero any mode-specific metadata. Finally, for create-namespace
33 operations on label-less namespaces, ndctl bypasses the region
34 capacity availability checks, and always satisfies the request
35 using the full region capacity. The only reconfiguration operation
36 supported on a label-less namespace is changing its mode.
37
38 A namespace can be provisioned to operate in one of 4 modes, fsdax,
39 devdax, sector, and raw. Here are the expected usage models for these
40 modes:
41
42 · fsdax: Filesystem-DAX mode is the default mode of a namespace when
43 specifying ndctl create-namespace with no options. It creates a
44 block device (/dev/pmemX[.Y]) that supports the DAX capabilities of
45 Linux filesystems (xfs and ext4 to date). DAX removes the page
46 cache from the I/O path and allows mmap(2) to establish direct
47 mappings to persistent memory media. The DAX capability enables
48 workloads / working-sets that would exceed the capacity of the page
49 cache to scale up to the capacity of persistent memory. Workloads
50 that fit in page cache or perform bulk data transfers may not see
51 benefit from DAX. When in doubt, pick this mode.
52
53 · devdax: Device-DAX mode enables similar mmap(2) DAX mapping
54 capabilities as Filesystem-DAX. However, instead of a block-device
55 that can support a DAX-enabled filesystem, this mode emits a single
56 character device file (/dev/daxX.Y). Use this mode to assign
57 persistent memory to a virtual-machine, register persistent memory
58 for RDMA, or when gigantic mappings are needed.
59
60 · sector: Use this mode to host legacy filesystems that do not
61 checksum metadata or applications that are not prepared for torn
62 sectors after a crash. Expected usage for this mode is for small
63 boot volumes. This mode is compatible with other operating systems.
64
65 · raw: Raw mode is effectively just a memory disk that does not
66 support DAX. Typically this indicates a namespace that was created
67 by tooling or another operating system that did not know how to
68 create a Linux fsdax or devdax mode namespace. This mode is
69 compatible with other operating systems, but again, does not
70 support DAX operation.
71
73 Create a maximally sized pmem namespace in fsdax mode (the default)
74
75
76 ndctl create-namespace
77
78 Convert namespace0.0 to sector mode
79
80
81 ndctl create-namespace -f -e namespace0.0 --mode=sector
82
84 -t, --type=
85 Create a pmem or blk namespace (subject to available capacity). A
86 pmem namespace supports the dax (direct access) capability to
87 mmap(2) persistent memory directly into a process address space. A
88 blk namespace access persistent memory through a
89 block-window-aperture. Compared to pmem it supports a traditional
90 storage error model (EIO on error rather than a cpu exception on a
91 bad memory access), but it does not support dax.
92
93 -m, --mode=
94
95 · "raw": expose the namespace capacity directly with limitations.
96 Neither a raw pmem namepace nor raw blk namespace support
97 sector atomicity by default (see "sector" mode below). A raw
98 pmem namespace may have limited to no dax support depending the
99 kernel. In other words operations like direct-I/O targeting a
100 dax buffer may fail for a pmem namespace in raw mode or
101 indirect through a page-cache buffer. See "fsdax" and "devdax"
102 mode for dax operation.
103
104 · "sector": persistent memory, given that it is byte addressable,
105 does not support sector atomicity. The problematic aspect of
106 sector tearing is that most applications do not know they have
107 a atomic sector update dependency. At least a disk rarely ever
108 tears sectors and if it does it almost certainly returns a
109 checksum error on access. Persistent memory devices will always
110 tear and always silently. Until an application is audited to be
111 robust in the presence of sector-tearing "safe" mode is
112 recommended. This imposes some performance overhead and
113 disables the dax capability. (also known as "safe" or "btt"
114 mode)
115
116 · "fsdax": A pmem namespace in this mode supports dax operation
117 with a block-device based filesystem (in previous ndctl
118 releases this mode was named "memory" mode). This mode comes at
119 the cost of allocating per-page metadata. The capacity can be
120 allocated from "System RAM", or from a reserved portion of
121 "Persistent Memory" (see the --map= option). NOTE: A filesystem
122 that supports DAX is required for dax operation. If the raw
123 block device (/dev/pmemX) is used directly without a
124 filesystem, it will use the page cache. See "devdax" mode for
125 raw device access that supports dax.
126
127 · "devdax": The device-dax character device interface is a
128 statically allocated / raw access analogue of filesystem-dax
129 (in previous ndctl releases this mode was named "dax" mode). It
130 allows memory ranges to be mapped without need of an
131 intervening filesystem. The device-dax is interface strict,
132 precise and predictable. Specifically the interface:
133
134 · Guarantees fault granularity with respect to a given page
135 size (4K, 2M, or 1G on x86) set at configuration time.
136
137 · Enforces deterministic behavior by being strict about what
138 fault scenarios are supported. I.e. if a device is
139 configured with a 2M alignment an attempt to fault a 4K
140 aligned offset will result in SIGBUS. :: Note both fsdax
141 and devdax mode require 16MiB physical alignment to be
142 cross-arch compatible. By default ndctl will block attempts
143 to create namespaces in these modes when the physical
144 starting address of the namespace is not 16MiB aligned. The
145 --force option tries to override this constraint if the
146 platform supports a smaller alignment, but this is not
147 recommended.
148
149 -s, --size=
150 For NVDIMM devices that support namespace labels, set the namespace
151 size in bytes. Otherwise it defaults to the maximum size specified
152 by platform firmware. This option supports the suffixes "k" or "K"
153 for KiB, "m" or "M" for MiB, "g" or "G" for GiB and "t" or "T" for
154 TiB.
155
156 For pmem namepsaces the size must be a multiple of the
157 interleave-width and the namespace alignment (see
158 below).
159
160 -a, --align
161 Applications that want to establish dax memory mappings with page
162 table entries greater than system base page size (4K on x86) need a
163 persistent memory namespace that is sufficiently aligned. For
164 "fsdax" and "devdax" mode this defaults to 2M. Note that "devdax"
165 mode enforces all mappings to be aligned to this value, i.e. it
166 fails unaligned mapping attempts. The "fsdax" alignment setting
167 determines the starting alignment of filesystem extents and may
168 limit the possible granularities, if a large mapping is not
169 possible it will silently fall back to a smaller page size.
170
171 -e, --reconfig=
172 Reconfigure an existing namespace. This option is a shortcut for
173 the following sequence:
174
175 · Read all parameters from @victim_namespace
176
177 · Destroy @victim_namespace
178
179 · Create @new_namespace merging old parameters with new ones ::
180 Note that the major implication of a destroy-create cycle is
181 that data from @victim_namespace is not preserved in
182 @new_namespace. The attributes transferred from
183 @victim_namespace are the geometry, mode, and name (not uuid
184 without --uuid=). No attempt is made to preserve the data and
185 any old data that is visible in @new_namespace is by
186 coincidence not convention. "Backup and restore" is the only
187 reliable method to populate @new_namespace with data from
188 @victim_namespace.
189
190 -u, --uuid=
191 This option is not recommended as a new uuid should be generated
192 every time a namespace is (re-)created. For recovery scenarios
193 however the uuid may be specified.
194
195 -n, --name=
196 For NVDIMM devices that support namespace labels, specify a human
197 friendly name for a namespace. This name is available as a device
198 attribute for use in udev rules.
199
200 -l, --sector-size
201 Specify the logical sector size (LBA size) of the Linux block
202 device associated with an namespace.
203
204 -M, --map=
205 A pmem namespace in "fsdax" or "devdax" mode requires allocation of
206 per-page metadata. The allocation can be drawn from either:
207
208 · "mem": typical system memory
209
210 · "dev": persistent memory reserved from the namespace :: Given
211 relative capacities of "Persistent Memory" to "System RAM" the
212 allocation defaults to reserving space out of the namespace
213 directly ("--map=dev"). The overhead is 64-bytes per 4K (16GB
214 per 1TB) on x86.
215
216 -c, --continue
217 Do not stop after creating one namespace. Instead, greedily create
218 as many namespaces as possible within the given --bus and --region
219 filter restrictions. This will abort if any creation attempt
220 results in an error unless --force is also supplied.
221
222 -f, --force
223 Unless this option is specified the reconfigure namespace operation
224 will fail if the namespace is presently active. Specifying --force
225 causes the namespace to be disabled before the operation is
226 attempted. However, if the namespace is mounted then the disable
227 namespace and reconfigure namespace operations will be aborted. The
228 namespace must be unmounted before being reconfigured. When used in
229 conjunction with --continue, continue the namespace creation loop
230 even if an error is encountered for intermediate namespaces.
231
232 -L, --autolabel, --no-autolabel
233 Legacy NVDIMM devices do not support namespace labels. In that case
234 the kernel creates region-sized namespaces that can not be deleted.
235 Their mode can be changed, but they can not be resized smaller than
236 their parent region. This is termed a "label-less namespace". In
237 contrast, NVDIMMs and hypervisors that support the ACPI 6.2 label
238 area definition (ACPI 6.2 Section 6.5.10 NVDIMM Label Methods)
239 support "labelled namespace" operation.
240
241 · There are two cases where the kernel will default to label-less
242 operation:
243
244 · NVDIMM does not support labels
245
246 · The NVDIMM supports labels, but the Label Index Block (see
247 UEFI 2.7) is not present and there is no capacity aliasing
248 between blk and pmem regions.
249
250 · In the latter case the configuration can be upgraded to
251 labelled operation by writing an index block on all DIMMs in a
252 region and re-enabling that region. The autolabel capability of
253 ndctl create-namespace --reconfig tries to do this by default
254 if it can determine that all DIMM capacity is referenced by the
255 namespace being reconfigured. It will otherwise fail to
256 autolabel and remain in label-less mode if it finds a DIMM
257 contributes capacity to more than one region. This check
258 prevents inadvertent data loss of that other region is in
259 active use. The --autolabel option is implied by default, the
260 --no-autolabel option can be used to disable this behavior.
261 When automatic labeling fails and labelled operation is still
262 desired the safety policy can be bypassed by the following
263 commands, note that all data on all regions is forfeited by
264 running these commands:
265
266 ndctl disable-region all
267 ndctl init-labels all
268 ndctl enable-region all
269
270 -R, --autorecover, --no-autorecover
271 By default, if a namespace creation attempt fails, ndctl will
272 cleanup the partially initialized namespace. Use --no-autorecover
273 to disable this behavior for debug and development scenarios where
274 it useful to have the label and info-block state preserved after a
275 failure.
276
277 -v, --verbose
278 Emit debug messages for the namespace creation process
279
280 -r, --region=
281 A regionX device name, or a region id number. Restrict the
282 operation to the specified region(s). The keyword all can be
283 specified to indicate the lack of any restriction, however this is
284 the same as not supplying a --region option at all.
285
286 -b, --bus=
287 A bus id number, or a provider string (e.g. "ACPI.NFIT"). Restrict
288 the operation to the specified bus(es). The keyword all can be
289 specified to indicate the lack of any restriction, however this is
290 the same as not supplying a --bus option at all.
291
293 Copyright (c) 2016 - 2019, Intel Corporation. License GPLv2: GNU GPL
294 version 2 http://gnu.org/licenses/gpl.html. This is free software: you
295 are free to change and redistribute it. There is NO WARRANTY, to the
296 extent permitted by law.
297
299 ndctl-zero-labels(1), ndctl-init-labels(1), ndctl-disable-namespace(1),
300 ndctl-enable-namespace(1), UEFI NVDIMM Label Protocol
301 <http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf>
302 Linux Persistent Memory Wiki <https://nvdimm.wiki.kernel.org>
303
304
305
306ndctl 2020-03-24 NDCTL-CREATE-NAMESPACE(1)