1NDCTL-CREATE-NAMES(1) NDCTL-CREATE-NAMES(1)
2
3
4
6 ndctl-create-namespace - provision or reconfigure a namespace
7
9 ndctl create-namespace [<options>]
10
12 The capacity of an NVDIMM REGION (contiguous span of persistent memory)
13 is accessed via one or more NAMESPACE devices. REGION is the Linux term
14 for what ACPI and UEFI call a DIMM-interleave-set, or a
15 system-physical-address-range that is striped (by the memory
16 controller) across one or more memory modules.
17
18 The UEFI specification defines the NVDIMM Label Protocol as the
19 combination of label area access methods and a data format for
20 provisioning one or more NAMESPACE objects from a REGION. Note that
21 label support is optional and if Linux does not detect the label
22 capability it will automatically instantiate a "label-less" namespace
23 per region. Examples of label-less namespaces are the ones created by
24 the kernel’s memmap=ss!nn command line option (see the nvdimm wiki on
25 kernel.org), or NVDIMMs without a valid namespace index in their label
26 area.
27
28 Note
29 Label-less namespaces lack many of the features of their label-rich
30 cousins. For example, their size cannot be modified, or they cannot
31 be fully destroyed (i.e. the space reclaimed). A destroy operation
32 will zero any mode-specific metadata. Finally, for create-namespace
33 operations on label-less namespaces, ndctl bypasses the region
34 capacity availability checks, and always satisfies the request
35 using the full region capacity. The only reconfiguration operation
36 supported on a label-less namespace is changing its mode.
37
38 A namespace can be provisioned to operate in one of 4 modes, fsdax,
39 devdax, sector, and raw. Here are the expected usage models for these
40 modes:
41
42 • fsdax: Filesystem-DAX mode is the default mode of a namespace when
43 specifying ndctl create-namespace with no options. It creates a
44 block device (/dev/pmemX[.Y]) that supports the DAX capabilities of
45 Linux filesystems (xfs and ext4 to date). DAX removes the page
46 cache from the I/O path and allows mmap(2) to establish direct
47 mappings to persistent memory media. The DAX capability enables
48 workloads / working-sets that would exceed the capacity of the page
49 cache to scale up to the capacity of persistent memory. Workloads
50 that fit in page cache or perform bulk data transfers may not see
51 benefit from DAX. When in doubt, pick this mode.
52
53 • devdax: Device-DAX mode enables similar mmap(2) DAX mapping
54 capabilities as Filesystem-DAX. However, instead of a block-device
55 that can support a DAX-enabled filesystem, this mode emits a single
56 character device file (/dev/daxX.Y). Use this mode to assign
57 persistent memory to a virtual-machine, register persistent memory
58 for RDMA, or when gigantic mappings are needed.
59
60 • sector: Use this mode to host legacy filesystems that do not
61 checksum metadata or applications that are not prepared for torn
62 sectors after a crash. Expected usage for this mode is for small
63 boot volumes. This mode is compatible with other operating systems.
64
65 • raw: Raw mode is effectively just a memory disk that does not
66 support DAX. Typically this indicates a namespace that was created
67 by tooling or another operating system that did not know how to
68 create a Linux fsdax or devdax mode namespace. This mode is
69 compatible with other operating systems, but again, does not
70 support DAX operation.
71
73 Create a maximally sized pmem namespace in fsdax mode (the default)
74
75 ndctl create-namespace
76
77 Convert namespace0.0 to sector mode
78
79 ndctl create-namespace -f -e namespace0.0 --mode=sector
80
82 -m, --mode=
83
84 • "raw": expose the namespace capacity directly with limitations.
85 A raw pmem namepace namespace does not support sector atomicity
86 (see "sector" mode below). A raw pmem namespace may have
87 limited to no dax support depending the kernel. In other words
88 operations like direct-I/O targeting a dax buffer may fail for
89 a pmem namespace in raw mode or indirect through a page-cache
90 buffer. See "fsdax" and "devdax" mode for dax operation.
91
92 • "sector": persistent memory, given that it is byte addressable,
93 does not support sector atomicity. The problematic aspect of
94 sector tearing is that most applications do not know they have
95 a atomic sector update dependency. At least a disk rarely ever
96 tears sectors and if it does it almost certainly returns a
97 checksum error on access. Persistent memory devices will always
98 tear and always silently. Until an application is audited to be
99 robust in the presence of sector-tearing "safe" mode is
100 recommended. This imposes some performance overhead and
101 disables the dax capability. (also known as "safe" or "btt"
102 mode)
103
104 • "fsdax": A pmem namespace in this mode supports dax operation
105 with a block-device based filesystem (in previous ndctl
106 releases this mode was named "memory" mode). This mode comes at
107 the cost of allocating per-page metadata. The capacity can be
108 allocated from "System RAM", or from a reserved portion of
109 "Persistent Memory" (see the --map= option). NOTE: A filesystem
110 that supports DAX is required for dax operation. If the raw
111 block device (/dev/pmemX) is used directly without a
112 filesystem, it will use the page cache. See "devdax" mode for
113 raw device access that supports dax.
114
115 • "devdax": The device-dax character device interface is a
116 statically allocated / raw access analogue of filesystem-dax
117 (in previous ndctl releases this mode was named "dax" mode). It
118 allows memory ranges to be mapped without need of an
119 intervening filesystem. The device-dax is interface strict,
120 precise and predictable. Specifically the interface:
121
122 • Guarantees fault granularity with respect to a given page
123 size (4K, 2M, or 1G on x86) set at configuration time.
124
125 • Enforces deterministic behavior by being strict about what
126 fault scenarios are supported. I.e. if a device is
127 configured with a 2M alignment an attempt to fault a 4K
128 aligned offset will result in SIGBUS. :: Note both fsdax
129 and devdax mode require 16MiB physical alignment to be
130 cross-arch compatible. By default ndctl will block attempts
131 to create namespaces in these modes when the physical
132 starting address of the namespace is not 16MiB aligned. The
133 --force option tries to override this constraint if the
134 platform supports a smaller alignment, but this is not
135 recommended.
136
137 -s, --size=
138 For NVDIMM devices that support namespace labels, set the namespace
139 size in bytes. Otherwise it defaults to the maximum size specified
140 by platform firmware. This option supports the suffixes "k" or "K"
141 for KiB, "m" or "M" for MiB, "g" or "G" for GiB and "t" or "T" for
142 TiB.
143
144 For pmem namepsaces the size must be a multiple of the
145 interleave-width and the namespace alignment (see
146 below).
147
148 -a, --align
149 Applications that want to establish dax memory mappings with page
150 table entries greater than system base page size (4K on x86) need a
151 persistent memory namespace that is sufficiently aligned. For
152 "fsdax" and "devdax" mode this defaults to 2M. Note that "devdax"
153 mode enforces all mappings to be aligned to this value, i.e. it
154 fails unaligned mapping attempts. The "fsdax" alignment setting
155 determines the starting alignment of filesystem extents and may
156 limit the possible granularities, if a large mapping is not
157 possible it will silently fall back to a smaller page size.
158
159 -e, --reconfig=
160 Reconfigure an existing namespace. This option is a shortcut for
161 the following sequence:
162
163 • Read all parameters from @victim_namespace
164
165 • Destroy @victim_namespace
166
167 • Create @new_namespace merging old parameters with new ones ::
168 Note that the major implication of a destroy-create cycle is
169 that data from @victim_namespace is not preserved in
170 @new_namespace. The attributes transferred from
171 @victim_namespace are the geometry, mode, and name (not uuid
172 without --uuid=). No attempt is made to preserve the data and
173 any old data that is visible in @new_namespace is by
174 coincidence not convention. "Backup and restore" is the only
175 reliable method to populate @new_namespace with data from
176 @victim_namespace.
177
178 -u, --uuid=
179 This option is not recommended as a new uuid should be generated
180 every time a namespace is (re-)created. For recovery scenarios
181 however the uuid may be specified.
182
183 -n, --name=
184 For NVDIMM devices that support namespace labels, specify a human
185 friendly name for a namespace. This name is available as a device
186 attribute for use in udev rules.
187
188 -l, --sector-size
189 Specify the logical sector size (LBA size) of the Linux block
190 device associated with an namespace.
191
192 -M, --map=
193 A pmem namespace in "fsdax" or "devdax" mode requires allocation of
194 per-page metadata. The allocation can be drawn from either:
195
196 • "mem": typical system memory
197
198 • "dev": persistent memory reserved from the namespace :: Given
199 relative capacities of "Persistent Memory" to "System RAM" the
200 allocation defaults to reserving space out of the namespace
201 directly ("--map=dev"). The overhead is 64-bytes per 4K (16GB
202 per 1TB) on x86.
203
204 -c, --continue
205 Do not stop after creating one namespace. Instead, greedily create
206 as many namespaces as possible within the given --bus and --region
207 filter restrictions. This will abort if any creation attempt
208 results in an error unless --force is also supplied.
209
210 -f, --force
211 Unless this option is specified the reconfigure namespace operation
212 will fail if the namespace is presently active. Specifying --force
213 causes the namespace to be disabled before the operation is
214 attempted. However, if the namespace is mounted then the disable
215 namespace and reconfigure namespace operations will be aborted. The
216 namespace must be unmounted before being reconfigured. When used in
217 conjunction with --continue, continue the namespace creation loop
218 even if an error is encountered for intermediate namespaces.
219
220 -L, --autolabel, --no-autolabel
221 Legacy NVDIMM devices do not support namespace labels. In that case
222 the kernel creates region-sized namespaces that can not be deleted.
223 Their mode can be changed, but they can not be resized smaller than
224 their parent region. This is termed a "label-less namespace". In
225 contrast, NVDIMMs and hypervisors that support the ACPI 6.2 label
226 area definition (ACPI 6.2 Section 6.5.10 NVDIMM Label Methods)
227 support "labelled namespace" operation.
228
229 • There are two cases where the kernel will default to label-less
230 operation:
231
232 • NVDIMM does not support labels
233
234 • The NVDIMM supports labels, but the Label Index Block (see
235 UEFI 2.7) is not present.
236
237 • In the latter case the configuration can be upgraded to
238 labelled operation by writing an index block on all DIMMs in a
239 region and re-enabling that region. The autolabel capability of
240 ndctl create-namespace --reconfig tries to do this by default
241 if it can determine that all DIMM capacity is referenced by the
242 namespace being reconfigured. It will otherwise fail to
243 autolabel and remain in label-less mode if it finds a DIMM
244 contributes capacity to more than one region. This check
245 prevents inadvertent data loss of that other region is in
246 active use. The --autolabel option is implied by default, the
247 --no-autolabel option can be used to disable this behavior.
248 When automatic labeling fails and labelled operation is still
249 desired the safety policy can be bypassed by the following
250 commands, note that all data on all regions is forfeited by
251 running these commands:
252
253 ndctl disable-region all
254 ndctl init-labels all
255 ndctl enable-region all
256
257 -R, --autorecover, --no-autorecover
258 By default, if a namespace creation attempt fails, ndctl will
259 cleanup the partially initialized namespace. Use --no-autorecover
260 to disable this behavior for debug and development scenarios where
261 it useful to have the label and info-block state preserved after a
262 failure.
263
264 -v, --verbose
265 Emit debug messages for the namespace creation process
266
267 -r, --region=
268 A regionX device name, or a region id number. Restrict the
269 operation to the specified region(s). The keyword all can be
270 specified to indicate the lack of any restriction, however this is
271 the same as not supplying a --region option at all.
272
273 -b, --bus=
274 A bus id number, or a provider string (e.g. "ACPI.NFIT"). Restrict
275 the operation to the specified bus(es). The keyword all can be
276 specified to indicate the lack of any restriction, however this is
277 the same as not supplying a --bus option at all.
278
280 Copyright © 2016 - 2022, Intel Corporation. License GPLv2: GNU GPL
281 version 2 http://gnu.org/licenses/gpl.html. This is free software: you
282 are free to change and redistribute it. There is NO WARRANTY, to the
283 extent permitted by law.
284
286 linkndctl:ndctl-zero-labels[1], linkndctl:ndctl-init-labels[1],
287 linkndctl:ndctl-disable-namespace[1],
288 linkndctl:ndctl-enable-namespace[1], UEFI NVDIMM Label Protocol[1]
289 Linux Persistent Memory Wiki[2]
290
292 1. UEFI NVDIMM Label Protocol
293 http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf
294
295 2. Linux Persistent Memory Wiki
296 https://nvdimm.wiki.kernel.org
297
298
299
300 03/08/2022 NDCTL-CREATE-NAMES(1)