ndctl-create-namespace(1)

1NDCTL-CREATE-NAMES(1)            ndctl Manual            NDCTL-CREATE-NAMES(1)
2
3
4

NAME

6       ndctl-create-namespace - provision or reconfigure a namespace
7

SYNOPSIS

9       ndctl create-namespace [<options>]
10

THEORY OF OPERATION

12       The capacity of an NVDIMM REGION (contiguous span of persistent memory)
13       is accessed via one or more NAMESPACE devices. REGION is the Linux term
14       for what ACPI and UEFI call a DIMM-interleave-set, or a
15       system-physical-address-range that is striped (by the memory
16       controller) across one or more memory modules.
17
18       The UEFI specification defines the NVDIMM Label Protocol as the
19       combination of label area access methods and a data format for
20       provisioning one or more NAMESPACE objects from a REGION. Note that
21       label support is optional and if Linux does not detect the label
22       capability it will automatically instantiate a "label-less" namespace
23       per region. Examples of label-less namespaces are the ones created by
24       the kernel’s memmap=ss!nn command line option (see the nvdimm wiki on
25       kernel.org), or NVDIMMs without a valid namespace index in their label
26       area.
27
28           Note
29           Label-less namespaces lack many of the features of their label-rich
30           cousins. For example, their size cannot be modified, or they cannot
31           be fully destroyed (i.e. the space reclaimed). A destroy operation
32           will zero any mode-specific metadata. Finally, for create-namespace
33           operations on label-less namespaces, ndctl bypasses the region
34           capacity availability checks, and always satisfies the request
35           using the full region capacity. The only reconfiguration operation
36           supported on a label-less namespace is changing its mode.
37
38       A namespace can be provisioned to operate in one of 4 modes, fsdax,
39       devdax, sector, and raw. Here are the expected usage models for these
40       modes:
41
42       •   fsdax: Filesystem-DAX mode is the default mode of a namespace when
43           specifying ndctl create-namespace with no options. It creates a
44           block device (/dev/pmemX[.Y]) that supports the DAX capabilities of
45           Linux filesystems (xfs and ext4 to date). DAX removes the page
46           cache from the I/O path and allows mmap(2) to establish direct
47           mappings to persistent memory media. The DAX capability enables
48           workloads / working-sets that would exceed the capacity of the page
49           cache to scale up to the capacity of persistent memory. Workloads
50           that fit in page cache or perform bulk data transfers may not see
51           benefit from DAX. When in doubt, pick this mode.
52
53       •   devdax: Device-DAX mode enables similar mmap(2) DAX mapping
54           capabilities as Filesystem-DAX. However, instead of a block-device
55           that can support a DAX-enabled filesystem, this mode emits a single
56           character device file (/dev/daxX.Y). Use this mode to assign
57           persistent memory to a virtual-machine, register persistent memory
58           for RDMA, or when gigantic mappings are needed.
59
60       •   sector: Use this mode to host legacy filesystems that do not
61           checksum metadata or applications that are not prepared for torn
62           sectors after a crash. Expected usage for this mode is for small
63           boot volumes. This mode is compatible with other operating systems.
64
65       •   raw: Raw mode is effectively just a memory disk that does not
66           support DAX. Typically this indicates a namespace that was created
67           by tooling or another operating system that did not know how to
68           create a Linux fsdax or devdax mode namespace. This mode is
69           compatible with other operating systems, but again, does not
70           support DAX operation.
71

EXAMPLES

73       Create a maximally sized pmem namespace in fsdax mode (the default)
74
75           ndctl create-namespace
76
77       Convert namespace0.0 to sector mode
78
79           ndctl create-namespace -f -e namespace0.0 --mode=sector
80

OPTIONS

82       -t, --type=
83           Create a pmem or blk namespace (subject to available capacity). A
84           pmem namespace supports the dax (direct access) capability to
85           mmap(2) persistent memory directly into a process address space. A
86           blk namespace access persistent memory through a
87           block-window-aperture. Compared to pmem it supports a traditional
88           storage error model (EIO on error rather than a cpu exception on a
89           bad memory access), but it does not support dax.
90
91       -m, --mode=
92
93           •   "raw": expose the namespace capacity directly with limitations.
94               Neither a raw pmem namepace nor raw blk namespace support
95               sector atomicity by default (see "sector" mode below). A raw
96               pmem namespace may have limited to no dax support depending the
97               kernel. In other words operations like direct-I/O targeting a
98               dax buffer may fail for a pmem namespace in raw mode or
99               indirect through a page-cache buffer. See "fsdax" and "devdax"
100               mode for dax operation.
101
102           •   "sector": persistent memory, given that it is byte addressable,
103               does not support sector atomicity. The problematic aspect of
104               sector tearing is that most applications do not know they have
105               a atomic sector update dependency. At least a disk rarely ever
106               tears sectors and if it does it almost certainly returns a
107               checksum error on access. Persistent memory devices will always
108               tear and always silently. Until an application is audited to be
109               robust in the presence of sector-tearing "safe" mode is
110               recommended. This imposes some performance overhead and
111               disables the dax capability. (also known as "safe" or "btt"
112               mode)
113
114           •   "fsdax": A pmem namespace in this mode supports dax operation
115               with a block-device based filesystem (in previous ndctl
116               releases this mode was named "memory" mode). This mode comes at
117               the cost of allocating per-page metadata. The capacity can be
118               allocated from "System RAM", or from a reserved portion of
119               "Persistent Memory" (see the --map= option). NOTE: A filesystem
120               that supports DAX is required for dax operation. If the raw
121               block device (/dev/pmemX) is used directly without a
122               filesystem, it will use the page cache. See "devdax" mode for
123               raw device access that supports dax.
124
125           •   "devdax": The device-dax character device interface is a
126               statically allocated / raw access analogue of filesystem-dax
127               (in previous ndctl releases this mode was named "dax" mode). It
128               allows memory ranges to be mapped without need of an
129               intervening filesystem. The device-dax is interface strict,
130               precise and predictable. Specifically the interface:
131
132               •   Guarantees fault granularity with respect to a given page
133                   size (4K, 2M, or 1G on x86) set at configuration time.
134
135               •   Enforces deterministic behavior by being strict about what
136                   fault scenarios are supported. I.e. if a device is
137                   configured with a 2M alignment an attempt to fault a 4K
138                   aligned offset will result in SIGBUS. :: Note both fsdax
139                   and devdax mode require 16MiB physical alignment to be
140                   cross-arch compatible. By default ndctl will block attempts
141                   to create namespaces in these modes when the physical
142                   starting address of the namespace is not 16MiB aligned. The
143                   --force option tries to override this constraint if the
144                   platform supports a smaller alignment, but this is not
145                   recommended.
146
147       -s, --size=
148           For NVDIMM devices that support namespace labels, set the namespace
149           size in bytes. Otherwise it defaults to the maximum size specified
150           by platform firmware. This option supports the suffixes "k" or "K"
151           for KiB, "m" or "M" for MiB, "g" or "G" for GiB and "t" or "T" for
152           TiB.
153
154               For pmem namepsaces the size must be a multiple of the
155               interleave-width and the namespace alignment (see
156               below).
157
158       -a, --align
159           Applications that want to establish dax memory mappings with page
160           table entries greater than system base page size (4K on x86) need a
161           persistent memory namespace that is sufficiently aligned. For
162           "fsdax" and "devdax" mode this defaults to 2M. Note that "devdax"
163           mode enforces all mappings to be aligned to this value, i.e. it
164           fails unaligned mapping attempts. The "fsdax" alignment setting
165           determines the starting alignment of filesystem extents and may
166           limit the possible granularities, if a large mapping is not
167           possible it will silently fall back to a smaller page size.
168
169       -e, --reconfig=
170           Reconfigure an existing namespace. This option is a shortcut for
171           the following sequence:
172
173           •   Read all parameters from @victim_namespace
174
175           •   Destroy @victim_namespace
176
177           •   Create @new_namespace merging old parameters with new ones ::
178               Note that the major implication of a destroy-create cycle is
179               that data from @victim_namespace is not preserved in
180               @new_namespace. The attributes transferred from
181               @victim_namespace are the geometry, mode, and name (not uuid
182               without --uuid=). No attempt is made to preserve the data and
183               any old data that is visible in @new_namespace is by
184               coincidence not convention. "Backup and restore" is the only
185               reliable method to populate @new_namespace with data from
186               @victim_namespace.
187
188       -u, --uuid=
189           This option is not recommended as a new uuid should be generated
190           every time a namespace is (re-)created. For recovery scenarios
191           however the uuid may be specified.
192
193       -n, --name=
194           For NVDIMM devices that support namespace labels, specify a human
195           friendly name for a namespace. This name is available as a device
196           attribute for use in udev rules.
197
198       -l, --sector-size
199           Specify the logical sector size (LBA size) of the Linux block
200           device associated with an namespace.
201
202       -M, --map=
203           A pmem namespace in "fsdax" or "devdax" mode requires allocation of
204           per-page metadata. The allocation can be drawn from either:
205
206           •   "mem": typical system memory
207
208           •   "dev": persistent memory reserved from the namespace :: Given
209               relative capacities of "Persistent Memory" to "System RAM" the
210               allocation defaults to reserving space out of the namespace
211               directly ("--map=dev"). The overhead is 64-bytes per 4K (16GB
212               per 1TB) on x86.
213
214       -c, --continue
215           Do not stop after creating one namespace. Instead, greedily create
216           as many namespaces as possible within the given --bus and --region
217           filter restrictions. This will abort if any creation attempt
218           results in an error unless --force is also supplied.
219
220       -f, --force
221           Unless this option is specified the reconfigure namespace operation
222           will fail if the namespace is presently active. Specifying --force
223           causes the namespace to be disabled before the operation is
224           attempted. However, if the namespace is mounted then the disable
225           namespace and reconfigure namespace operations will be aborted. The
226           namespace must be unmounted before being reconfigured. When used in
227           conjunction with --continue, continue the namespace creation loop
228           even if an error is encountered for intermediate namespaces.
229
230       -L, --autolabel, --no-autolabel
231           Legacy NVDIMM devices do not support namespace labels. In that case
232           the kernel creates region-sized namespaces that can not be deleted.
233           Their mode can be changed, but they can not be resized smaller than
234           their parent region. This is termed a "label-less namespace". In
235           contrast, NVDIMMs and hypervisors that support the ACPI 6.2 label
236           area definition (ACPI 6.2 Section 6.5.10 NVDIMM Label Methods)
237           support "labelled namespace" operation.
238
239           •   There are two cases where the kernel will default to label-less
240               operation:
241
242               •   NVDIMM does not support labels
243
244               •   The NVDIMM supports labels, but the Label Index Block (see
245                   UEFI 2.7) is not present and there is no capacity aliasing
246                   between blk and pmem regions.
247
248           •   In the latter case the configuration can be upgraded to
249               labelled operation by writing an index block on all DIMMs in a
250               region and re-enabling that region. The autolabel capability of
251               ndctl create-namespace --reconfig tries to do this by default
252               if it can determine that all DIMM capacity is referenced by the
253               namespace being reconfigured. It will otherwise fail to
254               autolabel and remain in label-less mode if it finds a DIMM
255               contributes capacity to more than one region. This check
256               prevents inadvertent data loss of that other region is in
257               active use. The --autolabel option is implied by default, the
258               --no-autolabel option can be used to disable this behavior.
259               When automatic labeling fails and labelled operation is still
260               desired the safety policy can be bypassed by the following
261               commands, note that all data on all regions is forfeited by
262               running these commands:
263
264                   ndctl disable-region all
265                   ndctl init-labels all
266                   ndctl enable-region all
267
268       -R, --autorecover, --no-autorecover
269           By default, if a namespace creation attempt fails, ndctl will
270           cleanup the partially initialized namespace. Use --no-autorecover
271           to disable this behavior for debug and development scenarios where
272           it useful to have the label and info-block state preserved after a
273           failure.
274
275       -v, --verbose
276           Emit debug messages for the namespace creation process
277
278       -r, --region=
279           A regionX device name, or a region id number. Restrict the
280           operation to the specified region(s). The keyword all can be
281           specified to indicate the lack of any restriction, however this is
282           the same as not supplying a --region option at all.
283
284       -b, --bus=
285           A bus id number, or a provider string (e.g. "ACPI.NFIT"). Restrict
286           the operation to the specified bus(es). The keyword all can be
287           specified to indicate the lack of any restriction, however this is
288           the same as not supplying a --bus option at all.
289

COPYRIGHT

291       Copyright © 2016 - 2020, Intel Corporation. License GPLv2: GNU GPL
292       version 2 http://gnu.org/licenses/gpl.html. This is free software: you
293       are free to change and redistribute it. There is NO WARRANTY, to the
294       extent permitted by law.
295

NOTES

302        1. UEFI NVDIMM Label Protocol
303           http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf
304
305        2. Linux Persistent Memory Wiki
306           https://nvdimm.wiki.kernel.org
307
308
309
310ndctl 71.1                        07/22/2021             NDCTL-CREATE-NAMES(1)