1DAXCTL-RECONFIGURE()                                      DAXCTL-RECONFIGURE()
2
3
4

NAME

6       daxctl-reconfigure-device - Reconfigure a dax device into a different
7       mode
8

SYNOPSIS

10       daxctl reconfigure-device <dax0.0> [<dax1.0>...<daxY.Z>] [<options>]
11

DESCRIPTION

13       Reconfigure the operational mode of a dax device. This can be used to
14       convert a regular devdax mode device to the system-ram mode which
15       arranges for the dax range to be hot-plugged into the system as regular
16       memory.
17
18           Note
19           This is a destructive operation. Any data on the dax device will be
20           lost.
21
22           Note
23           Device reconfiguration depends on the dax-bus device model. See
24           linkdaxctl:daxctl-migrate-device-model[1] for more information. If
25           dax-class is in use (via the dax_pmem_compat driver), the
26           reconfiguration will fail with an error such as the following:
27
28           # daxctl reconfigure-device --mode=system-ram --region=0 all
29           libdaxctl: daxctl_dev_disable: dax3.0: error: device model is dax-class
30           dax3.0: disable failed: Operation not supported
31           error reconfiguring devices: Operation not supported
32           reconfigured 0 devices
33
34       daxctl-reconfigure-device nominally expects that it will online new
35       memory blocks as movable, so that kernel data doesn’t make it into this
36       memory. However, there are other potential agents that may be
37       configured to automatically online new hot-plugged memory as it
38       appears. Most notably, these are the
39       /sys/devices/system/memory/auto_online_blocks configuration, or system
40       udev rules. If such an agent races to online memory sections, daxctl
41       checks if the blocks were onlined as movable memory. If this was not
42       the case, and the memory blocks are found to be in a different zone,
43       then a warning is displayed. If it is desired that a different agent
44       control the onlining of memory blocks, and the associated memory zone,
45       then it is recommended to use the --no-online option described below.
46       This will abridge the device reconfiguration operation to just
47       hotplugging the memory, and refrain from then onlining it.
48
49       In case daxctl detects that there is a kernel policy to auto-online
50       blocks (via /sys/devices/system/memory/auto_online_blocks), then
51       reconfiguring to system-ram will result in a failure. This can be
52       overridden with --force.
53

THEORY OF OPERATION

55       The kernel device-dax subsystem surfaces character devices that provide
56       DAX-access (direct mappings sans page-cache buffering) to a given
57       memory region. The devices are named /dev/daxX.Y where X is a region-id
58       and Y is an instance-id within that region. There are 2 mechanisms that
59       trigger device-dax instances to appear:
60
61        1. Persistent Memory (PMEM) namespace configured in "devdax" mode. See
62           "ndctl create-namspace --help" and CONFIG_DEV_DAX_PMEM[1]. In this
63           case the device-dax instance is statically sized to its host memory
64           region which is bounded to the physical address range of the host
65           namespace.
66
67        2. Soft Reserved memory enumerated by platform firmware. On EFI
68           systems this is communicated via the so called EFI_MEMORY_SP
69           "Special Purpose" attribute. See CONFIG_DEV_DAX_HMEM[1]. In this
70           case the device-dax instance(s) associated with the given memory
71           region can be resized and divided into multiple devices.
72
73       In the Soft Reservation case the expectation for EFI + ACPI based
74       platforms is that in addition to the EFI_MEMORY_SP attribute the
75       firmware also creates distinct ACPI proximity domains for any address
76       range that has different performance characteristics than default
77       "System RAM". So, the SRAT will define the proximity domain, the SLIT
78       communicates relative distance to other proximity domains, and the HMAT
79       is populated with nominal read/write latency and read/write bandwidth
80       data. That HMAT data is emitted to the kernel log on bootup, and also
81       exported to sysfs. See NUMAPERF[2], for the runtime representation of
82       CPU to Memory node performance details.
83
84       Outside of the NUMA performance details linked above the other method
85       to detect the presence of "Soft Reserved" memory is to dump /proc/iomem
86       and look for "Soft Reserved" ranges. If the kernel was not built with
87       CONFIG_EFI_SOFTRESERVE, predates the introduction of
88       CONFIG_EFI_SOFTRESERVE (v5.5), or was booted with the efi=nosoftreserve
89       command line then device-dax will not attach and the expectation is
90       that the memory shows up as a memory-only NUMA node. Otherwise the
91       memory shows up as a device-dax instance and DAXCTL(1) can be used to
92       optionally partition it and assign the memory back to the kernel as
93       "System RAM", or the device can be mapped directly as the back end of a
94       userspace memory allocator like LIBVMEM[3].
95

EXAMPLES

97       •   Reconfigure dax0.0 to system-ram mode, don’t online the memory
98
99           # daxctl reconfigure-device --mode=system-ram --no-online dax0.0
100           [
101             {
102               "chardev":"dax0.0",
103               "size":16777216000,
104               "target_node":2,
105               "mode":"system-ram"
106             }
107           ]
108
109       •   Reconfigure dax0.0 to devdax mode, attempt to offline the memory
110
111           # daxctl reconfigure-device --human --mode=devdax --force dax0.0
112           {
113             "chardev":"dax0.0",
114             "size":"15.63 GiB (16.78 GB)",
115             "target_node":2,
116             "mode":"devdax"
117           }
118
119       •   Reconfigure all dax devices on region0 to system-ram mode
120
121           # daxctl reconfigure-device --mode=system-ram --region=0 all
122           [
123             {
124               "chardev":"dax0.0",
125               "size":16777216000,
126               "target_node":2,
127               "mode":"system-ram"
128             },
129             {
130               "chardev":"dax0.1",
131               "size":16777216000,
132               "target_node":3,
133               "mode":"system-ram"
134             }
135           ]
136
137       •   Run a process called some-service using numactl to restrict its cpu
138           nodes to 0 and 1, and memory allocations to node 2 (determined
139           using daxctl_dev_get_target_node() or daxctl list)
140
141           # daxctl reconfigure-device --mode=system-ram dax0.0
142           [
143             {
144               "chardev":"dax0.0",
145               "size":16777216000,
146               "target_node":2,
147               "mode":"system-ram"
148             }
149           ]
150
151           # numactl --cpunodebind=0-1 --membind=2 -- some-service --opt1 --opt2
152
153       •   Change the size of a dax device
154
155           # daxctl reconfigure-device dax0.1 -s 16G
156           reconfigured 1 device
157           # daxctl reconfigure-device dax0.1 -s 0
158           reconfigured 1 device
159

OPTIONS

161       -r, --region=
162           Restrict the operation to devices belonging to the specified
163           region(s). A device-dax region is a contiguous range of memory that
164           hosts one or more /dev/daxX.Y devices, where X is the region id and
165           Y is the device instance id.
166
167       -s, --size=
168           For regions that support dax device creation, change the device
169           size in bytes. This option supports the suffixes "k" or "K" for
170           KiB, "m" or "M" for MiB, "g" or "G" for GiB and "t" or "T" for TiB.
171
172               The size must be a multiple of the region alignment.
173
174               This option is mutually exclusive with -m or --mode.
175
176       -a, --align
177           Applications that want to establish dax memory mappings with page
178           table entries greater than system base page size (4K on x86) need a
179           device that is sufficiently aligned. This defaults to 2M. Note that
180           "devdax" mode enforces all mappings to be aligned to this value,
181           i.e. it fails unaligned mapping attempts.
182
183               This option is mutually exclusive with -m or --mode.
184
185       -m, --mode=
186           Specify the mode to which the dax device(s) should be reconfigured.
187
188           •   "system-ram": hotplug the device into system memory.
189
190           •   "devdax": switch to the normal "device dax" mode. This requires
191               the kernel to support hot-unplugging kmem based memory. If this
192               is not available, a reboot is the only way to switch back to
193               devdax mode.
194
195       -N, --no-online
196           By default, memory sections provided by system-ram devices will be
197           brought online automatically and immediately with the
198           online_movable policy. Use this option to disable the automatic
199           onlining behavior.
200
201       -C, --check-config
202           Get reconfiguration parameters from the global daxctl config file.
203           This is typically used when daxctl-reconfigure-device is called
204           from a systemd-udevd device unit file. The reconfiguration proceeds
205           only if the match parameters in a reconfigure-device section of the
206           config match the dax device specified on the command line. See the
207           PERSISTENT RECONFIGURATION section for more details.
208
209       --no-movable
210           --movable is the default. This can be overridden to online new
211           memory such that it is not movable. This allows any allocation to
212           potentially be served from this memory. This may preclude
213           subsequent removal. With the --movable behavior (which is default),
214           kernel allocations will not consider this memory, and it will be
215           reserved for application use.
216
217       -f, --force
218
219           •   When converting from "system-ram" mode to "devdax", it is
220               expected that all the memory sections are first made offline.
221               By default, daxctl won’t touch online memory. However with this
222               option, attempt to offline the memory on the NUMA node
223               associated with the dax device before converting it back to
224               "devdax" mode.
225
226           •   Additionally, if a kernel policy to auto-online blocks is
227               detected, reconfiguration to system-ram fails. With this
228               option, the failure can be overridden to allow reconfiguration
229               regardless of kernel policy. Doing this may result in a
230               successful reconfiguration, but it may not be possible to
231               subsequently offline the memory without a reboot.
232
233       -u, --human
234           By default the command will output machine-friendly raw-integer
235           data. Instead, with this flag, numbers representing storage size
236           will be formatted as human readable strings with units, other
237           fields are converted to hexadecimal strings.
238
239       -v, --verbose
240           Emit more debug messages
241

PERSISTENT RECONFIGURATION

243       The mode of a daxctl device is not persistent across reboots by
244       default. This is because the device itself does not hold any metadata
245       that hints at what mode it was set to, or is intended to be used. The
246       default mode for such a device on boot is devdax.
247
248       The administrator may set policy such that certain dax devices are
249       always reconfigured into a target configuration every boot. This is
250       accomplished via a daxctl config file.
251
252       The config file may have multiple sections influencing different
253       aspects of daxctl operation. The section of interest for persistent
254       reconfiguration is reconfigure-device. The format of this is as
255       follows:
256
257           [reconfigure-device <unique_subsection_name>]
258           nvdimm.uuid = <NVDIMM namespace uuid>
259           mode = <desired reconfiguration mode> (default: system-ram)
260           online = <true|false> (default: true)
261           movable = <true|false> (default: true)
262
263       Here is an example of a config snippet for managing three devdax
264       namespaces, one is left in devdax mode, the second is changed to
265       system-ram mode with default options (online, movable), and the third
266       is set to system-ram mode, the memory is onlined, but not movable.
267
268       Note that the subsection name can be arbitrary, and is only used to
269       identify a specific config section. It does not have to match the
270       device name (e.g. dax0.0 etc).
271
272           [reconfigure-device dax0]
273           nvdimm.uuid = ed93e918-e165-49d8-921d-383d7b9660c5
274           mode = devdax
275
276           [reconfigure-device dax1]
277           nvdimm.uuid = f36d02ff-1d9f-4fb9-a5b9-8ceb10a00fe3
278           mode = system-ram
279
280           [reconfigure-device dax2]
281           nvdimm.uuid = f36d02ff-1d9f-4fb9-a5b9-8ceb10a00fe3
282           mode = system-ram
283           online = true
284           movable = false
285
286       The following example can be used to create a devdax mode namespace,
287       and simultaneously add the newly created namespace to the config file
288       for system-ram conversion.
289
290           ndctl create-namespace --mode=devdax | \
291                   jq -r "\"[reconfigure-device $(uuidgen)]\", \"nvdimm.uuid = \(.uuid)\", \"mode = system-ram\"" >> $config_path
292
293       The default location for daxctl config files is under
294       /etc/daxctl.conf.d/, and any file with a .conf suffix at this location
295       is considered. It is acceptable to have multiple files containing
296       ini-style config sections, but the {section, subsection} tuple must be
297       unique across all config files under /etc/daxctl.conf.d/.
298
300       Copyright © 2016 - 2022, Intel Corporation. License GPLv2: GNU GPL
301       version 2 http://gnu.org/licenses/gpl.html. This is free software: you
302       are free to change and redistribute it. There is NO WARRANTY, to the
303       extent permitted by law.
304

SEE ALSO

306       linkdaxctl:daxctl-list[1],daxctl-migrate-device-model[1]
307

NOTES

309        1. CONFIG_DEV_DAX_PMEM
310           https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dax/Kconfig
311
312        2. NUMAPERF
313           https://www.kernel.org/doc/html/latest/admin-guide/mm/numaperf.html
314
315        3. LIBVMEM
316           https://pmem.io/vmem/libvmem/
317
318
319
320                                                          DAXCTL-RECONFIGURE()
Impressum