// SPDX-License-Identifier: GPL-2.0

daxctl-reconfigure-device(1)
============================

NAME
----
daxctl-reconfigure-device - Reconfigure a dax device into a different mode

SYNOPSIS
--------
[verse]
'daxctl reconfigure-device' <dax0.0> [<dax1.0>...<daxY.Z>] [<options>]

DESCRIPTION
-----------

Reconfigure the operational mode of a dax device. This can be used to convert
a regular 'devdax' mode device to the 'system-ram' mode which arranges for the
dax range to be hot-plugged into the system as regular memory.

NOTE: This is a destructive operation. Any data on the dax device *will* be
lost.

NOTE: Device reconfiguration depends on the dax-bus device model. See
linkdaxctl:daxctl-migrate-device-model[1] for more information. If dax-class is
in use (via the dax_pmem_compat driver), the reconfiguration will fail with an
error such as the following:
----
# daxctl reconfigure-device --mode=system-ram --region=0 all
libdaxctl: daxctl_dev_disable: dax3.0: error: device model is dax-class
dax3.0: disable failed: Operation not supported
error reconfiguring devices: Operation not supported
reconfigured 0 devices
----

'daxctl-reconfigure-device' nominally expects that it will online new memory
blocks as 'movable', so that kernel data doesn't make it into this memory.
However, there are other potential agents that may be configured to
automatically online new hot-plugged memory as it appears. Most notably,
these are the '/sys/devices/system/memory/auto_online_blocks' configuration,
or system udev rules. If such an agent races to online memory sections, daxctl
checks if the blocks were onlined as 'movable' memory. If this was not the
case, and the memory blocks are found to be in a different zone, then a
warning is displayed. If it is desired that a different agent control the
onlining of memory blocks, and the associated memory zone, then it is
recommended to use the --no-online option described below. This will abridge
the device reconfiguration operation to just hotplugging the memory, and
refrain from then onlining it.

In case daxctl detects that there is a kernel policy to auto-online blocks
(via /sys/devices/system/memory/auto_online_blocks), then reconfiguring to
system-ram will result in a failure. This can be overridden with '--force'.


THEORY OF OPERATION
-------------------
The kernel device-dax subsystem surfaces character devices
that provide DAX-access (direct mappings sans page-cache buffering) to a
given memory region. The devices are named /dev/daxX.Y where X is a
region-id and Y is an instance-id within that region. There are 2
mechanisms that trigger device-dax instances to appear:

1. Persistent Memory (PMEM) namespace configured in "devdax" mode. See
"ndctl create-namspace --help" and
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dax/Kconfig[CONFIG_DEV_DAX_PMEM].
In this case the device-dax instance is statically sized to its host
memory region which is bounded to the physical address range of the host
namespace.

2. Soft Reserved memory enumerated by platform firmware. On EFI systems
this is communicated via the so called EFI_MEMORY_SP "Special Purpose"
attribute. See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dax/Kconfig[CONFIG_DEV_DAX_HMEM].
In this case the device-dax instance(s) associated with the given memory
region can be resized and divided into multiple devices.

In the Soft Reservation case the expectation for EFI + ACPI based
platforms is that in addition to the EFI_MEMORY_SP attribute the
firmware also creates distinct ACPI proximity domains for any address
range that has different performance characteristics than default
"System RAM". So, the SRAT will define the proximity domain, the SLIT
communicates relative distance to other proximity domains, and the HMAT
is populated with nominal read/write latency and read/write bandwidth
data. That HMAT data is emitted to the kernel log on bootup, and also
exported to sysfs. See
https://www.kernel.org/doc/html/latest/admin-guide/mm/numaperf.html[NUMAPERF],
for the runtime representation of CPU to Memory node performance
details.

Outside of the NUMA performance details linked above the other method to
detect the presence of "Soft Reserved" memory is to dump /proc/iomem and
look for "Soft Reserved" ranges. If the kernel was not built with
CONFIG_EFI_SOFTRESERVE, predates the introduction of
CONFIG_EFI_SOFTRESERVE (v5.5), or was booted with the efi=nosoftreserve
command line then device-dax will not attach and the expectation is that
the memory shows up as a memory-only NUMA node. Otherwise the memory
shows up as a device-dax instance and DAXCTL(1) can be used to
optionally partition it and assign the memory back to the kernel as
"System RAM", or the device can be mapped directly as the back end of a
userspace memory allocator like https://pmem.io/vmem/libvmem/[LIBVMEM].

EXAMPLES
--------

* Reconfigure dax0.0 to system-ram mode, don't online the memory
----
# daxctl reconfigure-device --mode=system-ram --no-online dax0.0
[
  {
    "chardev":"dax0.0",
    "size":16777216000,
    "target_node":2,
    "mode":"system-ram"
  }
]
----

* Reconfigure dax0.0 to devdax mode, attempt to offline the memory
----
# daxctl reconfigure-device --human --mode=devdax --force dax0.0
{
  "chardev":"dax0.0",
  "size":"15.63 GiB (16.78 GB)",
  "target_node":2,
  "mode":"devdax"
}
----

* Reconfigure all dax devices on region0 to system-ram mode
----
# daxctl reconfigure-device --mode=system-ram --region=0 all
[
  {
    "chardev":"dax0.0",
    "size":16777216000,
    "target_node":2,
    "mode":"system-ram"
  },
  {
    "chardev":"dax0.1",
    "size":16777216000,
    "target_node":3,
    "mode":"system-ram"
  }
]
----

* Run a process called 'some-service' using numactl to restrict its cpu
nodes to '0' and '1', and  memory allocations to node 2 (determined using
daxctl_dev_get_target_node() or 'daxctl list')
----
# daxctl reconfigure-device --mode=system-ram dax0.0
[
  {
    "chardev":"dax0.0",
    "size":16777216000,
    "target_node":2,
    "mode":"system-ram"
  }
]

# numactl --cpunodebind=0-1 --membind=2 -- some-service --opt1 --opt2
----

* Change the size of a dax device
----
# daxctl reconfigure-device dax0.1 -s 16G
reconfigured 1 device
# daxctl reconfigure-device dax0.1 -s 0
reconfigured 1 device
----


OPTIONS
-------
include::region-option.txt[]

-s::
--size=::
	For regions that support dax device creation, change the device size
	in bytes. This option supports the suffixes "k" or "K" for KiB, "m" or
	"M" for MiB, "g" or "G" for GiB and "t" or "T" for TiB.

	The size must be a multiple of the region alignment.

	This option is mutually exclusive with -m or --mode.

-a::
--align::
	Applications that want to establish dax memory mappings with
	page table entries greater than system base page size (4K on
	x86) need a device that is sufficiently aligned. This defaults
	to 2M. Note that "devdax" mode enforces all mappings to be
	aligned to this value, i.e. it fails unaligned mapping attempts.

	This option is mutually exclusive with -m or --mode.

-m::
--mode=::
	Specify the mode to which the dax device(s) should be reconfigured.
	- "system-ram": hotplug the device into system memory.

	- "devdax": switch to the normal "device dax" mode. This requires the
	  kernel to support hot-unplugging 'kmem' based memory. If this is not
	  available, a reboot is the only way to switch back to 'devdax' mode.

-N::
--no-online::
	By default, memory sections provided by system-ram devices will be
	brought online automatically and immediately with the 'online_movable'
	policy. Use this option to disable the automatic onlining behavior.

-C::
--check-config::
	Get reconfiguration parameters from the global daxctl config file.
	This is typically used when daxctl-reconfigure-device is called from
	a systemd-udevd device unit file. The reconfiguration proceeds only
	if the match parameters in a 'reconfigure-device' section of the
	config match the dax device specified on the command line. See the
	'PERSISTENT RECONFIGURATION' section for more details.

include::movable-options.txt[]

-f::
--force::
	- When converting from "system-ram" mode to "devdax", it is expected
	that all the memory sections are first made offline. By default,
	daxctl won't touch online memory. However with this option, attempt
	to offline the memory on the NUMA node associated with the dax device
	before converting it back to "devdax" mode.

	- Additionally, if a kernel policy to auto-online blocks is detected,
	reconfiguration to system-ram fails. With this option, the failure can
	be overridden to allow reconfiguration regardless of kernel policy.
	Doing this may result in a successful reconfiguration, but it may
	not be possible to subsequently offline the memory without a reboot.


include::human-option.txt[]

include::verbose-option.txt[]

PERSISTENT RECONFIGURATION
--------------------------

The 'mode' of a daxctl device is not persistent across reboots by default. This
is because the device itself does not hold any metadata that hints at what mode
it was set to, or is intended to be used. The default mode for such a device
on boot is 'devdax'.

The administrator may set policy such that certain dax devices are always
reconfigured into a target configuration every boot. This is accomplished via a
daxctl config file.

The config file may have multiple sections influencing different aspects of
daxctl operation. The section of interest for persistent reconfiguration is
'reconfigure-device'. The format of this is as follows:

----
[reconfigure-device <unique_subsection_name>]
nvdimm.uuid = <NVDIMM namespace uuid>
mode = <desired reconfiguration mode> (default: system-ram)
online = <true|false> (default: true)
movable = <true|false> (default: true)
----

Here is an example of a config snippet for managing three devdax namespaces,
one is left in devdax mode, the second is changed to system-ram mode with
default options (online, movable), and the third is set to system-ram mode,
the memory is onlined, but not movable.

Note that the 'subsection name' can be arbitrary, and is only used to
identify a specific config section. It does not have to match the 'device
name' (e.g. 'dax0.0' etc).

----
[reconfigure-device dax0]
nvdimm.uuid = ed93e918-e165-49d8-921d-383d7b9660c5
mode = devdax

[reconfigure-device dax1]
nvdimm.uuid = f36d02ff-1d9f-4fb9-a5b9-8ceb10a00fe3
mode = system-ram

[reconfigure-device dax2]
nvdimm.uuid = f36d02ff-1d9f-4fb9-a5b9-8ceb10a00fe3
mode = system-ram
online = true
movable = false
----

The following example can be used to create a devdax mode namespace, and
simultaneously add the newly created namespace to the config file for
system-ram conversion.

----
ndctl create-namespace --mode=devdax | \
	jq -r "\"[reconfigure-device $(uuidgen)]\", \"nvdimm.uuid = \(.uuid)\", \"mode = system-ram\"" >> $config_path
----

The default location for daxctl config files is under {daxctl_confdir}/,
and any file with a '.conf' suffix at this location is considered. It is
acceptable to have multiple files containing ini-style config sections,
but the {section, subsection} tuple must be unique across all config files
under {daxctl_confdir}/.

include::../copyright.txt[]

SEE ALSO
--------
linkdaxctl:daxctl-list[1],daxctl-migrate-device-model[1]
