1mbind(2)                      System Calls Manual                     mbind(2)
2
3
4

NAME

6       mbind - set memory policy for a memory range
7

LIBRARY

9       NUMA (Non-Uniform Memory Access) policy library (libnuma, -lnuma)
10

SYNOPSIS

12       #include <numaif.h>
13
14       long mbind(void addr[.len], unsigned long len, int mode,
15                  const unsigned long nodemask[(.maxnode + ULONG_WIDTH - 1)
16                                               / ULONG_WIDTH],
17                  unsigned long maxnode, unsigned int flags);
18

DESCRIPTION

20       mbind()  sets  the  NUMA memory policy, which consists of a policy mode
21       and zero or more nodes, for the memory range  starting  with  addr  and
22       continuing  for  len  bytes.  The memory policy defines from which node
23       memory is allocated.
24
25       If the memory range specified by the addr and len arguments includes an
26       "anonymous"  region  of memory—that is a region of memory created using
27       the mmap(2) system call with the MAP_ANONYMOUS—or a memory-mapped file,
28       mapped  using  the mmap(2) system call with the MAP_PRIVATE flag, pages
29       will be allocated only according to the specified policy when  the  ap‐
30       plication  writes (stores) to the page.  For anonymous regions, an ini‐
31       tial read access will use a shared page in the  kernel  containing  all
32       zeros.  For a file mapped with MAP_PRIVATE, an initial read access will
33       allocate pages according to the memory policy of the thread that causes
34       the  page  to  be  allocated.   This  may not be the thread that called
35       mbind().
36
37       The specified policy will be ignored for any MAP_SHARED mappings in the
38       specified  memory  range.  Rather the pages will be allocated according
39       to the memory policy of the thread that caused the  page  to  be  allo‐
40       cated.  Again, this may not be the thread that called mbind().
41
42       If  the  specified memory range includes a shared memory region created
43       using the shmget(2) system call and attached using the shmat(2)  system
44       call, pages allocated for the anonymous or shared memory region will be
45       allocated according  to  the  policy  specified,  regardless  of  which
46       process  attached  to  the shared memory segment causes the allocation.
47       If, however, the shared memory region was created with the  SHM_HUGETLB
48       flag,  the  huge pages will be allocated according to the policy speci‐
49       fied only if the page allocation is caused by the  process  that  calls
50       mbind() for that region.
51
52       By  default,  mbind()  has  an  effect only for new allocations; if the
53       pages inside the range have been already  touched  before  setting  the
54       policy,  then  the  policy has no effect.  This default behavior may be
55       overridden by the MPOL_MF_MOVE and MPOL_MF_MOVE_ALL flags described be‐
56       low.
57
58       The mode argument must specify one of MPOL_DEFAULT, MPOL_BIND, MPOL_IN‐
59       TERLEAVE, MPOL_PREFERRED, or MPOL_LOCAL (which are described in  detail
60       below).   All  policy  modes  except MPOL_DEFAULT require the caller to
61       specify the node or nodes to which the mode applies, via  the  nodemask
62       argument.
63
64       The  mode  argument  may  also include an optional mode flag.  The sup‐
65       ported mode flags are:
66
67       MPOL_F_STATIC_NODES (since Linux-2.6.26)
68              A nonempty nodemask specifies physical node IDs.  Linux does not
69              remap  the  nodemask when the thread moves to a different cpuset
70              context, nor when the set of nodes allowed by the thread's  cur‐
71              rent cpuset context changes.
72
73       MPOL_F_RELATIVE_NODES (since Linux-2.6.26)
74              A  nonempty nodemask specifies node IDs that are relative to the
75              set of node IDs allowed by the thread's current cpuset.
76
77       nodemask points to a bit mask of nodes containing up to  maxnode  bits.
78       The  bit  mask  size is rounded to the next multiple of sizeof(unsigned
79       long), but the kernel will use bits only up to maxnode.  A  NULL  value
80       of  nodemask  or  a  maxnode  value  of zero specifies the empty set of
81       nodes.  If the value of maxnode is zero, the nodemask argument  is  ig‐
82       nored.  Where a nodemask is required, it must contain at least one node
83       that is on-line, allowed by the thread's current cpuset context (unless
84       the MPOL_F_STATIC_NODES mode flag is specified), and contains memory.
85
86       The mode argument must include one of the following values:
87
88       MPOL_DEFAULT
89              This  mode  requests  that  any  nondefault  policy  be removed,
90              restoring default behavior.  When applied to a range  of  memory
91              via  mbind(),  this means to use the thread memory policy, which
92              may have been set with set_mempolicy(2).  If  the  mode  of  the
93              thread  memory  policy is also MPOL_DEFAULT, the system-wide de‐
94              fault policy will be used.  The system-wide default policy allo‐
95              cates pages on the node of the CPU that triggers the allocation.
96              For MPOL_DEFAULT, the nodemask and  maxnode  arguments  must  be
97              specify the empty set of nodes.
98
99       MPOL_BIND
100              This  mode specifies a strict policy that restricts memory allo‐
101              cation to the nodes specified in nodemask.  If  nodemask  speci‐
102              fies  more  than  one  node, page allocations will come from the
103              node with sufficient free memory that is  closest  to  the  node
104              where  the  allocation takes place.  Pages will not be allocated
105              from any node not specified in the IR nodemask .  (Before  Linux
106              2.6.26,  page allocations came from the node with the lowest nu‐
107              meric node ID first, until that node contained no  free  memory.
108              Allocations  then  came from the node with the next highest node
109              ID specified in nodemask and so forth, until none of the  speci‐
110              fied nodes contained free memory.)
111
112       MPOL_INTERLEAVE
113              This  mode specifies that page allocations be interleaved across
114              the set of nodes specified  in  nodemask.   This  optimizes  for
115              bandwidth  instead  of latency by spreading out pages and memory
116              accesses to those pages across multiple nodes.  To be  effective
117              the  memory area should be fairly large, at least 1 MB or bigger
118              with a fairly uniform access pattern.  Accesses to a single page
119              of  the  area will still be limited to the memory bandwidth of a
120              single node.
121
122       MPOL_PREFERRED
123              This mode sets the preferred node for  allocation.   The  kernel
124              will try to allocate pages from this node first and fall back to
125              other nodes if the preferred nodes is low on  free  memory.   If
126              nodemask  specifies more than one node ID, the first node in the
127              mask will be selected as the preferred node.   If  the  nodemask
128              and  maxnode arguments specify the empty set, then the memory is
129              allocated on the node of the CPU that triggered the allocation.
130
131       MPOL_LOCAL (since Linux 3.8)
132              This mode specifies "local allocation"; the memory is  allocated
133              on the node of the CPU that triggered the allocation (the "local
134              node").  The nodemask and maxnode  arguments  must  specify  the
135              empty  set.  If the "local node" is low on free memory, the ker‐
136              nel will try to allocate memory from other  nodes.   The  kernel
137              will  allocate  memory from the "local node" whenever memory for
138              this node is available.  If the "local node" is not  allowed  by
139              the  thread's current cpuset context, the kernel will try to al‐
140              locate memory from other nodes.  The kernel will allocate memory
141              from  the  "local  node"  whenever  it  becomes  allowed  by the
142              thread's current cpuset context.  By contrast, MPOL_DEFAULT  re‐
143              verts  to  the memory policy of the thread (which may be set via
144              set_mempolicy(2)); that policy may be something other than  "lo‐
145              cal allocation".
146
147       If MPOL_MF_STRICT is passed in flags and mode is not MPOL_DEFAULT, then
148       the call fails with the error EIO if the existing pages in  the  memory
149       range don't follow the policy.
150
151       If  MPOL_MF_MOVE is specified in flags, then the kernel will attempt to
152       move all the existing pages in the memory range so that they follow the
153       policy.   Pages that are shared with other processes will not be moved.
154       If MPOL_MF_STRICT is also specified, then the call fails with the error
155       EIO if some pages could not be moved.
156
157       If MPOL_MF_MOVE_ALL is passed in flags, then the kernel will attempt to
158       move all existing pages in the memory range regardless of whether other
159       processes  use  the  pages.   The  calling  thread  must  be privileged
160       (CAP_SYS_NICE) to use this flag.  If MPOL_MF_STRICT is also  specified,
161       then  the  call  fails  with  the  error EIO if some pages could not be
162       moved.
163

RETURN VALUE

165       On success, mbind() returns 0; on error, -1 is returned  and  errno  is
166       set to indicate the error.
167

ERRORS

169       EFAULT Part  or all of the memory range specified by nodemask and maxn‐
170              ode points outside your accessible address space.  Or, there was
171              an unmapped hole in the specified memory range specified by addr
172              and len.
173
174       EINVAL An invalid value was specified for flags or mode; or addr +  len
175              was less than addr; or addr is not a multiple of the system page
176              size.   Or,  mode  is  MPOL_DEFAULT  and  nodemask  specified  a
177              nonempty  set; or mode is MPOL_BIND or MPOL_INTERLEAVE and node‐
178              mask is empty.  Or, maxnode exceeds a kernel-imposed limit.  Or,
179              nodemask  specifies  one  or more node IDs that are greater than
180              the maximum supported node ID.  Or, none of the node IDs  speci‐
181              fied by nodemask are on-line and allowed by the thread's current
182              cpuset context, or none of the specified nodes  contain  memory.
183              Or,  the  mode  argument  specified both MPOL_F_STATIC_NODES and
184              MPOL_F_RELATIVE_NODES.
185
186       EIO    MPOL_MF_STRICT was specified and an existing page was already on
187              a  node  that  does  not  follow  the policy; or MPOL_MF_MOVE or
188              MPOL_MF_MOVE_ALL was specified and the kernel was unable to move
189              all existing pages in the range.
190
191       ENOMEM Insufficient kernel memory was available.
192
193       EPERM  The  flags  argument  included the MPOL_MF_MOVE_ALL flag and the
194              caller does not have the CAP_SYS_NICE privilege.
195

STANDARDS

197       Linux.
198

HISTORY

200       Linux 2.6.7.
201
202       Support for huge page policy was added with Linux 2.6.16.   For  inter‐
203       leave  policy to be effective on huge page mappings the policied memory
204       needs to be tens of megabytes or larger.
205
206       Before Linux 5.7.  MPOL_MF_STRICT was ignored on huge page mappings.
207
208       MPOL_MF_MOVE and MPOL_MF_MOVE_ALL are available only  on  Linux  2.6.16
209       and later.
210

NOTES

212       For information on library support, see numa(7).
213
214       NUMA  policy  is  not  supported on a memory-mapped file range that was
215       mapped with the MAP_SHARED flag.
216
217       The MPOL_DEFAULT mode  can  have  different  effects  for  mbind()  and
218       set_mempolicy(2).  When MPOL_DEFAULT is specified for set_mempolicy(2),
219       the thread's memory policy reverts to the system default policy or  lo‐
220       cal  allocation.   When MPOL_DEFAULT is specified for a range of memory
221       using mbind(), any pages subsequently allocated for that range will use
222       the  thread's  memory  policy, as set by set_mempolicy(2).  This effec‐
223       tively removes the explicit policy from the specified  range,  "falling
224       back" to a possibly nondefault policy.  To select explicit "local allo‐
225       cation" for a memory range, specify a mode of MPOL_LOCAL  or  MPOL_PRE‐
226       FERRED  with an empty set of nodes.  This method will work for set_mem‐
227       policy(2), as well.
228

SEE ALSO

230       get_mempolicy(2),  getcpu(2),  mmap(2),   set_mempolicy(2),   shmat(2),
231       shmget(2), numa(3), cpuset(7), numa(7), numactl(8)
232
233
234
235Linux man-pages 6.04              2023-03-30                          mbind(2)
Impressum