1sched_setaffinity(2) System Calls Manual sched_setaffinity(2)
2
3
4
6 sched_setaffinity, sched_getaffinity - set and get a thread's CPU
7 affinity mask
8
10 Standard C library (libc, -lc)
11
13 #define _GNU_SOURCE /* See feature_test_macros(7) */
14 #include <sched.h>
15
16 int sched_setaffinity(pid_t pid, size_t cpusetsize,
17 const cpu_set_t *mask);
18 int sched_getaffinity(pid_t pid, size_t cpusetsize,
19 cpu_set_t *mask);
20
22 A thread's CPU affinity mask determines the set of CPUs on which it is
23 eligible to run. On a multiprocessor system, setting the CPU affinity
24 mask can be used to obtain performance benefits. For example, by dedi‐
25 cating one CPU to a particular thread (i.e., setting the affinity mask
26 of that thread to specify a single CPU, and setting the affinity mask
27 of all other threads to exclude that CPU), it is possible to ensure
28 maximum execution speed for that thread. Restricting a thread to run
29 on a single CPU also avoids the performance cost caused by the cache
30 invalidation that occurs when a thread ceases to execute on one CPU and
31 then recommences execution on a different CPU.
32
33 A CPU affinity mask is represented by the cpu_set_t structure, a "CPU
34 set", pointed to by mask. A set of macros for manipulating CPU sets is
35 described in CPU_SET(3).
36
37 sched_setaffinity() sets the CPU affinity mask of the thread whose ID
38 is pid to the value specified by mask. If pid is zero, then the call‐
39 ing thread is used. The argument cpusetsize is the length (in bytes)
40 of the data pointed to by mask. Normally this argument would be speci‐
41 fied as sizeof(cpu_set_t).
42
43 If the thread specified by pid is not currently running on one of the
44 CPUs specified in mask, then that thread is migrated to one of the CPUs
45 specified in mask.
46
47 sched_getaffinity() writes the affinity mask of the thread whose ID is
48 pid into the cpu_set_t structure pointed to by mask. The cpusetsize
49 argument specifies the size (in bytes) of mask. If pid is zero, then
50 the mask of the calling thread is returned.
51
53 On success, sched_setaffinity() and sched_getaffinity() return 0 (but
54 see "C library/kernel differences" below, which notes that the underly‐
55 ing sched_getaffinity() differs in its return value). On failure, -1
56 is returned, and errno is set to indicate the error.
57
59 EFAULT A supplied memory address was invalid.
60
61 EINVAL The affinity bit mask mask contains no processors that are cur‐
62 rently physically on the system and permitted to the thread ac‐
63 cording to any restrictions that may be imposed by cpuset
64 cgroups or the "cpuset" mechanism described in cpuset(7).
65
66 EINVAL (sched_getaffinity() and, before Linux 2.6.9, sched_setaffin‐
67 ity()) cpusetsize is smaller than the size of the affinity mask
68 used by the kernel.
69
70 EPERM (sched_setaffinity()) The calling thread does not have appropri‐
71 ate privileges. The caller needs an effective user ID equal to
72 the real user ID or effective user ID of the thread identified
73 by pid, or it must possess the CAP_SYS_NICE capability in the
74 user namespace of the thread pid.
75
76 ESRCH The thread whose ID is pid could not be found.
77
79 Linux.
80
82 Linux 2.5.8, glibc 2.3.
83
84 Initially, the glibc interfaces included a cpusetsize argument, typed
85 as unsigned int. In glibc 2.3.3, the cpusetsize argument was removed,
86 but was then restored in glibc 2.3.4, with type size_t.
87
89 After a call to sched_setaffinity(), the set of CPUs on which the
90 thread will actually run is the intersection of the set specified in
91 the mask argument and the set of CPUs actually present on the system.
92 The system may further restrict the set of CPUs on which the thread
93 runs if the "cpuset" mechanism described in cpuset(7) is being used.
94 These restrictions on the actual set of CPUs on which the thread will
95 run are silently imposed by the kernel.
96
97 There are various ways of determining the number of CPUs available on
98 the system, including: inspecting the contents of /proc/cpuinfo; using
99 sysconf(3) to obtain the values of the _SC_NPROCESSORS_CONF and
100 _SC_NPROCESSORS_ONLN parameters; and inspecting the list of CPU direc‐
101 tories under /sys/devices/system/cpu/.
102
103 sched(7) has a description of the Linux scheduling scheme.
104
105 The affinity mask is a per-thread attribute that can be adjusted inde‐
106 pendently for each of the threads in a thread group. The value re‐
107 turned from a call to gettid(2) can be passed in the argument pid.
108 Specifying pid as 0 will set the attribute for the calling thread, and
109 passing the value returned from a call to getpid(2) will set the attri‐
110 bute for the main thread of the thread group. (If you are using the
111 POSIX threads API, then use pthread_setaffinity_np(3) instead of
112 sched_setaffinity().)
113
114 The isolcpus boot option can be used to isolate one or more CPUs at
115 boot time, so that no processes are scheduled onto those CPUs. Follow‐
116 ing the use of this boot option, the only way to schedule processes
117 onto the isolated CPUs is via sched_setaffinity() or the cpuset(7)
118 mechanism. For further information, see the kernel source file Docu‐
119 mentation/admin-guide/kernel-parameters.txt. As noted in that file,
120 isolcpus is the preferred mechanism of isolating CPUs (versus the al‐
121 ternative of manually setting the CPU affinity of all processes on the
122 system).
123
124 A child created via fork(2) inherits its parent's CPU affinity mask.
125 The affinity mask is preserved across an execve(2).
126
127 C library/kernel differences
128 This manual page describes the glibc interface for the CPU affinity
129 calls. The actual system call interface is slightly different, with
130 the mask being typed as unsigned long *, reflecting the fact that the
131 underlying implementation of CPU sets is a simple bit mask.
132
133 On success, the raw sched_getaffinity() system call returns the number
134 of bytes placed copied into the mask buffer; this will be the minimum
135 of cpusetsize and the size (in bytes) of the cpumask_t data type that
136 is used internally by the kernel to represent the CPU set bit mask.
137
138 Handling systems with large CPU affinity masks
139 The underlying system calls (which represent CPU masks as bit masks of
140 type unsigned long *) impose no restriction on the size of the CPU
141 mask. However, the cpu_set_t data type used by glibc has a fixed size
142 of 128 bytes, meaning that the maximum CPU number that can be repre‐
143 sented is 1023. If the kernel CPU affinity mask is larger than 1024,
144 then calls of the form:
145
146 sched_getaffinity(pid, sizeof(cpu_set_t), &mask);
147
148 fail with the error EINVAL, the error produced by the underlying system
149 call for the case where the mask size specified in cpusetsize is
150 smaller than the size of the affinity mask used by the kernel. (De‐
151 pending on the system CPU topology, the kernel affinity mask can be
152 substantially larger than the number of active CPUs in the system.)
153
154 When working on systems with large kernel CPU affinity masks, one must
155 dynamically allocate the mask argument (see CPU_ALLOC(3)). Currently,
156 the only way to do this is by probing for the size of the required mask
157 using sched_getaffinity() calls with increasing mask sizes (until the
158 call does not fail with the error EINVAL).
159
160 Be aware that CPU_ALLOC(3) may allocate a slightly larger CPU set than
161 requested (because CPU sets are implemented as bit masks allocated in
162 units of sizeof(long)). Consequently, sched_getaffinity() can set bits
163 beyond the requested allocation size, because the kernel sees a few ad‐
164 ditional bits. Therefore, the caller should iterate over the bits in
165 the returned set, counting those which are set, and stop upon reaching
166 the value returned by CPU_COUNT(3) (rather than iterating over the num‐
167 ber of bits requested to be allocated).
168
170 The program below creates a child process. The parent and child then
171 each assign themselves to a specified CPU and execute identical loops
172 that consume some CPU time. Before terminating, the parent waits for
173 the child to complete. The program takes three command-line arguments:
174 the CPU number for the parent, the CPU number for the child, and the
175 number of loop iterations that both processes should perform.
176
177 As the sample runs below demonstrate, the amount of real and CPU time
178 consumed when running the program will depend on intra-core caching ef‐
179 fects and whether the processes are using the same CPU.
180
181 We first employ lscpu(1) to determine that this (x86) system has two
182 cores, each with two CPUs:
183
184 $ lscpu | egrep -i 'core.*:|socket'
185 Thread(s) per core: 2
186 Core(s) per socket: 2
187 Socket(s): 1
188
189 We then time the operation of the example program for three cases: both
190 processes running on the same CPU; both processes running on different
191 CPUs on the same core; and both processes running on different CPUs on
192 different cores.
193
194 $ time -p ./a.out 0 0 100000000
195 real 14.75
196 user 3.02
197 sys 11.73
198 $ time -p ./a.out 0 1 100000000
199 real 11.52
200 user 3.98
201 sys 19.06
202 $ time -p ./a.out 0 3 100000000
203 real 7.89
204 user 3.29
205 sys 12.07
206
207 Program source
208
209 #define _GNU_SOURCE
210 #include <err.h>
211 #include <sched.h>
212 #include <stdio.h>
213 #include <stdlib.h>
214 #include <sys/wait.h>
215 #include <unistd.h>
216
217 int
218 main(int argc, char *argv[])
219 {
220 int parentCPU, childCPU;
221 cpu_set_t set;
222 unsigned int nloops;
223
224 if (argc != 4) {
225 fprintf(stderr, "Usage: %s parent-cpu child-cpu num-loops\n",
226 argv[0]);
227 exit(EXIT_FAILURE);
228 }
229
230 parentCPU = atoi(argv[1]);
231 childCPU = atoi(argv[2]);
232 nloops = atoi(argv[3]);
233
234 CPU_ZERO(&set);
235
236 switch (fork()) {
237 case -1: /* Error */
238 err(EXIT_FAILURE, "fork");
239
240 case 0: /* Child */
241 CPU_SET(childCPU, &set);
242
243 if (sched_setaffinity(getpid(), sizeof(set), &set) == -1)
244 err(EXIT_FAILURE, "sched_setaffinity");
245
246 for (unsigned int j = 0; j < nloops; j++)
247 getppid();
248
249 exit(EXIT_SUCCESS);
250
251 default: /* Parent */
252 CPU_SET(parentCPU, &set);
253
254 if (sched_setaffinity(getpid(), sizeof(set), &set) == -1)
255 err(EXIT_FAILURE, "sched_setaffinity");
256
257 for (unsigned int j = 0; j < nloops; j++)
258 getppid();
259
260 wait(NULL); /* Wait for child to terminate */
261 exit(EXIT_SUCCESS);
262 }
263 }
264
266 lscpu(1), nproc(1), taskset(1), clone(2), getcpu(2), getpriority(2),
267 gettid(2), nice(2), sched_get_priority_max(2),
268 sched_get_priority_min(2), sched_getscheduler(2),
269 sched_setscheduler(2), setpriority(2), CPU_SET(3), get_nprocs(3),
270 pthread_setaffinity_np(3), sched_getcpu(3), capabilities(7), cpuset(7),
271 sched(7), numactl(8)
272
273
274
275Linux man-pages 6.05 2023-05-03 sched_setaffinity(2)