1PRCTL(2) Linux Programmer's Manual PRCTL(2)
2
3
4
6 prctl - operations on a process
7
9 #include <sys/prctl.h>
10
11 int prctl(int option, unsigned long arg2, unsigned long arg3,
12 unsigned long arg4, unsigned long arg5);
13
15 prctl() is called with a first argument describing what to do (with
16 values defined in <linux/prctl.h>), and further arguments with a sig‐
17 nificance depending on the first one. The first argument can be:
18
19 PR_CAP_AMBIENT (since Linux 4.3)
20 Reads or changes the ambient capability set of the calling
21 thread, according to the value of arg2, which must be one of the
22 following:
23
24 PR_CAP_AMBIENT_RAISE
25 The capability specified in arg3 is added to the ambient
26 set. The specified capability must already be present in
27 both the permitted and the inheritable sets of the
28 process. This operation is not permitted if the
29 SECBIT_NO_CAP_AMBIENT_RAISE securebit is set.
30
31 PR_CAP_AMBIENT_LOWER
32 The capability specified in arg3 is removed from the
33 ambient set.
34
35 PR_CAP_AMBIENT_IS_SET
36 The prctl() call returns 1 if the capability in arg3 is
37 in the ambient set and 0 if it is not.
38
39 PR_CAP_AMBIENT_CLEAR_ALL
40 All capabilities will be removed from the ambient set.
41 This operation requires setting arg3 to zero.
42
43 In all of the above operations, arg4 and arg5 must be specified
44 as 0.
45
46 Higher-level interfaces layered on top of the above operations
47 are provided in the libcap(3) library in the form of
48 cap_get_ambient(3), cap_set_ambient(3), and cap_reset_ambi‐
49 ent(3).
50
51 PR_CAPBSET_READ (since Linux 2.6.25)
52 Return (as the function result) 1 if the capability specified in
53 arg2 is in the calling thread's capability bounding set, or 0 if
54 it is not. (The capability constants are defined in
55 <linux/capability.h>.) The capability bounding set dictates
56 whether the process can receive the capability through a file's
57 permitted capability set on a subsequent call to execve(2).
58
59 If the capability specified in arg2 is not valid, then the call
60 fails with the error EINVAL.
61
62 A higher-level interface layered on top of this operation is
63 provided in the libcap(3) library in the form of
64 cap_get_bound(3).
65
66 PR_CAPBSET_DROP (since Linux 2.6.25)
67 If the calling thread has the CAP_SETPCAP capability within its
68 user namespace, then drop the capability specified by arg2 from
69 the calling thread's capability bounding set. Any children of
70 the calling thread will inherit the newly reduced bounding set.
71
72 The call fails with the error: EPERM if the calling thread does
73 not have the CAP_SETPCAP; EINVAL if arg2 does not represent a
74 valid capability; or EINVAL if file capabilities are not enabled
75 in the kernel, in which case bounding sets are not supported.
76
77 A higher-level interface layered on top of this operation is
78 provided in the libcap(3) library in the form of
79 cap_drop_bound(3).
80
81 PR_SET_CHILD_SUBREAPER (since Linux 3.4)
82 If arg2 is nonzero, set the "child subreaper" attribute of the
83 calling process; if arg2 is zero, unset the attribute.
84
85 A subreaper fulfills the role of init(1) for its descendant pro‐
86 cesses. When a process becomes orphaned (i.e., its immediate
87 parent terminates), then that process will be reparented to the
88 nearest still living ancestor subreaper. Subsequently, calls to
89 getppid() in the orphaned process will now return the PID of the
90 subreaper process, and when the orphan terminates, it is the
91 subreaper process that will receive a SIGCHLD signal and will be
92 able to wait(2) on the process to discover its termination sta‐
93 tus.
94
95 The setting of the "child subreaper" attribute is not inherited
96 by children created by fork(2) and clone(2). The setting is
97 preserved across execve(2).
98
99 Establishing a subreaper process is useful in session management
100 frameworks where a hierarchical group of processes is managed by
101 a subreaper process that needs to be informed when one of the
102 processes—for example, a double-forked daemon—terminates (per‐
103 haps so that it can restart that process). Some init(1) frame‐
104 works (e.g., systemd(1)) employ a subreaper process for similar
105 reasons.
106
107 PR_GET_CHILD_SUBREAPER (since Linux 3.4)
108 Return the "child subreaper" setting of the caller, in the loca‐
109 tion pointed to by (int *) arg2.
110
111 PR_SET_DUMPABLE (since Linux 2.3.20)
112 Set the state of the "dumpable" flag, which determines whether
113 core dumps are produced for the calling process upon delivery of
114 a signal whose default behavior is to produce a core dump.
115
116 In kernels up to and including 2.6.12, arg2 must be either 0
117 (SUID_DUMP_DISABLE, process is not dumpable) or 1
118 (SUID_DUMP_USER, process is dumpable). Between kernels 2.6.13
119 and 2.6.17, the value 2 was also permitted, which caused any
120 binary which normally would not be dumped to be dumped readable
121 by root only; for security reasons, this feature has been
122 removed. (See also the description of /proc/sys/fs/
123 suid_dumpable in proc(5).)
124
125 Normally, this flag is set to 1. However, it is reset to the
126 current value contained in the file /proc/sys/fs/suid_dumpable
127 (which by default has the value 0), in the following circum‐
128 stances:
129
130 * The process's effective user or group ID is changed.
131
132 * The process's filesystem user or group ID is changed (see
133 credentials(7)).
134
135 * The process executes (execve(2)) a set-user-ID or set-group-
136 ID program, resulting in a change of either the effective
137 user ID or the effective group ID.
138
139 * The process executes (execve(2)) a program that has file
140 capabilities (see capabilities(7)), but only if the permitted
141 capabilities gained exceed those already permitted for the
142 process.
143
144 Processes that are not dumpable can not be attached via
145 ptrace(2) PTRACE_ATTACH; see ptrace(2) for further details.
146
147 If a process is not dumpable, the ownership of files in the
148 process's /proc/[pid] directory is affected as described in
149 proc(5).
150
151 PR_GET_DUMPABLE (since Linux 2.3.20)
152 Return (as the function result) the current state of the calling
153 process's dumpable flag.
154
155 PR_SET_ENDIAN (since Linux 2.6.18, PowerPC only)
156 Set the endian-ness of the calling process to the value given in
157 arg2, which should be one of the following: PR_ENDIAN_BIG,
158 PR_ENDIAN_LITTLE, or PR_ENDIAN_PPC_LITTLE (PowerPC pseudo little
159 endian).
160
161 PR_GET_ENDIAN (since Linux 2.6.18, PowerPC only)
162 Return the endian-ness of the calling process, in the location
163 pointed to by (int *) arg2.
164
165 PR_SET_FP_MODE (since Linux 4.0, only on MIPS)
166 On the MIPS architecture, user-space code can be built using an
167 ABI which permits linking with code that has more restrictive
168 floating-point (FP) requirements. For example, user-space code
169 may be built to target the O32 FPXX ABI and linked with code
170 built for either one of the more restrictive FP32 or FP64 ABIs.
171 When more restrictive code is linked in, the overall requirement
172 for the process is to use the more restrictive floating-point
173 mode.
174
175 Because the kernel has no means of knowing in advance which mode
176 the process should be executed in, and because these restric‐
177 tions can change over the lifetime of the process, the
178 PR_SET_FP_MODE operation is provided to allow control of the
179 floating-point mode from user space.
180
181 The (unsigned int) arg2 argument is a bit mask describing the
182 floating-point mode used:
183
184 PR_FP_MODE_FR
185 When this bit is unset (so called FR=0 or FR0 mode), the
186 32 floating-point registers are 32 bits wide, and 64-bit
187 registers are represented as a pair of registers (even-
188 and odd- numbered, with the even-numbered register con‐
189 taining the lower 32 bits, and the odd-numbered register
190 containing the higher 32 bits).
191
192 When this bit is set (on supported hardware), the 32
193 floating-point registers are 64 bits wide (so called FR=1
194 or FR1 mode). Note that modern MIPS implementations
195 (MIPS R6 and newer) support FR=1 mode only.
196
197 Applications that use the O32 FP32 ABI can operate only
198 when this bit is unset (FR=0; or they can be used with
199 FRE enabled, see below). Applications that use the O32
200 FP64 ABI (and the O32 FP64A ABI, which exists to provide
201 the ability to operate with existing FP32 code; see
202 below) can operate only when this bit is set (FR=1).
203 Applications that use the O32 FPXX ABI can operate with
204 either FR=0 or FR=1.
205
206 PR_FP_MODE_FRE
207 Enable emulation of 32-bit floating-point mode. When
208 this mode is enabled, it emulates 32-bit floating-point
209 operations by raising a reserved-instruction exception on
210 every instruction that uses 32-bit formats and the kernel
211 then handles the instruction in software. (The problem
212 lies in the discrepancy of handling odd-numbered regis‐
213 ters which are the high 32 bits of 64-bit registers with
214 even numbers in FR=0 mode and the lower 32-bit parts of
215 odd-numbered 64-bit registers in FR=1 mode.) Enabling
216 this bit is necessary when code with the O32 FP32 ABI
217 should operate with code with compatible the O32 FPXX or
218 O32 FP64A ABIs (which require FR=1 FPU mode) or when it
219 is executed on newer hardware (MIPS R6 onwards) which
220 lacks FR=0 mode support when a binary with the FP32 ABI
221 is used.
222
223 Note that this mode makes sense only when the FPU is in
224 64-bit mode (FR=1).
225
226 Note that the use of emulation inherently has a signifi‐
227 cant performance hit and should be avoided if possible.
228
229 In the N32/N64 ABI, 64-bit floating-point mode is always used,
230 so FPU emulation is not required and the FPU always operates in
231 FR=1 mode.
232
233 This option is mainly intended for use by the dynamic linker
234 (ld.so(8)).
235
236 The arguments arg3, arg4, and arg5 are ignored.
237
238 PR_GET_FP_MODE (since Linux 4.0, only on MIPS)
239 Return (as the function result) the current floating-point mode
240 (see the description of PR_SET_FP_MODE for details).
241
242 On success, the call returns a bit mask which represents the
243 current floating-point mode.
244
245 The arguments arg2, arg3, arg4, and arg5 are ignored.
246
247 PR_SET_FPEMU (since Linux 2.4.18, 2.5.9, only on ia64)
248 Set floating-point emulation control bits to arg2. Pass
249 PR_FPEMU_NOPRINT to silently emulate floating-point operation
250 accesses, or PR_FPEMU_SIGFPE to not emulate floating-point oper‐
251 ations and send SIGFPE instead.
252
253 PR_GET_FPEMU (since Linux 2.4.18, 2.5.9, only on ia64)
254 Return floating-point emulation control bits, in the location
255 pointed to by (int *) arg2.
256
257 PR_SET_FPEXC (since Linux 2.4.21, 2.5.32, only on PowerPC)
258 Set floating-point exception mode to arg2. Pass
259 PR_FP_EXC_SW_ENABLE to use FPEXC for FP exception enables,
260 PR_FP_EXC_DIV for floating-point divide by zero, PR_FP_EXC_OVF
261 for floating-point overflow, PR_FP_EXC_UND for floating-point
262 underflow, PR_FP_EXC_RES for floating-point inexact result,
263 PR_FP_EXC_INV for floating-point invalid operation,
264 PR_FP_EXC_DISABLED for FP exceptions disabled, PR_FP_EXC_NONRE‐
265 COV for async nonrecoverable exception mode, PR_FP_EXC_ASYNC for
266 async recoverable exception mode, PR_FP_EXC_PRECISE for precise
267 exception mode.
268
269 PR_GET_FPEXC (since Linux 2.4.21, 2.5.32, only on PowerPC)
270 Return floating-point exception mode, in the location pointed to
271 by (int *) arg2.
272
273 PR_SET_KEEPCAPS (since Linux 2.2.18)
274 Set the state of the calling thread's "keep capabilities" flag.
275 The effect of this flag is described in capabilities(7). arg2
276 must be either 0 (clear the flag) or 1 (set the flag). The
277 "keep capabilities" value will be reset to 0 on subsequent calls
278 to execve(2).
279
280 PR_GET_KEEPCAPS (since Linux 2.2.18)
281 Return (as the function result) the current state of the calling
282 thread's "keep capabilities" flag. See capabilities(7) for a
283 description of this flag.
284
285 PR_MCE_KILL (since Linux 2.6.32)
286 Set the machine check memory corruption kill policy for the
287 calling thread. If arg2 is PR_MCE_KILL_CLEAR, clear the thread
288 memory corruption kill policy and use the system-wide default.
289 (The system-wide default is defined by /proc/sys/vm/memory_fail‐
290 ure_early_kill; see proc(5).) If arg2 is PR_MCE_KILL_SET, use a
291 thread-specific memory corruption kill policy. In this case,
292 arg3 defines whether the policy is early kill
293 (PR_MCE_KILL_EARLY), late kill (PR_MCE_KILL_LATE), or the sys‐
294 tem-wide default (PR_MCE_KILL_DEFAULT). Early kill means that
295 the thread receives a SIGBUS signal as soon as hardware memory
296 corruption is detected inside its address space. In late kill
297 mode, the process is killed only when it accesses a corrupted
298 page. See sigaction(2) for more information on the SIGBUS sig‐
299 nal. The policy is inherited by children. The remaining unused
300 prctl() arguments must be zero for future compatibility.
301
302 PR_MCE_KILL_GET (since Linux 2.6.32)
303 Return (as the function result) the current per-process machine
304 check kill policy. All unused prctl() arguments must be zero.
305
306 PR_SET_MM (since Linux 3.3)
307 Modify certain kernel memory map descriptor fields of the call‐
308 ing process. Usually these fields are set by the kernel and
309 dynamic loader (see ld.so(8) for more information) and a regular
310 application should not use this feature. However, there are
311 cases, such as self-modifying programs, where a program might
312 find it useful to change its own memory map.
313
314 The calling process must have the CAP_SYS_RESOURCE capability.
315 The value in arg2 is one of the options below, while arg3 pro‐
316 vides a new value for the option. The arg4 and arg5 arguments
317 must be zero if unused.
318
319 Before Linux 3.10, this feature is available only if the kernel
320 is built with the CONFIG_CHECKPOINT_RESTORE option enabled.
321
322 PR_SET_MM_START_CODE
323 Set the address above which the program text can run.
324 The corresponding memory area must be readable and exe‐
325 cutable, but not writable or shareable (see mprotect(2)
326 and mmap(2) for more information).
327
328 PR_SET_MM_END_CODE
329 Set the address below which the program text can run.
330 The corresponding memory area must be readable and exe‐
331 cutable, but not writable or shareable.
332
333 PR_SET_MM_START_DATA
334 Set the address above which initialized and uninitialized
335 (bss) data are placed. The corresponding memory area
336 must be readable and writable, but not executable or
337 shareable.
338
339 PR_SET_MM_END_DATA
340 Set the address below which initialized and uninitialized
341 (bss) data are placed. The corresponding memory area
342 must be readable and writable, but not executable or
343 shareable.
344
345 PR_SET_MM_START_STACK
346 Set the start address of the stack. The corresponding
347 memory area must be readable and writable.
348
349 PR_SET_MM_START_BRK
350 Set the address above which the program heap can be
351 expanded with brk(2) call. The address must be greater
352 than the ending address of the current program data seg‐
353 ment. In addition, the combined size of the resulting
354 heap and the size of the data segment can't exceed the
355 RLIMIT_DATA resource limit (see setrlimit(2)).
356
357 PR_SET_MM_BRK
358 Set the current brk(2) value. The requirements for the
359 address are the same as for the PR_SET_MM_START_BRK
360 option.
361
362 The following options are available since Linux 3.5.
363
364 PR_SET_MM_ARG_START
365 Set the address above which the program command line is
366 placed.
367
368 PR_SET_MM_ARG_END
369 Set the address below which the program command line is
370 placed.
371
372 PR_SET_MM_ENV_START
373 Set the address above which the program environment is
374 placed.
375
376 PR_SET_MM_ENV_END
377 Set the address below which the program environment is
378 placed.
379
380 The address passed with PR_SET_MM_ARG_START,
381 PR_SET_MM_ARG_END, PR_SET_MM_ENV_START, and
382 PR_SET_MM_ENV_END should belong to a process stack area.
383 Thus, the corresponding memory area must be readable,
384 writable, and (depending on the kernel configuration)
385 have the MAP_GROWSDOWN attribute set (see mmap(2)).
386
387 PR_SET_MM_AUXV
388 Set a new auxiliary vector. The arg3 argument should
389 provide the address of the vector. The arg4 is the size
390 of the vector.
391
392 PR_SET_MM_EXE_FILE
393 Supersede the /proc/pid/exe symbolic link with a new one
394 pointing to a new executable file identified by the file
395 descriptor provided in arg3 argument. The file descrip‐
396 tor should be obtained with a regular open(2) call.
397
398 To change the symbolic link, one needs to unmap all
399 existing executable memory areas, including those created
400 by the kernel itself (for example the kernel usually cre‐
401 ates at least one executable memory area for the ELF
402 .text section).
403
404 In Linux 4.9 and earlier, the PR_SET_MM_EXE_FILE opera‐
405 tion can be performed only once in a process's lifetime;
406 attempting to perform the operation a second time results
407 in the error EPERM. This restriction was enforced for
408 security reasons that were subsequently deemed specious,
409 and the restriction was removed in Linux 4.10 because
410 some user-space applications needed to perform this oper‐
411 ation more than once.
412
413 The following options are available since Linux 3.18.
414
415 PR_SET_MM_MAP
416 Provides one-shot access to all the addresses by passing
417 in a struct prctl_mm_map (as defined in <linux/prctl.h>).
418 The arg4 argument should provide the size of the struct.
419
420 This feature is available only if the kernel is built
421 with the CONFIG_CHECKPOINT_RESTORE option enabled.
422
423 PR_SET_MM_MAP_SIZE
424 Returns the size of the struct prctl_mm_map the kernel
425 expects. This allows user space to find a compatible
426 struct. The arg4 argument should be a pointer to an
427 unsigned int.
428
429 This feature is available only if the kernel is built
430 with the CONFIG_CHECKPOINT_RESTORE option enabled.
431
432 PR_MPX_ENABLE_MANAGEMENT, PR_MPX_DISABLE_MANAGEMENT (since Linux 3.19)
433 Enable or disable kernel management of Memory Protection eXten‐
434 sions (MPX) bounds tables. The arg2, arg3, arg4, and arg5 argu‐
435 ments must be zero.
436
437 MPX is a hardware-assisted mechanism for performing bounds
438 checking on pointers. It consists of a set of registers storing
439 bounds information and a set of special instruction prefixes
440 that tell the CPU on which instructions it should do bounds
441 enforcement. There is a limited number of these registers and
442 when there are more pointers than registers, their contents must
443 be "spilled" into a set of tables. These tables are called
444 "bounds tables" and the MPX prctl() operations control whether
445 the kernel manages their allocation and freeing.
446
447 When management is enabled, the kernel will take over allocation
448 and freeing of the bounds tables. It does this by trapping the
449 #BR exceptions that result at first use of missing bounds tables
450 and instead of delivering the exception to user space, it allo‐
451 cates the table and populates the bounds directory with the
452 location of the new table. For freeing, the kernel checks to
453 see if bounds tables are present for memory which is not allo‐
454 cated, and frees them if so.
455
456 Before enabling MPX management using PR_MPX_ENABLE_MANAGEMENT,
457 the application must first have allocated a user-space buffer
458 for the bounds directory and placed the location of that direc‐
459 tory in the bndcfgu register.
460
461 These calls fail if the CPU or kernel does not support MPX.
462 Kernel support for MPX is enabled via the CONFIG_X86_INTEL_MPX
463 configuration option. You can check whether the CPU supports
464 MPX by looking for the 'mpx' CPUID bit, like with the following
465 command:
466
467 cat /proc/cpuinfo | grep ' mpx '
468
469 A thread may not switch in or out of long (64-bit) mode while
470 MPX is enabled.
471
472 All threads in a process are affected by these calls.
473
474 The child of a fork(2) inherits the state of MPX management.
475 During execve(2), MPX management is reset to a state as if
476 PR_MPX_DISABLE_MANAGEMENT had been called.
477
478 For further information on Intel MPX, see the kernel source file
479 Documentation/x86/intel_mpx.txt.
480
481 PR_SET_NAME (since Linux 2.6.9)
482 Set the name of the calling thread, using the value in the loca‐
483 tion pointed to by (char *) arg2. The name can be up to 16
484 bytes long, including the terminating null byte. (If the length
485 of the string, including the terminating null byte, exceeds 16
486 bytes, the string is silently truncated.) This is the same
487 attribute that can be set via pthread_setname_np(3) and
488 retrieved using pthread_getname_np(3). The attribute is like‐
489 wise accessible via /proc/self/task/[tid]/comm, where tid is the
490 name of the calling thread.
491
492 PR_GET_NAME (since Linux 2.6.11)
493 Return the name of the calling thread, in the buffer pointed to
494 by (char *) arg2. The buffer should allow space for up to 16
495 bytes; the returned string will be null-terminated.
496
497 PR_SET_NO_NEW_PRIVS (since Linux 3.5)
498 Set the calling thread's no_new_privs attribute to the value in
499 arg2. With no_new_privs set to 1, execve(2) promises not to
500 grant privileges to do anything that could not have been done
501 without the execve(2) call (for example, rendering the set-user-
502 ID and set-group-ID mode bits, and file capabilities non-func‐
503 tional). Once set, this the no_new_privs attribute cannot be
504 unset. The setting of this attribute is inherited by children
505 created by fork(2) and clone(2), and preserved across execve(2).
506
507 Since Linux 4.10, the value of a thread's no_new_privs attribute
508 can be viewed via the NoNewPrivs field in the /proc/[pid]/status
509 file.
510
511 For more information, see the kernel source file Documenta‐
512 tion/userspace-api/no_new_privs.rst (or Documenta‐
513 tion/prctl/no_new_privs.txt before Linux 4.13). See also sec‐
514 comp(2).
515
516 PR_GET_NO_NEW_PRIVS (since Linux 3.5)
517 Return (as the function result) the value of the no_new_privs
518 attribute for the calling thread. A value of 0 indicates the
519 regular execve(2) behavior. A value of 1 indicates execve(2)
520 will operate in the privilege-restricting mode described above.
521
522 PR_SET_PDEATHSIG (since Linux 2.1.57)
523 Set the parent-death signal of the calling process to arg2
524 (either a signal value in the range 1..maxsig, or 0 to clear).
525 This is the signal that the calling process will get when its
526 parent dies.
527
528 Warning: the "parent" in this case is considered to be the
529 thread that created this process. In other words, the signal
530 will be sent when that thread terminates (via, for example,
531 pthread_exit(3)), rather than after all of the threads in the
532 parent process terminate.
533
534 The parent-death signal is sent upon subsequent termination of
535 the parent thread and also upon termination of each subreaper
536 process (see the description of PR_SET_CHILD_SUBREAPER above) to
537 which the caller is subsequently reparented. If the parent
538 thread and all ancestor subreapers have already terminated by
539 the time of the PR_SET_PDEATHSIG operation, then no parent-death
540 signal is sent to the caller.
541
542 The parent-death signal is process-directed (see signal(7)) and,
543 if the child installs a handler using the sigaction(2) SA_SIG‐
544 INFO flag, the si_pid field of the siginfo_t argument of the
545 handler contains the PID of the terminating parent process.
546
547 The parent-death signal setting is cleared for the child of a
548 fork(2). It is also (since Linux 2.4.36 / 2.6.23) cleared when
549 executing a set-user-ID or set-group-ID binary, or a binary that
550 has associated capabilities (see capabilities(7)); otherwise,
551 this value is preserved across execve(2).
552
553 PR_GET_PDEATHSIG (since Linux 2.3.15)
554 Return the current value of the parent process death signal, in
555 the location pointed to by (int *) arg2.
556
557 PR_SET_PTRACER (since Linux 3.4)
558 This is meaningful only when the Yama LSM is enabled and in mode
559 1 ("restricted ptrace", visible via /proc/sys/ker‐
560 nel/yama/ptrace_scope). When a "ptracer process ID" is passed
561 in arg2, the caller is declaring that the ptracer process can
562 ptrace(2) the calling process as if it were a direct process
563 ancestor. Each PR_SET_PTRACER operation replaces the previous
564 "ptracer process ID". Employing PR_SET_PTRACER with arg2 set to
565 0 clears the caller's "ptracer process ID". If arg2 is
566 PR_SET_PTRACER_ANY, the ptrace restrictions introduced by Yama
567 are effectively disabled for the calling process.
568
569 For further information, see the kernel source file Documenta‐
570 tion/admin-guide/LSM/Yama.rst (or Documentation/secu‐
571 rity/Yama.txt before Linux 4.13).
572
573 PR_SET_SECCOMP (since Linux 2.6.23)
574 Set the secure computing (seccomp) mode for the calling thread,
575 to limit the available system calls. The more recent seccomp(2)
576 system call provides a superset of the functionality of
577 PR_SET_SECCOMP.
578
579 The seccomp mode is selected via arg2. (The seccomp constants
580 are defined in <linux/seccomp.h>.)
581
582 With arg2 set to SECCOMP_MODE_STRICT, the only system calls that
583 the thread is permitted to make are read(2), write(2), _exit(2)
584 (but not exit_group(2)), and sigreturn(2). Other system calls
585 result in the delivery of a SIGKILL signal. Strict secure com‐
586 puting mode is useful for number-crunching applications that may
587 need to execute untrusted byte code, perhaps obtained by reading
588 from a pipe or socket. This operation is available only if the
589 kernel is configured with CONFIG_SECCOMP enabled.
590
591 With arg2 set to SECCOMP_MODE_FILTER (since Linux 3.5), the sys‐
592 tem calls allowed are defined by a pointer to a Berkeley Packet
593 Filter passed in arg3. This argument is a pointer to struct
594 sock_fprog; it can be designed to filter arbitrary system calls
595 and system call arguments. This mode is available only if the
596 kernel is configured with CONFIG_SECCOMP_FILTER enabled.
597
598 If SECCOMP_MODE_FILTER filters permit fork(2), then the seccomp
599 mode is inherited by children created by fork(2); if execve(2)
600 is permitted, then the seccomp mode is preserved across
601 execve(2). If the filters permit prctl() calls, then additional
602 filters can be added; they are run in order until the first non-
603 allow result is seen.
604
605 For further information, see the kernel source file Documenta‐
606 tion/userspace-api/seccomp_filter.rst (or Documenta‐
607 tion/prctl/seccomp_filter.txt before Linux 4.13).
608
609 PR_GET_SECCOMP (since Linux 2.6.23)
610 Return (as the function result) the secure computing mode of the
611 calling thread. If the caller is not in secure computing mode,
612 this operation returns 0; if the caller is in strict secure com‐
613 puting mode, then the prctl() call will cause a SIGKILL signal
614 to be sent to the process. If the caller is in filter mode, and
615 this system call is allowed by the seccomp filters, it returns
616 2; otherwise, the process is killed with a SIGKILL signal. This
617 operation is available only if the kernel is configured with
618 CONFIG_SECCOMP enabled.
619
620 Since Linux 3.8, the Seccomp field of the /proc/[pid]/status
621 file provides a method of obtaining the same information, with‐
622 out the risk that the process is killed; see proc(5).
623
624 PR_SET_SECUREBITS (since Linux 2.6.26)
625 Set the "securebits" flags of the calling thread to the value
626 supplied in arg2. See capabilities(7).
627
628 PR_GET_SECUREBITS (since Linux 2.6.26)
629 Return (as the function result) the "securebits" flags of the
630 calling thread. See capabilities(7).
631
632 PR_GET_SPECULATION_CTRL (since Linux 4.17)
633 Return (as the function result) the state of the speculation
634 misfeature specified in arg2. Currently, the only permitted
635 value for this argument is PR_SPEC_STORE_BYPASS (otherwise the
636 call fails with the error ENODEV).
637
638 The return value uses bits 0-3 with the following meaning:
639
640 PR_SPEC_PRCTL
641 Mitigation can be controlled per thread by PR_SET_SPECU‐
642 LATION_CTRL
643
644 PR_SPEC_ENABLE
645 The speculation feature is enabled, mitigation is dis‐
646 abled.
647
648 PR_SPEC_DISABLE
649 The speculation feature is disabled, mitigation is
650 enabled
651
652 PR_SPEC_FORCE_DISABLE
653 Same as PR_SPEC_DISABLE but cannot be undone.
654
655 If all bits are 0, then the CPU is not affected by the specula‐
656 tion misfeature.
657
658 If PR_SPEC_PRCTL is set, then per-thread control of the mitiga‐
659 tion is available. If not set, prctl() for the speculation mis‐
660 feature will fail.
661
662 The arg3, arg4, and arg5 arguments must be specified as 0; oth‐
663 erwise the call fails with the error EINVAL.
664
665 PR_SET_SPECULATION_CTRL (since Linux 4.17)
666 Sets the state of the speculation misfeature specified in arg2.
667 Currently, the only permitted value for this argument is
668 PR_SPEC_STORE_BYPASS (otherwise the call fails with the error
669 ENODEV). This setting is a per-thread attribute. The arg3
670 argument is used to hand in the control value, which is one of
671 the following:
672
673 PR_SPEC_ENABLE
674 The speculation feature is enabled, mitigation is dis‐
675 abled.
676
677 PR_SPEC_DISABLE
678 The speculation feature is disabled, mitigation is
679 enabled
680
681 PR_SPEC_FORCE_DISABLE
682 Same as PR_SPEC_DISABLE but cannot be undone. A subse‐
683 quent prctl(..., PR_SPEC_ENABLE) will fail with the error
684 EPERM.
685
686 Any other value in arg3 will result in the call failing with the
687 error ERANGE.
688
689 The arg4 and arg5 arguments must be specified as 0; otherwise
690 the call fails with the error EINVAL.
691
692 The speculation feature can also be controlled by the
693 spec_store_bypass_disable boot parameter. This parameter may
694 enforce a read-only policy which will result in the prctl() call
695 failing with the error ENXIO. For further details, see the ker‐
696 nel source file Documentation/admin-guide/kernel-parameters.txt.
697
698 PR_SET_THP_DISABLE (since Linux 3.15)
699 Set the state of the "THP disable" flag for the calling thread.
700 If arg2 has a nonzero value, the flag is set, otherwise it is
701 cleared. Setting this flag provides a method for disabling
702 transparent huge pages for jobs where the code cannot be modi‐
703 fied, and using a malloc hook with madvise(2) is not an option
704 (i.e., statically allocated data). The setting of the "THP dis‐
705 able" flag is inherited by a child created via fork(2) and is
706 preserved across execve(2).
707
708 PR_TASK_PERF_EVENTS_DISABLE (since Linux 2.6.31)
709 Disable all performance counters attached to the calling
710 process, regardless of whether the counters were created by this
711 process or another process. Performance counters created by the
712 calling process for other processes are unaffected. For more
713 information on performance counters, see the Linux kernel source
714 file tools/perf/design.txt.
715
716 Originally called PR_TASK_PERF_COUNTERS_DISABLE; renamed
717 (retaining the same numerical value) in Linux 2.6.32.
718
719 PR_TASK_PERF_EVENTS_ENABLE (since Linux 2.6.31)
720 The converse of PR_TASK_PERF_EVENTS_DISABLE; enable performance
721 counters attached to the calling process.
722
723 Originally called PR_TASK_PERF_COUNTERS_ENABLE; renamed in Linux
724 2.6.32.
725
726 PR_GET_THP_DISABLE (since Linux 3.15)
727 Return (as the function result) the current setting of the "THP
728 disable" flag for the calling thread: either 1, if the flag is
729 set, or 0, if it is not.
730
731 PR_GET_TID_ADDRESS (since Linux 3.5)
732 Return the clear_child_tid address set by set_tid_address(2) and
733 the clone(2) CLONE_CHILD_CLEARTID flag, in the location pointed
734 to by (int **) arg2. This feature is available only if the ker‐
735 nel is built with the CONFIG_CHECKPOINT_RESTORE option enabled.
736 Note that since the prctl() system call does not have a compat
737 implementation for the AMD64 x32 and MIPS n32 ABIs, and the ker‐
738 nel writes out a pointer using the kernel's pointer size, this
739 operation expects a user-space buffer of 8 (not 4) bytes on
740 these ABIs.
741
742 PR_SET_TIMERSLACK (since Linux 2.6.28)
743 Each thread has two associated timer slack values: a "default"
744 value, and a "current" value. This operation sets the "current"
745 timer slack value for the calling thread. arg2 is an unsigned
746 long value, then maximum "current" value is ULONG_MAX and the
747 minimum "current" value is 1. If the nanosecond value supplied
748 in arg2 is greater than zero, then the "current" value is set to
749 this value. If arg2 is equal to zero, the "current" timer slack
750 is reset to the thread's "default" timer slack value.
751
752 The "current" timer slack is used by the kernel to group timer
753 expirations for the calling thread that are close to one
754 another; as a consequence, timer expirations for the thread may
755 be up to the specified number of nanoseconds late (but will
756 never expire early). Grouping timer expirations can help reduce
757 system power consumption by minimizing CPU wake-ups.
758
759 The timer expirations affected by timer slack are those set by
760 select(2), pselect(2), poll(2), ppoll(2), epoll_wait(2),
761 epoll_pwait(2), clock_nanosleep(2), nanosleep(2), and futex(2)
762 (and thus the library functions implemented via futexes, includ‐
763 ing pthread_cond_timedwait(3), pthread_mutex_timedlock(3),
764 pthread_rwlock_timedrdlock(3), pthread_rwlock_timedwrlock(3),
765 and sem_timedwait(3)).
766
767 Timer slack is not applied to threads that are scheduled under a
768 real-time scheduling policy (see sched_setscheduler(2)).
769
770 When a new thread is created, the two timer slack values are
771 made the same as the "current" value of the creating thread.
772 Thereafter, a thread can adjust its "current" timer slack value
773 via PR_SET_TIMERSLACK. The "default" value can't be changed.
774 The timer slack values of init (PID 1), the ancestor of all pro‐
775 cesses, are 50,000 nanoseconds (50 microseconds). The timer
776 slack value is inherited by a child created via fork(2), and is
777 preserved across execve(2).
778
779 Since Linux 4.6, the "current" timer slack value of any process
780 can be examined and changed via the file /proc/[pid]/timer‐
781 slack_ns. See proc(5).
782
783 PR_GET_TIMERSLACK (since Linux 2.6.28)
784 Return (as the function result) the "current" timer slack value
785 of the calling thread.
786
787 PR_SET_TIMING (since Linux 2.6.0)
788 Set whether to use (normal, traditional) statistical process
789 timing or accurate timestamp-based process timing, by passing
790 PR_TIMING_STATISTICAL or PR_TIMING_TIMESTAMP to arg2. PR_TIM‐
791 ING_TIMESTAMP is not currently implemented (attempting to set
792 this mode will yield the error EINVAL).
793
794 PR_GET_TIMING (since Linux 2.6.0)
795 Return (as the function result) which process timing method is
796 currently in use.
797
798 PR_SET_TSC (since Linux 2.6.26, x86 only)
799 Set the state of the flag determining whether the timestamp
800 counter can be read by the process. Pass PR_TSC_ENABLE to arg2
801 to allow it to be read, or PR_TSC_SIGSEGV to generate a SIGSEGV
802 when the process tries to read the timestamp counter.
803
804 PR_GET_TSC (since Linux 2.6.26, x86 only)
805 Return the state of the flag determining whether the timestamp
806 counter can be read, in the location pointed to by (int *) arg2.
807
808 PR_SET_UNALIGN
809 (Only on: ia64, since Linux 2.3.48; parisc, since Linux 2.6.15;
810 PowerPC, since Linux 2.6.18; Alpha, since Linux 2.6.22; sh,
811 since Linux 2.6.34; tile, since Linux 3.12) Set unaligned access
812 control bits to arg2. Pass PR_UNALIGN_NOPRINT to silently fix
813 up unaligned user accesses, or PR_UNALIGN_SIGBUS to generate
814 SIGBUS on unaligned user access. Alpha also supports an addi‐
815 tional flag with the value of 4 and no corresponding named con‐
816 stant, which instructs kernel to not fix up unaligned accesses
817 (it is analogous to providing the UAC_NOFIX flag in SSI_NVPAIRS
818 operation of the setsysinfo() system call on Tru64).
819
820 PR_GET_UNALIGN
821 (See PR_SET_UNALIGN for information on versions and architec‐
822 tures.) Return unaligned access control bits, in the location
823 pointed to by (unsigned int *) arg2.
824
826 On success, PR_GET_DUMPABLE, PR_GET_FP_MODE, PR_GET_KEEPCAPS,
827 PR_GET_NO_NEW_PRIVS, PR_GET_THP_DISABLE, PR_CAPBSET_READ, PR_GET_TIM‐
828 ING, PR_GET_TIMERSLACK, PR_GET_SECUREBITS, PR_GET_SPECULATION_CTRL,
829 PR_MCE_KILL_GET, PR_CAP_AMBIENT+PR_CAP_AMBIENT_IS_SET, and (if it
830 returns) PR_GET_SECCOMP return the nonnegative values described above.
831 All other option values return 0 on success. On error, -1 is returned,
832 and errno is set appropriately.
833
835 EACCES option is PR_SET_SECCOMP and arg2 is SECCOMP_MODE_FILTER, but
836 the process does not have the CAP_SYS_ADMIN capability or has
837 not set the no_new_privs attribute (see the discussion of
838 PR_SET_NO_NEW_PRIVS above).
839
840 EACCES option is PR_SET_MM, and arg3 is PR_SET_MM_EXE_FILE, the file is
841 not executable.
842
843 EBADF option is PR_SET_MM, arg3 is PR_SET_MM_EXE_FILE, and the file
844 descriptor passed in arg4 is not valid.
845
846 EBUSY option is PR_SET_MM, arg3 is PR_SET_MM_EXE_FILE, and this the
847 second attempt to change the /proc/pid/exe symbolic link, which
848 is prohibited.
849
850 EFAULT arg2 is an invalid address.
851
852 EFAULT option is PR_SET_SECCOMP, arg2 is SECCOMP_MODE_FILTER, the sys‐
853 tem was built with CONFIG_SECCOMP_FILTER, and arg3 is an invalid
854 address.
855
856 EINVAL The value of option is not recognized.
857
858 EINVAL option is PR_MCE_KILL or PR_MCE_KILL_GET or PR_SET_MM, and
859 unused prctl() arguments were not specified as zero.
860
861 EINVAL arg2 is not valid value for this option.
862
863 EINVAL option is PR_SET_SECCOMP or PR_GET_SECCOMP, and the kernel was
864 not configured with CONFIG_SECCOMP.
865
866 EINVAL option is PR_SET_SECCOMP, arg2 is SECCOMP_MODE_FILTER, and the
867 kernel was not configured with CONFIG_SECCOMP_FILTER.
868
869 EINVAL option is PR_SET_MM, and one of the following is true
870
871 * arg4 or arg5 is nonzero;
872
873 * arg3 is greater than TASK_SIZE (the limit on the size of the
874 user address space for this architecture);
875
876 * arg2 is PR_SET_MM_START_CODE, PR_SET_MM_END_CODE,
877 PR_SET_MM_START_DATA, PR_SET_MM_END_DATA, or
878 PR_SET_MM_START_STACK, and the permissions of the correspond‐
879 ing memory area are not as required;
880
881 * arg2 is PR_SET_MM_START_BRK or PR_SET_MM_BRK, and arg3 is
882 less than or equal to the end of the data segment or speci‐
883 fies a value that would cause the RLIMIT_DATA resource limit
884 to be exceeded.
885
886 EINVAL option is PR_SET_PTRACER and arg2 is not 0, PR_SET_PTRACER_ANY,
887 or the PID of an existing process.
888
889 EINVAL option is PR_SET_PDEATHSIG and arg2 is not a valid signal num‐
890 ber.
891
892 EINVAL option is PR_SET_DUMPABLE and arg2 is neither SUID_DUMP_DISABLE
893 nor SUID_DUMP_USER.
894
895 EINVAL option is PR_SET_TIMING and arg2 is not PR_TIMING_STATISTICAL.
896
897 EINVAL option is PR_SET_NO_NEW_PRIVS and arg2 is not equal to 1 or
898 arg3, arg4, or arg5 is nonzero.
899
900 EINVAL option is PR_GET_NO_NEW_PRIVS and arg2, arg3, arg4, or arg5 is
901 nonzero.
902
903 EINVAL option is PR_SET_THP_DISABLE and arg3, arg4, or arg5 is nonzero.
904
905 EINVAL option is PR_GET_THP_DISABLE and arg2, arg3, arg4, or arg5 is
906 nonzero.
907
908 EINVAL option is PR_CAP_AMBIENT and an unused argument (arg4, arg5, or,
909 in the case of PR_CAP_AMBIENT_CLEAR_ALL, arg3) is nonzero; or
910 arg2 has an invalid value; or arg2 is PR_CAP_AMBIENT_LOWER,
911 PR_CAP_AMBIENT_RAISE, or PR_CAP_AMBIENT_IS_SET and arg3 does not
912 specify a valid capability.
913
914 EINVAL option was PR_GET_SPECULATION_CTRL or PR_SET_SPECULATION_CTRL
915 and unused arguments to prctl() are not 0.
916
917 ENODEV option was PR_SET_SPECULATION_CTRL the kernel or CPU does not
918 support the requested speculation misfeature.
919
920 ENXIO option was PR_MPX_ENABLE_MANAGEMENT or PR_MPX_DISABLE_MANAGEMENT
921 and the kernel or the CPU does not support MPX management.
922 Check that the kernel and processor have MPX support.
923
924 ENXIO option was PR_SET_SPECULATION_CTRL implies that the control of
925 the selected speculation misfeature is not possible. See
926 PR_GET_SPECULATION_CTRL for the bit fields to determine which
927 option is available.
928
929 EOPNOTSUPP
930 option is PR_SET_FP_MODE and arg2 has an invalid or unsupported
931 value.
932
933 EPERM option is PR_SET_SECUREBITS, and the caller does not have the
934 CAP_SETPCAP capability, or tried to unset a "locked" flag, or
935 tried to set a flag whose corresponding locked flag was set (see
936 capabilities(7)).
937
938 EPERM option is PR_SET_SPECULATION_CTRL wherein the speculation was
939 disabled with PR_SPEC_FORCE_DISABLE and caller tried to enable
940 it again.
941
942 EPERM option is PR_SET_KEEPCAPS, and the caller's
943 SECBIT_KEEP_CAPS_LOCKED flag is set (see capabilities(7)).
944
945 EPERM option is PR_CAPBSET_DROP, and the caller does not have the
946 CAP_SETPCAP capability.
947
948 EPERM option is PR_SET_MM, and the caller does not have the
949 CAP_SYS_RESOURCE capability.
950
951 EPERM option is PR_CAP_AMBIENT and arg2 is PR_CAP_AMBIENT_RAISE, but
952 either the capability specified in arg3 is not present in the
953 process's permitted and inheritable capability sets, or the
954 PR_CAP_AMBIENT_LOWER securebit has been set.
955
956 ERANGE option was PR_SET_SPECULATION_CTRL and arg3 is neither
957 PR_SPEC_ENABLE, PR_SPEC_DISABLE, nor PR_SPEC_FORCE_DISABLE.
958
960 The prctl() system call was introduced in Linux 2.1.57.
961
963 This call is Linux-specific. IRIX has a prctl() system call (also
964 introduced in Linux 2.1.44 as irix_prctl on the MIPS architecture),
965 with prototype
966
967 ptrdiff_t prctl(int option, int arg2, int arg3);
968
969 and options to get the maximum number of processes per user, get the
970 maximum number of processors the calling process can use, find out
971 whether a specified process is currently blocked, get or set the maxi‐
972 mum stack size, and so on.
973
975 signal(2), core(5)
976
978 This page is part of release 5.04 of the Linux man-pages project. A
979 description of the project, information about reporting bugs, and the
980 latest version of this page, can be found at
981 https://www.kernel.org/doc/man-pages/.
982
983
984
985Linux 2019-08-02 PRCTL(2)