1CAPABILITIES(7) Linux Programmer's Manual CAPABILITIES(7)
2
3
4
6 capabilities - overview of Linux capabilities
7
9 For the purpose of performing permission checks, traditional UNIX
10 implementations distinguish two categories of processes: privileged
11 processes (whose effective user ID is 0, referred to as superuser or
12 root), and unprivileged processes (whose effective UID is nonzero).
13 Privileged processes bypass all kernel permission checks, while unpriv‐
14 ileged processes are subject to full permission checking based on the
15 process's credentials (usually: effective UID, effective GID, and sup‐
16 plementary group list).
17
18 Starting with kernel 2.2, Linux divides the privileges traditionally
19 associated with superuser into distinct units, known as capabilities,
20 which can be independently enabled and disabled. Capabilities are a
21 per-thread attribute.
22
23 Capabilities list
24 The following list shows the capabilities implemented on Linux, and the
25 operations or behaviors that each capability permits:
26
27 CAP_AUDIT_CONTROL (since Linux 2.6.11)
28 Enable and disable kernel auditing; change auditing filter
29 rules; retrieve auditing status and filtering rules.
30
31 CAP_AUDIT_READ (since Linux 3.16)
32 Allow reading the audit log via a multicast netlink socket.
33
34 CAP_AUDIT_WRITE (since Linux 2.6.11)
35 Write records to kernel auditing log.
36
37 CAP_BLOCK_SUSPEND (since Linux 3.5)
38 Employ features that can block system suspend (epoll(7) EPOLL‐
39 WAKEUP, /proc/sys/wake_lock).
40
41 CAP_CHOWN
42 Make arbitrary changes to file UIDs and GIDs (see chown(2)).
43
44 CAP_DAC_OVERRIDE
45 Bypass file read, write, and execute permission checks. (DAC is
46 an abbreviation of "discretionary access control".)
47
48 CAP_DAC_READ_SEARCH
49 * Bypass file read permission checks and directory read and exe‐
50 cute permission checks;
51 * invoke open_by_handle_at(2);
52 * use the linkat(2) AT_EMPTY_PATH flag to create a link to a
53 file referred to by a file descriptor.
54
55 CAP_FOWNER
56 * Bypass permission checks on operations that normally require
57 the filesystem UID of the process to match the UID of the file
58 (e.g., chmod(2), utime(2)), excluding those operations covered
59 by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
60 * set inode flags (see ioctl_iflags(2)) on arbitrary files;
61 * set Access Control Lists (ACLs) on arbitrary files;
62 * ignore directory sticky bit on file deletion;
63 * modify user extended attributes on sticky directory owned by
64 any user;
65 * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
66
67 CAP_FSETID
68 * Don't clear set-user-ID and set-group-ID mode bits when a file
69 is modified;
70 * set the set-group-ID bit for a file whose GID does not match
71 the filesystem or any of the supplementary GIDs of the calling
72 process.
73
74 CAP_IPC_LOCK
75 Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
76
77 CAP_IPC_OWNER
78 Bypass permission checks for operations on System V IPC objects.
79
80 CAP_KILL
81 Bypass permission checks for sending signals (see kill(2)).
82 This includes use of the ioctl(2) KDSIGACCEPT operation.
83
84 CAP_LEASE (since Linux 2.4)
85 Establish leases on arbitrary files (see fcntl(2)).
86
87 CAP_LINUX_IMMUTABLE
88 Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags (see
89 ioctl_iflags(2)).
90
91 CAP_MAC_ADMIN (since Linux 2.6.25)
92 Allow MAC configuration or state changes. Implemented for the
93 Smack Linux Security Module (LSM).
94
95 CAP_MAC_OVERRIDE (since Linux 2.6.25)
96 Override Mandatory Access Control (MAC). Implemented for the
97 Smack LSM.
98
99 CAP_MKNOD (since Linux 2.4)
100 Create special files using mknod(2).
101
102 CAP_NET_ADMIN
103 Perform various network-related operations:
104 * interface configuration;
105 * administration of IP firewall, masquerading, and accounting;
106 * modify routing tables;
107 * bind to any address for transparent proxying;
108 * set type-of-service (TOS)
109 * clear driver statistics;
110 * set promiscuous mode;
111 * enabling multicasting;
112 * use setsockopt(2) to set the following socket options:
113 SO_DEBUG, SO_MARK, SO_PRIORITY (for a priority outside the
114 range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
115
116 CAP_NET_BIND_SERVICE
117 Bind a socket to Internet domain privileged ports (port numbers
118 less than 1024).
119
120 CAP_NET_BROADCAST
121 (Unused) Make socket broadcasts, and listen to multicasts.
122
123 CAP_NET_RAW
124 * Use RAW and PACKET sockets;
125 * bind to any address for transparent proxying.
126
127 CAP_SETGID
128 * Make arbitrary manipulations of process GIDs and supplementary
129 GID list;
130 * forge GID when passing socket credentials via UNIX domain
131 sockets;
132 * write a group ID mapping in a user namespace (see user_names‐
133 paces(7)).
134
135 CAP_SETFCAP (since Linux 2.6.24)
136 Set arbitrary capabilities on a file.
137
138 CAP_SETPCAP
139 If file capabilities are supported (i.e., since Linux 2.6.24):
140 add any capability from the calling thread's bounding set to its
141 inheritable set; drop capabilities from the bounding set (via
142 prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
143
144 If file capabilities are not supported (i.e., kernels before
145 Linux 2.6.24): grant or remove any capability in the caller's
146 permitted capability set to or from any other process. (This
147 property of CAP_SETPCAP is not available when the kernel is con‐
148 figured to support file capabilities, since CAP_SETPCAP has
149 entirely different semantics for such kernels.)
150
151 CAP_SETUID
152 * Make arbitrary manipulations of process UIDs (setuid(2),
153 setreuid(2), setresuid(2), setfsuid(2));
154 * forge UID when passing socket credentials via UNIX domain
155 sockets;
156 * write a user ID mapping in a user namespace (see user_names‐
157 paces(7)).
158
159 CAP_SYS_ADMIN
160 Note: this capability is overloaded; see Notes to kernel devel‐
161 opers, below.
162
163 * Perform a range of system administration operations including:
164 quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2),
165 swapoff(2), sethostname(2), and setdomainname(2);
166 * perform privileged syslog(2) operations (since Linux 2.6.37,
167 CAP_SYSLOG should be used to permit such operations);
168 * perform VM86_REQUEST_IRQ vm86(2) command;
169 * perform IPC_SET and IPC_RMID operations on arbitrary System V
170 IPC objects;
171 * override RLIMIT_NPROC resource limit;
172 * perform operations on trusted and security Extended Attributes
173 (see xattr(7));
174 * use lookup_dcookie(2);
175 * use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
176 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
177 * forge PID when passing socket credentials via UNIX domain
178 sockets;
179 * exceed /proc/sys/fs/file-max, the system-wide limit on the
180 number of open files, in system calls that open files (e.g.,
181 accept(2), execve(2), open(2), pipe(2));
182 * employ CLONE_* flags that create new namespaces with clone(2)
183 and unshare(2) (but, since Linux 3.8, creating user namespaces
184 does not require any capability);
185 * call perf_event_open(2);
186 * access privileged perf event information;
187 * call setns(2) (requires CAP_SYS_ADMIN in the target names‐
188 pace);
189 * call fanotify_init(2);
190 * call bpf(2);
191 * perform privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
192 operations;
193 * perform madvise(2) MADV_HWPOISON operation;
194 * employ the TIOCSTI ioctl(2) to insert characters into the
195 input queue of a terminal other than the caller's controlling
196 terminal;
197 * employ the obsolete nfsservctl(2) system call;
198 * employ the obsolete bdflush(2) system call;
199 * perform various privileged block-device ioctl(2) operations;
200 * perform various privileged filesystem ioctl(2) operations;
201 * perform privileged ioctl(2) operations on the /dev/random
202 device (see random(4));
203 * install a seccomp(2) filter without first having to set the
204 no_new_privs thread attribute;
205 * modify allow/deny rules for device control groups;
206 * employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER operation to
207 dump tracee's seccomp filters;
208 * employ the ptrace(2) PTRACE_SETOPTIONS operation to suspend
209 the tracee's seccomp protections (i.e., the PTRACE_O_SUS‐
210 PEND_SECCOMP flag);
211 * perform administrative operations on many device drivers.
212
213 CAP_SYS_BOOT
214 Use reboot(2) and kexec_load(2).
215
216 CAP_SYS_CHROOT
217 * Use chroot(2);
218 * change mount namespaces using setns(2).
219
220 CAP_SYS_MODULE
221 * Load and unload kernel modules (see init_module(2) and
222 delete_module(2));
223 * in kernels before 2.6.25: drop capabilities from the system-
224 wide capability bounding set.
225
226 CAP_SYS_NICE
227 * Raise process nice value (nice(2), setpriority(2)) and change
228 the nice value for arbitrary processes;
229 * set real-time scheduling policies for calling process, and set
230 scheduling policies and priorities for arbitrary processes
231 (sched_setscheduler(2), sched_setparam(2), sched_setattr(2));
232 * set CPU affinity for arbitrary processes (sched_setaffin‐
233 ity(2));
234 * set I/O scheduling class and priority for arbitrary processes
235 (ioprio_set(2));
236 * apply migrate_pages(2) to arbitrary processes and allow pro‐
237 cesses to be migrated to arbitrary nodes;
238 * apply move_pages(2) to arbitrary processes;
239 * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
240
241 CAP_SYS_PACCT
242 Use acct(2).
243
244 CAP_SYS_PTRACE
245 * Trace arbitrary processes using ptrace(2);
246 * apply get_robust_list(2) to arbitrary processes;
247 * transfer data to or from the memory of arbitrary processes
248 using process_vm_readv(2) and process_vm_writev(2);
249 * inspect processes using kcmp(2).
250
251 CAP_SYS_RAWIO
252 * Perform I/O port operations (iopl(2) and ioperm(2));
253 * access /proc/kcore;
254 * employ the FIBMAP ioctl(2) operation;
255 * open devices for accessing x86 model-specific registers (MSRs,
256 see msr(4));
257 * update /proc/sys/vm/mmap_min_addr;
258 * create memory mappings at addresses below the value specified
259 by /proc/sys/vm/mmap_min_addr;
260 * map files in /proc/bus/pci;
261 * open /dev/mem and /dev/kmem;
262 * perform various SCSI device commands;
263 * perform certain operations on hpsa(4) and cciss(4) devices;
264 * perform a range of device-specific operations on other
265 devices.
266
267 CAP_SYS_RESOURCE
268 * Use reserved space on ext2 filesystems;
269 * make ioctl(2) calls controlling ext3 journaling;
270 * override disk quota limits;
271 * increase resource limits (see setrlimit(2));
272 * override RLIMIT_NPROC resource limit;
273 * override maximum number of consoles on console allocation;
274 * override maximum number of keymaps;
275 * allow more than 64hz interrupts from the real-time clock;
276 * raise msg_qbytes limit for a System V message queue above the
277 limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
278 * allow the RLIMIT_NOFILE resource limit on the number of "in-
279 flight" file descriptors to be bypassed when passing file
280 descriptors to another process via a UNIX domain socket (see
281 unix(7));
282 * override the /proc/sys/fs/pipe-size-max limit when setting the
283 capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command.
284 * use F_SETPIPE_SZ to increase the capacity of a pipe above the
285 limit specified by /proc/sys/fs/pipe-max-size;
286 * override /proc/sys/fs/mqueue/queues_max limit when creating
287 POSIX message queues (see mq_overview(7));
288 * employ the prctl(2) PR_SET_MM operation;
289 * set /proc/[pid]/oom_score_adj to a value lower than the value
290 last set by a process with CAP_SYS_RESOURCE.
291
292 CAP_SYS_TIME
293 Set system clock (settimeofday(2), stime(2), adjtimex(2)); set
294 real-time (hardware) clock.
295
296 CAP_SYS_TTY_CONFIG
297 Use vhangup(2); employ various privileged ioctl(2) operations on
298 virtual terminals.
299
300 CAP_SYSLOG (since Linux 2.6.37)
301 * Perform privileged syslog(2) operations. See syslog(2) for
302 information on which operations require privilege.
303 * View kernel addresses exposed via /proc and other interfaces
304 when /proc/sys/kernel/kptr_restrict has the value 1. (See the
305 discussion of the kptr_restrict in proc(5).)
306
307 CAP_WAKE_ALARM (since Linux 3.0)
308 Trigger something that will wake up the system (set CLOCK_REAL‐
309 TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
310
311 Past and current implementation
312 A full implementation of capabilities requires that:
313
314 1. For all privileged operations, the kernel must check whether the
315 thread has the required capability in its effective set.
316
317 2. The kernel must provide system calls allowing a thread's capability
318 sets to be changed and retrieved.
319
320 3. The filesystem must support attaching capabilities to an executable
321 file, so that a process gains those capabilities when the file is
322 executed.
323
324 Before kernel 2.6.24, only the first two of these requirements are met;
325 since kernel 2.6.24, all three requirements are met.
326
327 Notes to kernel developers
328 When adding a new kernel feature that should be governed by a capabil‐
329 ity, consider the following points.
330
331 * The goal of capabilities is divide the power of superuser into
332 pieces, such that if a program that has one or more capabilities is
333 compromised, its power to do damage to the system would be less than
334 the same program running with root privilege.
335
336 * You have the choice of either creating a new capability for your new
337 feature, or associating the feature with one of the existing capa‐
338 bilities. In order to keep the set of capabilities to a manageable
339 size, the latter option is preferable, unless there are compelling
340 reasons to take the former option. (There is also a technical
341 limit: the size of capability sets is currently limited to 64 bits.)
342
343 * To determine which existing capability might best be associated with
344 your new feature, review the list of capabilities above in order to
345 find a "silo" into which your new feature best fits. One approach
346 to take is to determine if there are other features requiring capa‐
347 bilities that will always be used along with the new feature. If
348 the new feature is useless without these other features, you should
349 use the same capability as the other features.
350
351 * Don't choose CAP_SYS_ADMIN if you can possibly avoid it! A vast
352 proportion of existing capability checks are associated with this
353 capability (see the partial list above). It can plausibly be called
354 "the new root", since on the one hand, it confers a wide range of
355 powers, and on the other hand, its broad scope means that this is
356 the capability that is required by many privileged programs. Don't
357 make the problem worse. The only new features that should be asso‐
358 ciated with CAP_SYS_ADMIN are ones that closely match existing uses
359 in that silo.
360
361 * If you have determined that it really is necessary to create a new
362 capability for your feature, don't make or name it as a "single-use"
363 capability. Thus, for example, the addition of the highly specific
364 CAP_SYS_PACCT was probably a mistake. Instead, try to identify and
365 name your new capability as a broader silo into which other related
366 future use cases might fit.
367
368 Thread capability sets
369 Each thread has the following capability sets containing zero or more
370 of the above capabilities:
371
372 Permitted
373 This is a limiting superset for the effective capabilities that
374 the thread may assume. It is also a limiting superset for the
375 capabilities that may be added to the inheritable set by a
376 thread that does not have the CAP_SETPCAP capability in its
377 effective set.
378
379 If a thread drops a capability from its permitted set, it can
380 never reacquire that capability (unless it execve(2)s either a
381 set-user-ID-root program, or a program whose associated file
382 capabilities grant that capability).
383
384 Inheritable
385 This is a set of capabilities preserved across an execve(2).
386 Inheritable capabilities remain inheritable when executing any
387 program, and inheritable capabilities are added to the permitted
388 set when executing a program that has the corresponding bits set
389 in the file inheritable set.
390
391 Because inheritable capabilities are not generally preserved
392 across execve(2) when running as a non-root user, applications
393 that wish to run helper programs with elevated capabilities
394 should consider using ambient capabilities, described below.
395
396 Effective
397 This is the set of capabilities used by the kernel to perform
398 permission checks for the thread.
399
400 Bounding (per-thread since Linux 2.6.25)
401 The capability bounding set is a mechanism that can be used to
402 limit the capabilities that are gained during execve(2).
403
404 Since Linux 2.6.25, this is a per-thread capability set. In
405 older kernels, the capability bounding set was a system wide
406 attribute shared by all threads on the system.
407
408 For more details on the capability bounding set, see below.
409
410 Ambient (since Linux 4.3)
411 This is a set of capabilities that are preserved across an
412 execve(2) of a program that is not privileged. The ambient
413 capability set obeys the invariant that no capability can ever
414 be ambient if it is not both permitted and inheritable.
415
416 The ambient capability set can be directly modified using
417 prctl(2). Ambient capabilities are automatically lowered if
418 either of the corresponding permitted or inheritable capabili‐
419 ties is lowered.
420
421 Executing a program that changes UID or GID due to the set-user-
422 ID or set-group-ID bits or executing a program that has any file
423 capabilities set will clear the ambient set. Ambient capabili‐
424 ties are added to the permitted set and assigned to the effec‐
425 tive set when execve(2) is called. If ambient capabilities
426 cause a process's permitted and effective capabilities to
427 increase during an execve(2), this does not trigger the secure-
428 execution mode described in ld.so(8).
429
430 A child created via fork(2) inherits copies of its parent's capability
431 sets. See below for a discussion of the treatment of capabilities dur‐
432 ing execve(2).
433
434 Using capset(2), a thread may manipulate its own capability sets (see
435 below).
436
437 Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the
438 numerical value of the highest capability supported by the running ker‐
439 nel; this can be used to determine the highest bit that may be set in a
440 capability set.
441
442 File capabilities
443 Since kernel 2.6.24, the kernel supports associating capability sets
444 with an executable file using setcap(8). The file capability sets are
445 stored in an extended attribute (see setxattr(2) and xattr(7)) named
446 security.capability. Writing to this extended attribute requires the
447 CAP_SETFCAP capability. The file capability sets, in conjunction with
448 the capability sets of the thread, determine the capabilities of a
449 thread after an execve(2).
450
451 The three file capability sets are:
452
453 Permitted (formerly known as forced):
454 These capabilities are automatically permitted to the thread,
455 regardless of the thread's inheritable capabilities.
456
457 Inheritable (formerly known as allowed):
458 This set is ANDed with the thread's inheritable set to determine
459 which inheritable capabilities are enabled in the permitted set
460 of the thread after the execve(2).
461
462 Effective:
463 This is not a set, but rather just a single bit. If this bit is
464 set, then during an execve(2) all of the new permitted capabili‐
465 ties for the thread are also raised in the effective set. If
466 this bit is not set, then after an execve(2), none of the new
467 permitted capabilities is in the new effective set.
468
469 Enabling the file effective capability bit implies that any file
470 permitted or inheritable capability that causes a thread to
471 acquire the corresponding permitted capability during an
472 execve(2) (see the transformation rules described below) will
473 also acquire that capability in its effective set. Therefore,
474 when assigning capabilities to a file (setcap(8),
475 cap_set_file(3), cap_set_fd(3)), if we specify the effective
476 flag as being enabled for any capability, then the effective
477 flag must also be specified as enabled for all other capabili‐
478 ties for which the corresponding permitted or inheritable flags
479 is enabled.
480
481 File capability extended attribute versioning
482 To allow extensibility, the kernel supports a scheme to encode a ver‐
483 sion number inside the security.capability extended attribute that is
484 used to implement file capabilities. These version numbers are inter‐
485 nal to the implementation, and not directly visible to user-space
486 applications. To date, the following versions are supported:
487
488 VFS_CAP_REVISION_1
489 This was the original file capability implementation, which sup‐
490 ported 32-bit masks for file capabilities.
491
492 VFS_CAP_REVISION_2 (since Linux 2.6.25)
493 This version allows for file capability masks that are 64 bits
494 in size, and was necessary as the number of supported capabili‐
495 ties grew beyond 32. The kernel transparently continues to sup‐
496 port the execution of files that have 32-bit version 1 capabil‐
497 ity masks, but when adding capabilities to files that did not
498 previously have capabilities, or modifying the capabilities of
499 existing files, it automatically uses the version 2 scheme (or
500 possibly the version 3 scheme, as described below).
501
502 VFS_CAP_REVISION_3 (since Linux 4.14)
503 Version 3 file capabilities are provided to support namespaced
504 file capabilities (described below).
505
506 As with version 2 file capabilities, version 3 capability masks
507 are 64 bits in size. But in addition, the root user ID of
508 namespace is encoded in the security.capability extended
509 attribute. (A namespace's root user ID is the value that user
510 ID 0 inside that namespace maps to in the initial user names‐
511 pace.)
512
513 Version 3 file capabilities are designed to coexist with version
514 2 capabilities; that is, on a modern Linux system, there may be
515 some files with version 2 capabilities while others have version
516 3 capabilities.
517
518 Before Linux 4.14, the only kind of file capability extended attribute
519 that could be attached to a file was a VFS_CAP_REVISION_2 attribute.
520 Since Linux 4.14, the version of the security.capability extended
521 attribute that is attached to a file depends on the circumstances in
522 which the attribute was created.
523
524 Starting with Linux 4.14, a security.capability extended attribute is
525 automatically created as (or converted to) a version 3 (VFS_CAP_REVI‐
526 SION_3) attribute if both of the following are true:
527
528 (1) The thread writing the attribute resides in a noninitial user
529 namespace. (More precisely: the thread resides in a user namespace
530 other than the one from which the underlying filesystem was
531 mounted.)
532
533 (2) The thread has the CAP_SETFCAP capability over the file inode,
534 meaning that (a) the thread has the CAP_SETFCAP capability in its
535 own user namespace; and (b) the UID and GID of the file inode have
536 mappings in the writer's user namespace.
537
538 When a VFS_CAP_REVISION_3 security.capability extended attribute is
539 created, the root user ID of the creating thread's user namespace is
540 saved in the extended attribute.
541
542 By contrast, creating or modifying a security.capability extended
543 attribute from a privileged (CAP_SETFCAP) thread that resides in the
544 namespace where the underlying filesystem was mounted (this normally
545 means the initial user namespace) automatically results in the creation
546 of a version 2 (VFS_CAP_REVISION_2) attribute.
547
548 Note that the creation of a version 3 security.capability extended
549 attribute is automatic. That is to say, when a user-space application
550 writes (setxattr(2)) a security.capability attribute in the version 2
551 format, the kernel will automatically create a version 3 attribute if
552 the attribute is created in the circumstances described above. Corre‐
553 spondingly, when a version 3 security.capability attribute is retrieved
554 (getxattr(2)) by a process that resides inside a user namespace that
555 was created by the root user ID (or a descendant of that user names‐
556 pace), the returned attribute is (automatically) simplified to appear
557 as a version 2 attribute (i.e., the returned value is the size of a
558 version 2 attribute and does not include the root user ID). These
559 automatic translations mean that no changes are required to user-space
560 tools (e.g., setcap(1) and getcap(1)) in order for those tools to be
561 used to create and retrieve version 3 security.capability attributes.
562
563 Note that a file can have either a version 2 or a version 3 secu‐
564 rity.capability extended attribute associated with it, but not both:
565 creation or modification of the security.capability extended attribute
566 will automatically modify the version according to the circumstances in
567 which the extended attribute is created or modified.
568
569 Transformation of capabilities during execve()
570 During an execve(2), the kernel calculates the new capabilities of the
571 process using the following algorithm:
572
573 P'(ambient) = (file is privileged) ? 0 : P(ambient)
574
575 P'(permitted) = (P(inheritable) & F(inheritable)) |
576 (F(permitted) & P(bounding)) | P'(ambient)
577
578 P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
579
580 P'(inheritable) = P(inheritable) [i.e., unchanged]
581
582 P'(bounding) = P(bounding) [i.e., unchanged]
583
584 where:
585
586 P() denotes the value of a thread capability set before the
587 execve(2)
588
589 P'() denotes the value of a thread capability set after the
590 execve(2)
591
592 F() denotes a file capability set
593
594 Note the following details relating to the above capability transforma‐
595 tion rules:
596
597 * The ambient capability set is present only since Linux 4.3. When
598 determining the transformation of the ambient set during execve(2),
599 a privileged file is one that has capabilities or has the set-user-
600 ID or set-group-ID bit set.
601
602 * Prior to Linux 2.6.25, the bounding set was a system-wide attribute
603 shared by all threads. That system-wide value was employed to cal‐
604 culate the new permitted set during execve(2) in the same manner as
605 shown above for P(bounding).
606
607 Note: during the capability transitions described above, file capabili‐
608 ties may be ignored (treated as empty) for the same reasons that the
609 set-user-ID and set-group-ID bits are ignored; see execve(2). File
610 capabilities are similarly ignored if the kernel was booted with the
611 no_file_caps option.
612
613 Note: according to the rules above, if a process with nonzero user IDs
614 performs an execve(2) then any capabilities that are present in its
615 permitted and effective sets will be cleared. For the treatment of
616 capabilities when a process with a user ID of zero performs an
617 execve(2), see below under Capabilities and execution of programs by
618 root.
619
620 Safety checking for capability-dumb binaries
621 A capability-dumb binary is an application that has been marked to have
622 file capabilities, but has not been converted to use the libcap(3) API
623 to manipulate its capabilities. (In other words, this is a traditional
624 set-user-ID-root program that has been switched to use file capabili‐
625 ties, but whose code has not been modified to understand capabilities.)
626 For such applications, the effective capability bit is set on the file,
627 so that the file permitted capabilities are automatically enabled in
628 the process effective set when executing the file. The kernel recog‐
629 nizes a file which has the effective capability bit set as capability-
630 dumb for the purpose of the check described here.
631
632 When executing a capability-dumb binary, the kernel checks if the
633 process obtained all permitted capabilities that were specified in the
634 file permitted set, after the capability transformations described
635 above have been performed. (The typical reason why this might not
636 occur is that the capability bounding set masked out some of the capa‐
637 bilities in the file permitted set.) If the process did not obtain the
638 full set of file permitted capabilities, then execve(2) fails with the
639 error EPERM. This prevents possible security risks that could arise
640 when a capability-dumb application is executed with less privilege that
641 it needs. Note that, by definition, the application could not itself
642 recognize this problem, since it does not employ the libcap(3) API.
643
644 Capabilities and execution of programs by root
645 In order to mirror traditional UNIX semantics, the kernel performs spe‐
646 cial treatment of file capabilities when a process with UID 0 (root)
647 executes a program and when a set-user-ID-root program is executed.
648
649 After having performed any changes to the process effective ID that
650 were triggered by the set-user-ID mode bit of the binary—e.g., switch‐
651 ing the effective user ID to 0 (root) because a set-user-ID-root pro‐
652 gram was executed—the kernel calculates the file capability sets as
653 follows:
654
655 1. If the real or effective user ID of the process is 0 (root), then
656 the file inheritable and permitted sets are ignored; instead they
657 are notionally considered to be all ones (i.e., all capabilities
658 enabled). (There is one exception to this behavior, described below
659 in Set-user-ID-root programs that have file capabilities.)
660
661 2. If the effective user ID of the process is 0 (root) or the file
662 effective bit is in fact enabled, then the file effective bit is
663 notionally defined to be one (enabled).
664
665 These notional values for the file's capability sets are then used as
666 described above to calculate the transformation of the process's capa‐
667 bilities during execve(2).
668
669 Thus, when a process with nonzero UIDs execve(2)s a set-user-ID-root
670 program that does not have capabilities attached, or when a process
671 whose real and effective UIDs are zero execve(2)s a program, the calcu‐
672 lation of the process's new permitted capabilities simplifies to:
673
674 P'(permitted) = P(inheritable) | P(bounding)
675
676 P'(effective) = P'(permitted)
677
678 Consequently, the process gains all capabilities in its permitted and
679 effective capability sets, except those masked out by the capability
680 bounding set. (In the calculation of P'(permitted), the P'(ambient)
681 term can be simplified away because it is by definition a proper subset
682 of P(inheritable).)
683
684 The special treatments of user ID 0 (root) described in this subsection
685 can be disabled using the securebits mechanism described below.
686
687 Set-user-ID-root programs that have file capabilities
688 There is one exception to the behavior described under Capabilities and
689 execution of programs by root. If (a) the binary that is being exe‐
690 cuted has capabilities attached and (b) the real user ID of the process
691 is not 0 (root) and (c) the effective user ID of the process is 0
692 (root), then the file capability bits are honored (i.e., they are not
693 notionally considered to be all ones). The usual way in which this
694 situation can arise is when executing a set-UID-root program that also
695 has file capabilities. When such a program is executed, the process
696 gains just the capabilities granted by the program (i.e., not all capa‐
697 bilities, as would occur when executing a set-user-ID-root program that
698 does not have any associated file capabilities).
699
700 Note that one can assign empty capability sets to a program file, and
701 thus it is possible to create a set-user-ID-root program that changes
702 the effective and saved set-user-ID of the process that executes the
703 program to 0, but confers no capabilities to that process.
704
705 Capability bounding set
706 The capability bounding set is a security mechanism that can be used to
707 limit the capabilities that can be gained during an execve(2). The
708 bounding set is used in the following ways:
709
710 * During an execve(2), the capability bounding set is ANDed with the
711 file permitted capability set, and the result of this operation is
712 assigned to the thread's permitted capability set. The capability
713 bounding set thus places a limit on the permitted capabilities that
714 may be granted by an executable file.
715
716 * (Since Linux 2.6.25) The capability bounding set acts as a limiting
717 superset for the capabilities that a thread can add to its inherita‐
718 ble set using capset(2). This means that if a capability is not in
719 the bounding set, then a thread can't add this capability to its
720 inheritable set, even if it was in its permitted capabilities, and
721 thereby cannot have this capability preserved in its permitted set
722 when it execve(2)s a file that has the capability in its inheritable
723 set.
724
725 Note that the bounding set masks the file permitted capabilities, but
726 not the inheritable capabilities. If a thread maintains a capability
727 in its inheritable set that is not in its bounding set, then it can
728 still gain that capability in its permitted set by executing a file
729 that has the capability in its inheritable set.
730
731 Depending on the kernel version, the capability bounding set is either
732 a system-wide attribute, or a per-process attribute.
733
734 Capability bounding set from Linux 2.6.25 onward
735
736 From Linux 2.6.25, the capability bounding set is a per-thread
737 attribute. (The system-wide capability bounding set described below no
738 longer exists.)
739
740 The bounding set is inherited at fork(2) from the thread's parent, and
741 is preserved across an execve(2).
742
743 A thread may remove capabilities from its capability bounding set using
744 the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
745 capability. Once a capability has been dropped from the bounding set,
746 it cannot be restored to that set. A thread can determine if a capa‐
747 bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
748 tion.
749
750 Removing capabilities from the bounding set is supported only if file
751 capabilities are compiled into the kernel. In kernels before Linux
752 2.6.33, file capabilities were an optional feature configurable via the
753 CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the con‐
754 figuration option has been removed and file capabilities are always
755 part of the kernel. When file capabilities are compiled into the ker‐
756 nel, the init process (the ancestor of all processes) begins with a
757 full bounding set. If file capabilities are not compiled into the ker‐
758 nel, then init begins with a full bounding set minus CAP_SETPCAP,
759 because this capability has a different meaning when there are no file
760 capabilities.
761
762 Removing a capability from the bounding set does not remove it from the
763 thread's inheritable set. However it does prevent the capability from
764 being added back into the thread's inheritable set in the future.
765
766 Capability bounding set prior to Linux 2.6.25
767
768 In kernels before 2.6.25, the capability bounding set is a system-wide
769 attribute that affects all threads on the system. The bounding set is
770 accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this
771 bit mask parameter is expressed as a signed decimal number in
772 /proc/sys/kernel/cap-bound.)
773
774 Only the init process may set capabilities in the capability bounding
775 set; other than that, the superuser (more precisely: a process with the
776 CAP_SYS_MODULE capability) may only clear capabilities from this set.
777
778 On a standard system the capability bounding set always masks out the
779 CAP_SETPCAP capability. To remove this restriction (dangerous!), mod‐
780 ify the definition of CAP_INIT_EFF_SET in include/linux/capability.h
781 and rebuild the kernel.
782
783 The system-wide capability bounding set feature was added to Linux
784 starting with kernel version 2.2.11.
785
786 Effect of user ID changes on capabilities
787 To preserve the traditional semantics for transitions between 0 and
788 nonzero user IDs, the kernel makes the following changes to a thread's
789 capability sets on changes to the thread's real, effective, saved set,
790 and filesystem user IDs (using setuid(2), setresuid(2), or similar):
791
792 1. If one or more of the real, effective or saved set user IDs was pre‐
793 viously 0, and as a result of the UID changes all of these IDs have
794 a nonzero value, then all capabilities are cleared from the permit‐
795 ted, effective, and ambient capability sets.
796
797 2. If the effective user ID is changed from 0 to nonzero, then all
798 capabilities are cleared from the effective set.
799
800 3. If the effective user ID is changed from nonzero to 0, then the per‐
801 mitted set is copied to the effective set.
802
803 4. If the filesystem user ID is changed from 0 to nonzero (see setf‐
804 suid(2)), then the following capabilities are cleared from the
805 effective set: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
806 CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE (since Linux 2.6.30),
807 CAP_MAC_OVERRIDE, and CAP_MKNOD (since Linux 2.6.30). If the
808 filesystem UID is changed from nonzero to 0, then any of these capa‐
809 bilities that are enabled in the permitted set are enabled in the
810 effective set.
811
812 If a thread that has a 0 value for one or more of its user IDs wants to
813 prevent its permitted capability set being cleared when it resets all
814 of its user IDs to nonzero values, it can do so using the
815 SECBIT_KEEP_CAPS securebits flag described below.
816
817 Programmatically adjusting capability sets
818 A thread can retrieve and change its permitted, effective, and inheri‐
819 table capability sets using the capget(2) and capset(2) system calls.
820 However, the use of cap_get_proc(3) and cap_set_proc(3), both provided
821 in the libcap package, is preferred for this purpose. The following
822 rules govern changes to the thread capability sets:
823
824 1. If the caller does not have the CAP_SETPCAP capability, the new
825 inheritable set must be a subset of the combination of the existing
826 inheritable and permitted sets.
827
828 2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
829 combination of the existing inheritable set and the capability
830 bounding set.
831
832 3. The new permitted set must be a subset of the existing permitted set
833 (i.e., it is not possible to acquire permitted capabilities that the
834 thread does not currently have).
835
836 4. The new effective set must be a subset of the new permitted set.
837
838 The securebits flags: establishing a capabilities-only environment
839 Starting with kernel 2.6.26, and with a kernel in which file capabili‐
840 ties are enabled, Linux implements a set of per-thread securebits flags
841 that can be used to disable special handling of capabilities for UID 0
842 (root). These flags are as follows:
843
844 SECBIT_KEEP_CAPS
845 Setting this flag allows a thread that has one or more 0 UIDs to
846 retain capabilities in its permitted set when it switches all of
847 its UIDs to nonzero values. If this flag is not set, then such
848 a UID switch causes the thread to lose all permitted capabili‐
849 ties. This flag is always cleared on an execve(2).
850
851 Note that even with the SECBIT_KEEP_CAPS flag set, the effective
852 capabilities of a thread are cleared when it switches its effec‐
853 tive UID to a nonzero value. However, if the thread has set
854 this flag and its effective UID is already nonzero, and the
855 thread subsequently switches all other UIDs to nonzero values,
856 then the effective capabilities will not be cleared.
857
858 The setting of the SECBIT_KEEP_CAPS flag is ignored if the
859 SECBIT_NO_SETUID_FIXUP flag is set. (The latter flag provides a
860 superset of the effect of the former flag.)
861
862 This flag provides the same functionality as the older prctl(2)
863 PR_SET_KEEPCAPS operation.
864
865 SECBIT_NO_SETUID_FIXUP
866 Setting this flag stops the kernel from adjusting the process's
867 permitted, effective, and ambient capability sets when the
868 thread's effective and filesystem UIDs are switched between zero
869 and nonzero values. (See the subsection Effect of user ID
870 changes on capabilities.)
871
872 SECBIT_NOROOT
873 If this bit is set, then the kernel does not grant capabilities
874 when a set-user-ID-root program is executed, or when a process
875 with an effective or real UID of 0 calls execve(2). (See the
876 subsection Capabilities and execution of programs by root.)
877
878 SECBIT_NO_CAP_AMBIENT_RAISE
879 Setting this flag disallows raising ambient capabilities via the
880 prctl(2) PR_CAP_AMBIENT_RAISE operation.
881
882 Each of the above "base" flags has a companion "locked" flag. Setting
883 any of the "locked" flags is irreversible, and has the effect of pre‐
884 venting further changes to the corresponding "base" flag. The locked
885 flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED,
886 SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
887
888 The securebits flags can be modified and retrieved using the prctl(2)
889 PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP
890 capability is required to modify the flags. Note that the SECBIT_*
891 constants are available only after including the <linux/securebits.h>
892 header file.
893
894 The securebits flags are inherited by child processes. During an
895 execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS
896 which is always cleared.
897
898 An application can use the following call to lock itself, and all of
899 its descendants, into an environment where the only way of gaining
900 capabilities is by executing a program with associated file capabili‐
901 ties:
902
903 prctl(PR_SET_SECUREBITS,
904 /* SECBIT_KEEP_CAPS off */
905 SECBIT_KEEP_CAPS_LOCKED |
906 SECBIT_NO_SETUID_FIXUP |
907 SECBIT_NO_SETUID_FIXUP_LOCKED |
908 SECBIT_NOROOT |
909 SECBIT_NOROOT_LOCKED);
910 /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
911 is not required */
912
913 Per-user-namespace "set-user-ID-root" programs
914 A set-user-ID program whose UID matches the UID that created a user
915 namespace will confer capabilities in the process's permitted and
916 effective sets when executed by any process inside that namespace or
917 any descendant user namespace.
918
919 The rules about the transformation of the process's capabilities during
920 the execve(2) are exactly as described in the subsections Transforma‐
921 tion of capabilities during execve() and Capabilities and execution of
922 programs by root, with the difference that, in the latter subsection,
923 "root" is the UID of the creator of the user namespace.
924
925 Namespaced file capabilities
926 Traditional (i.e., version 2) file capabilities associate only a set of
927 capability masks with a binary executable file. When a process exe‐
928 cutes a binary with such capabilities, it gains the associated capabil‐
929 ities (within its user namespace) as per the rules described above in
930 "Transformation of capabilities during execve()".
931
932 Because version 2 file capabilities confer capabilities to the execut‐
933 ing process regardless of which user namespace it resides in, only
934 privileged processes are permitted to associate capabilities with a
935 file. Here, "privileged" means a process that has the CAP_SETFCAP
936 capability in the user namespace where the filesystem was mounted (nor‐
937 mally the initial user namespace). This limitation renders file capa‐
938 bilities useless for certain use cases. For example, in user-names‐
939 paced containers, it can be desirable to be able to create a binary
940 that confers capabilities only to processes executed inside that con‐
941 tainer, but not to processes that are executed outside the container.
942
943 Linux 4.14 added so-called namespaced file capabilities to support such
944 use cases. Namespaced file capabilities are recorded as version 3
945 (i.e., VFS_CAP_REVISION_3) security.capability extended attributes.
946 Such an attribute is automatically created in the circumstances
947 described above under "File capability extended attribute versioning".
948 When a version 3 security.capability extended attribute is created, the
949 kernel records not just the capability masks in the extended attribute,
950 but also the namespace root user ID.
951
952 As with a binary that has VFS_CAP_REVISION_2 file capabilities, a
953 binary with VFS_CAP_REVISION_3 file capabilities confers capabilities
954 to a process during execve(). However, capabilities are conferred only
955 if the binary is executed by a process that resides in a user namespace
956 whose UID 0 maps to the root user ID that is saved in the extended
957 attribute, or when executed by a process that resides in a descendant
958 of such a namespace.
959
960 Interaction with user namespaces
961 For further information on the interaction of capabilities and user
962 namespaces, see user_namespaces(7).
963
965 No standards govern capabilities, but the Linux capability implementa‐
966 tion is based on the withdrawn POSIX.1e draft standard; see
967 ⟨https://archive.org/details/posix_1003.1e-990310⟩.
968
970 When attempting to strace(1) binaries that have capabilities (or set-
971 user-ID-root binaries), you may find the -u <username> option useful.
972 Something like:
973
974 $ sudo strace -o trace.log -u ceci ./myprivprog
975
976 From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
977 nel component, and could be enabled/disabled via the CONFIG_SECU‐
978 RITY_CAPABILITIES kernel configuration option.
979
980 The /proc/[pid]/task/TID/status file can be used to view the capability
981 sets of a thread. The /proc/[pid]/status file shows the capability
982 sets of a process's main thread. Before Linux 3.8, nonexistent capa‐
983 bilities were shown as being enabled (1) in these sets. Since Linux
984 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as
985 disabled (0).
986
987 The libcap package provides a suite of routines for setting and getting
988 capabilities that is more comfortable and less likely to change than
989 the interface provided by capset(2) and capget(2). This package also
990 provides the setcap(8) and getcap(8) programs. It can be found at
991 ⟨https://git.kernel.org/pub/scm/libs/libcap/libcap.git/refs/⟩.
992
993 Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file
994 capabilities are not enabled, a thread with the CAP_SETPCAP capability
995 can manipulate the capabilities of threads other than itself. However,
996 this is only theoretically possible, since no thread ever has CAP_SETP‐
997 CAP in either of these cases:
998
999 * In the pre-2.6.25 implementation the system-wide capability bounding
1000 set, /proc/sys/kernel/cap-bound, always masks out this capability,
1001 and this can not be changed without modifying the kernel source and
1002 rebuilding.
1003
1004 * If file capabilities are disabled in the current implementation, then
1005 init starts out with this capability removed from its per-process
1006 bounding set, and that bounding set is inherited by all other pro‐
1007 cesses created on the system.
1008
1010 capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3),
1011 cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3),
1012 cap_init(3), capgetp(3), capsetp(3), libcap(3), proc(5), creden‐
1013 tials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), get‐
1014 cap(8), netcap(8), pscap(8), setcap(8)
1015
1016 include/linux/capability.h in the Linux kernel source tree
1017
1019 This page is part of release 5.02 of the Linux man-pages project. A
1020 description of the project, information about reporting bugs, and the
1021 latest version of this page, can be found at
1022 https://www.kernel.org/doc/man-pages/.
1023
1024
1025
1026Linux 2019-08-02 CAPABILITIES(7)