1CAPABILITIES(7) Linux Programmer's Manual CAPABILITIES(7)
2
3
4
6 capabilities - overview of Linux capabilities
7
9 For the purpose of performing permission checks, traditional UNIX
10 implementations distinguish two categories of processes: privileged
11 processes (whose effective user ID is 0, referred to as superuser or
12 root), and unprivileged processes (whose effective UID is nonzero).
13 Privileged processes bypass all kernel permission checks, while unpriv‐
14 ileged processes are subject to full permission checking based on the
15 process's credentials (usually: effective UID, effective GID, and sup‐
16 plementary group list).
17
18 Starting with kernel 2.2, Linux divides the privileges traditionally
19 associated with superuser into distinct units, known as capabilities,
20 which can be independently enabled and disabled. Capabilities are a
21 per-thread attribute.
22
23 Capabilities list
24 The following list shows the capabilities implemented on Linux, and the
25 operations or behaviors that each capability permits:
26
27 CAP_AUDIT_CONTROL (since Linux 2.6.11)
28 Enable and disable kernel auditing; change auditing filter
29 rules; retrieve auditing status and filtering rules.
30
31 CAP_AUDIT_READ (since Linux 3.16)
32 Allow reading the audit log via a multicast netlink socket.
33
34 CAP_AUDIT_WRITE (since Linux 2.6.11)
35 Write records to kernel auditing log.
36
37 CAP_BLOCK_SUSPEND (since Linux 3.5)
38 Employ features that can block system suspend (epoll(7) EPOLL‐
39 WAKEUP, /proc/sys/wake_lock).
40
41 CAP_CHOWN
42 Make arbitrary changes to file UIDs and GIDs (see chown(2)).
43
44 CAP_DAC_OVERRIDE
45 Bypass file read, write, and execute permission checks. (DAC is
46 an abbreviation of "discretionary access control".)
47
48 CAP_DAC_READ_SEARCH
49 * Bypass file read permission checks and directory read and exe‐
50 cute permission checks;
51 * invoke open_by_handle_at(2);
52 * use the linkat(2) AT_EMPTY_PATH flag to create a link to a
53 file referred to by a file descriptor.
54
55 CAP_FOWNER
56 * Bypass permission checks on operations that normally require
57 the filesystem UID of the process to match the UID of the file
58 (e.g., chmod(2), utime(2)), excluding those operations covered
59 by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
60 * set inode flags (see ioctl_iflags(2)) on arbitrary files;
61 * set Access Control Lists (ACLs) on arbitrary files;
62 * ignore directory sticky bit on file deletion;
63 * modify user extended attributes on sticky directory owned by
64 any user;
65 * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
66
67 CAP_FSETID
68 * Don't clear set-user-ID and set-group-ID mode bits when a file
69 is modified;
70 * set the set-group-ID bit for a file whose GID does not match
71 the filesystem or any of the supplementary GIDs of the calling
72 process.
73
74 CAP_IPC_LOCK
75 Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
76
77 CAP_IPC_OWNER
78 Bypass permission checks for operations on System V IPC objects.
79
80 CAP_KILL
81 Bypass permission checks for sending signals (see kill(2)).
82 This includes use of the ioctl(2) KDSIGACCEPT operation.
83
84 CAP_LEASE (since Linux 2.4)
85 Establish leases on arbitrary files (see fcntl(2)).
86
87 CAP_LINUX_IMMUTABLE
88 Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags (see
89 ioctl_iflags(2)).
90
91 CAP_MAC_ADMIN (since Linux 2.6.25)
92 Allow MAC configuration or state changes. Implemented for the
93 Smack Linux Security Module (LSM).
94
95 CAP_MAC_OVERRIDE (since Linux 2.6.25)
96 Override Mandatory Access Control (MAC). Implemented for the
97 Smack LSM.
98
99 CAP_MKNOD (since Linux 2.4)
100 Create special files using mknod(2).
101
102 CAP_NET_ADMIN
103 Perform various network-related operations:
104 * interface configuration;
105 * administration of IP firewall, masquerading, and accounting;
106 * modify routing tables;
107 * bind to any address for transparent proxying;
108 * set type-of-service (TOS);
109 * clear driver statistics;
110 * set promiscuous mode;
111 * enabling multicasting;
112 * use setsockopt(2) to set the following socket options:
113 SO_DEBUG, SO_MARK, SO_PRIORITY (for a priority outside the
114 range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
115
116 CAP_NET_BIND_SERVICE
117 Bind a socket to Internet domain privileged ports (port numbers
118 less than 1024).
119
120 CAP_NET_BROADCAST
121 (Unused) Make socket broadcasts, and listen to multicasts.
122
123 CAP_NET_RAW
124 * Use RAW and PACKET sockets;
125 * bind to any address for transparent proxying.
126
127 CAP_SETGID
128 * Make arbitrary manipulations of process GIDs and supplementary
129 GID list;
130 * forge GID when passing socket credentials via UNIX domain
131 sockets;
132 * write a group ID mapping in a user namespace (see user_names‐
133 paces(7)).
134
135 CAP_SETFCAP (since Linux 2.6.24)
136 Set arbitrary capabilities on a file.
137
138 CAP_SETPCAP
139 If file capabilities are supported (i.e., since Linux 2.6.24):
140 add any capability from the calling thread's bounding set to its
141 inheritable set; drop capabilities from the bounding set (via
142 prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
143
144 If file capabilities are not supported (i.e., kernels before
145 Linux 2.6.24): grant or remove any capability in the caller's
146 permitted capability set to or from any other process. (This
147 property of CAP_SETPCAP is not available when the kernel is con‐
148 figured to support file capabilities, since CAP_SETPCAP has
149 entirely different semantics for such kernels.)
150
151 CAP_SETUID
152 * Make arbitrary manipulations of process UIDs (setuid(2),
153 setreuid(2), setresuid(2), setfsuid(2));
154 * forge UID when passing socket credentials via UNIX domain
155 sockets;
156 * write a user ID mapping in a user namespace (see user_names‐
157 paces(7)).
158
159 CAP_SYS_ADMIN
160 Note: this capability is overloaded; see Notes to kernel devel‐
161 opers, below.
162
163 * Perform a range of system administration operations including:
164 quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2),
165 swapoff(2), sethostname(2), and setdomainname(2);
166 * perform privileged syslog(2) operations (since Linux 2.6.37,
167 CAP_SYSLOG should be used to permit such operations);
168 * perform VM86_REQUEST_IRQ vm86(2) command;
169 * perform IPC_SET and IPC_RMID operations on arbitrary System V
170 IPC objects;
171 * override RLIMIT_NPROC resource limit;
172 * perform operations on trusted and security extended attributes
173 (see xattr(7));
174 * use lookup_dcookie(2);
175 * use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
176 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
177 * forge PID when passing socket credentials via UNIX domain
178 sockets;
179 * exceed /proc/sys/fs/file-max, the system-wide limit on the
180 number of open files, in system calls that open files (e.g.,
181 accept(2), execve(2), open(2), pipe(2));
182 * employ CLONE_* flags that create new namespaces with clone(2)
183 and unshare(2) (but, since Linux 3.8, creating user namespaces
184 does not require any capability);
185 * call perf_event_open(2);
186 * access privileged perf event information;
187 * call setns(2) (requires CAP_SYS_ADMIN in the target names‐
188 pace);
189 * call fanotify_init(2);
190 * call bpf(2);
191 * perform privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
192 operations;
193 * perform madvise(2) MADV_HWPOISON operation;
194 * employ the TIOCSTI ioctl(2) to insert characters into the
195 input queue of a terminal other than the caller's controlling
196 terminal;
197 * employ the obsolete nfsservctl(2) system call;
198 * employ the obsolete bdflush(2) system call;
199 * perform various privileged block-device ioctl(2) operations;
200 * perform various privileged filesystem ioctl(2) operations;
201 * perform privileged ioctl(2) operations on the /dev/random
202 device (see random(4));
203 * install a seccomp(2) filter without first having to set the
204 no_new_privs thread attribute;
205 * modify allow/deny rules for device control groups;
206 * employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER operation to
207 dump tracee's seccomp filters;
208 * employ the ptrace(2) PTRACE_SETOPTIONS operation to suspend
209 the tracee's seccomp protections (i.e., the PTRACE_O_SUS‐
210 PEND_SECCOMP flag);
211 * perform administrative operations on many device drivers.
212 * Modify autogroup nice values by writing to /proc/[pid]/auto‐
213 group (see sched(7)).
214
215 CAP_SYS_BOOT
216 Use reboot(2) and kexec_load(2).
217
218 CAP_SYS_CHROOT
219 * Use chroot(2);
220 * change mount namespaces using setns(2).
221
222 CAP_SYS_MODULE
223 * Load and unload kernel modules (see init_module(2) and
224 delete_module(2));
225 * in kernels before 2.6.25: drop capabilities from the system-
226 wide capability bounding set.
227
228 CAP_SYS_NICE
229 * Raise process nice value (nice(2), setpriority(2)) and change
230 the nice value for arbitrary processes;
231 * set real-time scheduling policies for calling process, and set
232 scheduling policies and priorities for arbitrary processes
233 (sched_setscheduler(2), sched_setparam(2), sched_setattr(2));
234 * set CPU affinity for arbitrary processes (sched_setaffin‐
235 ity(2));
236 * set I/O scheduling class and priority for arbitrary processes
237 (ioprio_set(2));
238 * apply migrate_pages(2) to arbitrary processes and allow pro‐
239 cesses to be migrated to arbitrary nodes;
240 * apply move_pages(2) to arbitrary processes;
241 * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
242
243 CAP_SYS_PACCT
244 Use acct(2).
245
246 CAP_SYS_PTRACE
247 * Trace arbitrary processes using ptrace(2);
248 * apply get_robust_list(2) to arbitrary processes;
249 * transfer data to or from the memory of arbitrary processes
250 using process_vm_readv(2) and process_vm_writev(2);
251 * inspect processes using kcmp(2).
252
253 CAP_SYS_RAWIO
254 * Perform I/O port operations (iopl(2) and ioperm(2));
255 * access /proc/kcore;
256 * employ the FIBMAP ioctl(2) operation;
257 * open devices for accessing x86 model-specific registers (MSRs,
258 see msr(4));
259 * update /proc/sys/vm/mmap_min_addr;
260 * create memory mappings at addresses below the value specified
261 by /proc/sys/vm/mmap_min_addr;
262 * map files in /proc/bus/pci;
263 * open /dev/mem and /dev/kmem;
264 * perform various SCSI device commands;
265 * perform certain operations on hpsa(4) and cciss(4) devices;
266 * perform a range of device-specific operations on other
267 devices.
268
269 CAP_SYS_RESOURCE
270 * Use reserved space on ext2 filesystems;
271 * make ioctl(2) calls controlling ext3 journaling;
272 * override disk quota limits;
273 * increase resource limits (see setrlimit(2));
274 * override RLIMIT_NPROC resource limit;
275 * override maximum number of consoles on console allocation;
276 * override maximum number of keymaps;
277 * allow more than 64hz interrupts from the real-time clock;
278 * raise msg_qbytes limit for a System V message queue above the
279 limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
280 * allow the RLIMIT_NOFILE resource limit on the number of "in-
281 flight" file descriptors to be bypassed when passing file
282 descriptors to another process via a UNIX domain socket (see
283 unix(7));
284 * override the /proc/sys/fs/pipe-size-max limit when setting the
285 capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command;
286 * use F_SETPIPE_SZ to increase the capacity of a pipe above the
287 limit specified by /proc/sys/fs/pipe-max-size;
288 * override /proc/sys/fs/mqueue/queues_max limit when creating
289 POSIX message queues (see mq_overview(7));
290 * employ the prctl(2) PR_SET_MM operation;
291 * set /proc/[pid]/oom_score_adj to a value lower than the value
292 last set by a process with CAP_SYS_RESOURCE.
293
294 CAP_SYS_TIME
295 Set system clock (settimeofday(2), stime(2), adjtimex(2)); set
296 real-time (hardware) clock.
297
298 CAP_SYS_TTY_CONFIG
299 Use vhangup(2); employ various privileged ioctl(2) operations on
300 virtual terminals.
301
302 CAP_SYSLOG (since Linux 2.6.37)
303 * Perform privileged syslog(2) operations. See syslog(2) for
304 information on which operations require privilege.
305 * View kernel addresses exposed via /proc and other interfaces
306 when /proc/sys/kernel/kptr_restrict has the value 1. (See the
307 discussion of the kptr_restrict in proc(5).)
308
309 CAP_WAKE_ALARM (since Linux 3.0)
310 Trigger something that will wake up the system (set CLOCK_REAL‐
311 TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
312
313 Past and current implementation
314 A full implementation of capabilities requires that:
315
316 1. For all privileged operations, the kernel must check whether the
317 thread has the required capability in its effective set.
318
319 2. The kernel must provide system calls allowing a thread's capability
320 sets to be changed and retrieved.
321
322 3. The filesystem must support attaching capabilities to an executable
323 file, so that a process gains those capabilities when the file is
324 executed.
325
326 Before kernel 2.6.24, only the first two of these requirements are met;
327 since kernel 2.6.24, all three requirements are met.
328
329 Notes to kernel developers
330 When adding a new kernel feature that should be governed by a capabil‐
331 ity, consider the following points.
332
333 * The goal of capabilities is divide the power of superuser into
334 pieces, such that if a program that has one or more capabilities is
335 compromised, its power to do damage to the system would be less than
336 the same program running with root privilege.
337
338 * You have the choice of either creating a new capability for your new
339 feature, or associating the feature with one of the existing capa‐
340 bilities. In order to keep the set of capabilities to a manageable
341 size, the latter option is preferable, unless there are compelling
342 reasons to take the former option. (There is also a technical
343 limit: the size of capability sets is currently limited to 64 bits.)
344
345 * To determine which existing capability might best be associated with
346 your new feature, review the list of capabilities above in order to
347 find a "silo" into which your new feature best fits. One approach
348 to take is to determine if there are other features requiring capa‐
349 bilities that will always be used along with the new feature. If
350 the new feature is useless without these other features, you should
351 use the same capability as the other features.
352
353 * Don't choose CAP_SYS_ADMIN if you can possibly avoid it! A vast
354 proportion of existing capability checks are associated with this
355 capability (see the partial list above). It can plausibly be called
356 "the new root", since on the one hand, it confers a wide range of
357 powers, and on the other hand, its broad scope means that this is
358 the capability that is required by many privileged programs. Don't
359 make the problem worse. The only new features that should be asso‐
360 ciated with CAP_SYS_ADMIN are ones that closely match existing uses
361 in that silo.
362
363 * If you have determined that it really is necessary to create a new
364 capability for your feature, don't make or name it as a "single-use"
365 capability. Thus, for example, the addition of the highly specific
366 CAP_SYS_PACCT was probably a mistake. Instead, try to identify and
367 name your new capability as a broader silo into which other related
368 future use cases might fit.
369
370 Thread capability sets
371 Each thread has the following capability sets containing zero or more
372 of the above capabilities:
373
374 Permitted
375 This is a limiting superset for the effective capabilities that
376 the thread may assume. It is also a limiting superset for the
377 capabilities that may be added to the inheritable set by a
378 thread that does not have the CAP_SETPCAP capability in its
379 effective set.
380
381 If a thread drops a capability from its permitted set, it can
382 never reacquire that capability (unless it execve(2)s either a
383 set-user-ID-root program, or a program whose associated file
384 capabilities grant that capability).
385
386 Inheritable
387 This is a set of capabilities preserved across an execve(2).
388 Inheritable capabilities remain inheritable when executing any
389 program, and inheritable capabilities are added to the permitted
390 set when executing a program that has the corresponding bits set
391 in the file inheritable set.
392
393 Because inheritable capabilities are not generally preserved
394 across execve(2) when running as a non-root user, applications
395 that wish to run helper programs with elevated capabilities
396 should consider using ambient capabilities, described below.
397
398 Effective
399 This is the set of capabilities used by the kernel to perform
400 permission checks for the thread.
401
402 Bounding (per-thread since Linux 2.6.25)
403 The capability bounding set is a mechanism that can be used to
404 limit the capabilities that are gained during execve(2).
405
406 Since Linux 2.6.25, this is a per-thread capability set. In
407 older kernels, the capability bounding set was a system wide
408 attribute shared by all threads on the system.
409
410 For more details on the capability bounding set, see below.
411
412 Ambient (since Linux 4.3)
413 This is a set of capabilities that are preserved across an
414 execve(2) of a program that is not privileged. The ambient
415 capability set obeys the invariant that no capability can ever
416 be ambient if it is not both permitted and inheritable.
417
418 The ambient capability set can be directly modified using
419 prctl(2). Ambient capabilities are automatically lowered if
420 either of the corresponding permitted or inheritable capabili‐
421 ties is lowered.
422
423 Executing a program that changes UID or GID due to the set-user-
424 ID or set-group-ID bits or executing a program that has any file
425 capabilities set will clear the ambient set. Ambient capabili‐
426 ties are added to the permitted set and assigned to the effec‐
427 tive set when execve(2) is called. If ambient capabilities
428 cause a process's permitted and effective capabilities to
429 increase during an execve(2), this does not trigger the secure-
430 execution mode described in ld.so(8).
431
432 A child created via fork(2) inherits copies of its parent's capability
433 sets. See below for a discussion of the treatment of capabilities dur‐
434 ing execve(2).
435
436 Using capset(2), a thread may manipulate its own capability sets (see
437 below).
438
439 Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the
440 numerical value of the highest capability supported by the running ker‐
441 nel; this can be used to determine the highest bit that may be set in a
442 capability set.
443
444 File capabilities
445 Since kernel 2.6.24, the kernel supports associating capability sets
446 with an executable file using setcap(8). The file capability sets are
447 stored in an extended attribute (see setxattr(2) and xattr(7)) named
448 security.capability. Writing to this extended attribute requires the
449 CAP_SETFCAP capability. The file capability sets, in conjunction with
450 the capability sets of the thread, determine the capabilities of a
451 thread after an execve(2).
452
453 The three file capability sets are:
454
455 Permitted (formerly known as forced):
456 These capabilities are automatically permitted to the thread,
457 regardless of the thread's inheritable capabilities.
458
459 Inheritable (formerly known as allowed):
460 This set is ANDed with the thread's inheritable set to determine
461 which inheritable capabilities are enabled in the permitted set
462 of the thread after the execve(2).
463
464 Effective:
465 This is not a set, but rather just a single bit. If this bit is
466 set, then during an execve(2) all of the new permitted capabili‐
467 ties for the thread are also raised in the effective set. If
468 this bit is not set, then after an execve(2), none of the new
469 permitted capabilities is in the new effective set.
470
471 Enabling the file effective capability bit implies that any file
472 permitted or inheritable capability that causes a thread to
473 acquire the corresponding permitted capability during an
474 execve(2) (see the transformation rules described below) will
475 also acquire that capability in its effective set. Therefore,
476 when assigning capabilities to a file (setcap(8),
477 cap_set_file(3), cap_set_fd(3)), if we specify the effective
478 flag as being enabled for any capability, then the effective
479 flag must also be specified as enabled for all other capabili‐
480 ties for which the corresponding permitted or inheritable flags
481 is enabled.
482
483 File capability extended attribute versioning
484 To allow extensibility, the kernel supports a scheme to encode a ver‐
485 sion number inside the security.capability extended attribute that is
486 used to implement file capabilities. These version numbers are inter‐
487 nal to the implementation, and not directly visible to user-space
488 applications. To date, the following versions are supported:
489
490 VFS_CAP_REVISION_1
491 This was the original file capability implementation, which sup‐
492 ported 32-bit masks for file capabilities.
493
494 VFS_CAP_REVISION_2 (since Linux 2.6.25)
495 This version allows for file capability masks that are 64 bits
496 in size, and was necessary as the number of supported capabili‐
497 ties grew beyond 32. The kernel transparently continues to sup‐
498 port the execution of files that have 32-bit version 1 capabil‐
499 ity masks, but when adding capabilities to files that did not
500 previously have capabilities, or modifying the capabilities of
501 existing files, it automatically uses the version 2 scheme (or
502 possibly the version 3 scheme, as described below).
503
504 VFS_CAP_REVISION_3 (since Linux 4.14)
505 Version 3 file capabilities are provided to support namespaced
506 file capabilities (described below).
507
508 As with version 2 file capabilities, version 3 capability masks
509 are 64 bits in size. But in addition, the root user ID of
510 namespace is encoded in the security.capability extended
511 attribute. (A namespace's root user ID is the value that user
512 ID 0 inside that namespace maps to in the initial user names‐
513 pace.)
514
515 Version 3 file capabilities are designed to coexist with version
516 2 capabilities; that is, on a modern Linux system, there may be
517 some files with version 2 capabilities while others have version
518 3 capabilities.
519
520 Before Linux 4.14, the only kind of file capability extended attribute
521 that could be attached to a file was a VFS_CAP_REVISION_2 attribute.
522 Since Linux 4.14, the version of the security.capability extended
523 attribute that is attached to a file depends on the circumstances in
524 which the attribute was created.
525
526 Starting with Linux 4.14, a security.capability extended attribute is
527 automatically created as (or converted to) a version 3 (VFS_CAP_REVI‐
528 SION_3) attribute if both of the following are true:
529
530 (1) The thread writing the attribute resides in a noninitial user
531 namespace. (More precisely: the thread resides in a user namespace
532 other than the one from which the underlying filesystem was
533 mounted.)
534
535 (2) The thread has the CAP_SETFCAP capability over the file inode,
536 meaning that (a) the thread has the CAP_SETFCAP capability in its
537 own user namespace; and (b) the UID and GID of the file inode have
538 mappings in the writer's user namespace.
539
540 When a VFS_CAP_REVISION_3 security.capability extended attribute is
541 created, the root user ID of the creating thread's user namespace is
542 saved in the extended attribute.
543
544 By contrast, creating or modifying a security.capability extended
545 attribute from a privileged (CAP_SETFCAP) thread that resides in the
546 namespace where the underlying filesystem was mounted (this normally
547 means the initial user namespace) automatically results in the creation
548 of a version 2 (VFS_CAP_REVISION_2) attribute.
549
550 Note that the creation of a version 3 security.capability extended
551 attribute is automatic. That is to say, when a user-space application
552 writes (setxattr(2)) a security.capability attribute in the version 2
553 format, the kernel will automatically create a version 3 attribute if
554 the attribute is created in the circumstances described above. Corre‐
555 spondingly, when a version 3 security.capability attribute is retrieved
556 (getxattr(2)) by a process that resides inside a user namespace that
557 was created by the root user ID (or a descendant of that user names‐
558 pace), the returned attribute is (automatically) simplified to appear
559 as a version 2 attribute (i.e., the returned value is the size of a
560 version 2 attribute and does not include the root user ID). These
561 automatic translations mean that no changes are required to user-space
562 tools (e.g., setcap(1) and getcap(1)) in order for those tools to be
563 used to create and retrieve version 3 security.capability attributes.
564
565 Note that a file can have either a version 2 or a version 3 secu‐
566 rity.capability extended attribute associated with it, but not both:
567 creation or modification of the security.capability extended attribute
568 will automatically modify the version according to the circumstances in
569 which the extended attribute is created or modified.
570
571 Transformation of capabilities during execve()
572 During an execve(2), the kernel calculates the new capabilities of the
573 process using the following algorithm:
574
575 P'(ambient) = (file is privileged) ? 0 : P(ambient)
576
577 P'(permitted) = (P(inheritable) & F(inheritable)) |
578 (F(permitted) & P(bounding)) | P'(ambient)
579
580 P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
581
582 P'(inheritable) = P(inheritable) [i.e., unchanged]
583
584 P'(bounding) = P(bounding) [i.e., unchanged]
585
586 where:
587
588 P() denotes the value of a thread capability set before the
589 execve(2)
590
591 P'() denotes the value of a thread capability set after the
592 execve(2)
593
594 F() denotes a file capability set
595
596 Note the following details relating to the above capability transforma‐
597 tion rules:
598
599 * The ambient capability set is present only since Linux 4.3. When
600 determining the transformation of the ambient set during execve(2),
601 a privileged file is one that has capabilities or has the set-user-
602 ID or set-group-ID bit set.
603
604 * Prior to Linux 2.6.25, the bounding set was a system-wide attribute
605 shared by all threads. That system-wide value was employed to cal‐
606 culate the new permitted set during execve(2) in the same manner as
607 shown above for P(bounding).
608
609 Note: during the capability transitions described above, file capabili‐
610 ties may be ignored (treated as empty) for the same reasons that the
611 set-user-ID and set-group-ID bits are ignored; see execve(2). File
612 capabilities are similarly ignored if the kernel was booted with the
613 no_file_caps option.
614
615 Note: according to the rules above, if a process with nonzero user IDs
616 performs an execve(2) then any capabilities that are present in its
617 permitted and effective sets will be cleared. For the treatment of
618 capabilities when a process with a user ID of zero performs an
619 execve(2), see below under Capabilities and execution of programs by
620 root.
621
622 Safety checking for capability-dumb binaries
623 A capability-dumb binary is an application that has been marked to have
624 file capabilities, but has not been converted to use the libcap(3) API
625 to manipulate its capabilities. (In other words, this is a traditional
626 set-user-ID-root program that has been switched to use file capabili‐
627 ties, but whose code has not been modified to understand capabilities.)
628 For such applications, the effective capability bit is set on the file,
629 so that the file permitted capabilities are automatically enabled in
630 the process effective set when executing the file. The kernel recog‐
631 nizes a file which has the effective capability bit set as capability-
632 dumb for the purpose of the check described here.
633
634 When executing a capability-dumb binary, the kernel checks if the
635 process obtained all permitted capabilities that were specified in the
636 file permitted set, after the capability transformations described
637 above have been performed. (The typical reason why this might not
638 occur is that the capability bounding set masked out some of the capa‐
639 bilities in the file permitted set.) If the process did not obtain the
640 full set of file permitted capabilities, then execve(2) fails with the
641 error EPERM. This prevents possible security risks that could arise
642 when a capability-dumb application is executed with less privilege that
643 it needs. Note that, by definition, the application could not itself
644 recognize this problem, since it does not employ the libcap(3) API.
645
646 Capabilities and execution of programs by root
647 In order to mirror traditional UNIX semantics, the kernel performs spe‐
648 cial treatment of file capabilities when a process with UID 0 (root)
649 executes a program and when a set-user-ID-root program is executed.
650
651 After having performed any changes to the process effective ID that
652 were triggered by the set-user-ID mode bit of the binary—e.g., switch‐
653 ing the effective user ID to 0 (root) because a set-user-ID-root pro‐
654 gram was executed—the kernel calculates the file capability sets as
655 follows:
656
657 1. If the real or effective user ID of the process is 0 (root), then
658 the file inheritable and permitted sets are ignored; instead they
659 are notionally considered to be all ones (i.e., all capabilities
660 enabled). (There is one exception to this behavior, described below
661 in Set-user-ID-root programs that have file capabilities.)
662
663 2. If the effective user ID of the process is 0 (root) or the file
664 effective bit is in fact enabled, then the file effective bit is
665 notionally defined to be one (enabled).
666
667 These notional values for the file's capability sets are then used as
668 described above to calculate the transformation of the process's capa‐
669 bilities during execve(2).
670
671 Thus, when a process with nonzero UIDs execve(2)s a set-user-ID-root
672 program that does not have capabilities attached, or when a process
673 whose real and effective UIDs are zero execve(2)s a program, the calcu‐
674 lation of the process's new permitted capabilities simplifies to:
675
676 P'(permitted) = P(inheritable) | P(bounding)
677
678 P'(effective) = P'(permitted)
679
680 Consequently, the process gains all capabilities in its permitted and
681 effective capability sets, except those masked out by the capability
682 bounding set. (In the calculation of P'(permitted), the P'(ambient)
683 term can be simplified away because it is by definition a proper subset
684 of P(inheritable).)
685
686 The special treatments of user ID 0 (root) described in this subsection
687 can be disabled using the securebits mechanism described below.
688
689 Set-user-ID-root programs that have file capabilities
690 There is one exception to the behavior described under Capabilities and
691 execution of programs by root. If (a) the binary that is being exe‐
692 cuted has capabilities attached and (b) the real user ID of the process
693 is not 0 (root) and (c) the effective user ID of the process is 0
694 (root), then the file capability bits are honored (i.e., they are not
695 notionally considered to be all ones). The usual way in which this
696 situation can arise is when executing a set-UID-root program that also
697 has file capabilities. When such a program is executed, the process
698 gains just the capabilities granted by the program (i.e., not all capa‐
699 bilities, as would occur when executing a set-user-ID-root program that
700 does not have any associated file capabilities).
701
702 Note that one can assign empty capability sets to a program file, and
703 thus it is possible to create a set-user-ID-root program that changes
704 the effective and saved set-user-ID of the process that executes the
705 program to 0, but confers no capabilities to that process.
706
707 Capability bounding set
708 The capability bounding set is a security mechanism that can be used to
709 limit the capabilities that can be gained during an execve(2). The
710 bounding set is used in the following ways:
711
712 * During an execve(2), the capability bounding set is ANDed with the
713 file permitted capability set, and the result of this operation is
714 assigned to the thread's permitted capability set. The capability
715 bounding set thus places a limit on the permitted capabilities that
716 may be granted by an executable file.
717
718 * (Since Linux 2.6.25) The capability bounding set acts as a limiting
719 superset for the capabilities that a thread can add to its inherita‐
720 ble set using capset(2). This means that if a capability is not in
721 the bounding set, then a thread can't add this capability to its
722 inheritable set, even if it was in its permitted capabilities, and
723 thereby cannot have this capability preserved in its permitted set
724 when it execve(2)s a file that has the capability in its inheritable
725 set.
726
727 Note that the bounding set masks the file permitted capabilities, but
728 not the inheritable capabilities. If a thread maintains a capability
729 in its inheritable set that is not in its bounding set, then it can
730 still gain that capability in its permitted set by executing a file
731 that has the capability in its inheritable set.
732
733 Depending on the kernel version, the capability bounding set is either
734 a system-wide attribute, or a per-process attribute.
735
736 Capability bounding set from Linux 2.6.25 onward
737
738 From Linux 2.6.25, the capability bounding set is a per-thread
739 attribute. (The system-wide capability bounding set described below no
740 longer exists.)
741
742 The bounding set is inherited at fork(2) from the thread's parent, and
743 is preserved across an execve(2).
744
745 A thread may remove capabilities from its capability bounding set using
746 the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
747 capability. Once a capability has been dropped from the bounding set,
748 it cannot be restored to that set. A thread can determine if a capa‐
749 bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
750 tion.
751
752 Removing capabilities from the bounding set is supported only if file
753 capabilities are compiled into the kernel. In kernels before Linux
754 2.6.33, file capabilities were an optional feature configurable via the
755 CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the con‐
756 figuration option has been removed and file capabilities are always
757 part of the kernel. When file capabilities are compiled into the ker‐
758 nel, the init process (the ancestor of all processes) begins with a
759 full bounding set. If file capabilities are not compiled into the ker‐
760 nel, then init begins with a full bounding set minus CAP_SETPCAP,
761 because this capability has a different meaning when there are no file
762 capabilities.
763
764 Removing a capability from the bounding set does not remove it from the
765 thread's inheritable set. However it does prevent the capability from
766 being added back into the thread's inheritable set in the future.
767
768 Capability bounding set prior to Linux 2.6.25
769
770 In kernels before 2.6.25, the capability bounding set is a system-wide
771 attribute that affects all threads on the system. The bounding set is
772 accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this
773 bit mask parameter is expressed as a signed decimal number in
774 /proc/sys/kernel/cap-bound.)
775
776 Only the init process may set capabilities in the capability bounding
777 set; other than that, the superuser (more precisely: a process with the
778 CAP_SYS_MODULE capability) may only clear capabilities from this set.
779
780 On a standard system the capability bounding set always masks out the
781 CAP_SETPCAP capability. To remove this restriction (dangerous!), mod‐
782 ify the definition of CAP_INIT_EFF_SET in include/linux/capability.h
783 and rebuild the kernel.
784
785 The system-wide capability bounding set feature was added to Linux
786 starting with kernel version 2.2.11.
787
788 Effect of user ID changes on capabilities
789 To preserve the traditional semantics for transitions between 0 and
790 nonzero user IDs, the kernel makes the following changes to a thread's
791 capability sets on changes to the thread's real, effective, saved set,
792 and filesystem user IDs (using setuid(2), setresuid(2), or similar):
793
794 1. If one or more of the real, effective or saved set user IDs was pre‐
795 viously 0, and as a result of the UID changes all of these IDs have
796 a nonzero value, then all capabilities are cleared from the permit‐
797 ted, effective, and ambient capability sets.
798
799 2. If the effective user ID is changed from 0 to nonzero, then all
800 capabilities are cleared from the effective set.
801
802 3. If the effective user ID is changed from nonzero to 0, then the per‐
803 mitted set is copied to the effective set.
804
805 4. If the filesystem user ID is changed from 0 to nonzero (see setf‐
806 suid(2)), then the following capabilities are cleared from the
807 effective set: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
808 CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE (since Linux 2.6.30),
809 CAP_MAC_OVERRIDE, and CAP_MKNOD (since Linux 2.6.30). If the
810 filesystem UID is changed from nonzero to 0, then any of these capa‐
811 bilities that are enabled in the permitted set are enabled in the
812 effective set.
813
814 If a thread that has a 0 value for one or more of its user IDs wants to
815 prevent its permitted capability set being cleared when it resets all
816 of its user IDs to nonzero values, it can do so using the
817 SECBIT_KEEP_CAPS securebits flag described below.
818
819 Programmatically adjusting capability sets
820 A thread can retrieve and change its permitted, effective, and inheri‐
821 table capability sets using the capget(2) and capset(2) system calls.
822 However, the use of cap_get_proc(3) and cap_set_proc(3), both provided
823 in the libcap package, is preferred for this purpose. The following
824 rules govern changes to the thread capability sets:
825
826 1. If the caller does not have the CAP_SETPCAP capability, the new
827 inheritable set must be a subset of the combination of the existing
828 inheritable and permitted sets.
829
830 2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
831 combination of the existing inheritable set and the capability
832 bounding set.
833
834 3. The new permitted set must be a subset of the existing permitted set
835 (i.e., it is not possible to acquire permitted capabilities that the
836 thread does not currently have).
837
838 4. The new effective set must be a subset of the new permitted set.
839
840 The securebits flags: establishing a capabilities-only environment
841 Starting with kernel 2.6.26, and with a kernel in which file capabili‐
842 ties are enabled, Linux implements a set of per-thread securebits flags
843 that can be used to disable special handling of capabilities for UID 0
844 (root). These flags are as follows:
845
846 SECBIT_KEEP_CAPS
847 Setting this flag allows a thread that has one or more 0 UIDs to
848 retain capabilities in its permitted set when it switches all of
849 its UIDs to nonzero values. If this flag is not set, then such
850 a UID switch causes the thread to lose all permitted capabili‐
851 ties. This flag is always cleared on an execve(2).
852
853 Note that even with the SECBIT_KEEP_CAPS flag set, the effective
854 capabilities of a thread are cleared when it switches its effec‐
855 tive UID to a nonzero value. However, if the thread has set
856 this flag and its effective UID is already nonzero, and the
857 thread subsequently switches all other UIDs to nonzero values,
858 then the effective capabilities will not be cleared.
859
860 The setting of the SECBIT_KEEP_CAPS flag is ignored if the
861 SECBIT_NO_SETUID_FIXUP flag is set. (The latter flag provides a
862 superset of the effect of the former flag.)
863
864 This flag provides the same functionality as the older prctl(2)
865 PR_SET_KEEPCAPS operation.
866
867 SECBIT_NO_SETUID_FIXUP
868 Setting this flag stops the kernel from adjusting the process's
869 permitted, effective, and ambient capability sets when the
870 thread's effective and filesystem UIDs are switched between zero
871 and nonzero values. (See the subsection Effect of user ID
872 changes on capabilities.)
873
874 SECBIT_NOROOT
875 If this bit is set, then the kernel does not grant capabilities
876 when a set-user-ID-root program is executed, or when a process
877 with an effective or real UID of 0 calls execve(2). (See the
878 subsection Capabilities and execution of programs by root.)
879
880 SECBIT_NO_CAP_AMBIENT_RAISE
881 Setting this flag disallows raising ambient capabilities via the
882 prctl(2) PR_CAP_AMBIENT_RAISE operation.
883
884 Each of the above "base" flags has a companion "locked" flag. Setting
885 any of the "locked" flags is irreversible, and has the effect of pre‐
886 venting further changes to the corresponding "base" flag. The locked
887 flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED,
888 SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
889
890 The securebits flags can be modified and retrieved using the prctl(2)
891 PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP
892 capability is required to modify the flags. Note that the SECBIT_*
893 constants are available only after including the <linux/securebits.h>
894 header file.
895
896 The securebits flags are inherited by child processes. During an
897 execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS
898 which is always cleared.
899
900 An application can use the following call to lock itself, and all of
901 its descendants, into an environment where the only way of gaining
902 capabilities is by executing a program with associated file capabili‐
903 ties:
904
905 prctl(PR_SET_SECUREBITS,
906 /* SECBIT_KEEP_CAPS off */
907 SECBIT_KEEP_CAPS_LOCKED |
908 SECBIT_NO_SETUID_FIXUP |
909 SECBIT_NO_SETUID_FIXUP_LOCKED |
910 SECBIT_NOROOT |
911 SECBIT_NOROOT_LOCKED);
912 /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
913 is not required */
914
915 Per-user-namespace "set-user-ID-root" programs
916 A set-user-ID program whose UID matches the UID that created a user
917 namespace will confer capabilities in the process's permitted and
918 effective sets when executed by any process inside that namespace or
919 any descendant user namespace.
920
921 The rules about the transformation of the process's capabilities during
922 the execve(2) are exactly as described in the subsections Transforma‐
923 tion of capabilities during execve() and Capabilities and execution of
924 programs by root, with the difference that, in the latter subsection,
925 "root" is the UID of the creator of the user namespace.
926
927 Namespaced file capabilities
928 Traditional (i.e., version 2) file capabilities associate only a set of
929 capability masks with a binary executable file. When a process exe‐
930 cutes a binary with such capabilities, it gains the associated capabil‐
931 ities (within its user namespace) as per the rules described above in
932 "Transformation of capabilities during execve()".
933
934 Because version 2 file capabilities confer capabilities to the execut‐
935 ing process regardless of which user namespace it resides in, only
936 privileged processes are permitted to associate capabilities with a
937 file. Here, "privileged" means a process that has the CAP_SETFCAP
938 capability in the user namespace where the filesystem was mounted (nor‐
939 mally the initial user namespace). This limitation renders file capa‐
940 bilities useless for certain use cases. For example, in user-names‐
941 paced containers, it can be desirable to be able to create a binary
942 that confers capabilities only to processes executed inside that con‐
943 tainer, but not to processes that are executed outside the container.
944
945 Linux 4.14 added so-called namespaced file capabilities to support such
946 use cases. Namespaced file capabilities are recorded as version 3
947 (i.e., VFS_CAP_REVISION_3) security.capability extended attributes.
948 Such an attribute is automatically created in the circumstances
949 described above under "File capability extended attribute versioning".
950 When a version 3 security.capability extended attribute is created, the
951 kernel records not just the capability masks in the extended attribute,
952 but also the namespace root user ID.
953
954 As with a binary that has VFS_CAP_REVISION_2 file capabilities, a
955 binary with VFS_CAP_REVISION_3 file capabilities confers capabilities
956 to a process during execve(). However, capabilities are conferred only
957 if the binary is executed by a process that resides in a user namespace
958 whose UID 0 maps to the root user ID that is saved in the extended
959 attribute, or when executed by a process that resides in a descendant
960 of such a namespace.
961
962 Interaction with user namespaces
963 For further information on the interaction of capabilities and user
964 namespaces, see user_namespaces(7).
965
967 No standards govern capabilities, but the Linux capability implementa‐
968 tion is based on the withdrawn POSIX.1e draft standard; see
969 ⟨https://archive.org/details/posix_1003.1e-990310⟩.
970
972 When attempting to strace(1) binaries that have capabilities (or set-
973 user-ID-root binaries), you may find the -u <username> option useful.
974 Something like:
975
976 $ sudo strace -o trace.log -u ceci ./myprivprog
977
978 From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
979 nel component, and could be enabled/disabled via the CONFIG_SECU‐
980 RITY_CAPABILITIES kernel configuration option.
981
982 The /proc/[pid]/task/TID/status file can be used to view the capability
983 sets of a thread. The /proc/[pid]/status file shows the capability
984 sets of a process's main thread. Before Linux 3.8, nonexistent capa‐
985 bilities were shown as being enabled (1) in these sets. Since Linux
986 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as
987 disabled (0).
988
989 The libcap package provides a suite of routines for setting and getting
990 capabilities that is more comfortable and less likely to change than
991 the interface provided by capset(2) and capget(2). This package also
992 provides the setcap(8) and getcap(8) programs. It can be found at
993 ⟨https://git.kernel.org/pub/scm/libs/libcap/libcap.git/refs/⟩.
994
995 Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file
996 capabilities are not enabled, a thread with the CAP_SETPCAP capability
997 can manipulate the capabilities of threads other than itself. However,
998 this is only theoretically possible, since no thread ever has CAP_SETP‐
999 CAP in either of these cases:
1000
1001 * In the pre-2.6.25 implementation the system-wide capability bounding
1002 set, /proc/sys/kernel/cap-bound, always masks out the CAP_SETPCAP
1003 capability, and this can not be changed without modifying the kernel
1004 source and rebuilding the kernel.
1005
1006 * If file capabilities are disabled (i.e., the kernel CONFIG_SECU‐
1007 RITY_FILE_CAPABILITIES option is disabled), then init starts out with
1008 the CAP_SETPCAP capability removed from its per-process bounding set,
1009 and that bounding set is inherited by all other processes created on
1010 the system.
1011
1013 capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3),
1014 cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3),
1015 cap_init(3), capgetp(3), capsetp(3), libcap(3), proc(5), creden‐
1016 tials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), get‐
1017 cap(8), netcap(8), pscap(8), setcap(8)
1018
1019 include/linux/capability.h in the Linux kernel source tree
1020
1022 This page is part of release 5.07 of the Linux man-pages project. A
1023 description of the project, information about reporting bugs, and the
1024 latest version of this page, can be found at
1025 https://www.kernel.org/doc/man-pages/.
1026
1027
1028
1029Linux 2019-08-02 CAPABILITIES(7)