1CAPABILITIES(7) Linux Programmer's Manual CAPABILITIES(7)
2
3
4
6 capabilities - overview of Linux capabilities
7
9 For the purpose of performing permission checks, traditional UNIX im‐
10 plementations distinguish two categories of processes: privileged pro‐
11 cesses (whose effective user ID is 0, referred to as superuser or
12 root), and unprivileged processes (whose effective UID is nonzero).
13 Privileged processes bypass all kernel permission checks, while unpriv‐
14 ileged processes are subject to full permission checking based on the
15 process's credentials (usually: effective UID, effective GID, and sup‐
16 plementary group list).
17
18 Starting with kernel 2.2, Linux divides the privileges traditionally
19 associated with superuser into distinct units, known as capabilities,
20 which can be independently enabled and disabled. Capabilities are a
21 per-thread attribute.
22
23 Capabilities list
24 The following list shows the capabilities implemented on Linux, and the
25 operations or behaviors that each capability permits:
26
27 CAP_AUDIT_CONTROL (since Linux 2.6.11)
28 Enable and disable kernel auditing; change auditing filter
29 rules; retrieve auditing status and filtering rules.
30
31 CAP_AUDIT_READ (since Linux 3.16)
32 Allow reading the audit log via a multicast netlink socket.
33
34 CAP_AUDIT_WRITE (since Linux 2.6.11)
35 Write records to kernel auditing log.
36
37 CAP_BLOCK_SUSPEND (since Linux 3.5)
38 Employ features that can block system suspend (epoll(7) EPOLL‐
39 WAKEUP, /proc/sys/wake_lock).
40
41 CAP_BPF (since Linux 5.8)
42 Employ privileged BPF operations; see bpf(2) and bpf-helpers(7).
43
44 This capability was added in Linux 5.8 to separate out BPF func‐
45 tionality from the overloaded CAP_SYS_ADMIN capability.
46
47 CAP_CHECKPOINT_RESTORE (since Linux 5.9)
48 * Update /proc/sys/kernel/ns_last_pid (see pid_namespaces(7));
49 * employ the set_tid feature of clone3(2);
50 * read the contents of the symbolic links in
51 /proc/[pid]/map_files for other processes.
52
53 This capability was added in Linux 5.9 to separate out check‐
54 point/restore functionality from the overloaded CAP_SYS_ADMIN
55 capability.
56
57 CAP_CHOWN
58 Make arbitrary changes to file UIDs and GIDs (see chown(2)).
59
60 CAP_DAC_OVERRIDE
61 Bypass file read, write, and execute permission checks. (DAC is
62 an abbreviation of "discretionary access control".)
63
64 CAP_DAC_READ_SEARCH
65 * Bypass file read permission checks and directory read and exe‐
66 cute permission checks;
67 * invoke open_by_handle_at(2);
68 * use the linkat(2) AT_EMPTY_PATH flag to create a link to a
69 file referred to by a file descriptor.
70
71 CAP_FOWNER
72 * Bypass permission checks on operations that normally require
73 the filesystem UID of the process to match the UID of the file
74 (e.g., chmod(2), utime(2)), excluding those operations covered
75 by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
76 * set inode flags (see ioctl_iflags(2)) on arbitrary files;
77 * set Access Control Lists (ACLs) on arbitrary files;
78 * ignore directory sticky bit on file deletion;
79 * modify user extended attributes on sticky directory owned by
80 any user;
81 * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
82
83 CAP_FSETID
84 * Don't clear set-user-ID and set-group-ID mode bits when a file
85 is modified;
86 * set the set-group-ID bit for a file whose GID does not match
87 the filesystem or any of the supplementary GIDs of the calling
88 process.
89
90 CAP_IPC_LOCK
91 Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
92
93 CAP_IPC_OWNER
94 Bypass permission checks for operations on System V IPC objects.
95
96 CAP_KILL
97 Bypass permission checks for sending signals (see kill(2)).
98 This includes use of the ioctl(2) KDSIGACCEPT operation.
99
100 CAP_LEASE (since Linux 2.4)
101 Establish leases on arbitrary files (see fcntl(2)).
102
103 CAP_LINUX_IMMUTABLE
104 Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags (see
105 ioctl_iflags(2)).
106
107 CAP_MAC_ADMIN (since Linux 2.6.25)
108 Allow MAC configuration or state changes. Implemented for the
109 Smack Linux Security Module (LSM).
110
111 CAP_MAC_OVERRIDE (since Linux 2.6.25)
112 Override Mandatory Access Control (MAC). Implemented for the
113 Smack LSM.
114
115 CAP_MKNOD (since Linux 2.4)
116 Create special files using mknod(2).
117
118 CAP_NET_ADMIN
119 Perform various network-related operations:
120 * interface configuration;
121 * administration of IP firewall, masquerading, and accounting;
122 * modify routing tables;
123 * bind to any address for transparent proxying;
124 * set type-of-service (TOS);
125 * clear driver statistics;
126 * set promiscuous mode;
127 * enabling multicasting;
128 * use setsockopt(2) to set the following socket options: SO_DE‐
129 BUG, SO_MARK, SO_PRIORITY (for a priority outside the range 0
130 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
131
132 CAP_NET_BIND_SERVICE
133 Bind a socket to Internet domain privileged ports (port numbers
134 less than 1024).
135
136 CAP_NET_BROADCAST
137 (Unused) Make socket broadcasts, and listen to multicasts.
138
139 CAP_NET_RAW
140 * Use RAW and PACKET sockets;
141 * bind to any address for transparent proxying.
142
143 CAP_PERFMON (since Linux 5.8)
144 Employ various performance-monitoring mechanisms, including:
145
146 * call perf_event_open(2);
147 * employ various BPF operations that have performance implica‐
148 tions.
149
150 This capability was added in Linux 5.8 to separate out perfor‐
151 mance monitoring functionality from the overloaded CAP_SYS_ADMIN
152 capability. See also the kernel source file Documentation/ad‐
153 min-guide/perf-security.rst.
154
155 CAP_SETGID
156 * Make arbitrary manipulations of process GIDs and supplementary
157 GID list;
158 * forge GID when passing socket credentials via UNIX domain
159 sockets;
160 * write a group ID mapping in a user namespace (see user_name‐
161 spaces(7)).
162
163 CAP_SETFCAP (since Linux 2.6.24)
164 Set arbitrary capabilities on a file.
165
166 CAP_SETPCAP
167 If file capabilities are supported (i.e., since Linux 2.6.24):
168 add any capability from the calling thread's bounding set to its
169 inheritable set; drop capabilities from the bounding set (via
170 prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
171
172 If file capabilities are not supported (i.e., kernels before
173 Linux 2.6.24): grant or remove any capability in the caller's
174 permitted capability set to or from any other process. (This
175 property of CAP_SETPCAP is not available when the kernel is con‐
176 figured to support file capabilities, since CAP_SETPCAP has en‐
177 tirely different semantics for such kernels.)
178
179 CAP_SETUID
180 * Make arbitrary manipulations of process UIDs (setuid(2), se‐
181 treuid(2), setresuid(2), setfsuid(2));
182 * forge UID when passing socket credentials via UNIX domain
183 sockets;
184 * write a user ID mapping in a user namespace (see user_name‐
185 spaces(7)).
186
187 CAP_SYS_ADMIN
188 Note: this capability is overloaded; see Notes to kernel devel‐
189 opers, below.
190
191 * Perform a range of system administration operations including:
192 quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2),
193 swapoff(2), sethostname(2), and setdomainname(2);
194 * perform privileged syslog(2) operations (since Linux 2.6.37,
195 CAP_SYSLOG should be used to permit such operations);
196 * perform VM86_REQUEST_IRQ vm86(2) command;
197 * access the same checkpoint/restore functionality that is gov‐
198 erned by CAP_CHECKPOINT_RESTORE (but the latter, weaker capa‐
199 bility is preferred for accessing that functionality).
200 * perform the same BPF operations as are governed by CAP_BPF
201 (but the latter, weaker capability is preferred for accessing
202 that functionality).
203 * employ the same performance monitoring mechanisms as are gov‐
204 erned by CAP_PERFMON (but the latter, weaker capability is
205 preferred for accessing that functionality).
206 * perform IPC_SET and IPC_RMID operations on arbitrary System V
207 IPC objects;
208 * override RLIMIT_NPROC resource limit;
209 * perform operations on trusted and security extended attributes
210 (see xattr(7));
211 * use lookup_dcookie(2);
212 * use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
213 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
214 * forge PID when passing socket credentials via UNIX domain
215 sockets;
216 * exceed /proc/sys/fs/file-max, the system-wide limit on the
217 number of open files, in system calls that open files (e.g.,
218 accept(2), execve(2), open(2), pipe(2));
219 * employ CLONE_* flags that create new namespaces with clone(2)
220 and unshare(2) (but, since Linux 3.8, creating user namespaces
221 does not require any capability);
222 * access privileged perf event information;
223 * call setns(2) (requires CAP_SYS_ADMIN in the target name‐
224 space);
225 * call fanotify_init(2);
226 * perform privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
227 operations;
228 * perform madvise(2) MADV_HWPOISON operation;
229 * employ the TIOCSTI ioctl(2) to insert characters into the in‐
230 put queue of a terminal other than the caller's controlling
231 terminal;
232 * employ the obsolete nfsservctl(2) system call;
233 * employ the obsolete bdflush(2) system call;
234 * perform various privileged block-device ioctl(2) operations;
235 * perform various privileged filesystem ioctl(2) operations;
236 * perform privileged ioctl(2) operations on the /dev/random de‐
237 vice (see random(4));
238 * install a seccomp(2) filter without first having to set the
239 no_new_privs thread attribute;
240 * modify allow/deny rules for device control groups;
241 * employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER operation to
242 dump tracee's seccomp filters;
243 * employ the ptrace(2) PTRACE_SETOPTIONS operation to suspend
244 the tracee's seccomp protections (i.e., the PTRACE_O_SUS‐
245 PEND_SECCOMP flag);
246 * perform administrative operations on many device drivers;
247 * modify autogroup nice values by writing to /proc/[pid]/auto‐
248 group (see sched(7)).
249
250 CAP_SYS_BOOT
251 Use reboot(2) and kexec_load(2).
252
253 CAP_SYS_CHROOT
254 * Use chroot(2);
255 * change mount namespaces using setns(2).
256
257 CAP_SYS_MODULE
258 * Load and unload kernel modules (see init_module(2) and
259 delete_module(2));
260 * in kernels before 2.6.25: drop capabilities from the system-
261 wide capability bounding set.
262
263 CAP_SYS_NICE
264 * Lower the process nice value (nice(2), setpriority(2)) and
265 change the nice value for arbitrary processes;
266 * set real-time scheduling policies for calling process, and set
267 scheduling policies and priorities for arbitrary processes
268 (sched_setscheduler(2), sched_setparam(2), sched_setattr(2));
269 * set CPU affinity for arbitrary processes (sched_setaffin‐
270 ity(2));
271 * set I/O scheduling class and priority for arbitrary processes
272 (ioprio_set(2));
273 * apply migrate_pages(2) to arbitrary processes and allow pro‐
274 cesses to be migrated to arbitrary nodes;
275 * apply move_pages(2) to arbitrary processes;
276 * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
277
278 CAP_SYS_PACCT
279 Use acct(2).
280
281 CAP_SYS_PTRACE
282 * Trace arbitrary processes using ptrace(2);
283 * apply get_robust_list(2) to arbitrary processes;
284 * transfer data to or from the memory of arbitrary processes us‐
285 ing process_vm_readv(2) and process_vm_writev(2);
286 * inspect processes using kcmp(2).
287
288 CAP_SYS_RAWIO
289 * Perform I/O port operations (iopl(2) and ioperm(2));
290 * access /proc/kcore;
291 * employ the FIBMAP ioctl(2) operation;
292 * open devices for accessing x86 model-specific registers (MSRs,
293 see msr(4));
294 * update /proc/sys/vm/mmap_min_addr;
295 * create memory mappings at addresses below the value specified
296 by /proc/sys/vm/mmap_min_addr;
297 * map files in /proc/bus/pci;
298 * open /dev/mem and /dev/kmem;
299 * perform various SCSI device commands;
300 * perform certain operations on hpsa(4) and cciss(4) devices;
301 * perform a range of device-specific operations on other de‐
302 vices.
303
304 CAP_SYS_RESOURCE
305 * Use reserved space on ext2 filesystems;
306 * make ioctl(2) calls controlling ext3 journaling;
307 * override disk quota limits;
308 * increase resource limits (see setrlimit(2));
309 * override RLIMIT_NPROC resource limit;
310 * override maximum number of consoles on console allocation;
311 * override maximum number of keymaps;
312 * allow more than 64hz interrupts from the real-time clock;
313 * raise msg_qbytes limit for a System V message queue above the
314 limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
315 * allow the RLIMIT_NOFILE resource limit on the number of "in-
316 flight" file descriptors to be bypassed when passing file de‐
317 scriptors to another process via a UNIX domain socket (see
318 unix(7));
319 * override the /proc/sys/fs/pipe-size-max limit when setting the
320 capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command;
321 * use F_SETPIPE_SZ to increase the capacity of a pipe above the
322 limit specified by /proc/sys/fs/pipe-max-size;
323 * override /proc/sys/fs/mqueue/queues_max,
324 /proc/sys/fs/mqueue/msg_max, and /proc/sys/fs/mqueue/msg‐
325 size_max limits when creating POSIX message queues (see
326 mq_overview(7));
327 * employ the prctl(2) PR_SET_MM operation;
328 * set /proc/[pid]/oom_score_adj to a value lower than the value
329 last set by a process with CAP_SYS_RESOURCE.
330
331 CAP_SYS_TIME
332 Set system clock (settimeofday(2), stime(2), adjtimex(2)); set
333 real-time (hardware) clock.
334
335 CAP_SYS_TTY_CONFIG
336 Use vhangup(2); employ various privileged ioctl(2) operations on
337 virtual terminals.
338
339 CAP_SYSLOG (since Linux 2.6.37)
340 * Perform privileged syslog(2) operations. See syslog(2) for
341 information on which operations require privilege.
342 * View kernel addresses exposed via /proc and other interfaces
343 when /proc/sys/kernel/kptr_restrict has the value 1. (See the
344 discussion of the kptr_restrict in proc(5).)
345
346 CAP_WAKE_ALARM (since Linux 3.0)
347 Trigger something that will wake up the system (set CLOCK_REAL‐
348 TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
349
350 Past and current implementation
351 A full implementation of capabilities requires that:
352
353 1. For all privileged operations, the kernel must check whether the
354 thread has the required capability in its effective set.
355
356 2. The kernel must provide system calls allowing a thread's capability
357 sets to be changed and retrieved.
358
359 3. The filesystem must support attaching capabilities to an executable
360 file, so that a process gains those capabilities when the file is
361 executed.
362
363 Before kernel 2.6.24, only the first two of these requirements are met;
364 since kernel 2.6.24, all three requirements are met.
365
366 Notes to kernel developers
367 When adding a new kernel feature that should be governed by a capabil‐
368 ity, consider the following points.
369
370 * The goal of capabilities is divide the power of superuser into
371 pieces, such that if a program that has one or more capabilities is
372 compromised, its power to do damage to the system would be less than
373 the same program running with root privilege.
374
375 * You have the choice of either creating a new capability for your new
376 feature, or associating the feature with one of the existing capa‐
377 bilities. In order to keep the set of capabilities to a manageable
378 size, the latter option is preferable, unless there are compelling
379 reasons to take the former option. (There is also a technical
380 limit: the size of capability sets is currently limited to 64 bits.)
381
382 * To determine which existing capability might best be associated with
383 your new feature, review the list of capabilities above in order to
384 find a "silo" into which your new feature best fits. One approach
385 to take is to determine if there are other features requiring capa‐
386 bilities that will always be used along with the new feature. If
387 the new feature is useless without these other features, you should
388 use the same capability as the other features.
389
390 * Don't choose CAP_SYS_ADMIN if you can possibly avoid it! A vast
391 proportion of existing capability checks are associated with this
392 capability (see the partial list above). It can plausibly be called
393 "the new root", since on the one hand, it confers a wide range of
394 powers, and on the other hand, its broad scope means that this is
395 the capability that is required by many privileged programs. Don't
396 make the problem worse. The only new features that should be asso‐
397 ciated with CAP_SYS_ADMIN are ones that closely match existing uses
398 in that silo.
399
400 * If you have determined that it really is necessary to create a new
401 capability for your feature, don't make or name it as a "single-use"
402 capability. Thus, for example, the addition of the highly specific
403 CAP_SYS_PACCT was probably a mistake. Instead, try to identify and
404 name your new capability as a broader silo into which other related
405 future use cases might fit.
406
407 Thread capability sets
408 Each thread has the following capability sets containing zero or more
409 of the above capabilities:
410
411 Permitted
412 This is a limiting superset for the effective capabilities that
413 the thread may assume. It is also a limiting superset for the
414 capabilities that may be added to the inheritable set by a
415 thread that does not have the CAP_SETPCAP capability in its ef‐
416 fective set.
417
418 If a thread drops a capability from its permitted set, it can
419 never reacquire that capability (unless it execve(2)s either a
420 set-user-ID-root program, or a program whose associated file ca‐
421 pabilities grant that capability).
422
423 Inheritable
424 This is a set of capabilities preserved across an execve(2).
425 Inheritable capabilities remain inheritable when executing any
426 program, and inheritable capabilities are added to the permitted
427 set when executing a program that has the corresponding bits set
428 in the file inheritable set.
429
430 Because inheritable capabilities are not generally preserved
431 across execve(2) when running as a non-root user, applications
432 that wish to run helper programs with elevated capabilities
433 should consider using ambient capabilities, described below.
434
435 Effective
436 This is the set of capabilities used by the kernel to perform
437 permission checks for the thread.
438
439 Bounding (per-thread since Linux 2.6.25)
440 The capability bounding set is a mechanism that can be used to
441 limit the capabilities that are gained during execve(2).
442
443 Since Linux 2.6.25, this is a per-thread capability set. In
444 older kernels, the capability bounding set was a system wide at‐
445 tribute shared by all threads on the system.
446
447 For more details on the capability bounding set, see below.
448
449 Ambient (since Linux 4.3)
450 This is a set of capabilities that are preserved across an ex‐
451 ecve(2) of a program that is not privileged. The ambient capa‐
452 bility set obeys the invariant that no capability can ever be
453 ambient if it is not both permitted and inheritable.
454
455 The ambient capability set can be directly modified using
456 prctl(2). Ambient capabilities are automatically lowered if ei‐
457 ther of the corresponding permitted or inheritable capabilities
458 is lowered.
459
460 Executing a program that changes UID or GID due to the set-user-
461 ID or set-group-ID bits or executing a program that has any file
462 capabilities set will clear the ambient set. Ambient capabili‐
463 ties are added to the permitted set and assigned to the effec‐
464 tive set when execve(2) is called. If ambient capabilities
465 cause a process's permitted and effective capabilities to in‐
466 crease during an execve(2), this does not trigger the secure-ex‐
467 ecution mode described in ld.so(8).
468
469 A child created via fork(2) inherits copies of its parent's capability
470 sets. See below for a discussion of the treatment of capabilities dur‐
471 ing execve(2).
472
473 Using capset(2), a thread may manipulate its own capability sets (see
474 below).
475
476 Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the nu‐
477 merical value of the highest capability supported by the running ker‐
478 nel; this can be used to determine the highest bit that may be set in a
479 capability set.
480
481 File capabilities
482 Since kernel 2.6.24, the kernel supports associating capability sets
483 with an executable file using setcap(8). The file capability sets are
484 stored in an extended attribute (see setxattr(2) and xattr(7)) named
485 security.capability. Writing to this extended attribute requires the
486 CAP_SETFCAP capability. The file capability sets, in conjunction with
487 the capability sets of the thread, determine the capabilities of a
488 thread after an execve(2).
489
490 The three file capability sets are:
491
492 Permitted (formerly known as forced):
493 These capabilities are automatically permitted to the thread,
494 regardless of the thread's inheritable capabilities.
495
496 Inheritable (formerly known as allowed):
497 This set is ANDed with the thread's inheritable set to determine
498 which inheritable capabilities are enabled in the permitted set
499 of the thread after the execve(2).
500
501 Effective:
502 This is not a set, but rather just a single bit. If this bit is
503 set, then during an execve(2) all of the new permitted capabili‐
504 ties for the thread are also raised in the effective set. If
505 this bit is not set, then after an execve(2), none of the new
506 permitted capabilities is in the new effective set.
507
508 Enabling the file effective capability bit implies that any file
509 permitted or inheritable capability that causes a thread to ac‐
510 quire the corresponding permitted capability during an execve(2)
511 (see the transformation rules described below) will also acquire
512 that capability in its effective set. Therefore, when assigning
513 capabilities to a file (setcap(8), cap_set_file(3),
514 cap_set_fd(3)), if we specify the effective flag as being en‐
515 abled for any capability, then the effective flag must also be
516 specified as enabled for all other capabilities for which the
517 corresponding permitted or inheritable flags is enabled.
518
519 File capability extended attribute versioning
520 To allow extensibility, the kernel supports a scheme to encode a ver‐
521 sion number inside the security.capability extended attribute that is
522 used to implement file capabilities. These version numbers are inter‐
523 nal to the implementation, and not directly visible to user-space ap‐
524 plications. To date, the following versions are supported:
525
526 VFS_CAP_REVISION_1
527 This was the original file capability implementation, which sup‐
528 ported 32-bit masks for file capabilities.
529
530 VFS_CAP_REVISION_2 (since Linux 2.6.25)
531 This version allows for file capability masks that are 64 bits
532 in size, and was necessary as the number of supported capabili‐
533 ties grew beyond 32. The kernel transparently continues to sup‐
534 port the execution of files that have 32-bit version 1 capabil‐
535 ity masks, but when adding capabilities to files that did not
536 previously have capabilities, or modifying the capabilities of
537 existing files, it automatically uses the version 2 scheme (or
538 possibly the version 3 scheme, as described below).
539
540 VFS_CAP_REVISION_3 (since Linux 4.14)
541 Version 3 file capabilities are provided to support namespaced
542 file capabilities (described below).
543
544 As with version 2 file capabilities, version 3 capability masks
545 are 64 bits in size. But in addition, the root user ID of name‐
546 space is encoded in the security.capability extended attribute.
547 (A namespace's root user ID is the value that user ID 0 inside
548 that namespace maps to in the initial user namespace.)
549
550 Version 3 file capabilities are designed to coexist with version
551 2 capabilities; that is, on a modern Linux system, there may be
552 some files with version 2 capabilities while others have version
553 3 capabilities.
554
555 Before Linux 4.14, the only kind of file capability extended attribute
556 that could be attached to a file was a VFS_CAP_REVISION_2 attribute.
557 Since Linux 4.14, the version of the security.capability extended at‐
558 tribute that is attached to a file depends on the circumstances in
559 which the attribute was created.
560
561 Starting with Linux 4.14, a security.capability extended attribute is
562 automatically created as (or converted to) a version 3 (VFS_CAP_REVI‐
563 SION_3) attribute if both of the following are true:
564
565 (1) The thread writing the attribute resides in a noninitial user name‐
566 space. (More precisely: the thread resides in a user namespace
567 other than the one from which the underlying filesystem was
568 mounted.)
569
570 (2) The thread has the CAP_SETFCAP capability over the file inode,
571 meaning that (a) the thread has the CAP_SETFCAP capability in its
572 own user namespace; and (b) the UID and GID of the file inode have
573 mappings in the writer's user namespace.
574
575 When a VFS_CAP_REVISION_3 security.capability extended attribute is
576 created, the root user ID of the creating thread's user namespace is
577 saved in the extended attribute.
578
579 By contrast, creating or modifying a security.capability extended at‐
580 tribute from a privileged (CAP_SETFCAP) thread that resides in the
581 namespace where the underlying filesystem was mounted (this normally
582 means the initial user namespace) automatically results in the creation
583 of a version 2 (VFS_CAP_REVISION_2) attribute.
584
585 Note that the creation of a version 3 security.capability extended at‐
586 tribute is automatic. That is to say, when a user-space application
587 writes (setxattr(2)) a security.capability attribute in the version 2
588 format, the kernel will automatically create a version 3 attribute if
589 the attribute is created in the circumstances described above. Corre‐
590 spondingly, when a version 3 security.capability attribute is retrieved
591 (getxattr(2)) by a process that resides inside a user namespace that
592 was created by the root user ID (or a descendant of that user name‐
593 space), the returned attribute is (automatically) simplified to appear
594 as a version 2 attribute (i.e., the returned value is the size of a
595 version 2 attribute and does not include the root user ID). These au‐
596 tomatic translations mean that no changes are required to user-space
597 tools (e.g., setcap(1) and getcap(1)) in order for those tools to be
598 used to create and retrieve version 3 security.capability attributes.
599
600 Note that a file can have either a version 2 or a version 3 secu‐
601 rity.capability extended attribute associated with it, but not both:
602 creation or modification of the security.capability extended attribute
603 will automatically modify the version according to the circumstances in
604 which the extended attribute is created or modified.
605
606 Transformation of capabilities during execve()
607 During an execve(2), the kernel calculates the new capabilities of the
608 process using the following algorithm:
609
610 P'(ambient) = (file is privileged) ? 0 : P(ambient)
611
612 P'(permitted) = (P(inheritable) & F(inheritable)) |
613 (F(permitted) & P(bounding)) | P'(ambient)
614
615 P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
616
617 P'(inheritable) = P(inheritable) [i.e., unchanged]
618
619 P'(bounding) = P(bounding) [i.e., unchanged]
620
621 where:
622
623 P() denotes the value of a thread capability set before the ex‐
624 ecve(2)
625
626 P'() denotes the value of a thread capability set after the ex‐
627 ecve(2)
628
629 F() denotes a file capability set
630
631 Note the following details relating to the above capability transforma‐
632 tion rules:
633
634 * The ambient capability set is present only since Linux 4.3. When
635 determining the transformation of the ambient set during execve(2),
636 a privileged file is one that has capabilities or has the set-user-
637 ID or set-group-ID bit set.
638
639 * Prior to Linux 2.6.25, the bounding set was a system-wide attribute
640 shared by all threads. That system-wide value was employed to cal‐
641 culate the new permitted set during execve(2) in the same manner as
642 shown above for P(bounding).
643
644 Note: during the capability transitions described above, file capabili‐
645 ties may be ignored (treated as empty) for the same reasons that the
646 set-user-ID and set-group-ID bits are ignored; see execve(2). File ca‐
647 pabilities are similarly ignored if the kernel was booted with the
648 no_file_caps option.
649
650 Note: according to the rules above, if a process with nonzero user IDs
651 performs an execve(2) then any capabilities that are present in its
652 permitted and effective sets will be cleared. For the treatment of ca‐
653 pabilities when a process with a user ID of zero performs an execve(2),
654 see below under Capabilities and execution of programs by root.
655
656 Safety checking for capability-dumb binaries
657 A capability-dumb binary is an application that has been marked to have
658 file capabilities, but has not been converted to use the libcap(3) API
659 to manipulate its capabilities. (In other words, this is a traditional
660 set-user-ID-root program that has been switched to use file capabili‐
661 ties, but whose code has not been modified to understand capabilities.)
662 For such applications, the effective capability bit is set on the file,
663 so that the file permitted capabilities are automatically enabled in
664 the process effective set when executing the file. The kernel recog‐
665 nizes a file which has the effective capability bit set as capability-
666 dumb for the purpose of the check described here.
667
668 When executing a capability-dumb binary, the kernel checks if the
669 process obtained all permitted capabilities that were specified in the
670 file permitted set, after the capability transformations described
671 above have been performed. (The typical reason why this might not oc‐
672 cur is that the capability bounding set masked out some of the capabil‐
673 ities in the file permitted set.) If the process did not obtain the
674 full set of file permitted capabilities, then execve(2) fails with the
675 error EPERM. This prevents possible security risks that could arise
676 when a capability-dumb application is executed with less privilege that
677 it needs. Note that, by definition, the application could not itself
678 recognize this problem, since it does not employ the libcap(3) API.
679
680 Capabilities and execution of programs by root
681 In order to mirror traditional UNIX semantics, the kernel performs spe‐
682 cial treatment of file capabilities when a process with UID 0 (root)
683 executes a program and when a set-user-ID-root program is executed.
684
685 After having performed any changes to the process effective ID that
686 were triggered by the set-user-ID mode bit of the binary—e.g., switch‐
687 ing the effective user ID to 0 (root) because a set-user-ID-root pro‐
688 gram was executed—the kernel calculates the file capability sets as
689 follows:
690
691 1. If the real or effective user ID of the process is 0 (root), then
692 the file inheritable and permitted sets are ignored; instead they
693 are notionally considered to be all ones (i.e., all capabilities en‐
694 abled). (There is one exception to this behavior, described below
695 in Set-user-ID-root programs that have file capabilities.)
696
697 2. If the effective user ID of the process is 0 (root) or the file ef‐
698 fective bit is in fact enabled, then the file effective bit is no‐
699 tionally defined to be one (enabled).
700
701 These notional values for the file's capability sets are then used as
702 described above to calculate the transformation of the process's capa‐
703 bilities during execve(2).
704
705 Thus, when a process with nonzero UIDs execve(2)s a set-user-ID-root
706 program that does not have capabilities attached, or when a process
707 whose real and effective UIDs are zero execve(2)s a program, the calcu‐
708 lation of the process's new permitted capabilities simplifies to:
709
710 P'(permitted) = P(inheritable) | P(bounding)
711
712 P'(effective) = P'(permitted)
713
714 Consequently, the process gains all capabilities in its permitted and
715 effective capability sets, except those masked out by the capability
716 bounding set. (In the calculation of P'(permitted), the P'(ambient)
717 term can be simplified away because it is by definition a proper subset
718 of P(inheritable).)
719
720 The special treatments of user ID 0 (root) described in this subsection
721 can be disabled using the securebits mechanism described below.
722
723 Set-user-ID-root programs that have file capabilities
724 There is one exception to the behavior described under Capabilities and
725 execution of programs by root. If (a) the binary that is being exe‐
726 cuted has capabilities attached and (b) the real user ID of the process
727 is not 0 (root) and (c) the effective user ID of the process is 0
728 (root), then the file capability bits are honored (i.e., they are not
729 notionally considered to be all ones). The usual way in which this
730 situation can arise is when executing a set-UID-root program that also
731 has file capabilities. When such a program is executed, the process
732 gains just the capabilities granted by the program (i.e., not all capa‐
733 bilities, as would occur when executing a set-user-ID-root program that
734 does not have any associated file capabilities).
735
736 Note that one can assign empty capability sets to a program file, and
737 thus it is possible to create a set-user-ID-root program that changes
738 the effective and saved set-user-ID of the process that executes the
739 program to 0, but confers no capabilities to that process.
740
741 Capability bounding set
742 The capability bounding set is a security mechanism that can be used to
743 limit the capabilities that can be gained during an execve(2). The
744 bounding set is used in the following ways:
745
746 * During an execve(2), the capability bounding set is ANDed with the
747 file permitted capability set, and the result of this operation is
748 assigned to the thread's permitted capability set. The capability
749 bounding set thus places a limit on the permitted capabilities that
750 may be granted by an executable file.
751
752 * (Since Linux 2.6.25) The capability bounding set acts as a limiting
753 superset for the capabilities that a thread can add to its inherita‐
754 ble set using capset(2). This means that if a capability is not in
755 the bounding set, then a thread can't add this capability to its in‐
756 heritable set, even if it was in its permitted capabilities, and
757 thereby cannot have this capability preserved in its permitted set
758 when it execve(2)s a file that has the capability in its inheritable
759 set.
760
761 Note that the bounding set masks the file permitted capabilities, but
762 not the inheritable capabilities. If a thread maintains a capability
763 in its inheritable set that is not in its bounding set, then it can
764 still gain that capability in its permitted set by executing a file
765 that has the capability in its inheritable set.
766
767 Depending on the kernel version, the capability bounding set is either
768 a system-wide attribute, or a per-process attribute.
769
770 Capability bounding set from Linux 2.6.25 onward
771
772 From Linux 2.6.25, the capability bounding set is a per-thread attri‐
773 bute. (The system-wide capability bounding set described below no
774 longer exists.)
775
776 The bounding set is inherited at fork(2) from the thread's parent, and
777 is preserved across an execve(2).
778
779 A thread may remove capabilities from its capability bounding set using
780 the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
781 capability. Once a capability has been dropped from the bounding set,
782 it cannot be restored to that set. A thread can determine if a capa‐
783 bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
784 tion.
785
786 Removing capabilities from the bounding set is supported only if file
787 capabilities are compiled into the kernel. In kernels before Linux
788 2.6.33, file capabilities were an optional feature configurable via the
789 CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the con‐
790 figuration option has been removed and file capabilities are always
791 part of the kernel. When file capabilities are compiled into the ker‐
792 nel, the init process (the ancestor of all processes) begins with a
793 full bounding set. If file capabilities are not compiled into the ker‐
794 nel, then init begins with a full bounding set minus CAP_SETPCAP, be‐
795 cause this capability has a different meaning when there are no file
796 capabilities.
797
798 Removing a capability from the bounding set does not remove it from the
799 thread's inheritable set. However it does prevent the capability from
800 being added back into the thread's inheritable set in the future.
801
802 Capability bounding set prior to Linux 2.6.25
803
804 In kernels before 2.6.25, the capability bounding set is a system-wide
805 attribute that affects all threads on the system. The bounding set is
806 accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this
807 bit mask parameter is expressed as a signed decimal number in
808 /proc/sys/kernel/cap-bound.)
809
810 Only the init process may set capabilities in the capability bounding
811 set; other than that, the superuser (more precisely: a process with the
812 CAP_SYS_MODULE capability) may only clear capabilities from this set.
813
814 On a standard system the capability bounding set always masks out the
815 CAP_SETPCAP capability. To remove this restriction (dangerous!), mod‐
816 ify the definition of CAP_INIT_EFF_SET in include/linux/capability.h
817 and rebuild the kernel.
818
819 The system-wide capability bounding set feature was added to Linux
820 starting with kernel version 2.2.11.
821
822 Effect of user ID changes on capabilities
823 To preserve the traditional semantics for transitions between 0 and
824 nonzero user IDs, the kernel makes the following changes to a thread's
825 capability sets on changes to the thread's real, effective, saved set,
826 and filesystem user IDs (using setuid(2), setresuid(2), or similar):
827
828 1. If one or more of the real, effective or saved set user IDs was pre‐
829 viously 0, and as a result of the UID changes all of these IDs have
830 a nonzero value, then all capabilities are cleared from the permit‐
831 ted, effective, and ambient capability sets.
832
833 2. If the effective user ID is changed from 0 to nonzero, then all ca‐
834 pabilities are cleared from the effective set.
835
836 3. If the effective user ID is changed from nonzero to 0, then the per‐
837 mitted set is copied to the effective set.
838
839 4. If the filesystem user ID is changed from 0 to nonzero (see setf‐
840 suid(2)), then the following capabilities are cleared from the ef‐
841 fective set: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
842 CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE (since Linux 2.6.30),
843 CAP_MAC_OVERRIDE, and CAP_MKNOD (since Linux 2.6.30). If the
844 filesystem UID is changed from nonzero to 0, then any of these capa‐
845 bilities that are enabled in the permitted set are enabled in the
846 effective set.
847
848 If a thread that has a 0 value for one or more of its user IDs wants to
849 prevent its permitted capability set being cleared when it resets all
850 of its user IDs to nonzero values, it can do so using the
851 SECBIT_KEEP_CAPS securebits flag described below.
852
853 Programmatically adjusting capability sets
854 A thread can retrieve and change its permitted, effective, and inheri‐
855 table capability sets using the capget(2) and capset(2) system calls.
856 However, the use of cap_get_proc(3) and cap_set_proc(3), both provided
857 in the libcap package, is preferred for this purpose. The following
858 rules govern changes to the thread capability sets:
859
860 1. If the caller does not have the CAP_SETPCAP capability, the new in‐
861 heritable set must be a subset of the combination of the existing
862 inheritable and permitted sets.
863
864 2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
865 combination of the existing inheritable set and the capability
866 bounding set.
867
868 3. The new permitted set must be a subset of the existing permitted set
869 (i.e., it is not possible to acquire permitted capabilities that the
870 thread does not currently have).
871
872 4. The new effective set must be a subset of the new permitted set.
873
874 The securebits flags: establishing a capabilities-only environment
875 Starting with kernel 2.6.26, and with a kernel in which file capabili‐
876 ties are enabled, Linux implements a set of per-thread securebits flags
877 that can be used to disable special handling of capabilities for UID 0
878 (root). These flags are as follows:
879
880 SECBIT_KEEP_CAPS
881 Setting this flag allows a thread that has one or more 0 UIDs to
882 retain capabilities in its permitted set when it switches all of
883 its UIDs to nonzero values. If this flag is not set, then such
884 a UID switch causes the thread to lose all permitted capabili‐
885 ties. This flag is always cleared on an execve(2).
886
887 Note that even with the SECBIT_KEEP_CAPS flag set, the effective
888 capabilities of a thread are cleared when it switches its effec‐
889 tive UID to a nonzero value. However, if the thread has set
890 this flag and its effective UID is already nonzero, and the
891 thread subsequently switches all other UIDs to nonzero values,
892 then the effective capabilities will not be cleared.
893
894 The setting of the SECBIT_KEEP_CAPS flag is ignored if the
895 SECBIT_NO_SETUID_FIXUP flag is set. (The latter flag provides a
896 superset of the effect of the former flag.)
897
898 This flag provides the same functionality as the older prctl(2)
899 PR_SET_KEEPCAPS operation.
900
901 SECBIT_NO_SETUID_FIXUP
902 Setting this flag stops the kernel from adjusting the process's
903 permitted, effective, and ambient capability sets when the
904 thread's effective and filesystem UIDs are switched between zero
905 and nonzero values. (See the subsection Effect of user ID
906 changes on capabilities.)
907
908 SECBIT_NOROOT
909 If this bit is set, then the kernel does not grant capabilities
910 when a set-user-ID-root program is executed, or when a process
911 with an effective or real UID of 0 calls execve(2). (See the
912 subsection Capabilities and execution of programs by root.)
913
914 SECBIT_NO_CAP_AMBIENT_RAISE
915 Setting this flag disallows raising ambient capabilities via the
916 prctl(2) PR_CAP_AMBIENT_RAISE operation.
917
918 Each of the above "base" flags has a companion "locked" flag. Setting
919 any of the "locked" flags is irreversible, and has the effect of pre‐
920 venting further changes to the corresponding "base" flag. The locked
921 flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED,
922 SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
923
924 The securebits flags can be modified and retrieved using the prctl(2)
925 PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP
926 capability is required to modify the flags. Note that the SECBIT_*
927 constants are available only after including the <linux/securebits.h>
928 header file.
929
930 The securebits flags are inherited by child processes. During an ex‐
931 ecve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS which
932 is always cleared.
933
934 An application can use the following call to lock itself, and all of
935 its descendants, into an environment where the only way of gaining ca‐
936 pabilities is by executing a program with associated file capabilities:
937
938 prctl(PR_SET_SECUREBITS,
939 /* SECBIT_KEEP_CAPS off */
940 SECBIT_KEEP_CAPS_LOCKED |
941 SECBIT_NO_SETUID_FIXUP |
942 SECBIT_NO_SETUID_FIXUP_LOCKED |
943 SECBIT_NOROOT |
944 SECBIT_NOROOT_LOCKED);
945 /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
946 is not required */
947
948 Per-user-namespace "set-user-ID-root" programs
949 A set-user-ID program whose UID matches the UID that created a user
950 namespace will confer capabilities in the process's permitted and ef‐
951 fective sets when executed by any process inside that namespace or any
952 descendant user namespace.
953
954 The rules about the transformation of the process's capabilities during
955 the execve(2) are exactly as described in the subsections Transforma‐
956 tion of capabilities during execve() and Capabilities and execution of
957 programs by root, with the difference that, in the latter subsection,
958 "root" is the UID of the creator of the user namespace.
959
960 Namespaced file capabilities
961 Traditional (i.e., version 2) file capabilities associate only a set of
962 capability masks with a binary executable file. When a process exe‐
963 cutes a binary with such capabilities, it gains the associated capabil‐
964 ities (within its user namespace) as per the rules described above in
965 "Transformation of capabilities during execve()".
966
967 Because version 2 file capabilities confer capabilities to the execut‐
968 ing process regardless of which user namespace it resides in, only
969 privileged processes are permitted to associate capabilities with a
970 file. Here, "privileged" means a process that has the CAP_SETFCAP ca‐
971 pability in the user namespace where the filesystem was mounted (nor‐
972 mally the initial user namespace). This limitation renders file capa‐
973 bilities useless for certain use cases. For example, in user-names‐
974 paced containers, it can be desirable to be able to create a binary
975 that confers capabilities only to processes executed inside that con‐
976 tainer, but not to processes that are executed outside the container.
977
978 Linux 4.14 added so-called namespaced file capabilities to support such
979 use cases. Namespaced file capabilities are recorded as version 3
980 (i.e., VFS_CAP_REVISION_3) security.capability extended attributes.
981 Such an attribute is automatically created in the circumstances de‐
982 scribed above under "File capability extended attribute versioning".
983 When a version 3 security.capability extended attribute is created, the
984 kernel records not just the capability masks in the extended attribute,
985 but also the namespace root user ID.
986
987 As with a binary that has VFS_CAP_REVISION_2 file capabilities, a bi‐
988 nary with VFS_CAP_REVISION_3 file capabilities confers capabilities to
989 a process during execve(). However, capabilities are conferred only if
990 the binary is executed by a process that resides in a user namespace
991 whose UID 0 maps to the root user ID that is saved in the extended at‐
992 tribute, or when executed by a process that resides in a descendant of
993 such a namespace.
994
995 Interaction with user namespaces
996 For further information on the interaction of capabilities and user
997 namespaces, see user_namespaces(7).
998
1000 No standards govern capabilities, but the Linux capability implementa‐
1001 tion is based on the withdrawn POSIX.1e draft standard; see
1002 ⟨https://archive.org/details/posix_1003.1e-990310⟩.
1003
1005 When attempting to strace(1) binaries that have capabilities (or set-
1006 user-ID-root binaries), you may find the -u <username> option useful.
1007 Something like:
1008
1009 $ sudo strace -o trace.log -u ceci ./myprivprog
1010
1011 From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
1012 nel component, and could be enabled/disabled via the CONFIG_SECU‐
1013 RITY_CAPABILITIES kernel configuration option.
1014
1015 The /proc/[pid]/task/TID/status file can be used to view the capability
1016 sets of a thread. The /proc/[pid]/status file shows the capability
1017 sets of a process's main thread. Before Linux 3.8, nonexistent capa‐
1018 bilities were shown as being enabled (1) in these sets. Since Linux
1019 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as
1020 disabled (0).
1021
1022 The libcap package provides a suite of routines for setting and getting
1023 capabilities that is more comfortable and less likely to change than
1024 the interface provided by capset(2) and capget(2). This package also
1025 provides the setcap(8) and getcap(8) programs. It can be found at
1026 ⟨https://git.kernel.org/pub/scm/libs/libcap/libcap.git/refs/⟩.
1027
1028 Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file
1029 capabilities are not enabled, a thread with the CAP_SETPCAP capability
1030 can manipulate the capabilities of threads other than itself. However,
1031 this is only theoretically possible, since no thread ever has CAP_SETP‐
1032 CAP in either of these cases:
1033
1034 * In the pre-2.6.25 implementation the system-wide capability bounding
1035 set, /proc/sys/kernel/cap-bound, always masks out the CAP_SETPCAP ca‐
1036 pability, and this can not be changed without modifying the kernel
1037 source and rebuilding the kernel.
1038
1039 * If file capabilities are disabled (i.e., the kernel CONFIG_SECU‐
1040 RITY_FILE_CAPABILITIES option is disabled), then init starts out with
1041 the CAP_SETPCAP capability removed from its per-process bounding set,
1042 and that bounding set is inherited by all other processes created on
1043 the system.
1044
1046 capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3),
1047 cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3),
1048 cap_init(3), capgetp(3), capsetp(3), libcap(3), proc(5), creden‐
1049 tials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), get‐
1050 cap(8), getpcaps(8), netcap(8), pscap(8), setcap(8)
1051
1052 include/linux/capability.h in the Linux kernel source tree
1053
1055 This page is part of release 5.10 of the Linux man-pages project. A
1056 description of the project, information about reporting bugs, and the
1057 latest version of this page, can be found at
1058 https://www.kernel.org/doc/man-pages/.
1059
1060
1061
1062Linux 2020-08-13 CAPABILITIES(7)