1CAPABILITIES(7) Linux Programmer's Manual CAPABILITIES(7)
2
3
4
6 capabilities - overview of Linux capabilities
7
9 For the purpose of performing permission checks, traditional UNIX
10 implementations distinguish two categories of processes: privileged
11 processes (whose effective user ID is 0, referred to as superuser or
12 root), and unprivileged processes (whose effective UID is nonzero).
13 Privileged processes bypass all kernel permission checks, while unpriv‐
14 ileged processes are subject to full permission checking based on the
15 process's credentials (usually: effective UID, effective GID, and sup‐
16 plementary group list).
17
18 Starting with kernel 2.2, Linux divides the privileges traditionally
19 associated with superuser into distinct units, known as capabilities,
20 which can be independently enabled and disabled. Capabilities are a
21 per-thread attribute.
22
23 Capabilities list
24 The following list shows the capabilities implemented on Linux, and the
25 operations or behaviors that each capability permits:
26
27 CAP_AUDIT_CONTROL (since Linux 2.6.11)
28 Enable and disable kernel auditing; change auditing filter
29 rules; retrieve auditing status and filtering rules.
30
31 CAP_AUDIT_READ (since Linux 3.16)
32 Allow reading the audit log via a multicast netlink socket.
33
34 CAP_AUDIT_WRITE (since Linux 2.6.11)
35 Write records to kernel auditing log.
36
37 CAP_BLOCK_SUSPEND (since Linux 3.5)
38 Employ features that can block system suspend (epoll(7) EPOLL‐
39 WAKEUP, /proc/sys/wake_lock).
40
41 CAP_CHOWN
42 Make arbitrary changes to file UIDs and GIDs (see chown(2)).
43
44 CAP_DAC_OVERRIDE
45 Bypass file read, write, and execute permission checks. (DAC is
46 an abbreviation of "discretionary access control".)
47
48 CAP_DAC_READ_SEARCH
49 * Bypass file read permission checks and directory read and exe‐
50 cute permission checks;
51 * invoke open_by_handle_at(2);
52 * use the linkat(2) AT_EMPTY_PATH flag to create a link to a
53 file referred to by a file descriptor.
54
55 CAP_FOWNER
56 * Bypass permission checks on operations that normally require
57 the filesystem UID of the process to match the UID of the file
58 (e.g., chmod(2), utime(2)), excluding those operations covered
59 by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
60 * set inode flags (see ioctl_iflags(2)) on arbitrary files;
61 * set Access Control Lists (ACLs) on arbitrary files;
62 * ignore directory sticky bit on file deletion;
63 * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
64
65 CAP_FSETID
66 * Don't clear set-user-ID and set-group-ID mode bits when a file
67 is modified;
68 * set the set-group-ID bit for a file whose GID does not match
69 the filesystem or any of the supplementary GIDs of the calling
70 process.
71
72 CAP_IPC_LOCK
73 Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
74
75 CAP_IPC_OWNER
76 Bypass permission checks for operations on System V IPC objects.
77
78 CAP_KILL
79 Bypass permission checks for sending signals (see kill(2)).
80 This includes use of the ioctl(2) KDSIGACCEPT operation.
81
82 CAP_LEASE (since Linux 2.4)
83 Establish leases on arbitrary files (see fcntl(2)).
84
85 CAP_LINUX_IMMUTABLE
86 Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags (see
87 ioctl_iflags(2)).
88
89 CAP_MAC_ADMIN (since Linux 2.6.25)
90 Allow MAC configuration or state changes. Implemented for the
91 Smack Linux Security Module (LSM).
92
93 CAP_MAC_OVERRIDE (since Linux 2.6.25)
94 Override Mandatory Access Control (MAC). Implemented for the
95 Smack LSM.
96
97 CAP_MKNOD (since Linux 2.4)
98 Create special files using mknod(2).
99
100 CAP_NET_ADMIN
101 Perform various network-related operations:
102 * interface configuration;
103 * administration of IP firewall, masquerading, and accounting;
104 * modify routing tables;
105 * bind to any address for transparent proxying;
106 * set type-of-service (TOS)
107 * clear driver statistics;
108 * set promiscuous mode;
109 * enabling multicasting;
110 * use setsockopt(2) to set the following socket options:
111 SO_DEBUG, SO_MARK, SO_PRIORITY (for a priority outside the
112 range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
113
114 CAP_NET_BIND_SERVICE
115 Bind a socket to Internet domain privileged ports (port numbers
116 less than 1024).
117
118 CAP_NET_BROADCAST
119 (Unused) Make socket broadcasts, and listen to multicasts.
120
121 CAP_NET_RAW
122 * Use RAW and PACKET sockets;
123 * bind to any address for transparent proxying.
124
125 CAP_SETGID
126 * Make arbitrary manipulations of process GIDs and supplementary
127 GID list;
128 * forge GID when passing socket credentials via UNIX domain
129 sockets;
130 * write a group ID mapping in a user namespace (see user_names‐
131 paces(7)).
132
133 CAP_SETFCAP (since Linux 2.6.24)
134 Set arbitrary capabilities on a file.
135
136 CAP_SETPCAP
137 If file capabilities are supported (i.e., since Linux 2.6.24):
138 add any capability from the calling thread's bounding set to its
139 inheritable set; drop capabilities from the bounding set (via
140 prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
141
142 If file capabilities are not supported (i.e., kernels before
143 Linux 2.6.24): grant or remove any capability in the caller's
144 permitted capability set to or from any other process. (This
145 property of CAP_SETPCAP is not available when the kernel is con‐
146 figured to support file capabilities, since CAP_SETPCAP has
147 entirely different semantics for such kernels.)
148
149 CAP_SETUID
150 * Make arbitrary manipulations of process UIDs (setuid(2),
151 setreuid(2), setresuid(2), setfsuid(2));
152 * forge UID when passing socket credentials via UNIX domain
153 sockets;
154 * write a user ID mapping in a user namespace (see user_names‐
155 paces(7)).
156
157 CAP_SYS_ADMIN
158 Note: this capability is overloaded; see Notes to kernel devel‐
159 opers, below.
160
161 * Perform a range of system administration operations including:
162 quotactl(2), mount(2), umount(2), swapon(2), swapoff(2),
163 sethostname(2), and setdomainname(2);
164 * perform privileged syslog(2) operations (since Linux 2.6.37,
165 CAP_SYSLOG should be used to permit such operations);
166 * perform VM86_REQUEST_IRQ vm86(2) command;
167 * perform IPC_SET and IPC_RMID operations on arbitrary System V
168 IPC objects;
169 * override RLIMIT_NPROC resource limit;
170 * perform operations on trusted and security Extended Attributes
171 (see xattr(7));
172 * use lookup_dcookie(2);
173 * use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
174 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
175 * forge PID when passing socket credentials via UNIX domain
176 sockets;
177 * exceed /proc/sys/fs/file-max, the system-wide limit on the
178 number of open files, in system calls that open files (e.g.,
179 accept(2), execve(2), open(2), pipe(2));
180 * employ CLONE_* flags that create new namespaces with clone(2)
181 and unshare(2) (but, since Linux 3.8, creating user namespaces
182 does not require any capability);
183 * call perf_event_open(2);
184 * access privileged perf event information;
185 * call setns(2) (requires CAP_SYS_ADMIN in the target names‐
186 pace);
187 * call fanotify_init(2);
188 * call bpf(2);
189 * perform privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
190 operations;
191 * perform madvise(2) MADV_HWPOISON operation;
192 * employ the TIOCSTI ioctl(2) to insert characters into the
193 input queue of a terminal other than the caller's controlling
194 terminal;
195 * employ the obsolete nfsservctl(2) system call;
196 * employ the obsolete bdflush(2) system call;
197 * perform various privileged block-device ioctl(2) operations;
198 * perform various privileged filesystem ioctl(2) operations;
199 * perform privileged ioctl(2) operations on the /dev/random
200 device (see random(4));
201 * install a seccomp(2) filter without first having to set the
202 no_new_privs thread attribute;
203 * modify allow/deny rules for device control groups;
204 * employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER operation to
205 dump tracee's seccomp filters;
206 * employ the ptrace(2) PTRACE_SETOPTIONS operation to suspend
207 the tracee's seccomp protections (i.e., the PTRACE_O_SUS‐
208 PEND_SECCOMP flag);
209 * perform administrative operations on many device drivers.
210
211 CAP_SYS_BOOT
212 Use reboot(2) and kexec_load(2).
213
214 CAP_SYS_CHROOT
215 Use chroot(2).
216
217 CAP_SYS_MODULE
218 * Load and unload kernel modules (see init_module(2) and
219 delete_module(2));
220 * in kernels before 2.6.25: drop capabilities from the system-
221 wide capability bounding set.
222
223 CAP_SYS_NICE
224 * Raise process nice value (nice(2), setpriority(2)) and change
225 the nice value for arbitrary processes;
226 * set real-time scheduling policies for calling process, and set
227 scheduling policies and priorities for arbitrary processes
228 (sched_setscheduler(2), sched_setparam(2), shed_setattr(2));
229 * set CPU affinity for arbitrary processes (sched_setaffin‐
230 ity(2));
231 * set I/O scheduling class and priority for arbitrary processes
232 (ioprio_set(2));
233 * apply migrate_pages(2) to arbitrary processes and allow pro‐
234 cesses to be migrated to arbitrary nodes;
235 * apply move_pages(2) to arbitrary processes;
236 * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
237
238 CAP_SYS_PACCT
239 Use acct(2).
240
241 CAP_SYS_PTRACE
242 * Trace arbitrary processes using ptrace(2);
243 * apply get_robust_list(2) to arbitrary processes;
244 * transfer data to or from the memory of arbitrary processes
245 using process_vm_readv(2) and process_vm_writev(2);
246 * inspect processes using kcmp(2).
247
248 CAP_SYS_RAWIO
249 * Perform I/O port operations (iopl(2) and ioperm(2));
250 * access /proc/kcore;
251 * employ the FIBMAP ioctl(2) operation;
252 * open devices for accessing x86 model-specific registers (MSRs,
253 see msr(4));
254 * update /proc/sys/vm/mmap_min_addr;
255 * create memory mappings at addresses below the value specified
256 by /proc/sys/vm/mmap_min_addr;
257 * map files in /proc/bus/pci;
258 * open /dev/mem and /dev/kmem;
259 * perform various SCSI device commands;
260 * perform certain operations on hpsa(4) and cciss(4) devices;
261 * perform a range of device-specific operations on other
262 devices.
263
264 CAP_SYS_RESOURCE
265 * Use reserved space on ext2 filesystems;
266 * make ioctl(2) calls controlling ext3 journaling;
267 * override disk quota limits;
268 * increase resource limits (see setrlimit(2));
269 * override RLIMIT_NPROC resource limit;
270 * override maximum number of consoles on console allocation;
271 * override maximum number of keymaps;
272 * allow more than 64hz interrupts from the real-time clock;
273 * raise msg_qbytes limit for a System V message queue above the
274 limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
275 * allow the RLIMIT_NOFILE resource limit on the number of "in-
276 flight" file descriptors to be bypassed when passing file
277 descriptors to another process via a UNIX domain socket (see
278 unix(7));
279 * override the /proc/sys/fs/pipe-size-max limit when setting the
280 capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command.
281 * use F_SETPIPE_SZ to increase the capacity of a pipe above the
282 limit specified by /proc/sys/fs/pipe-max-size;
283 * override /proc/sys/fs/mqueue/queues_max limit when creating
284 POSIX message queues (see mq_overview(7));
285 * employ the prctl(2) PR_SET_MM operation;
286 * set /proc/[pid]/oom_score_adj to a value lower than the value
287 last set by a process with CAP_SYS_RESOURCE.
288
289 CAP_SYS_TIME
290 Set system clock (settimeofday(2), stime(2), adjtimex(2)); set
291 real-time (hardware) clock.
292
293 CAP_SYS_TTY_CONFIG
294 Use vhangup(2); employ various privileged ioctl(2) operations on
295 virtual terminals.
296
297 CAP_SYSLOG (since Linux 2.6.37)
298 * Perform privileged syslog(2) operations. See syslog(2) for
299 information on which operations require privilege.
300 * View kernel addresses exposed via /proc and other interfaces
301 when /proc/sys/kernel/kptr_restrict has the value 1. (See the
302 discussion of the kptr_restrict in proc(5).)
303
304 CAP_WAKE_ALARM (since Linux 3.0)
305 Trigger something that will wake up the system (set CLOCK_REAL‐
306 TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
307
308 Past and current implementation
309 A full implementation of capabilities requires that:
310
311 1. For all privileged operations, the kernel must check whether the
312 thread has the required capability in its effective set.
313
314 2. The kernel must provide system calls allowing a thread's capability
315 sets to be changed and retrieved.
316
317 3. The filesystem must support attaching capabilities to an executable
318 file, so that a process gains those capabilities when the file is
319 executed.
320
321 Before kernel 2.6.24, only the first two of these requirements are met;
322 since kernel 2.6.24, all three requirements are met.
323
324 Notes to kernel developers
325 When adding a new kernel feature that should be governed by a capabil‐
326 ity, consider the following points.
327
328 * The goal of capabilities is divide the power of superuser into
329 pieces, such that if a program that has one or more capabilities is
330 compromised, its power to do damage to the system would be less than
331 the same program running with root privilege.
332
333 * You have the choice of either creating a new capability for your new
334 feature, or associating the feature with one of the existing capa‐
335 bilities. In order to keep the set of capabilities to a manageable
336 size, the latter option is preferable, unless there are compelling
337 reasons to take the former option. (There is also a technical
338 limit: the size of capability sets is currently limited to 64 bits.)
339
340 * To determine which existing capability might best be associated with
341 your new feature, review the list of capabilities above in order to
342 find a "silo" into which your new feature best fits. One approach
343 to take is to determine if there are other features requiring capa‐
344 bilities that will always be use along with the new feature. If the
345 new feature is useless without these other features, you should use
346 the same capability as the other features.
347
348 * Don't choose CAP_SYS_ADMIN if you can possibly avoid it! A vast
349 proportion of existing capability checks are associated with this
350 capability (see the partial list above). It can plausibly be called
351 "the new root", since on the one hand, it confers a wide range of
352 powers, and on the other hand, its broad scope means that this is
353 the capability that is required by many privileged programs. Don't
354 make the problem worse. The only new features that should be asso‐
355 ciated with CAP_SYS_ADMIN are ones that closely match existing uses
356 in that silo.
357
358 * If you have determined that it really is necessary to create a new
359 capability for your feature, don't make or name it as a "single-use"
360 capability. Thus, for example, the addition of the highly specific
361 CAP_SYS_PACCT was probably a mistake. Instead, try to identify and
362 name your new capability as a broader silo into which other related
363 future use cases might fit.
364
365 Thread capability sets
366 Each thread has three capability sets containing zero or more of the
367 above capabilities:
368
369 Permitted:
370 This is a limiting superset for the effective capabilities that
371 the thread may assume. It is also a limiting superset for the
372 capabilities that may be added to the inheritable set by a
373 thread that does not have the CAP_SETPCAP capability in its
374 effective set.
375
376 If a thread drops a capability from its permitted set, it can
377 never reacquire that capability (unless it execve(2)s either a
378 set-user-ID-root program, or a program whose associated file
379 capabilities grant that capability).
380
381 Inheritable:
382 This is a set of capabilities preserved across an execve(2).
383 Inheritable capabilities remain inheritable when executing any
384 program, and inheritable capabilities are added to the permitted
385 set when executing a program that has the corresponding bits set
386 in the file inheritable set.
387
388 Because inheritable capabilities are not generally preserved
389 across execve(2) when running as a non-root user, applications
390 that wish to run helper programs with elevated capabilities
391 should consider using ambient capabilities, described below.
392
393 Effective:
394 This is the set of capabilities used by the kernel to perform
395 permission checks for the thread.
396
397 Ambient (since Linux 4.3):
398 This is a set of capabilities that are preserved across an
399 execve(2) of a program that is not privileged. The ambient
400 capability set obeys the invariant that no capability can ever
401 be ambient if it is not both permitted and inheritable.
402
403 The ambient capability set can be directly modified using
404 prctl(2). Ambient capabilities are automatically lowered if
405 either of the corresponding permitted or inheritable capabili‐
406 ties is lowered.
407
408 Executing a program that changes UID or GID due to the set-user-
409 ID or set-group-ID bits or executing a program that has any file
410 capabilities set will clear the ambient set. Ambient capabili‐
411 ties are added to the permitted set and assigned to the effec‐
412 tive set when execve(2) is called.
413
414 A child created via fork(2) inherits copies of its parent's capability
415 sets. See below for a discussion of the treatment of capabilities dur‐
416 ing execve(2).
417
418 Using capset(2), a thread may manipulate its own capability sets (see
419 below).
420
421 Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the
422 numerical value of the highest capability supported by the running ker‐
423 nel; this can be used to determine the highest bit that may be set in a
424 capability set.
425
426 File capabilities
427 Since kernel 2.6.24, the kernel supports associating capability sets
428 with an executable file using setcap(8). The file capability sets are
429 stored in an extended attribute (see setxattr(2) and xattr(7)) named
430 security.capability. Writing to this extended attribute requires the
431 CAP_SETFCAP capability. The file capability sets, in conjunction with
432 the capability sets of the thread, determine the capabilities of a
433 thread after an execve(2).
434
435 The three file capability sets are:
436
437 Permitted (formerly known as forced):
438 These capabilities are automatically permitted to the thread,
439 regardless of the thread's inheritable capabilities.
440
441 Inheritable (formerly known as allowed):
442 This set is ANDed with the thread's inheritable set to determine
443 which inheritable capabilities are enabled in the permitted set
444 of the thread after the execve(2).
445
446 Effective:
447 This is not a set, but rather just a single bit. If this bit is
448 set, then during an execve(2) all of the new permitted capabili‐
449 ties for the thread are also raised in the effective set. If
450 this bit is not set, then after an execve(2), none of the new
451 permitted capabilities is in the new effective set.
452
453 Enabling the file effective capability bit implies that any file
454 permitted or inheritable capability that causes a thread to
455 acquire the corresponding permitted capability during an
456 execve(2) (see the transformation rules described below) will
457 also acquire that capability in its effective set. Therefore,
458 when assigning capabilities to a file (setcap(8),
459 cap_set_file(3), cap_set_fd(3)), if we specify the effective
460 flag as being enabled for any capability, then the effective
461 flag must also be specified as enabled for all other capabili‐
462 ties for which the corresponding permitted or inheritable flags
463 is enabled.
464
465 File capability mask versioning
466 To allow extensibility, the kernel supports a scheme to encode a ver‐
467 sion number inside the security.capability extended attribute that is
468 used to implement file capabilities. These version numbers are inter‐
469 nal to the implementation, and not directly visible to user-space
470 applications. To date, the following versions are supported:
471
472 VFS_CAP_REVISION_1
473 This was the original file capability implementation, which sup‐
474 ported 32-bit masks for file capabilities.
475
476 VFS_CAP_REVISION_2 (since Linux 2.6.25)
477 This version allows for file capability masks that are 64 bits
478 in size, and was necessary as the number of supported capabili‐
479 ties grew beyond 32. The kernel transparently continues to sup‐
480 port the execution of files that have 32-bit version 1 capabil‐
481 ity masks, but when adding capabilities to files that did not
482 previously have capabilities, or modifying the capabilities of
483 existing files, it automatically uses the version 2 scheme (or
484 possibly the version 3 scheme, as described below).
485
486 VFS_CAP_REVISION_3 (since Linux 4.14)
487 Version 3 file capabilities are provided to support namespaced
488 file capabilities (described below).
489
490 As with version 2 file capabilities, version 3 capability masks
491 are 64 bits in size. But in addition, the root user ID of
492 namespace is encoded in the security.capability extended
493 attribute. (A namespace's root user ID is the value that user
494 ID 0 inside that namespace maps to in the initial user names‐
495 pace.)
496
497 Version 3 file capabilities are designed to coexist with version
498 2 capabilities; that is, on a modern Linux system, there may be
499 some files with version 2 capabilities while others have version
500 3 capabilities.
501
502 Before Linux 4.14, the only kind of capability mask that could be
503 attached to a file was a VFS_CAP_REVISION_2 mask. Since Linux 4.14,
504 the version of the capability mask that is attached to a file depends
505 on the circumstances in which the security.capability extended
506 attribute was created.
507
508 Starting with Linux 4.14, a security.capability extended attribute is
509 automatically created as (or converted to) a version 3 (VFS_CAP_REVI‐
510 SION_3) attribute if both of the following are true:
511
512 (1) The thread writing the attribute resides in a noninitial namespace.
513 (More precisely: the thread resides in a user namespace other than
514 the one from which the underlying filesystem was mounted.)
515
516 (2) The thread has the CAP_SETFCAP capability over the file inode,
517 meaning that (a) the thread has the CAP_SETFCAP capability in its
518 own user namespace; and (b) the UID and GID of the file inode have
519 mappings in the writer's user namespace.
520
521 When a VFS_CAP_REVISION_3 security.capability extended attribute is
522 created, the root user ID of the creating thread's user namespace is
523 saved in the extended attribute.
524
525 By contrast, creating a security.capability extended attribute from a
526 privileged (CAP_SETFCAP) thread that resides in the namespace where the
527 underlying filesystem was mounted (this normally means the initial user
528 namespace) automatically results in a version 2 (VFS_CAP_REVISION_2)
529 attribute.
530
531 Note that a file can have either a version 2 or a version 3 secu‐
532 rity.capability extended attribute associated with it, but not both:
533 creation or modification of the security.capability extended attribute
534 will automatically modify the version according to the circumstances in
535 which the extended attribute is created or modified.
536
537 Transformation of capabilities during execve()
538 During an execve(2), the kernel calculates the new capabilities of the
539 process using the following algorithm:
540
541 P'(ambient) = (file is privileged) ? 0 : P(ambient)
542
543 P'(permitted) = (P(inheritable) & F(inheritable)) |
544 (F(permitted) & cap_bset) | P'(ambient)
545
546 P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
547
548 P'(inheritable) = P(inheritable) [i.e., unchanged]
549
550 where:
551
552 P denotes the value of a thread capability set before the
553 execve(2)
554
555 P' denotes the value of a thread capability set after the
556 execve(2)
557
558 F denotes a file capability set
559
560 cap_bset is the value of the capability bounding set (described
561 below).
562
563 A privileged file is one that has capabilities or has the set-user-ID
564 or set-group-ID bit set.
565
566 Note: the capability transitions described above may not be performed
567 (i.e., file capabilities may be ignored) for the same reasons that the
568 set-user-ID and set-group-ID bits are ignored; see execve(2).
569
570 Note: according to the rules above, if a process with nonzero user IDs
571 performs an execve(2) then any capabilities that are present in its
572 permitted and effective sets will be cleared. For the treatment of
573 capabilities when a process with a user ID of zero performs an
574 execve(2), see below under Capabilities and execution of programs by
575 root.
576
577 Safety checking for capability-dumb binaries
578 A capability-dumb binary is an application that has been marked to have
579 file capabilities, but has not been converted to use the libcap(3) API
580 to manipulate its capabilities. (In other words, this is a traditional
581 set-user-ID-root program that has been switched to use file capabili‐
582 ties, but whose code has not been modified to understand capabilities.)
583 For such applications, the effective capability bit is set on the file,
584 so that the file permitted capabilities are automatically enabled in
585 the process effective set when executing the file. The kernel recog‐
586 nizes a file which has the effective capability bit set as capability-
587 dumb for the purpose of the check described here.
588
589 When executing a capability-dumb binary, the kernel checks if the
590 process obtained all permitted capabilities that were specified in the
591 file permitted set, after the capability transformations described
592 above have been performed. (The typical reason why this might not
593 occur is that the capability bounding set masked out some of the capa‐
594 bilities in the file permitted set.) If the process did not obtain the
595 full set of file permitted capabilities, then execve(2) fails with the
596 error EPERM. This prevents possible security risks that could arise
597 when a capability-dumb application is executed with less privilege that
598 it needs. Note that, by definition, the application could not itself
599 recognize this problem, since it does not employ the libcap(3) API.
600
601 Capabilities and execution of programs by root
602 In order to provide an all-powerful root using capability sets, during
603 an execve(2):
604
605 1. If a set-user-ID-root program is being executed, or the real or
606 effective user ID of the process is 0 (root) then the file inherita‐
607 ble and permitted sets are defined to be all ones (i.e., all capa‐
608 bilities enabled).
609
610 2. If a set-user-ID-root program is being executed, or the effective
611 user ID of the process is 0 (root) then the file effective bit is
612 defined to be one (enabled).
613
614 The upshot of the above rules, combined with the capabilities transfor‐
615 mations described above, is as follows:
616
617 * When a process execve(2)s a set-user-ID-root program, or when a
618 process with an effective UID of 0 execve(2)s a program, it gains
619 all capabilities in its permitted and effective capability sets,
620 except those masked out by the capability bounding set.
621
622 * When a process with a real UID of 0 execve(2)s a program, it gains
623 all capabilities in its permitted capability set, except those
624 masked out by the capability bounding set.
625
626 The above steps yield semantics that are the same as those provided by
627 traditional UNIX systems.
628
629 Set-user-ID-root programs that have file capabilities
630 Executing a program that is both set-user-ID root and has file capabil‐
631 ities will cause the process to gain just the capabilities granted by
632 the program (i.e., not all capabilities, as would occur when executing
633 a set-user-ID-root program that does not have any associated file capa‐
634 bilities). Note that one can assign empty capability sets to a program
635 file, and thus it is possible to create a set-user-ID-root program that
636 changes the effective and saved set-user-ID of the process that exe‐
637 cutes the program to 0, but confers no capabilities to that process.
638
639 Capability bounding set
640 The capability bounding set is a security mechanism that can be used to
641 limit the capabilities that can be gained during an execve(2). The
642 bounding set is used in the following ways:
643
644 * During an execve(2), the capability bounding set is ANDed with the
645 file permitted capability set, and the result of this operation is
646 assigned to the thread's permitted capability set. The capability
647 bounding set thus places a limit on the permitted capabilities that
648 may be granted by an executable file.
649
650 * (Since Linux 2.6.25) The capability bounding set acts as a limiting
651 superset for the capabilities that a thread can add to its inherita‐
652 ble set using capset(2). This means that if a capability is not in
653 the bounding set, then a thread can't add this capability to its
654 inheritable set, even if it was in its permitted capabilities, and
655 thereby cannot have this capability preserved in its permitted set
656 when it execve(2)s a file that has the capability in its inheritable
657 set.
658
659 Note that the bounding set masks the file permitted capabilities, but
660 not the inheritable capabilities. If a thread maintains a capability
661 in its inheritable set that is not in its bounding set, then it can
662 still gain that capability in its permitted set by executing a file
663 that has the capability in its inheritable set.
664
665 Depending on the kernel version, the capability bounding set is either
666 a system-wide attribute, or a per-process attribute.
667
668 Capability bounding set prior to Linux 2.6.25
669
670 In kernels before 2.6.25, the capability bounding set is a system-wide
671 attribute that affects all threads on the system. The bounding set is
672 accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this
673 bit mask parameter is expressed as a signed decimal number in
674 /proc/sys/kernel/cap-bound.)
675
676 Only the init process may set capabilities in the capability bounding
677 set; other than that, the superuser (more precisely: a process with the
678 CAP_SYS_MODULE capability) may only clear capabilities from this set.
679
680 On a standard system the capability bounding set always masks out the
681 CAP_SETPCAP capability. To remove this restriction (dangerous!), mod‐
682 ify the definition of CAP_INIT_EFF_SET in include/linux/capability.h
683 and rebuild the kernel.
684
685 The system-wide capability bounding set feature was added to Linux
686 starting with kernel version 2.2.11.
687
688 Capability bounding set from Linux 2.6.25 onward
689
690 From Linux 2.6.25, the capability bounding set is a per-thread
691 attribute. (There is no longer a system-wide capability bounding set.)
692
693 The bounding set is inherited at fork(2) from the thread's parent, and
694 is preserved across an execve(2).
695
696 A thread may remove capabilities from its capability bounding set using
697 the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
698 capability. Once a capability has been dropped from the bounding set,
699 it cannot be restored to that set. A thread can determine if a capa‐
700 bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
701 tion.
702
703 Removing capabilities from the bounding set is supported only if file
704 capabilities are compiled into the kernel. In kernels before Linux
705 2.6.33, file capabilities were an optional feature configurable via the
706 CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the con‐
707 figuration option has been removed and file capabilities are always
708 part of the kernel. When file capabilities are compiled into the ker‐
709 nel, the init process (the ancestor of all processes) begins with a
710 full bounding set. If file capabilities are not compiled into the ker‐
711 nel, then init begins with a full bounding set minus CAP_SETPCAP,
712 because this capability has a different meaning when there are no file
713 capabilities.
714
715 Removing a capability from the bounding set does not remove it from the
716 thread's inheritable set. However it does prevent the capability from
717 being added back into the thread's inheritable set in the future.
718
719 Effect of user ID changes on capabilities
720 To preserve the traditional semantics for transitions between 0 and
721 nonzero user IDs, the kernel makes the following changes to a thread's
722 capability sets on changes to the thread's real, effective, saved set,
723 and filesystem user IDs (using setuid(2), setresuid(2), or similar):
724
725 1. If one or more of the real, effective or saved set user IDs was pre‐
726 viously 0, and as a result of the UID changes all of these IDs have
727 a nonzero value, then all capabilities are cleared from the permit‐
728 ted, effective, and ambient capability sets.
729
730 2. If the effective user ID is changed from 0 to nonzero, then all
731 capabilities are cleared from the effective set.
732
733 3. If the effective user ID is changed from nonzero to 0, then the per‐
734 mitted set is copied to the effective set.
735
736 4. If the filesystem user ID is changed from 0 to nonzero (see setf‐
737 suid(2)), then the following capabilities are cleared from the
738 effective set: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
739 CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE (since Linux 2.6.30),
740 CAP_MAC_OVERRIDE, and CAP_MKNOD (since Linux 2.6.30). If the
741 filesystem UID is changed from nonzero to 0, then any of these capa‐
742 bilities that are enabled in the permitted set are enabled in the
743 effective set.
744
745 If a thread that has a 0 value for one or more of its user IDs wants to
746 prevent its permitted capability set being cleared when it resets all
747 of its user IDs to nonzero values, it can do so using the
748 SECBIT_KEEP_CAPS securebits flag described below.
749
750 Programmatically adjusting capability sets
751 A thread can retrieve and change its capability sets using the
752 capget(2) and capset(2) system calls. However, the use of
753 cap_get_proc(3) and cap_set_proc(3), both provided in the libcap pack‐
754 age, is preferred for this purpose. The following rules govern changes
755 to the thread capability sets:
756
757 1. If the caller does not have the CAP_SETPCAP capability, the new
758 inheritable set must be a subset of the combination of the existing
759 inheritable and permitted sets.
760
761 2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
762 combination of the existing inheritable set and the capability
763 bounding set.
764
765 3. The new permitted set must be a subset of the existing permitted set
766 (i.e., it is not possible to acquire permitted capabilities that the
767 thread does not currently have).
768
769 4. The new effective set must be a subset of the new permitted set.
770
771 The securebits flags: establishing a capabilities-only environment
772 Starting with kernel 2.6.26, and with a kernel in which file capabili‐
773 ties are enabled, Linux implements a set of per-thread securebits flags
774 that can be used to disable special handling of capabilities for UID 0
775 (root). These flags are as follows:
776
777 SECBIT_KEEP_CAPS
778 Setting this flag allows a thread that has one or more 0 UIDs to
779 retain capabilities in its permitted and effective sets when it
780 switches all of its UIDs to nonzero values. If this flag is not
781 set, then such a UID switch causes the thread to lose all capa‐
782 bilities in those sets. This flag is always cleared on an
783 execve(2).
784
785 The setting of the SECBIT_KEEP_CAPS flag is ignored if the
786 SECBIT_NO_SETUID_FIXUP flag is set. (The latter flag provides a
787 superset of the effect of the former flag.)
788
789 This flag provides the same functionality as the older prctl(2)
790 PR_SET_KEEPCAPS operation.
791
792 SECBIT_NO_SETUID_FIXUP
793 Setting this flag stops the kernel from adjusting the process's
794 permitted, effective, and ambient capability sets when the
795 thread's effective and filesystem UIDs are switched between zero
796 and nonzero values. (See the subsection Effect of user ID
797 changes on capabilities.)
798
799 SECBIT_NOROOT
800 If this bit is set, then the kernel does not grant capabilities
801 when a set-user-ID-root program is executed, or when a process
802 with an effective or real UID of 0 calls execve(2). (See the
803 subsection Capabilities and execution of programs by root.)
804
805 SECBIT_NO_CAP_AMBIENT_RAISE
806 Setting this flag disallows raising ambient capabilities via the
807 prctl(2) PR_CAP_AMBIENT_RAISE operation.
808
809 Each of the above "base" flags has a companion "locked" flag. Setting
810 any of the "locked" flags is irreversible, and has the effect of pre‐
811 venting further changes to the corresponding "base" flag. The locked
812 flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED,
813 SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
814
815 The securebits flags can be modified and retrieved using the prctl(2)
816 PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP
817 capability is required to modify the flags.
818
819 The securebits flags are inherited by child processes. During an
820 execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS
821 which is always cleared.
822
823 An application can use the following call to lock itself, and all of
824 its descendants, into an environment where the only way of gaining
825 capabilities is by executing a program with associated file capabili‐
826 ties:
827
828 prctl(PR_SET_SECUREBITS,
829 /* SECBIT_KEEP_CAPS off */
830 SECBIT_KEEP_CAPS_LOCKED |
831 SECBIT_NO_SETUID_FIXUP |
832 SECBIT_NO_SETUID_FIXUP_LOCKED |
833 SECBIT_NOROOT |
834 SECBIT_NOROOT_LOCKED);
835 /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
836 is not required */
837
838 Interaction with user namespaces
839 For a discussion of the interaction of capabilities and user names‐
840 paces, see user_namespaces(7).
841
842 Namespaced file capabilities
843 Traditional (i.e., version 2) file capabilities associate only a set of
844 capability masks with a binary executable file. When a process exe‐
845 cutes a binary with such capabilities, it gains the associated capabil‐
846 ities (within its user namespace) as per the rules described above in
847 "Transformation of capabilities during execve()".
848
849 Because version 2 file capabilities confer capabilities to the execut‐
850 ing process regardless of which user namespace it resides in, only
851 privileged processes are permitted to associate capabilities with a
852 file. Here, "privileged" means a process that has the CAP_SETFCAP
853 capability in the user namespace where the filesystem was mounted (nor‐
854 mally the initial user namespace). This limitation renders file capa‐
855 bilities useless for certain use cases. For example, in user-names‐
856 paced containers, it can be desirable to be able to create a binary
857 that confers capabilities only to processes executed inside that con‐
858 tainer, but not to processes that are executed outside the container.
859
860 Linux 4.14 added so-called namespaced file capabilities to support such
861 use cases. Namespaced file capabilities are recorded as version 3
862 (i.e., VFS_CAP_REVISION_3) security.capability extended attributes.
863 Such an attribute is automatically created when a process that resides
864 in a noninitial user namespace associates (setxattr(2)) file capabili‐
865 ties with a file whose user ID matches the user ID of the creator of
866 the namespace. In this case, the kernel records not just the capabil‐
867 ity masks in the extended attribute, but also the namespace root user
868 ID. For further details, see File capability mask versioning, above.
869
870 As with a binary that has VFS_CAP_REVISION_2 file capabilities, a
871 binary with VFS_CAP_REVISION_3 file capabilities confers capabilities
872 to a process during execve(). However, capabilities are conferred only
873 if the binary is executed by a process that resides in a user namespace
874 whose UID 0 maps to the root user ID that is saved in the extended
875 attribute, or when executed by a process that resides in descendant of
876 such a namespace.
877
879 No standards govern capabilities, but the Linux capability implementa‐
880 tion is based on the withdrawn POSIX.1e draft standard; see
881 ⟨http://wt.tuxomania.net/publications/posix.1e/⟩.
882
884 From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
885 nel component, and could be enabled/disabled via the CONFIG_SECU‐
886 RITY_CAPABILITIES kernel configuration option.
887
888 The /proc/[pid]/task/TID/status file can be used to view the capability
889 sets of a thread. The /proc/[pid]/status file shows the capability
890 sets of a process's main thread. Before Linux 3.8, nonexistent capa‐
891 bilities were shown as being enabled (1) in these sets. Since Linux
892 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as
893 disabled (0).
894
895 The libcap package provides a suite of routines for setting and getting
896 capabilities that is more comfortable and less likely to change than
897 the interface provided by capset(2) and capget(2). This package also
898 provides the setcap(8) and getcap(8) programs. It can be found at
899 ⟨http://www.kernel.org/pub/linux/libs/security/linux-privs⟩.
900
901 Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file
902 capabilities are not enabled, a thread with the CAP_SETPCAP capability
903 can manipulate the capabilities of threads other than itself. However,
904 this is only theoretically possible, since no thread ever has CAP_SETP‐
905 CAP in either of these cases:
906
907 * In the pre-2.6.25 implementation the system-wide capability bounding
908 set, /proc/sys/kernel/cap-bound, always masks out this capability,
909 and this can not be changed without modifying the kernel source and
910 rebuilding.
911
912 * If file capabilities are disabled in the current implementation, then
913 init starts out with this capability removed from its per-process
914 bounding set, and that bounding set is inherited by all other pro‐
915 cesses created on the system.
916
918 capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3),
919 cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3),
920 cap_init(3), capgetp(3), capsetp(3), libcap(3), proc(5), creden‐
921 tials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), get‐
922 cap(8), netcap(8), pscap(8), setcap(8)
923
924 include/linux/capability.h in the Linux kernel source tree
925
927 This page is part of release 4.16 of the Linux man-pages project. A
928 description of the project, information about reporting bugs, and the
929 latest version of this page, can be found at
930 https://www.kernel.org/doc/man-pages/.
931
932
933
934Linux 2018-02-02 CAPABILITIES(7)