capabilities(7)

1CAPABILITIES(7)            Linux Programmer's Manual           CAPABILITIES(7)
2
3
4

NAME

6       capabilities - overview of Linux capabilities
7

DESCRIPTION

9       For  the  purpose  of  performing  permission  checks, traditional UNIX
10       implementations distinguish two  categories  of  processes:  privileged
11       processes  (whose  effective  user ID is 0, referred to as superuser or
12       root), and unprivileged processes (whose  effective  UID  is  nonzero).
13       Privileged processes bypass all kernel permission checks, while unpriv‐
14       ileged processes are subject to full permission checking based  on  the
15       process's  credentials (usually: effective UID, effective GID, and sup‐
16       plementary group list).
17
18       Starting with kernel 2.2, Linux divides  the  privileges  traditionally
19       associated  with  superuser into distinct units, known as capabilities,
20       which can be independently enabled and disabled.   Capabilities  are  a
21       per-thread attribute.
22
23   Capabilities list
24       The following list shows the capabilities implemented on Linux, and the
25       operations or behaviors that each capability permits:
26
27       CAP_AUDIT_CONTROL (since Linux 2.6.11)
28              Enable and  disable  kernel  auditing;  change  auditing  filter
29              rules; retrieve auditing status and filtering rules.
30
31       CAP_AUDIT_READ (since Linux 3.16)
32              Allow reading the audit log via a multicast netlink socket.
33
34       CAP_AUDIT_WRITE (since Linux 2.6.11)
35              Write records to kernel auditing log.
36
37       CAP_BLOCK_SUSPEND (since Linux 3.5)
38              Employ  features  that can block system suspend (epoll(7) EPOLL‐
39              WAKEUP, /proc/sys/wake_lock).
40
41       CAP_CHOWN
42              Make arbitrary changes to file UIDs and GIDs (see chown(2)).
43
44       CAP_DAC_OVERRIDE
45              Bypass file read, write, and execute permission checks.  (DAC is
46              an abbreviation of "discretionary access control".)
47
48       CAP_DAC_READ_SEARCH
49              * Bypass file read permission checks and directory read and exe‐
50                cute permission checks;
51              * invoke open_by_handle_at(2);
52              * use the linkat(2) AT_EMPTY_PATH flag to create  a  link  to  a
53                file referred to by a file descriptor.
54
55       CAP_FOWNER
56              * Bypass  permission  checks on operations that normally require
57                the filesystem UID of the process to match the UID of the file
58                (e.g., chmod(2), utime(2)), excluding those operations covered
59                by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
60              * set inode flags (see ioctl_iflags(2)) on arbitrary files;
61              * set Access Control Lists (ACLs) on arbitrary files;
62              * ignore directory sticky bit on file deletion;
63              * modify user extended attributes on sticky directory  owned  by
64                any user;
65              * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
66
67       CAP_FSETID
68              * Don't clear set-user-ID and set-group-ID mode bits when a file
69                is modified;
70              * set the set-group-ID bit for a file whose GID does  not  match
71                the filesystem or any of the supplementary GIDs of the calling
72                process.
73
74       CAP_IPC_LOCK
75              Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
76
77       CAP_IPC_OWNER
78              Bypass permission checks for operations on System V IPC objects.
79
80       CAP_KILL
81              Bypass permission checks  for  sending  signals  (see  kill(2)).
82              This includes use of the ioctl(2) KDSIGACCEPT operation.
83
84       CAP_LEASE (since Linux 2.4)
85              Establish leases on arbitrary files (see fcntl(2)).
86
87       CAP_LINUX_IMMUTABLE
88              Set  the  FS_APPEND_FL  and  FS_IMMUTABLE_FL  inode  flags  (see
89              ioctl_iflags(2)).
90
91       CAP_MAC_ADMIN (since Linux 2.6.25)
92              Allow MAC configuration or state changes.  Implemented  for  the
93              Smack Linux Security Module (LSM).
94
95       CAP_MAC_OVERRIDE (since Linux 2.6.25)
96              Override  Mandatory  Access  Control (MAC).  Implemented for the
97              Smack LSM.
98
99       CAP_MKNOD (since Linux 2.4)
100              Create special files using mknod(2).
101
102       CAP_NET_ADMIN
103              Perform various network-related operations:
104              * interface configuration;
105              * administration of IP firewall, masquerading, and accounting;
106              * modify routing tables;
107              * bind to any address for transparent proxying;
108              * set type-of-service (TOS);
109              * clear driver statistics;
110              * set promiscuous mode;
111              * enabling multicasting;
112              * use  setsockopt(2)  to  set  the  following  socket   options:
113                SO_DEBUG,  SO_MARK,  SO_PRIORITY  (for  a priority outside the
114                range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
115
116       CAP_NET_BIND_SERVICE
117              Bind a socket to Internet domain privileged ports (port  numbers
118              less than 1024).
119
120       CAP_NET_BROADCAST
121              (Unused)  Make socket broadcasts, and listen to multicasts.
122
123       CAP_NET_RAW
124              * Use RAW and PACKET sockets;
125              * bind to any address for transparent proxying.
126
127       CAP_SETGID
128              * Make arbitrary manipulations of process GIDs and supplementary
129                GID list;
130              * forge GID when passing  socket  credentials  via  UNIX  domain
131                sockets;
132              * write  a group ID mapping in a user namespace (see user_names‐
133                paces(7)).
134
135       CAP_SETFCAP (since Linux 2.6.24)
136              Set arbitrary capabilities on a file.
137
138       CAP_SETPCAP
139              If file capabilities are supported (i.e., since  Linux  2.6.24):
140              add any capability from the calling thread's bounding set to its
141              inheritable set; drop capabilities from the  bounding  set  (via
142              prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
143
144              If  file  capabilities  are  not supported (i.e., kernels before
145              Linux 2.6.24): grant or remove any capability  in  the  caller's
146              permitted  capability  set  to or from any other process.  (This
147              property of CAP_SETPCAP is not available when the kernel is con‐
148              figured  to  support  file  capabilities,  since CAP_SETPCAP has
149              entirely different semantics for such kernels.)
150
151       CAP_SETUID
152              * Make  arbitrary  manipulations  of  process  UIDs  (setuid(2),
153                setreuid(2), setresuid(2), setfsuid(2));
154              * forge  UID  when  passing  socket  credentials via UNIX domain
155                sockets;
156              * write a user ID mapping in a user namespace  (see  user_names‐
157                paces(7)).
158
159       CAP_SYS_ADMIN
160              Note:  this capability is overloaded; see Notes to kernel devel‐
161              opers, below.
162
163              * Perform a range of system administration operations including:
164                quotactl(2),  mount(2),  umount(2),  pivot_root(2), swapon(2),
165                swapoff(2), sethostname(2), and setdomainname(2);
166              * perform privileged syslog(2) operations (since  Linux  2.6.37,
167                CAP_SYSLOG should be used to permit such operations);
168              * perform VM86_REQUEST_IRQ vm86(2) command;
169              * perform  IPC_SET and IPC_RMID operations on arbitrary System V
170                IPC objects;
171              * override RLIMIT_NPROC resource limit;
172              * perform operations on trusted and security extended attributes
173                (see xattr(7));
174              * use lookup_dcookie(2);
175              * use  ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
176                2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
177              * forge PID when passing  socket  credentials  via  UNIX  domain
178                sockets;
179              * exceed  /proc/sys/fs/file-max,  the  system-wide  limit on the
180                number of open files, in system calls that open  files  (e.g.,
181                accept(2), execve(2), open(2), pipe(2));
182              * employ  CLONE_* flags that create new namespaces with clone(2)
183                and unshare(2) (but, since Linux 3.8, creating user namespaces
184                does not require any capability);
185              * call perf_event_open(2);
186              * access privileged perf event information;
187              * call  setns(2)  (requires  CAP_SYS_ADMIN  in the target names‐
188                pace);
189              * call fanotify_init(2);
190              * call bpf(2);
191              * perform privileged KEYCTL_CHOWN and  KEYCTL_SETPERM  keyctl(2)
192                operations;
193              * perform madvise(2) MADV_HWPOISON operation;
194              * employ  the  TIOCSTI  ioctl(2)  to  insert characters into the
195                input queue of a terminal other than the caller's  controlling
196                terminal;
197              * employ the obsolete nfsservctl(2) system call;
198              * employ the obsolete bdflush(2) system call;
199              * perform various privileged block-device ioctl(2) operations;
200              * perform various privileged filesystem ioctl(2) operations;
201              * perform  privileged  ioctl(2)  operations  on  the /dev/random
202                device (see random(4));
203              * install a seccomp(2) filter without first having  to  set  the
204                no_new_privs thread attribute;
205              * modify allow/deny rules for device control groups;
206              * employ  the  ptrace(2)  PTRACE_SECCOMP_GET_FILTER operation to
207                dump tracee's seccomp filters;
208              * employ the ptrace(2) PTRACE_SETOPTIONS  operation  to  suspend
209                the  tracee's  seccomp  protections  (i.e.,  the PTRACE_O_SUS‐
210                PEND_SECCOMP flag);
211              * perform administrative operations on many device drivers.
212              * Modify autogroup nice values by writing  to  /proc/[pid]/auto‐
213                group (see sched(7)).
214
215       CAP_SYS_BOOT
216              Use reboot(2) and kexec_load(2).
217
218       CAP_SYS_CHROOT
219              * Use chroot(2);
220              * change mount namespaces using setns(2).
221
222       CAP_SYS_MODULE
223              * Load   and  unload  kernel  modules  (see  init_module(2)  and
224                delete_module(2));
225              * in kernels before 2.6.25: drop capabilities from  the  system-
226                wide capability bounding set.
227
228       CAP_SYS_NICE
229              * Raise  process nice value (nice(2), setpriority(2)) and change
230                the nice value for arbitrary processes;
231              * set real-time scheduling policies for calling process, and set
232                scheduling  policies  and  priorities  for arbitrary processes
233                (sched_setscheduler(2), sched_setparam(2), sched_setattr(2));
234              * set CPU  affinity  for  arbitrary  processes  (sched_setaffin‐
235                ity(2));
236              * set  I/O scheduling class and priority for arbitrary processes
237                (ioprio_set(2));
238              * apply migrate_pages(2) to arbitrary processes and  allow  pro‐
239                cesses to be migrated to arbitrary nodes;
240              * apply move_pages(2) to arbitrary processes;
241              * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
242
243       CAP_SYS_PACCT
244              Use acct(2).
245
246       CAP_SYS_PTRACE
247              * Trace arbitrary processes using ptrace(2);
248              * apply get_robust_list(2) to arbitrary processes;
249              * transfer  data  to  or  from the memory of arbitrary processes
250                using process_vm_readv(2) and process_vm_writev(2);
251              * inspect processes using kcmp(2).
252
253       CAP_SYS_RAWIO
254              * Perform I/O port operations (iopl(2) and ioperm(2));
255              * access /proc/kcore;
256              * employ the FIBMAP ioctl(2) operation;
257              * open devices for accessing x86 model-specific registers (MSRs,
258                see msr(4));
259              * update /proc/sys/vm/mmap_min_addr;
260              * create  memory mappings at addresses below the value specified
261                by /proc/sys/vm/mmap_min_addr;
262              * map files in /proc/bus/pci;
263              * open /dev/mem and /dev/kmem;
264              * perform various SCSI device commands;
265              * perform certain operations on hpsa(4) and cciss(4) devices;
266              * perform  a  range  of  device-specific  operations  on   other
267                devices.
268
269       CAP_SYS_RESOURCE
270              * Use reserved space on ext2 filesystems;
271              * make ioctl(2) calls controlling ext3 journaling;
272              * override disk quota limits;
273              * increase resource limits (see setrlimit(2));
274              * override RLIMIT_NPROC resource limit;
275              * override maximum number of consoles on console allocation;
276              * override maximum number of keymaps;
277              * allow more than 64hz interrupts from the real-time clock;
278              * raise  msg_qbytes limit for a System V message queue above the
279                limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
280              * allow the RLIMIT_NOFILE resource limit on the number  of  "in-
281                flight"  file  descriptors  to  be  bypassed when passing file
282                descriptors to another process via a UNIX domain  socket  (see
283                unix(7));
284              * override the /proc/sys/fs/pipe-size-max limit when setting the
285                capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command;
286              * use F_SETPIPE_SZ to increase the capacity of a pipe above  the
287                limit specified by /proc/sys/fs/pipe-max-size;
288              * override  /proc/sys/fs/mqueue/queues_max  limit  when creating
289                POSIX message queues (see mq_overview(7));
290              * employ the prctl(2) PR_SET_MM operation;
291              * set /proc/[pid]/oom_score_adj to a value lower than the  value
292                last set by a process with CAP_SYS_RESOURCE.
293
294       CAP_SYS_TIME
295              Set  system  clock (settimeofday(2), stime(2), adjtimex(2)); set
296              real-time (hardware) clock.
297
298       CAP_SYS_TTY_CONFIG
299              Use vhangup(2); employ various privileged ioctl(2) operations on
300              virtual terminals.
301
302       CAP_SYSLOG (since Linux 2.6.37)
303              * Perform  privileged  syslog(2)  operations.  See syslog(2) for
304                information on which operations require privilege.
305              * View kernel addresses exposed via /proc and  other  interfaces
306                when /proc/sys/kernel/kptr_restrict has the value 1.  (See the
307                discussion of the kptr_restrict in proc(5).)
308
309       CAP_WAKE_ALARM (since Linux 3.0)
310              Trigger something that will wake up the system (set  CLOCK_REAL‐
311              TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
312
313   Past and current implementation
314       A full implementation of capabilities requires that:
315
316       1. For  all  privileged  operations,  the kernel must check whether the
317          thread has the required capability in its effective set.
318
319       2. The kernel must provide system calls allowing a thread's  capability
320          sets to be changed and retrieved.
321
322       3. The  filesystem must support attaching capabilities to an executable
323          file, so that a process gains those capabilities when  the  file  is
324          executed.
325
326       Before kernel 2.6.24, only the first two of these requirements are met;
327       since kernel 2.6.24, all three requirements are met.
328
329   Notes to kernel developers
330       When adding a new kernel feature that should be governed by a  capabil‐
331       ity, consider the following points.
332
333       *  The  goal  of  capabilities  is  divide  the power of superuser into
334          pieces, such that if a program that has one or more capabilities  is
335          compromised, its power to do damage to the system would be less than
336          the same program running with root privilege.
337
338       *  You have the choice of either creating a new capability for your new
339          feature,  or  associating the feature with one of the existing capa‐
340          bilities.  In order to keep the set of capabilities to a  manageable
341          size,  the  latter option is preferable, unless there are compelling
342          reasons to take the former  option.   (There  is  also  a  technical
343          limit: the size of capability sets is currently limited to 64 bits.)
344
345       *  To determine which existing capability might best be associated with
346          your new feature, review the list of capabilities above in order  to
347          find  a  "silo" into which your new feature best fits.  One approach
348          to take is to determine if there are other features requiring  capa‐
349          bilities  that  will  always be used along with the new feature.  If
350          the new feature is useless without these other features, you  should
351          use the same capability as the other features.
352
353       *  Don't  choose  CAP_SYS_ADMIN  if  you can possibly avoid it!  A vast
354          proportion of existing capability checks are  associated  with  this
355          capability (see the partial list above).  It can plausibly be called
356          "the new root", since on the one hand, it confers a  wide  range  of
357          powers,  and  on  the other hand, its broad scope means that this is
358          the capability that is required by many privileged programs.   Don't
359          make  the problem worse.  The only new features that should be asso‐
360          ciated with CAP_SYS_ADMIN are ones that closely match existing  uses
361          in that silo.
362
363       *  If  you  have determined that it really is necessary to create a new
364          capability for your feature, don't make or name it as a "single-use"
365          capability.   Thus, for example, the addition of the highly specific
366          CAP_SYS_PACCT was probably a mistake.  Instead, try to identify  and
367          name  your new capability as a broader silo into which other related
368          future use cases might fit.
369
370   Thread capability sets
371       Each thread has the following capability sets containing zero  or  more
372       of the above capabilities:
373
374       Permitted
375              This  is a limiting superset for the effective capabilities that
376              the thread may assume.  It is also a limiting superset  for  the
377              capabilities  that  may  be  added  to  the inheritable set by a
378              thread that does not have  the  CAP_SETPCAP  capability  in  its
379              effective set.
380
381              If  a  thread  drops a capability from its permitted set, it can
382              never reacquire that capability (unless it execve(2)s  either  a
383              set-user-ID-root  program,  or  a  program whose associated file
384              capabilities grant that capability).
385
386       Inheritable
387              This is a set of capabilities  preserved  across  an  execve(2).
388              Inheritable  capabilities  remain inheritable when executing any
389              program, and inheritable capabilities are added to the permitted
390              set when executing a program that has the corresponding bits set
391              in the file inheritable set.
392
393              Because inheritable capabilities  are  not  generally  preserved
394              across  execve(2)  when running as a non-root user, applications
395              that wish to run  helper  programs  with  elevated  capabilities
396              should consider using ambient capabilities, described below.
397
398       Effective
399              This  is  the  set of capabilities used by the kernel to perform
400              permission checks for the thread.
401
402       Bounding (per-thread since Linux 2.6.25)
403              The capability bounding set is a mechanism that can be  used  to
404              limit the capabilities that are gained during execve(2).
405
406              Since  Linux  2.6.25,  this  is a per-thread capability set.  In
407              older kernels, the capability bounding set  was  a  system  wide
408              attribute shared by all threads on the system.
409
410              For more details on the capability bounding set, see below.
411
412       Ambient (since Linux 4.3)
413              This  is  a  set  of  capabilities  that are preserved across an
414              execve(2) of a program that  is  not  privileged.   The  ambient
415              capability  set  obeys the invariant that no capability can ever
416              be ambient if it is not both permitted and inheritable.
417
418              The ambient  capability  set  can  be  directly  modified  using
419              prctl(2).   Ambient  capabilities  are  automatically lowered if
420              either of the corresponding permitted or  inheritable  capabili‐
421              ties is lowered.
422
423              Executing a program that changes UID or GID due to the set-user-
424              ID or set-group-ID bits or executing a program that has any file
425              capabilities  set will clear the ambient set.  Ambient capabili‐
426              ties are added to the permitted set and assigned to  the  effec‐
427              tive  set  when  execve(2)  is  called.  If ambient capabilities
428              cause  a  process's  permitted  and  effective  capabilities  to
429              increase  during an execve(2), this does not trigger the secure-
430              execution mode described in ld.so(8).
431
432       A child created via fork(2) inherits copies of its parent's  capability
433       sets.  See below for a discussion of the treatment of capabilities dur‐
434       ing execve(2).
435
436       Using capset(2), a thread may manipulate its own capability  sets  (see
437       below).
438
439       Since  Linux  3.2,  the  file /proc/sys/kernel/cap_last_cap exposes the
440       numerical value of the highest capability supported by the running ker‐
441       nel; this can be used to determine the highest bit that may be set in a
442       capability set.
443
444   File capabilities
445       Since kernel 2.6.24, the kernel supports  associating  capability  sets
446       with  an executable file using setcap(8).  The file capability sets are
447       stored in an extended attribute (see setxattr(2)  and  xattr(7))  named
448       security.capability.   Writing  to this extended attribute requires the
449       CAP_SETFCAP capability.  The file capability sets, in conjunction  with
450       the  capability  sets  of  the  thread, determine the capabilities of a
451       thread after an execve(2).
452
453       The three file capability sets are:
454
455       Permitted (formerly known as forced):
456              These capabilities are automatically permitted  to  the  thread,
457              regardless of the thread's inheritable capabilities.
458
459       Inheritable (formerly known as allowed):
460              This set is ANDed with the thread's inheritable set to determine
461              which inheritable capabilities are enabled in the permitted  set
462              of the thread after the execve(2).
463
464       Effective:
465              This is not a set, but rather just a single bit.  If this bit is
466              set, then during an execve(2) all of the new permitted capabili‐
467              ties  for  the  thread are also raised in the effective set.  If
468              this bit is not set, then after an execve(2), none  of  the  new
469              permitted capabilities is in the new effective set.
470
471              Enabling the file effective capability bit implies that any file
472              permitted or inheritable capability  that  causes  a  thread  to
473              acquire   the   corresponding  permitted  capability  during  an
474              execve(2) (see the transformation rules  described  below)  will
475              also  acquire  that capability in its effective set.  Therefore,
476              when   assigning   capabilities   to    a    file    (setcap(8),
477              cap_set_file(3),  cap_set_fd(3)),  if  we  specify the effective
478              flag as being enabled for any  capability,  then  the  effective
479              flag  must  also be specified as enabled for all other capabili‐
480              ties for which the corresponding permitted or inheritable  flags
481              is enabled.
482
483   File capability extended attribute versioning
484       To  allow  extensibility, the kernel supports a scheme to encode a ver‐
485       sion number inside the security.capability extended attribute  that  is
486       used  to implement file capabilities.  These version numbers are inter‐
487       nal to the implementation,  and  not  directly  visible  to  user-space
488       applications.  To date, the following versions are supported:
489
490       VFS_CAP_REVISION_1
491              This was the original file capability implementation, which sup‐
492              ported 32-bit masks for file capabilities.
493
494       VFS_CAP_REVISION_2 (since Linux 2.6.25)
495              This version allows for file capability masks that are  64  bits
496              in  size, and was necessary as the number of supported capabili‐
497              ties grew beyond 32.  The kernel transparently continues to sup‐
498              port  the execution of files that have 32-bit version 1 capabil‐
499              ity masks, but when adding capabilities to files  that  did  not
500              previously  have  capabilities, or modifying the capabilities of
501              existing files, it automatically uses the version 2  scheme  (or
502              possibly the version 3 scheme, as described below).
503
504       VFS_CAP_REVISION_3 (since Linux 4.14)
505              Version  3  file capabilities are provided to support namespaced
506              file capabilities (described below).
507
508              As with version 2 file capabilities, version 3 capability  masks
509              are  64  bits  in  size.   But  in addition, the root user ID of
510              namespace  is  encoded  in  the   security.capability   extended
511              attribute.   (A  namespace's root user ID is the value that user
512              ID 0 inside that namespace maps to in the  initial  user  names‐
513              pace.)
514
515              Version 3 file capabilities are designed to coexist with version
516              2 capabilities; that is, on a modern Linux system, there may  be
517              some files with version 2 capabilities while others have version
518              3 capabilities.
519
520       Before Linux 4.14, the only kind of file capability extended  attribute
521       that  could  be  attached to a file was a VFS_CAP_REVISION_2 attribute.
522       Since Linux 4.14,  the  version  of  the  security.capability  extended
523       attribute  that  is  attached to a file depends on the circumstances in
524       which the attribute was created.
525
526       Starting with Linux 4.14, a security.capability extended  attribute  is
527       automatically  created  as (or converted to) a version 3 (VFS_CAP_REVI‐
528       SION_3) attribute if both of the following are true:
529
530       (1) The thread writing the  attribute  resides  in  a  noninitial  user
531           namespace.  (More precisely: the thread resides in a user namespace
532           other than  the  one  from  which  the  underlying  filesystem  was
533           mounted.)
534
535       (2) The  thread  has  the  CAP_SETFCAP  capability over the file inode,
536           meaning that (a) the thread has the CAP_SETFCAP capability  in  its
537           own  user namespace; and (b) the UID and GID of the file inode have
538           mappings in the writer's user namespace.
539
540       When a VFS_CAP_REVISION_3  security.capability  extended  attribute  is
541       created,  the  root  user ID of the creating thread's user namespace is
542       saved in the extended attribute.
543
544       By contrast,  creating  or  modifying  a  security.capability  extended
545       attribute  from  a  privileged (CAP_SETFCAP) thread that resides in the
546       namespace where the underlying filesystem was  mounted  (this  normally
547       means the initial user namespace) automatically results in the creation
548       of a version 2 (VFS_CAP_REVISION_2) attribute.
549
550       Note that the creation of  a  version  3  security.capability  extended
551       attribute  is automatic.  That is to say, when a user-space application
552       writes (setxattr(2)) a security.capability attribute in the  version  2
553       format,  the  kernel will automatically create a version 3 attribute if
554       the attribute is created in the circumstances described above.   Corre‐
555       spondingly, when a version 3 security.capability attribute is retrieved
556       (getxattr(2)) by a process that resides inside a  user  namespace  that
557       was  created  by  the root user ID (or a descendant of that user names‐
558       pace), the returned attribute is (automatically) simplified  to  appear
559       as  a  version  2  attribute (i.e., the returned value is the size of a
560       version 2 attribute and does not include  the  root  user  ID).   These
561       automatic  translations mean that no changes are required to user-space
562       tools (e.g., setcap(1) and getcap(1)) in order for those  tools  to  be
563       used to create and retrieve version 3 security.capability attributes.
564
565       Note  that  a  file  can  have  either a version 2 or a version 3 secu‐
566       rity.capability extended attribute associated with it,  but  not  both:
567       creation  or modification of the security.capability extended attribute
568       will automatically modify the version according to the circumstances in
569       which the extended attribute is created or modified.
570
571   Transformation of capabilities during execve()
572       During  an execve(2), the kernel calculates the new capabilities of the
573       process using the following algorithm:
574
575           P'(ambient)     = (file is privileged) ? 0 : P(ambient)
576
577           P'(permitted)   = (P(inheritable) & F(inheritable)) |
578                             (F(permitted) & P(bounding)) | P'(ambient)
579
580           P'(effective)   = F(effective) ? P'(permitted) : P'(ambient)
581
582           P'(inheritable) = P(inheritable)    [i.e., unchanged]
583
584           P'(bounding)    = P(bounding)       [i.e., unchanged]
585
586       where:
587
588           P()   denotes the value of  a  thread  capability  set  before  the
589                 execve(2)
590
591           P'()  denotes  the  value  of  a  thread  capability  set after the
592                 execve(2)
593
594           F()   denotes a file capability set
595
596       Note the following details relating to the above capability transforma‐
597       tion rules:
598
599       *  The  ambient  capability  set is present only since Linux 4.3.  When
600          determining the transformation of the ambient set during  execve(2),
601          a  privileged file is one that has capabilities or has the set-user-
602          ID or set-group-ID bit set.
603
604       *  Prior to Linux 2.6.25, the bounding set was a system-wide  attribute
605          shared  by all threads.  That system-wide value was employed to cal‐
606          culate the new permitted set during execve(2) in the same manner  as
607          shown above for P(bounding).
608
609       Note: during the capability transitions described above, file capabili‐
610       ties may be ignored (treated as empty) for the same  reasons  that  the
611       set-user-ID  and  set-group-ID  bits  are ignored; see execve(2).  File
612       capabilities are similarly ignored if the kernel was  booted  with  the
613       no_file_caps option.
614
615       Note:  according to the rules above, if a process with nonzero user IDs
616       performs an execve(2) then any capabilities that  are  present  in  its
617       permitted  and  effective  sets  will be cleared.  For the treatment of
618       capabilities when a  process  with  a  user  ID  of  zero  performs  an
619       execve(2),  see  below  under Capabilities and execution of programs by
620       root.
621
622   Safety checking for capability-dumb binaries
623       A capability-dumb binary is an application that has been marked to have
624       file  capabilities, but has not been converted to use the libcap(3) API
625       to manipulate its capabilities.  (In other words, this is a traditional
626       set-user-ID-root  program  that has been switched to use file capabili‐
627       ties, but whose code has not been modified to understand capabilities.)
628       For such applications, the effective capability bit is set on the file,
629       so that the file permitted capabilities are  automatically  enabled  in
630       the  process  effective set when executing the file.  The kernel recog‐
631       nizes a file which has the effective capability bit set as  capability-
632       dumb for the purpose of the check described here.
633
634       When  executing  a  capability-dumb  binary,  the  kernel checks if the
635       process obtained all permitted capabilities that were specified in  the
636       file  permitted  set,  after  the  capability transformations described
637       above have been performed.  (The typical  reason  why  this  might  not
638       occur  is that the capability bounding set masked out some of the capa‐
639       bilities in the file permitted set.)  If the process did not obtain the
640       full  set of file permitted capabilities, then execve(2) fails with the
641       error EPERM.  This prevents possible security risks  that  could  arise
642       when a capability-dumb application is executed with less privilege that
643       it needs.  Note that, by definition, the application could  not  itself
644       recognize this problem, since it does not employ the libcap(3) API.
645
646   Capabilities and execution of programs by root
647       In order to mirror traditional UNIX semantics, the kernel performs spe‐
648       cial treatment of file capabilities when a process with  UID  0  (root)
649       executes a program and when a set-user-ID-root program is executed.
650
651       After  having  performed  any  changes to the process effective ID that
652       were triggered by the set-user-ID mode bit of the binary—e.g.,  switch‐
653       ing  the  effective user ID to 0 (root) because a set-user-ID-root pro‐
654       gram was executed—the kernel calculates the  file  capability  sets  as
655       follows:
656
657       1. If  the  real  or effective user ID of the process is 0 (root), then
658          the file inheritable and permitted sets are  ignored;  instead  they
659          are  notionally  considered  to  be all ones (i.e., all capabilities
660          enabled).  (There is one exception to this behavior, described below
661          in Set-user-ID-root programs that have file capabilities.)
662
663       2. If  the  effective  user  ID  of the process is 0 (root) or the file
664          effective bit is in fact enabled, then the  file  effective  bit  is
665          notionally defined to be one (enabled).
666
667       These  notional  values for the file's capability sets are then used as
668       described above to calculate the transformation of the process's  capa‐
669       bilities during execve(2).
670
671       Thus,  when  a  process with nonzero UIDs execve(2)s a set-user-ID-root
672       program that does not have capabilities attached,  or  when  a  process
673       whose real and effective UIDs are zero execve(2)s a program, the calcu‐
674       lation of the process's new permitted capabilities simplifies to:
675
676           P'(permitted)   = P(inheritable) | P(bounding)
677
678           P'(effective)   = P'(permitted)
679
680       Consequently, the process gains all capabilities in its  permitted  and
681       effective  capability  sets,  except those masked out by the capability
682       bounding set.  (In the calculation of  P'(permitted),  the  P'(ambient)
683       term can be simplified away because it is by definition a proper subset
684       of P(inheritable).)
685
686       The special treatments of user ID 0 (root) described in this subsection
687       can be disabled using the securebits mechanism described below.
688
689   Set-user-ID-root programs that have file capabilities
690       There is one exception to the behavior described under Capabilities and
691       execution of programs by root.  If (a) the binary that  is  being  exe‐
692       cuted has capabilities attached and (b) the real user ID of the process
693       is not 0 (root) and (c) the effective user  ID  of  the  process  is  0
694       (root),  then  the file capability bits are honored (i.e., they are not
695       notionally considered to be all ones).  The usual  way  in  which  this
696       situation  can arise is when executing a set-UID-root program that also
697       has file capabilities.  When such a program is  executed,  the  process
698       gains just the capabilities granted by the program (i.e., not all capa‐
699       bilities, as would occur when executing a set-user-ID-root program that
700       does not have any associated file capabilities).
701
702       Note  that  one can assign empty capability sets to a program file, and
703       thus it is possible to create a set-user-ID-root program  that  changes
704       the  effective  and  saved set-user-ID of the process that executes the
705       program to 0, but confers no capabilities to that process.
706
707   Capability bounding set
708       The capability bounding set is a security mechanism that can be used to
709       limit  the  capabilities  that  can be gained during an execve(2).  The
710       bounding set is used in the following ways:
711
712       * During an execve(2), the capability bounding set is  ANDed  with  the
713         file  permitted  capability  set, and the result of this operation is
714         assigned to the thread's permitted capability  set.   The  capability
715         bounding  set  thus places a limit on the permitted capabilities that
716         may be granted by an executable file.
717
718       * (Since Linux 2.6.25) The capability bounding set acts as  a  limiting
719         superset  for the capabilities that a thread can add to its inherita‐
720         ble set using capset(2).  This means that if a capability is  not  in
721         the  bounding  set,  then  a  thread can't add this capability to its
722         inheritable set, even if it was in its  permitted  capabilities,  and
723         thereby  cannot  have  this capability preserved in its permitted set
724         when it execve(2)s a file that has the capability in its  inheritable
725         set.
726
727       Note  that  the bounding set masks the file permitted capabilities, but
728       not the inheritable capabilities.  If a thread maintains  a  capability
729       in  its  inheritable  set  that is not in its bounding set, then it can
730       still gain that capability in its permitted set  by  executing  a  file
731       that has the capability in its inheritable set.
732
733       Depending  on the kernel version, the capability bounding set is either
734       a system-wide attribute, or a per-process attribute.
735
736       Capability bounding set from Linux 2.6.25 onward
737
738       From  Linux  2.6.25,  the  capability  bounding  set  is  a  per-thread
739       attribute.  (The system-wide capability bounding set described below no
740       longer exists.)
741
742       The bounding set is inherited at fork(2) from the thread's parent,  and
743       is preserved across an execve(2).
744
745       A thread may remove capabilities from its capability bounding set using
746       the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
747       capability.   Once a capability has been dropped from the bounding set,
748       it cannot be restored to that set.  A thread can determine if  a  capa‐
749       bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
750       tion.
751
752       Removing capabilities from the bounding set is supported only  if  file
753       capabilities  are  compiled  into  the kernel.  In kernels before Linux
754       2.6.33, file capabilities were an optional feature configurable via the
755       CONFIG_SECURITY_FILE_CAPABILITIES option.  Since Linux 2.6.33, the con‐
756       figuration option has been removed and  file  capabilities  are  always
757       part  of the kernel.  When file capabilities are compiled into the ker‐
758       nel, the init process (the ancestor of all  processes)  begins  with  a
759       full bounding set.  If file capabilities are not compiled into the ker‐
760       nel, then init begins with  a  full  bounding  set  minus  CAP_SETPCAP,
761       because  this capability has a different meaning when there are no file
762       capabilities.
763
764       Removing a capability from the bounding set does not remove it from the
765       thread's  inheritable set.  However it does prevent the capability from
766       being added back into the thread's inheritable set in the future.
767
768       Capability bounding set prior to Linux 2.6.25
769
770       In kernels before 2.6.25, the capability bounding set is a  system-wide
771       attribute  that affects all threads on the system.  The bounding set is
772       accessible via the file /proc/sys/kernel/cap-bound.  (Confusingly, this
773       bit  mask  parameter  is  expressed  as  a  signed  decimal  number  in
774       /proc/sys/kernel/cap-bound.)
775
776       Only the init process may set capabilities in the  capability  bounding
777       set; other than that, the superuser (more precisely: a process with the
778       CAP_SYS_MODULE capability) may only clear capabilities from this set.
779
780       On a standard system the capability bounding set always masks  out  the
781       CAP_SETPCAP  capability.  To remove this restriction (dangerous!), mod‐
782       ify the definition of  CAP_INIT_EFF_SET  in  include/linux/capability.h
783       and rebuild the kernel.
784
785       The  system-wide  capability  bounding  set  feature was added to Linux
786       starting with kernel version 2.2.11.
787
788   Effect of user ID changes on capabilities
789       To preserve the traditional semantics for  transitions  between  0  and
790       nonzero  user IDs, the kernel makes the following changes to a thread's
791       capability sets on changes to the thread's real, effective, saved  set,
792       and filesystem user IDs (using setuid(2), setresuid(2), or similar):
793
794       1. If one or more of the real, effective or saved set user IDs was pre‐
795          viously 0, and as a result of the UID changes all of these IDs  have
796          a  nonzero value, then all capabilities are cleared from the permit‐
797          ted, effective, and ambient capability sets.
798
799       2. If the effective user ID is changed from  0  to  nonzero,  then  all
800          capabilities are cleared from the effective set.
801
802       3. If the effective user ID is changed from nonzero to 0, then the per‐
803          mitted set is copied to the effective set.
804
805       4. If the filesystem user ID is changed from 0 to  nonzero  (see  setf‐
806          suid(2)),  then  the  following  capabilities  are  cleared from the
807          effective  set:  CAP_CHOWN,  CAP_DAC_OVERRIDE,  CAP_DAC_READ_SEARCH,
808          CAP_FOWNER,  CAP_FSETID,  CAP_LINUX_IMMUTABLE  (since Linux 2.6.30),
809          CAP_MAC_OVERRIDE,  and  CAP_MKNOD  (since  Linux  2.6.30).   If  the
810          filesystem UID is changed from nonzero to 0, then any of these capa‐
811          bilities that are enabled in the permitted set are  enabled  in  the
812          effective set.
813
814       If a thread that has a 0 value for one or more of its user IDs wants to
815       prevent its permitted capability set being cleared when it  resets  all
816       of   its   user  IDs  to  nonzero  values,  it  can  do  so  using  the
817       SECBIT_KEEP_CAPS securebits flag described below.
818
819   Programmatically adjusting capability sets
820       A thread can retrieve and change its permitted, effective, and  inheri‐
821       table  capability  sets using the capget(2) and capset(2) system calls.
822       However, the use of cap_get_proc(3) and cap_set_proc(3), both  provided
823       in  the  libcap  package, is preferred for this purpose.  The following
824       rules govern changes to the thread capability sets:
825
826       1. If the caller does not have  the  CAP_SETPCAP  capability,  the  new
827          inheritable  set must be a subset of the combination of the existing
828          inheritable and permitted sets.
829
830       2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
831          combination  of  the  existing  inheritable  set  and the capability
832          bounding set.
833
834       3. The new permitted set must be a subset of the existing permitted set
835          (i.e., it is not possible to acquire permitted capabilities that the
836          thread does not currently have).
837
838       4. The new effective set must be a subset of the new permitted set.
839
840   The securebits flags: establishing a capabilities-only environment
841       Starting with kernel 2.6.26, and with a kernel in which file  capabili‐
842       ties are enabled, Linux implements a set of per-thread securebits flags
843       that can be used to disable special handling of capabilities for UID  0
844       (root).  These flags are as follows:
845
846       SECBIT_KEEP_CAPS
847              Setting this flag allows a thread that has one or more 0 UIDs to
848              retain capabilities in its permitted set when it switches all of
849              its  UIDs to nonzero values.  If this flag is not set, then such
850              a UID switch causes the thread to lose all  permitted  capabili‐
851              ties.  This flag is always cleared on an execve(2).
852
853              Note that even with the SECBIT_KEEP_CAPS flag set, the effective
854              capabilities of a thread are cleared when it switches its effec‐
855              tive  UID  to  a  nonzero value.  However, if the thread has set
856              this flag and its effective UID  is  already  nonzero,  and  the
857              thread  subsequently  switches all other UIDs to nonzero values,
858              then the effective capabilities will not be cleared.
859
860              The setting of the  SECBIT_KEEP_CAPS  flag  is  ignored  if  the
861              SECBIT_NO_SETUID_FIXUP flag is set.  (The latter flag provides a
862              superset of the effect of the former flag.)
863
864              This flag provides the same functionality as the older  prctl(2)
865              PR_SET_KEEPCAPS operation.
866
867       SECBIT_NO_SETUID_FIXUP
868              Setting  this flag stops the kernel from adjusting the process's
869              permitted, effective,  and  ambient  capability  sets  when  the
870              thread's effective and filesystem UIDs are switched between zero
871              and nonzero values.  (See  the  subsection  Effect  of  user  ID
872              changes on capabilities.)
873
874       SECBIT_NOROOT
875              If  this bit is set, then the kernel does not grant capabilities
876              when a set-user-ID-root program is executed, or when  a  process
877              with  an  effective  or real UID of 0 calls execve(2).  (See the
878              subsection Capabilities and execution of programs by root.)
879
880       SECBIT_NO_CAP_AMBIENT_RAISE
881              Setting this flag disallows raising ambient capabilities via the
882              prctl(2) PR_CAP_AMBIENT_RAISE operation.
883
884       Each  of the above "base" flags has a companion "locked" flag.  Setting
885       any of the "locked" flags is irreversible, and has the effect  of  pre‐
886       venting  further  changes to the corresponding "base" flag.  The locked
887       flags  are:   SECBIT_KEEP_CAPS_LOCKED,   SECBIT_NO_SETUID_FIXUP_LOCKED,
888       SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
889
890       The  securebits  flags can be modified and retrieved using the prctl(2)
891       PR_SET_SECUREBITS and PR_GET_SECUREBITS  operations.   The  CAP_SETPCAP
892       capability  is  required  to  modify the flags.  Note that the SECBIT_*
893       constants are available only after including  the  <linux/securebits.h>
894       header file.
895
896       The  securebits  flags  are  inherited  by  child processes.  During an
897       execve(2), all of the  flags  are  preserved,  except  SECBIT_KEEP_CAPS
898       which is always cleared.
899
900       An  application  can  use the following call to lock itself, and all of
901       its descendants, into an environment where  the  only  way  of  gaining
902       capabilities  is  by executing a program with associated file capabili‐
903       ties:
904
905           prctl(PR_SET_SECUREBITS,
906                   /* SECBIT_KEEP_CAPS off */
907                   SECBIT_KEEP_CAPS_LOCKED |
908                   SECBIT_NO_SETUID_FIXUP |
909                   SECBIT_NO_SETUID_FIXUP_LOCKED |
910                   SECBIT_NOROOT |
911                   SECBIT_NOROOT_LOCKED);
912                   /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
913                      is not required */
914
915   Per-user-namespace "set-user-ID-root" programs
916       A set-user-ID program whose UID matches the UID  that  created  a  user
917       namespace  will  confer  capabilities  in  the  process's permitted and
918       effective sets when executed by any process inside  that  namespace  or
919       any descendant user namespace.
920
921       The rules about the transformation of the process's capabilities during
922       the execve(2) are exactly as described in the  subsections  Transforma‐
923       tion  of capabilities during execve() and Capabilities and execution of
924       programs by root, with the difference that, in the  latter  subsection,
925       "root" is the UID of the creator of the user namespace.
926
927   Namespaced file capabilities
928       Traditional (i.e., version 2) file capabilities associate only a set of
929       capability masks with a binary executable file.  When  a  process  exe‐
930       cutes a binary with such capabilities, it gains the associated capabil‐
931       ities (within its user namespace) as per the rules described  above  in
932       "Transformation of capabilities during execve()".
933
934       Because  version 2 file capabilities confer capabilities to the execut‐
935       ing process regardless of which user  namespace  it  resides  in,  only
936       privileged  processes  are  permitted  to associate capabilities with a
937       file.  Here, "privileged" means a  process  that  has  the  CAP_SETFCAP
938       capability in the user namespace where the filesystem was mounted (nor‐
939       mally the initial user namespace).  This limitation renders file  capa‐
940       bilities  useless  for  certain use cases.  For example, in user-names‐
941       paced containers, it can be desirable to be able  to  create  a  binary
942       that  confers  capabilities only to processes executed inside that con‐
943       tainer, but not to processes that are executed outside the container.
944
945       Linux 4.14 added so-called namespaced file capabilities to support such
946       use  cases.   Namespaced  file  capabilities  are recorded as version 3
947       (i.e.,  VFS_CAP_REVISION_3)  security.capability  extended  attributes.
948       Such  an  attribute  is  automatically  created  in  the  circumstances
949       described above under "File capability extended attribute  versioning".
950       When a version 3 security.capability extended attribute is created, the
951       kernel records not just the capability masks in the extended attribute,
952       but also the namespace root user ID.
953
954       As  with  a  binary  that  has  VFS_CAP_REVISION_2 file capabilities, a
955       binary with VFS_CAP_REVISION_3 file capabilities  confers  capabilities
956       to a process during execve().  However, capabilities are conferred only
957       if the binary is executed by a process that resides in a user namespace
958       whose  UID  0  maps  to  the root user ID that is saved in the extended
959       attribute, or when executed by a process that resides in  a  descendant
960       of such a namespace.
961
962   Interaction with user namespaces
963       For  further  information  on  the interaction of capabilities and user
964       namespaces, see user_namespaces(7).
965

CONFORMING TO

967       No standards govern capabilities, but the Linux capability  implementa‐
968       tion   is   based   on  the  withdrawn  POSIX.1e  draft  standard;  see
969       ⟨https://archive.org/details/posix_1003.1e-990310⟩.
970

NOTES

972       When attempting to strace(1) binaries that have capabilities  (or  set-
973       user-ID-root  binaries),  you may find the -u <username> option useful.
974       Something like:
975
976           $ sudo strace -o trace.log -u ceci ./myprivprog
977
978       From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
979       nel  component,  and  could  be  enabled/disabled  via the CONFIG_SECU‐
980       RITY_CAPABILITIES kernel configuration option.
981
982       The /proc/[pid]/task/TID/status file can be used to view the capability
983       sets  of  a  thread.   The /proc/[pid]/status file shows the capability
984       sets of a process's main thread.  Before Linux 3.8,  nonexistent  capa‐
985       bilities  were  shown  as being enabled (1) in these sets.  Since Linux
986       3.8, all nonexistent capabilities (above  CAP_LAST_CAP)  are  shown  as
987       disabled (0).
988
989       The libcap package provides a suite of routines for setting and getting
990       capabilities that is more comfortable and less likely  to  change  than
991       the  interface  provided by capset(2) and capget(2).  This package also
992       provides the setcap(8) and getcap(8) programs.  It can be found at
993       ⟨https://git.kernel.org/pub/scm/libs/libcap/libcap.git/refs/⟩.
994
995       Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32  if  file
996       capabilities  are not enabled, a thread with the CAP_SETPCAP capability
997       can manipulate the capabilities of threads other than itself.  However,
998       this is only theoretically possible, since no thread ever has CAP_SETP‐
999       CAP in either of these cases:
1000
1001       * In the pre-2.6.25 implementation the system-wide capability  bounding
1002         set,  /proc/sys/kernel/cap-bound,  always  masks  out the CAP_SETPCAP
1003         capability, and this can not be changed without modifying the  kernel
1004         source and rebuilding the kernel.
1005
1006       * If  file  capabilities  are  disabled  (i.e., the kernel CONFIG_SECU‐
1007         RITY_FILE_CAPABILITIES option is disabled), then init starts out with
1008         the CAP_SETPCAP capability removed from its per-process bounding set,
1009         and that bounding set is inherited by all other processes created  on
1010         the system.
1011

COLOPHON

1022       This  page  is  part of release 5.07 of the Linux man-pages project.  A
1023       description of the project, information about reporting bugs,  and  the
1024       latest     version     of     this    page,    can    be    found    at
1025       https://www.kernel.org/doc/man-pages/.
1026
1027
1028
1029Linux                             2019-08-02                   CAPABILITIES(7)

NAME

DESCRIPTION

CONFORMING TO

NOTES

SEE ALSO

COLOPHON