1Capabilities(7)        Miscellaneous Information Manual        Capabilities(7)


6       capabilities - overview of Linux capabilities


9       For  the  purpose of performing permission checks, traditional UNIX im‐
10       plementations distinguish two categories of processes: privileged  pro‐
11       cesses  (whose  effective  user  ID  is  0, referred to as superuser or
12       root), and unprivileged processes (whose  effective  UID  is  nonzero).
13       Privileged processes bypass all kernel permission checks, while unpriv‐
14       ileged processes are subject to full permission checking based  on  the
15       process's  credentials (usually: effective UID, effective GID, and sup‐
16       plementary group list).
18       Starting with Linux 2.2, Linux divides the privileges traditionally as‐
19       sociated  with  superuser  into  distinct units, known as capabilities,
20       which can be independently enabled and disabled.   Capabilities  are  a
21       per-thread attribute.
23   Capabilities list
24       The following list shows the capabilities implemented on Linux, and the
25       operations or behaviors that each capability permits:
27       CAP_AUDIT_CONTROL (since Linux 2.6.11)
28              Enable and  disable  kernel  auditing;  change  auditing  filter
29              rules; retrieve auditing status and filtering rules.
31       CAP_AUDIT_READ (since Linux 3.16)
32              Allow reading the audit log via a multicast netlink socket.
34       CAP_AUDIT_WRITE (since Linux 2.6.11)
35              Write records to kernel auditing log.
37       CAP_BLOCK_SUSPEND (since Linux 3.5)
38              Employ  features  that can block system suspend (epoll(7) EPOLL‐
39              WAKEUP, /proc/sys/wake_lock).
41       CAP_BPF (since Linux 5.8)
42              Employ privileged BPF operations; see bpf(2) and bpf-helpers(7).
44              This capability was added in Linux 5.8 to separate out BPF func‐
45              tionality from the overloaded CAP_SYS_ADMIN capability.
47       CAP_CHECKPOINT_RESTORE (since Linux 5.9)
48              •  Update /proc/sys/kernel/ns_last_pid (see pid_namespaces(7));
49              •  employ the set_tid feature of clone3(2);
50              •  read    the    contents    of    the    symbolic   links   in
51                 /proc/pid/map_files for other processes.
53              This capability was added in Linux 5.9 to  separate  out  check‐
54              point/restore  functionality  from  the overloaded CAP_SYS_ADMIN
55              capability.
57       CAP_CHOWN
58              Make arbitrary changes to file UIDs and GIDs (see chown(2)).
61              Bypass file read, write, and execute permission checks.  (DAC is
62              an abbreviation of "discretionary access control".)
65              •  Bypass file read permission checks and directory read and ex‐
66                 ecute permission checks;
67              •  invoke open_by_handle_at(2);
68              •  use the linkat(2) AT_EMPTY_PATH flag to create a  link  to  a
69                 file referred to by a file descriptor.
71       CAP_FOWNER
72              •  Bypass  permission checks on operations that normally require
73                 the filesystem UID of the process to match  the  UID  of  the
74                 file  (e.g.,  chmod(2), utime(2)), excluding those operations
75                 covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
76              •  set inode flags (see ioctl_iflags(2)) on arbitrary files;
77              •  set Access Control Lists (ACLs) on arbitrary files;
78              •  ignore directory sticky bit on file deletion;
79              •  modify user extended attributes on sticky directory owned  by
80                 any user;
81              •  specify  O_NOATIME  for  arbitrary  files  in open(2) and fc‐
82                 ntl(2).
84       CAP_FSETID
85              •  Don't clear set-user-ID and set-group-ID  mode  bits  when  a
86                 file is modified;
87              •  set  the set-group-ID bit for a file whose GID does not match
88                 the filesystem or any of the supplementary GIDs of the  call‐
89                 ing process.
91       CAP_IPC_LOCK
92              •  Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2));
93              •  Allocate  memory  using huge pages (memfd_create(2), mmap(2),
94                 shmctl(2)).
96       CAP_IPC_OWNER
97              Bypass permission checks for operations on System V IPC objects.
99       CAP_KILL
100              Bypass permission checks  for  sending  signals  (see  kill(2)).
101              This includes use of the ioctl(2) KDSIGACCEPT operation.
103       CAP_LEASE (since Linux 2.4)
104              Establish leases on arbitrary files (see fcntl(2)).
107              Set  the  FS_APPEND_FL  and  FS_IMMUTABLE_FL  inode  flags  (see
108              ioctl_iflags(2)).
110       CAP_MAC_ADMIN (since Linux 2.6.25)
111              Allow MAC configuration or state changes.  Implemented  for  the
112              Smack Linux Security Module (LSM).
114       CAP_MAC_OVERRIDE (since Linux 2.6.25)
115              Override  Mandatory  Access  Control (MAC).  Implemented for the
116              Smack LSM.
118       CAP_MKNOD (since Linux 2.4)
119              Create special files using mknod(2).
121       CAP_NET_ADMIN
122              Perform various network-related operations:
123              •  interface configuration;
124              •  administration of IP firewall, masquerading, and accounting;
125              •  modify routing tables;
126              •  bind to any address for transparent proxying;
127              •  set type-of-service (TOS);
128              •  clear driver statistics;
129              •  set promiscuous mode;
130              •  enabling multicasting;
131              •  use setsockopt(2) to set the following socket options: SO_DE‐
132                 BUG, SO_MARK, SO_PRIORITY (for a priority outside the range 0
133                 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
136              Bind a socket to Internet domain privileged ports (port  numbers
137              less than 1024).
140              (Unused)  Make socket broadcasts, and listen to multicasts.
142       CAP_NET_RAW
143              •  Use RAW and PACKET sockets;
144              •  bind to any address for transparent proxying.
146       CAP_PERFMON (since Linux 5.8)
147              Employ various performance-monitoring mechanisms, including:
149              •  call perf_event_open(2);
150              •  employ  various BPF operations that have performance implica‐
151                 tions.
153              This capability was added in Linux 5.8 to separate  out  perfor‐
154              mance monitoring functionality from the overloaded CAP_SYS_ADMIN
155              capability.  See also the kernel source  file  Documentation/ad‐
156              min-guide/perf-security.rst.
158       CAP_SETGID
159              •  Make  arbitrary  manipulations of process GIDs and supplemen‐
160                 tary GID list;
161              •  forge GID when passing socket  credentials  via  UNIX  domain
162                 sockets;
163              •  write  a group ID mapping in a user namespace (see user_name‐
164                 spaces(7)).
166       CAP_SETFCAP (since Linux 2.6.24)
167              Set arbitrary capabilities on a file.
169              Since Linux 5.12, this capability is also needed to map user  ID
170              0 in a new user namespace; see user_namespaces(7) for details.
172       CAP_SETPCAP
173              If  file  capabilities are supported (i.e., since Linux 2.6.24):
174              add any capability from the calling thread's bounding set to its
175              inheritable  set;  drop  capabilities from the bounding set (via
176              prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
178              If file capabilities  are  not  supported  (i.e.,  before  Linux
179              2.6.24):  grant or remove any capability in the caller's permit‐
180              ted capability set to or from any other process.  (This property
181              of CAP_SETPCAP is not available when the kernel is configured to
182              support file capabilities, since CAP_SETPCAP has  entirely  dif‐
183              ferent semantics for such kernels.)
185       CAP_SETUID
186              •  Make  arbitrary manipulations of process UIDs (setuid(2), se‐
187                 treuid(2), setresuid(2), setfsuid(2));
188              •  forge UID when passing socket  credentials  via  UNIX  domain
189                 sockets;
190              •  write  a  user ID mapping in a user namespace (see user_name‐
191                 spaces(7)).
193       CAP_SYS_ADMIN
194              Note: this capability is overloaded; see Notes to kernel  devel‐
195              opers below.
197              •  Perform  a  range of system administration operations includ‐
198                 ing:   quotactl(2),   mount(2),   umount(2),   pivot_root(2),
199                 swapon(2), swapoff(2), sethostname(2), and setdomainname(2);
200              •  perform  privileged syslog(2) operations (since Linux 2.6.37,
201                 CAP_SYSLOG should be used to permit such operations);
202              •  perform VM86_REQUEST_IRQ vm86(2) command;
203              •  access the same checkpoint/restore functionality that is gov‐
204                 erned by CAP_CHECKPOINT_RESTORE (but the latter, weaker capa‐
205                 bility is preferred for accessing that functionality).
206              •  perform the same BPF operations as are  governed  by  CAP_BPF
207                 (but the latter, weaker capability is preferred for accessing
208                 that functionality).
209              •  employ the same performance monitoring mechanisms as are gov‐
210                 erned  by  CAP_PERFMON  (but the latter, weaker capability is
211                 preferred for accessing that functionality).
212              •  perform IPC_SET and IPC_RMID operations on arbitrary System V
213                 IPC objects;
214              •  override RLIMIT_NPROC resource limit;
215              •  perform  operations  on  trusted  and  security  extended at‐
216                 tributes (see xattr(7));
217              •  use lookup_dcookie(2);
218              •  use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux
219                 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
220              •  forge  PID  when  passing  socket credentials via UNIX domain
221                 sockets;
222              •  exceed /proc/sys/fs/file-max, the system-wide  limit  on  the
223                 number  of open files, in system calls that open files (e.g.,
224                 accept(2), execve(2), open(2), pipe(2));
225              •  employ CLONE_* flags that create new namespaces with clone(2)
226                 and  unshare(2)  (but,  since  Linux 3.8, creating user name‐
227                 spaces does not require any capability);
228              •  access privileged perf event information;
229              •  call setns(2) (requires CAP_SYS_ADMIN  in  the  target  name‐
230                 space);
231              •  call fanotify_init(2);
232              •  perform  privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
233                 operations;
234              •  perform madvise(2) MADV_HWPOISON operation;
235              •  employ the TIOCSTI ioctl(2) to insert characters into the in‐
236                 put  queue  of a terminal other than the caller's controlling
237                 terminal;
238              •  employ the obsolete nfsservctl(2) system call;
239              •  employ the obsolete bdflush(2) system call;
240              •  perform various privileged block-device ioctl(2) operations;
241              •  perform various privileged filesystem ioctl(2) operations;
242              •  perform privileged ioctl(2) operations on the /dev/random de‐
243                 vice (see random(4));
244              •  install  a  seccomp(2) filter without first having to set the
245                 no_new_privs thread attribute;
246              •  modify allow/deny rules for device control groups;
247              •  employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER  operation  to
248                 dump tracee's seccomp filters;
249              •  employ  the  ptrace(2) PTRACE_SETOPTIONS operation to suspend
250                 the tracee's seccomp  protections  (i.e.,  the  PTRACE_O_SUS‐
251                 PEND_SECCOMP flag);
252              •  perform administrative operations on many device drivers;
253              •  modify  autogroup  nice  values by writing to /proc/pid/auto‐
254                 group (see sched(7)).
256       CAP_SYS_BOOT
257              Use reboot(2) and kexec_load(2).
259       CAP_SYS_CHROOT
260              •  Use chroot(2);
261              •  change mount namespaces using setns(2).
263       CAP_SYS_MODULE
264              •  Load  and  unload  kernel  modules  (see  init_module(2)  and
265                 delete_module(2));
266              •  before  Linux  2.6.25: drop capabilities from the system-wide
267                 capability bounding set.
269       CAP_SYS_NICE
270              •  Lower the process nice value  (nice(2),  setpriority(2))  and
271                 change the nice value for arbitrary processes;
272              •  set  real-time  scheduling  policies for calling process, and
273                 set scheduling policies and  priorities  for  arbitrary  pro‐
274                 cesses  (sched_setscheduler(2),  sched_setparam(2), sched_se‐
275                 tattr(2));
276              •  set CPU affinity  for  arbitrary  processes  (sched_setaffin‐
277                 ity(2));
278              •  set I/O scheduling class and priority for arbitrary processes
279                 (ioprio_set(2));
280              •  apply migrate_pages(2) to arbitrary processes and allow  pro‐
281                 cesses to be migrated to arbitrary nodes;
282              •  apply move_pages(2) to arbitrary processes;
283              •  use    the    MPOL_MF_MOVE_ALL   flag   with   mbind(2)   and
284                 move_pages(2).
286       CAP_SYS_PACCT
287              Use acct(2).
289       CAP_SYS_PTRACE
290              •  Trace arbitrary processes using ptrace(2);
291              •  apply get_robust_list(2) to arbitrary processes;
292              •  transfer data to or from the memory  of  arbitrary  processes
293                 using process_vm_readv(2) and process_vm_writev(2);
294              •  inspect processes using kcmp(2).
296       CAP_SYS_RAWIO
297              •  Perform I/O port operations (iopl(2) and ioperm(2));
298              •  access /proc/kcore;
299              •  employ the FIBMAP ioctl(2) operation;
300              •  open  devices  for  accessing  x86  model-specific  registers
301                 (MSRs, see msr(4));
302              •  update /proc/sys/vm/mmap_min_addr;
303              •  create memory mappings at addresses below the value specified
304                 by /proc/sys/vm/mmap_min_addr;
305              •  map files in /proc/bus/pci;
306              •  open /dev/mem and /dev/kmem;
307              •  perform various SCSI device commands;
308              •  perform certain operations on hpsa(4) and cciss(4) devices;
309              •  perform  a  range  of device-specific operations on other de‐
310                 vices.
313              •  Use reserved space on ext2 filesystems;
314              •  make ioctl(2) calls controlling ext3 journaling;
315              •  override disk quota limits;
316              •  increase resource limits (see setrlimit(2));
317              •  override RLIMIT_NPROC resource limit;
318              •  override maximum number of consoles on console allocation;
319              •  override maximum number of keymaps;
320              •  allow more than 64hz interrupts from the real-time clock;
321              •  raise msg_qbytes limit for a System V message queue above the
322                 limit   in  /proc/sys/kernel/msgmnb  (see  msgop(2)  and  ms‐
323                 gctl(2));
324              •  allow the RLIMIT_NOFILE resource limit on the number of  "in-
325                 flight" file descriptors to be bypassed when passing file de‐
326                 scriptors to another process via a UNIX  domain  socket  (see
327                 unix(7));
328              •  override  the  /proc/sys/fs/pipe-size-max  limit when setting
329                 the capacity of a pipe using the F_SETPIPE_SZ  fcntl(2)  com‐
330                 mand;
331              •  use F_SETPIPE_SZ to increase the capacity of a pipe above the
332                 limit specified by /proc/sys/fs/pipe-max-size;
333              •  override                      /proc/sys/fs/mqueue/queues_max,
334                 /proc/sys/fs/mqueue/msg_max,   and   /proc/sys/fs/mqueue/msg‐
335                 size_max limits  when  creating  POSIX  message  queues  (see
336                 mq_overview(7));
337              •  employ the prctl(2) PR_SET_MM operation;
338              •  set  /proc/pid/oom_score_adj  to a value lower than the value
339                 last set by a process with CAP_SYS_RESOURCE.
341       CAP_SYS_TIME
342              Set system clock (settimeofday(2), stime(2),  adjtimex(2));  set
343              real-time (hardware) clock.
346              Use vhangup(2); employ various privileged ioctl(2) operations on
347              virtual terminals.
349       CAP_SYSLOG (since Linux 2.6.37)
350              •  Perform privileged syslog(2) operations.  See  syslog(2)  for
351                 information on which operations require privilege.
352              •  View  kernel addresses exposed via /proc and other interfaces
353                 when /proc/sys/kernel/kptr_restrict has the  value  1.   (See
354                 the discussion of the kptr_restrict in proc(5).)
356       CAP_WAKE_ALARM (since Linux 3.0)
357              Trigger  something that will wake up the system (set CLOCK_REAL‐
358              TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
360   Past and current implementation
361       A full implementation of capabilities requires that:
363       •  For all privileged operations, the kernel  must  check  whether  the
364          thread has the required capability in its effective set.
366       •  The  kernel must provide system calls allowing a thread's capability
367          sets to be changed and retrieved.
369       •  The filesystem must support attaching capabilities to an  executable
370          file,  so  that  a process gains those capabilities when the file is
371          executed.
373       Before Linux 2.6.24, only the first two of these requirements are  met;
374       since Linux 2.6.24, all three requirements are met.
376   Notes to kernel developers
377       When  adding a new kernel feature that should be governed by a capabil‐
378       ity, consider the following points.
380       •  The goal of capabilities is  divide  the  power  of  superuser  into
381          pieces,  such that if a program that has one or more capabilities is
382          compromised, its power to do damage to the system would be less than
383          the same program running with root privilege.
385       •  You have the choice of either creating a new capability for your new
386          feature, or associating the feature with one of the  existing  capa‐
387          bilities.   In order to keep the set of capabilities to a manageable
388          size, the latter option is preferable, unless there  are  compelling
389          reasons  to  take  the  former  option.   (There is also a technical
390          limit: the size of capability sets is currently limited to 64 bits.)
392       •  To determine which existing capability might best be associated with
393          your  new feature, review the list of capabilities above in order to
394          find a "silo" into which your new feature best fits.   One  approach
395          to  take is to determine if there are other features requiring capa‐
396          bilities that will always be used along with the  new  feature.   If
397          the  new feature is useless without these other features, you should
398          use the same capability as the other features.
400Don't choose CAP_SYS_ADMIN if you can possibly  avoid  it!   A  vast
401          proportion  of  existing  capability checks are associated with this
402          capability (see the partial list above).  It can plausibly be called
403          "the  new  root",  since on the one hand, it confers a wide range of
404          powers, and on the other hand, its broad scope means  that  this  is
405          the  capability that is required by many privileged programs.  Don't
406          make the problem worse.  The only new features that should be  asso‐
407          ciated  with CAP_SYS_ADMIN are ones that closely match existing uses
408          in that silo.
410       •  If you have determined that it really is necessary to create  a  new
411          capability for your feature, don't make or name it as a "single-use"
412          capability.  Thus, for example, the addition of the highly  specific
413          CAP_SYS_PACCT  was probably a mistake.  Instead, try to identify and
414          name your new capability as a broader silo into which other  related
415          future use cases might fit.
417   Thread capability sets
418       Each  thread  has the following capability sets containing zero or more
419       of the above capabilities:
421       Permitted
422              This is a limiting superset for the effective capabilities  that
423              the  thread  may assume.  It is also a limiting superset for the
424              capabilities that may be added  to  the  inheritable  set  by  a
425              thread  that does not have the CAP_SETPCAP capability in its ef‐
426              fective set.
428              If a thread drops a capability from its permitted  set,  it  can
429              never  reacquire  that capability (unless it execve(2)s either a
430              set-user-ID-root program, or a program whose associated file ca‐
431              pabilities grant that capability).
433       Inheritable
434              This  is  a  set  of capabilities preserved across an execve(2).
435              Inheritable capabilities remain inheritable when  executing  any
436              program, and inheritable capabilities are added to the permitted
437              set when executing a program that has the corresponding bits set
438              in the file inheritable set.
440              Because  inheritable  capabilities  are  not generally preserved
441              across execve(2) when running as a non-root  user,  applications
442              that  wish  to  run  helper  programs with elevated capabilities
443              should consider using ambient capabilities, described below.
445       Effective
446              This is the set of capabilities used by the  kernel  to  perform
447              permission checks for the thread.
449       Bounding (per-thread since Linux 2.6.25)
450              The  capability  bounding set is a mechanism that can be used to
451              limit the capabilities that are gained during execve(2).
453              Since Linux 2.6.25, this is a  per-thread  capability  set.   In
454              older kernels, the capability bounding set was a system wide at‐
455              tribute shared by all threads on the system.
457              For more details, see Capability bounding set below.
459       Ambient (since Linux 4.3)
460              This is a set of capabilities that are preserved across  an  ex‐
461              ecve(2)  of a program that is not privileged.  The ambient capa‐
462              bility set obeys the invariant that no capability  can  ever  be
463              ambient if it is not both permitted and inheritable.
465              The  ambient  capability  set  can  be  directly  modified using
466              prctl(2).  Ambient capabilities are automatically lowered if ei‐
467              ther  of the corresponding permitted or inheritable capabilities
468              is lowered.
470              Executing a program that changes UID or GID due to the set-user-
471              ID or set-group-ID bits or executing a program that has any file
472              capabilities set will clear the ambient set.  Ambient  capabili‐
473              ties  are  added to the permitted set and assigned to the effec‐
474              tive set when execve(2)  is  called.   If  ambient  capabilities
475              cause  a  process's  permitted and effective capabilities to in‐
476              crease during an execve(2), this does not trigger the secure-ex‐
477              ecution mode described in ld.so(8).
479       A  child created via fork(2) inherits copies of its parent's capability
480       sets.  For details on how execve(2) affects capabilities, see Transfor‐
481       mation of capabilities during execve() below.
483       Using  capset(2),  a thread may manipulate its own capability sets; see
484       Programmatically adjusting capability sets below.
486       Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the nu‐
487       merical  value  of the highest capability supported by the running ker‐
488       nel; this can be used to determine the highest bit that may be set in a
489       capability set.
491   File capabilities
492       Since  Linux  2.6.24,  the  kernel supports associating capability sets
493       with an executable file using setcap(8).  The file capability sets  are
494       stored  in  an  extended attribute (see setxattr(2) and xattr(7)) named
495       security.capability.  Writing to this extended attribute  requires  the
496       CAP_SETFCAP  capability.  The file capability sets, in conjunction with
497       the capability sets of the thread,  determine  the  capabilities  of  a
498       thread after an execve(2).
500       The three file capability sets are:
502       Permitted (formerly known as forced):
503              These  capabilities  are  automatically permitted to the thread,
504              regardless of the thread's inheritable capabilities.
506       Inheritable (formerly known as allowed):
507              This set is ANDed with the thread's inheritable set to determine
508              which  inheritable capabilities are enabled in the permitted set
509              of the thread after the execve(2).
511       Effective:
512              This is not a set, but rather just a single bit.  If this bit is
513              set, then during an execve(2) all of the new permitted capabili‐
514              ties for the thread are also raised in the  effective  set.   If
515              this  bit  is  not set, then after an execve(2), none of the new
516              permitted capabilities is in the new effective set.
518              Enabling the file effective capability bit implies that any file
519              permitted  or inheritable capability that causes a thread to ac‐
520              quire the corresponding permitted capability during an execve(2)
521              (see  Transformation of capabilities during execve() below) will
522              also acquire that capability in its effective  set.   Therefore,
523              when    assigning    capabilities    to   a   file   (setcap(8),
524              cap_set_file(3), cap_set_fd(3)), if  we  specify  the  effective
525              flag  as  being  enabled  for any capability, then the effective
526              flag must also be specified as enabled for all  other  capabili‐
527              ties  for  which the corresponding permitted or inheritable flag
528              is enabled.
530   File capability extended attribute versioning
531       To allow extensibility, the kernel supports a scheme to encode  a  ver‐
532       sion  number  inside the security.capability extended attribute that is
533       used to implement file capabilities.  These version numbers are  inter‐
534       nal  to  the implementation, and not directly visible to user-space ap‐
535       plications.  To date, the following versions are supported:
537       VFS_CAP_REVISION_1
538              This was the original file capability implementation, which sup‐
539              ported 32-bit masks for file capabilities.
541       VFS_CAP_REVISION_2 (since Linux 2.6.25)
542              This  version  allows for file capability masks that are 64 bits
543              in size, and was necessary as the number of supported  capabili‐
544              ties grew beyond 32.  The kernel transparently continues to sup‐
545              port the execution of files that have 32-bit version 1  capabil‐
546              ity  masks,  but  when adding capabilities to files that did not
547              previously have capabilities, or modifying the  capabilities  of
548              existing  files,  it automatically uses the version 2 scheme (or
549              possibly the version 3 scheme, as described below).
551       VFS_CAP_REVISION_3 (since Linux 4.14)
552              Version 3 file capabilities are provided to  support  namespaced
553              file capabilities (described below).
555              As  with version 2 file capabilities, version 3 capability masks
556              are 64 bits in size.  But in addition, the root user ID of name‐
557              space  is encoded in the security.capability extended attribute.
558              (A namespace's root user ID is the value that user ID  0  inside
559              that namespace maps to in the initial user namespace.)
561              Version 3 file capabilities are designed to coexist with version
562              2 capabilities; that is, on a modern Linux system, there may  be
563              some files with version 2 capabilities while others have version
564              3 capabilities.
566       Before Linux 4.14, the only kind of file capability extended  attribute
567       that  could  be  attached to a file was a VFS_CAP_REVISION_2 attribute.
568       Since Linux 4.14, the version of the security.capability  extended  at‐
569       tribute  that  is  attached  to  a file depends on the circumstances in
570       which the attribute was created.
572       Starting with Linux 4.14, a security.capability extended  attribute  is
573       automatically  created  as (or converted to) a version 3 (VFS_CAP_REVI‐
574       SION_3) attribute if both of the following are true:
576       •  The thread writing the attribute resides in a noninitial user  name‐
577          space.   (More  precisely:  the  thread  resides in a user namespace
578          other  than  the  one  from  which  the  underlying  filesystem  was
579          mounted.)
581       •  The thread has the CAP_SETFCAP capability over the file inode, mean‐
582          ing that (a) the thread has the CAP_SETFCAP capability  in  its  own
583          user  namespace; and (b) the UID and GID of the file inode have map‐
584          pings in the writer's user namespace.
586       When a VFS_CAP_REVISION_3  security.capability  extended  attribute  is
587       created,  the  root  user ID of the creating thread's user namespace is
588       saved in the extended attribute.
590       By contrast, creating or modifying a security.capability  extended  at‐
591       tribute  from  a  privileged  (CAP_SETFCAP)  thread that resides in the
592       namespace where the underlying filesystem was  mounted  (this  normally
593       means the initial user namespace) automatically results in the creation
594       of a version 2 (VFS_CAP_REVISION_2) attribute.
596       Note that the creation of a version 3 security.capability extended  at‐
597       tribute  is  automatic.   That is to say, when a user-space application
598       writes (setxattr(2)) a security.capability attribute in the  version  2
599       format,  the  kernel will automatically create a version 3 attribute if
600       the attribute is created in the circumstances described above.   Corre‐
601       spondingly, when a version 3 security.capability attribute is retrieved
602       (getxattr(2)) by a process that resides inside a  user  namespace  that
603       was  created  by  the  root user ID (or a descendant of that user name‐
604       space), the returned attribute is (automatically) simplified to  appear
605       as  a  version  2  attribute (i.e., the returned value is the size of a
606       version 2 attribute and does not include the root user ID).  These  au‐
607       tomatic  translations  mean  that no changes are required to user-space
608       tools (e.g., setcap(1) and getcap(1)) in order for those  tools  to  be
609       used to create and retrieve version 3 security.capability attributes.
611       Note  that  a  file  can  have  either a version 2 or a version 3 secu‐
612       rity.capability extended attribute associated with it,  but  not  both:
613       creation  or modification of the security.capability extended attribute
614       will automatically modify the version according to the circumstances in
615       which the extended attribute is created or modified.
617   Transformation of capabilities during execve()
618       During  an execve(2), the kernel calculates the new capabilities of the
619       process using the following algorithm:
621           P'(ambient)     = (file is privileged) ? 0 : P(ambient)
623           P'(permitted)   = (P(inheritable) & F(inheritable)) |
624                             (F(permitted) & P(bounding)) | P'(ambient)
626           P'(effective)   = F(effective) ? P'(permitted) : P'(ambient)
628           P'(inheritable) = P(inheritable)    [i.e., unchanged]
630           P'(bounding)    = P(bounding)       [i.e., unchanged]
632       where:
634           P()    denotes the value of a thread capability set before the  ex‐
635                  ecve(2)
637           P'()   denotes  the  value of a thread capability set after the ex‐
638                  ecve(2)
640           F()    denotes a file capability set
642       Note the following details relating to the above capability transforma‐
643       tion rules:
645       •  The  ambient  capability  set is present only since Linux 4.3.  When
646          determining the transformation of the ambient set during  execve(2),
647          a  privileged file is one that has capabilities or has the set-user-
648          ID or set-group-ID bit set.
650       •  Prior to Linux 2.6.25, the bounding set was a system-wide  attribute
651          shared  by all threads.  That system-wide value was employed to cal‐
652          culate the new permitted set during execve(2) in the same manner  as
653          shown above for P(bounding).
655       Note: during the capability transitions described above, file capabili‐
656       ties may be ignored (treated as empty) for the same  reasons  that  the
657       set-user-ID and set-group-ID bits are ignored; see execve(2).  File ca‐
658       pabilities are similarly ignored if the  kernel  was  booted  with  the
659       no_file_caps option.
661       Note:  according to the rules above, if a process with nonzero user IDs
662       performs an execve(2) then any capabilities that  are  present  in  its
663       permitted and effective sets will be cleared.  For the treatment of ca‐
664       pabilities when a process with a user ID of zero performs an execve(2),
665       see Capabilities and execution of programs by root below.
667   Safety checking for capability-dumb binaries
668       A capability-dumb binary is an application that has been marked to have
669       file capabilities, but has not been converted to use the libcap(3)  API
670       to manipulate its capabilities.  (In other words, this is a traditional
671       set-user-ID-root program that has been switched to use  file  capabili‐
672       ties, but whose code has not been modified to understand capabilities.)
673       For such applications, the effective capability bit is set on the file,
674       so  that  the  file permitted capabilities are automatically enabled in
675       the process effective set when executing the file.  The  kernel  recog‐
676       nizes  a file which has the effective capability bit set as capability-
677       dumb for the purpose of the check described here.
679       When executing a capability-dumb  binary,  the  kernel  checks  if  the
680       process  obtained all permitted capabilities that were specified in the
681       file permitted set,  after  the  capability  transformations  described
682       above  have been performed.  (The typical reason why this might not oc‐
683       cur is that the capability bounding set masked out some of the capabil‐
684       ities  in  the  file permitted set.)  If the process did not obtain the
685       full set of file permitted capabilities, then execve(2) fails with  the
686       error  EPERM.   This  prevents possible security risks that could arise
687       when a capability-dumb application is executed with less privilege than
688       it  needs.   Note that, by definition, the application could not itself
689       recognize this problem, since it does not employ the libcap(3) API.
691   Capabilities and execution of programs by root
692       In order to mirror traditional UNIX semantics, the kernel performs spe‐
693       cial  treatment  of  file capabilities when a process with UID 0 (root)
694       executes a program and when a set-user-ID-root program is executed.
696       After having performed any changes to the  process  effective  ID  that
697       were  triggered by the set-user-ID mode bit of the binary—e.g., switch‐
698       ing the effective user ID to 0 (root) because a  set-user-ID-root  pro‐
699       gram  was  executed—the  kernel  calculates the file capability sets as
700       follows:
702       (1)  If the real or effective user ID of the process is 0 (root),  then
703            the  file inheritable and permitted sets are ignored; instead they
704            are notionally considered to be all ones (i.e.,  all  capabilities
705            enabled).   (There is one exception to this behavior, described in
706            Set-user-ID-root programs that have file capabilities below.)
708       (2)  If the effective user ID of the process is 0 (root)  or  the  file
709            effective  bit  is in fact enabled, then the file effective bit is
710            notionally defined to be one (enabled).
712       These notional values for the file's capability sets are then  used  as
713       described  above to calculate the transformation of the process's capa‐
714       bilities during execve(2).
716       Thus, when a process with nonzero UIDs  execve(2)s  a  set-user-ID-root
717       program  that  does  not  have capabilities attached, or when a process
718       whose real and effective UIDs are zero execve(2)s a program, the calcu‐
719       lation of the process's new permitted capabilities simplifies to:
721           P'(permitted)   = P(inheritable) | P(bounding)
723           P'(effective)   = P'(permitted)
725       Consequently,  the  process gains all capabilities in its permitted and
726       effective capability sets, except those masked out  by  the  capability
727       bounding  set.   (In  the calculation of P'(permitted), the P'(ambient)
728       term can be simplified away because it is by definition a proper subset
729       of P(inheritable).)
731       The special treatments of user ID 0 (root) described in this subsection
732       can be disabled using the securebits mechanism described below.
734   Set-user-ID-root programs that have file capabilities
735       There is one exception to the behavior described  in  Capabilities  and
736       execution  of  programs by root above.  If (a) the binary that is being
737       executed has capabilities attached and (b) the  real  user  ID  of  the
738       process is not 0 (root) and (c) the effective user ID of the process is
739       0 (root), then the file capability bits are honored (i.e., they are not
740       notionally  considered  to  be  all ones).  The usual way in which this
741       situation can arise is when executing a set-UID-root program that  also
742       has  file  capabilities.   When such a program is executed, the process
743       gains just the capabilities granted by the program (i.e., not all capa‐
744       bilities, as would occur when executing a set-user-ID-root program that
745       does not have any associated file capabilities).
747       Note that one can assign empty capability sets to a program  file,  and
748       thus  it  is possible to create a set-user-ID-root program that changes
749       the effective and saved set-user-ID of the process  that  executes  the
750       program to 0, but confers no capabilities to that process.
752   Capability bounding set
753       The capability bounding set is a security mechanism that can be used to
754       limit the capabilities that can be gained  during  an  execve(2).   The
755       bounding set is used in the following ways:
757       •  During  an  execve(2), the capability bounding set is ANDed with the
758          file permitted capability set, and the result of this  operation  is
759          assigned  to  the thread's permitted capability set.  The capability
760          bounding set thus places a limit on the permitted capabilities  that
761          may be granted by an executable file.
763       •  (Since  Linux 2.6.25) The capability bounding set acts as a limiting
764          superset for the capabilities that a thread can add to its inherita‐
765          ble  set using capset(2).  This means that if a capability is not in
766          the bounding set, then a thread can't add this capability to its in‐
767          heritable  set,  even  if  it was in its permitted capabilities, and
768          thereby cannot have this capability preserved in its  permitted  set
769          when it execve(2)s a file that has the capability in its inheritable
770          set.
772       Note that the bounding set masks the file permitted  capabilities,  but
773       not  the  inheritable capabilities.  If a thread maintains a capability
774       in its inheritable set that is not in its bounding  set,  then  it  can
775       still  gain  that  capability  in its permitted set by executing a file
776       that has the capability in its inheritable set.
778       Depending on the kernel version, the capability bounding set is  either
779       a system-wide attribute, or a per-process attribute.
781       Capability bounding set from Linux 2.6.25 onward
783       From  Linux  2.6.25, the capability bounding set is a per-thread attri‐
784       bute.  (The system-wide capability  bounding  set  described  below  no
785       longer exists.)
787       The  bounding set is inherited at fork(2) from the thread's parent, and
788       is preserved across an execve(2).
790       A thread may remove capabilities from its capability bounding set using
791       the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
792       capability.  Once a capability has been dropped from the bounding  set,
793       it  cannot  be restored to that set.  A thread can determine if a capa‐
794       bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
795       tion.
797       Removing  capabilities  from the bounding set is supported only if file
798       capabilities are compiled into the kernel.  Before Linux  2.6.33,  file
799       capabilities were an optional feature configurable via the CONFIG_SECU‐
800       RITY_FILE_CAPABILITIES option.  Since Linux 2.6.33,  the  configuration
801       option  has  been  removed and file capabilities are always part of the
802       kernel.  When file capabilities are compiled into the kernel, the  init
803       process  (the  ancestor  of  all processes) begins with a full bounding
804       set.  If file capabilities are not compiled into the kernel, then  init
805       begins  with  a full bounding set minus CAP_SETPCAP, because this capa‐
806       bility has a different meaning when there are no file capabilities.
808       Removing a capability from the bounding set does not remove it from the
809       thread's  inheritable set.  However it does prevent the capability from
810       being added back into the thread's inheritable set in the future.
812       Capability bounding set prior to Linux 2.6.25
814       Before Linux 2.6.25, the capability bounding set is a  system-wide  at‐
815       tribute  that  affects  all threads on the system.  The bounding set is
816       accessible via the file /proc/sys/kernel/cap-bound.  (Confusingly, this
817       bit  mask  parameter  is  expressed  as  a  signed  decimal  number  in
818       /proc/sys/kernel/cap-bound.)
820       Only the init process may set capabilities in the  capability  bounding
821       set; other than that, the superuser (more precisely: a process with the
822       CAP_SYS_MODULE capability) may only clear capabilities from this set.
824       On a standard system the capability bounding set always masks  out  the
825       CAP_SETPCAP  capability.  To remove this restriction (dangerous!), mod‐
826       ify the definition of  CAP_INIT_EFF_SET  in  include/linux/capability.h
827       and rebuild the kernel.
829       The  system-wide  capability  bounding  set  feature was added to Linux
830       2.2.11.
832   Effect of user ID changes on capabilities
833       To preserve the traditional semantics for  transitions  between  0  and
834       nonzero  user IDs, the kernel makes the following changes to a thread's
835       capability sets on changes to the thread's real, effective, saved  set,
836       and filesystem user IDs (using setuid(2), setresuid(2), or similar):
838       •  If  one  or  more  of the real, effective, or saved set user IDs was
839          previously 0, and as a result of the UID changes all  of  these  IDs
840          have  a  nonzero  value,  then all capabilities are cleared from the
841          permitted, effective, and ambient capability sets.
843       •  If the effective user ID is changed from 0 to nonzero, then all  ca‐
844          pabilities are cleared from the effective set.
846       •  If the effective user ID is changed from nonzero to 0, then the per‐
847          mitted set is copied to the effective set.
849       •  If the filesystem user ID is changed from 0 to  nonzero  (see  setf‐
850          suid(2)),  then  the following capabilities are cleared from the ef‐
851          fective  set:  CAP_CHOWN,   CAP_DAC_OVERRIDE,   CAP_DAC_READ_SEARCH,
852          CAP_FOWNER,  CAP_FSETID,  CAP_LINUX_IMMUTABLE  (since Linux 2.6.30),
853          CAP_MAC_OVERRIDE,  and  CAP_MKNOD  (since  Linux  2.6.30).   If  the
854          filesystem UID is changed from nonzero to 0, then any of these capa‐
855          bilities that are enabled in the permitted set are  enabled  in  the
856          effective set.
858       If a thread that has a 0 value for one or more of its user IDs wants to
859       prevent its permitted capability set being cleared when it  resets  all
860       of   its   user  IDs  to  nonzero  values,  it  can  do  so  using  the
861       SECBIT_KEEP_CAPS securebits flag described below.
863   Programmatically adjusting capability sets
864       A thread can retrieve and change its permitted, effective, and  inheri‐
865       table  capability  sets using the capget(2) and capset(2) system calls.
866       However, the use of cap_get_proc(3) and cap_set_proc(3), both  provided
867       in  the  libcap  package, is preferred for this purpose.  The following
868       rules govern changes to the thread capability sets:
870       •  If the caller does not have the CAP_SETPCAP capability, the new  in‐
871          heritable  set  must  be a subset of the combination of the existing
872          inheritable and permitted sets.
874       •  (Since Linux 2.6.25) The new inheritable set must be a subset of the
875          combination  of  the  existing  inheritable  set  and the capability
876          bounding set.
878       •  The new permitted set must be a subset of the existing permitted set
879          (i.e., it is not possible to acquire permitted capabilities that the
880          thread does not currently have).
882       •  The new effective set must be a subset of the new permitted set.
884   The securebits flags: establishing a capabilities-only environment
885       Starting with Linux 2.6.26, and with a kernel in which  file  capabili‐
886       ties are enabled, Linux implements a set of per-thread securebits flags
887       that can be used to disable special handling of capabilities for UID  0
888       (root).  These flags are as follows:
891              Setting this flag allows a thread that has one or more 0 UIDs to
892              retain capabilities in its permitted set when it switches all of
893              its  UIDs to nonzero values.  If this flag is not set, then such
894              a UID switch causes the thread to lose all  permitted  capabili‐
895              ties.  This flag is always cleared on an execve(2).
897              Note that even with the SECBIT_KEEP_CAPS flag set, the effective
898              capabilities of a thread are cleared when it switches its effec‐
899              tive  UID  to  a  nonzero value.  However, if the thread has set
900              this flag and its effective UID  is  already  nonzero,  and  the
901              thread  subsequently  switches all other UIDs to nonzero values,
902              then the effective capabilities will not be cleared.
904              The setting of the  SECBIT_KEEP_CAPS  flag  is  ignored  if  the
905              SECBIT_NO_SETUID_FIXUP flag is set.  (The latter flag provides a
906              superset of the effect of the former flag.)
908              This flag provides the same functionality as the older  prctl(2)
909              PR_SET_KEEPCAPS operation.
912              Setting  this flag stops the kernel from adjusting the process's
913              permitted, effective,  and  ambient  capability  sets  when  the
914              thread's effective and filesystem UIDs are switched between zero
915              and nonzero values.  See Effect of user ID changes on  capabili‐
916              ties above.
919              If  this bit is set, then the kernel does not grant capabilities
920              when a set-user-ID-root program is executed, or when  a  process
921              with  an effective or real UID of 0 calls execve(2).  (See Capa‐
922              bilities and execution of programs by root above.)
925              Setting this flag disallows raising ambient capabilities via the
926              prctl(2) PR_CAP_AMBIENT_RAISE operation.
928       Each  of the above "base" flags has a companion "locked" flag.  Setting
929       any of the "locked" flags is irreversible, and has the effect  of  pre‐
930       venting  further  changes to the corresponding "base" flag.  The locked
934       The  securebits  flags can be modified and retrieved using the prctl(2)
936       capability  is  required  to  modify the flags.  Note that the SECBIT_*
937       constants are available only after including  the  <linux/securebits.h>
938       header file.
940       The  securebits  flags are inherited by child processes.  During an ex‐
941       ecve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS  which
942       is always cleared.
944       An  application  can  use the following call to lock itself, and all of
945       its descendants, into an environment where the only way of gaining  ca‐
946       pabilities is by executing a program with associated file capabilities:
948           prctl(PR_SET_SECUREBITS,
949                   /* SECBIT_KEEP_CAPS off */
950                   SECBIT_KEEP_CAPS_LOCKED |
951                   SECBIT_NO_SETUID_FIXUP |
952                   SECBIT_NO_SETUID_FIXUP_LOCKED |
953                   SECBIT_NOROOT |
954                   SECBIT_NOROOT_LOCKED);
955                   /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
956                      is not required */
958   Per-user-namespace "set-user-ID-root" programs
959       A  set-user-ID  program  whose  UID matches the UID that created a user
960       namespace will confer capabilities in the process's permitted  and  ef‐
961       fective  sets when executed by any process inside that namespace or any
962       descendant user namespace.
964       The rules about the transformation of the process's capabilities during
965       the  execve(2)  are exactly as described in Transformation of capabili‐
966       ties during execve() and Capabilities and execution of programs by root
967       above,  with  the  difference that, in the latter subsection, "root" is
968       the UID of the creator of the user namespace.
970   Namespaced file capabilities
971       Traditional (i.e., version 2) file capabilities associate only a set of
972       capability  masks  with  a binary executable file.  When a process exe‐
973       cutes a binary with such capabilities, it gains the associated capabil‐
974       ities  (within its user namespace) as per the rules described in Trans‐
975       formation of capabilities during execve() above.
977       Because version 2 file capabilities confer capabilities to the  execut‐
978       ing  process  regardless  of  which  user namespace it resides in, only
979       privileged processes are permitted to  associate  capabilities  with  a
980       file.   Here, "privileged" means a process that has the CAP_SETFCAP ca‐
981       pability in the user namespace where the filesystem was  mounted  (nor‐
982       mally  the initial user namespace).  This limitation renders file capa‐
983       bilities useless for certain use cases.  For  example,  in  user-names‐
984       paced  containers,  it  can  be desirable to be able to create a binary
985       that confers capabilities only to processes executed inside  that  con‐
986       tainer, but not to processes that are executed outside the container.
988       Linux 4.14 added so-called namespaced file capabilities to support such
989       use cases.  Namespaced file capabilities  are  recorded  as  version  3
990       (i.e.,  VFS_CAP_REVISION_3)  security.capability  extended  attributes.
991       Such an attribute is automatically created  in  the  circumstances  de‐
992       scribed in File capability extended attribute versioning above.  When a
993       version 3 security.capability extended attribute is created, the kernel
994       records  not  just  the capability masks in the extended attribute, but
995       also the namespace root user ID.
997       As with a binary that has VFS_CAP_REVISION_2 file capabilities,  a  bi‐
998       nary  with VFS_CAP_REVISION_3 file capabilities confers capabilities to
999       a process during execve().  However, capabilities are conferred only if
1000       the  binary  is  executed by a process that resides in a user namespace
1001       whose UID 0 maps to the root user ID that is saved in the extended  at‐
1002       tribute,  or when executed by a process that resides in a descendant of
1003       such a namespace.
1005   Interaction with user namespaces
1006       For further information on the interaction  of  capabilities  and  user
1007       namespaces, see user_namespaces(7).


1010       No  standards govern capabilities, but the Linux capability implementa‐
1011       tion   is   based   on   the   withdrawn   POSIX.1e   draft    standard


1015       When  attempting  to strace(1) binaries that have capabilities (or set-
1016       user-ID-root binaries), you may find the -u <username>  option  useful.
1017       Something like:
1019           $ sudo strace -o trace.log -u ceci ./myprivprog
1021       From Linux 2.5.27 to Linux 2.6.26, capabilities were an optional kernel
1022       component, and could be enabled/disabled via the  CONFIG_SECURITY_CAPA‐
1023       BILITIES kernel configuration option.
1025       The  /proc/pid/task/TID/status  file can be used to view the capability
1026       sets of a thread.  The /proc/pid/status file shows the capability  sets
1027       of a process's main thread.  Before Linux 3.8, nonexistent capabilities
1028       were shown as being enabled (1) in these sets.  Since  Linux  3.8,  all
1029       nonexistent  capabilities  (above  CAP_LAST_CAP)  are shown as disabled
1030       (0).
1032       The libcap package provides a suite of routines for setting and getting
1033       capabilities  that  is  more comfortable and less likely to change than
1034       the interface provided by capset(2) and capget(2).  This  package  also
1035       provides the setcap(8) and getcap(8) programs.  It can be found at
1038       Before  Linux 2.6.24, and from Linux 2.6.24 to Linux 2.6.32 if file ca‐
1039       pabilities are not enabled, a thread with  the  CAP_SETPCAP  capability
1040       can manipulate the capabilities of threads other than itself.  However,
1041       this is only theoretically possible, since no thread ever has CAP_SETP‐
1042       CAP in either of these cases:
1044       •  In the pre-2.6.25 implementation the system-wide capability bounding
1045          set, /proc/sys/kernel/cap-bound, always masks  out  the  CAP_SETPCAP
1046          capability, and this can not be changed without modifying the kernel
1047          source and rebuilding the kernel.
1049       •  If file capabilities are disabled  (i.e.,  the  kernel  CONFIG_SECU‐
1050          RITY_FILE_CAPABILITIES  option  is  disabled),  then init starts out
1051          with the CAP_SETPCAP capability removed from its per-process  bound‐
1052          ing  set,  and that bounding set is inherited by all other processes
1053          created on the system.


1056       capsh(1),    setpriv(1),    prctl(2),    setfsuid(2),     cap_clear(3),
1057       cap_copy_ext(3),  cap_from_text(3),  cap_get_file(3),  cap_get_proc(3),
1058       cap_init(3),  capgetp(3),  capsetp(3),  libcap(3),   proc(5),   creden‐
1059       tials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), get‐
1060       cap(8), getpcaps(8), netcap(8), pscap(8), setcap(8)
1062       include/linux/capability.h in the Linux kernel source tree
1066Linux man-pages 6.05              2023-05-03                   Capabilities(7)