capabilities(7)

1CAPABILITIES(7)            Linux Programmer's Manual           CAPABILITIES(7)
2
3
4

NAME

6       capabilities - overview of Linux capabilities
7

DESCRIPTION

9       For  the  purpose  of  performing  permission  checks, traditional UNIX
10       implementations distinguish two  categories  of  processes:  privileged
11       processes  (whose  effective  user ID is 0, referred to as superuser or
12       root), and unprivileged processes (whose  effective  UID  is  nonzero).
13       Privileged processes bypass all kernel permission checks, while unpriv‐
14       ileged processes are subject to full permission checking based  on  the
15       process's  credentials (usually: effective UID, effective GID, and sup‐
16       plementary group list).
17
18       Starting with kernel 2.2, Linux divides  the  privileges  traditionally
19       associated  with  superuser into distinct units, known as capabilities,
20       which can be independently enabled and disabled.   Capabilities  are  a
21       per-thread attribute.
22
23   Capabilities list
24       The following list shows the capabilities implemented on Linux, and the
25       operations or behaviors that each capability permits:
26
27       CAP_AUDIT_CONTROL (since Linux 2.6.11)
28              Enable and  disable  kernel  auditing;  change  auditing  filter
29              rules; retrieve auditing status and filtering rules.
30
31       CAP_AUDIT_READ (since Linux 3.16)
32              Allow reading the audit log via a multicast netlink socket.
33
34       CAP_AUDIT_WRITE (since Linux 2.6.11)
35              Write records to kernel auditing log.
36
37       CAP_BLOCK_SUSPEND (since Linux 3.5)
38              Employ  features  that can block system suspend (epoll(7) EPOLL‐
39              WAKEUP, /proc/sys/wake_lock).
40
41       CAP_CHOWN
42              Make arbitrary changes to file UIDs and GIDs (see chown(2)).
43
44       CAP_DAC_OVERRIDE
45              Bypass file read, write, and execute permission checks.  (DAC is
46              an abbreviation of "discretionary access control".)
47
48       CAP_DAC_READ_SEARCH
49              * Bypass file read permission checks and directory read and exe‐
50                cute permission checks;
51              * invoke open_by_handle_at(2);
52              * use the linkat(2) AT_EMPTY_PATH flag to create  a  link  to  a
53                file referred to by a file descriptor.
54
55       CAP_FOWNER
56              * Bypass  permission  checks on operations that normally require
57                the filesystem UID of the process to match the UID of the file
58                (e.g., chmod(2), utime(2)), excluding those operations covered
59                by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH;
60              * set inode flags (see ioctl_iflags(2)) on arbitrary files;
61              * set Access Control Lists (ACLs) on arbitrary files;
62              * ignore directory sticky bit on file deletion;
63              * specify O_NOATIME for arbitrary files in open(2) and fcntl(2).
64
65       CAP_FSETID
66              * Don't clear set-user-ID and set-group-ID mode bits when a file
67                is modified;
68              * set  the  set-group-ID bit for a file whose GID does not match
69                the filesystem or any of the supplementary GIDs of the calling
70                process.
71
72       CAP_IPC_LOCK
73              Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
74
75       CAP_IPC_OWNER
76              Bypass permission checks for operations on System V IPC objects.
77
78       CAP_KILL
79              Bypass  permission  checks  for  sending  signals (see kill(2)).
80              This includes use of the ioctl(2) KDSIGACCEPT operation.
81
82       CAP_LEASE (since Linux 2.4)
83              Establish leases on arbitrary files (see fcntl(2)).
84
85       CAP_LINUX_IMMUTABLE
86              Set  the  FS_APPEND_FL  and  FS_IMMUTABLE_FL  inode  flags  (see
87              ioctl_iflags(2)).
88
89       CAP_MAC_ADMIN (since Linux 2.6.25)
90              Allow  MAC  configuration or state changes.  Implemented for the
91              Smack Linux Security Module (LSM).
92
93       CAP_MAC_OVERRIDE (since Linux 2.6.25)
94              Override Mandatory Access Control (MAC).   Implemented  for  the
95              Smack LSM.
96
97       CAP_MKNOD (since Linux 2.4)
98              Create special files using mknod(2).
99
100       CAP_NET_ADMIN
101              Perform various network-related operations:
102              * interface configuration;
103              * administration of IP firewall, masquerading, and accounting;
104              * modify routing tables;
105              * bind to any address for transparent proxying;
106              * set type-of-service (TOS)
107              * clear driver statistics;
108              * set promiscuous mode;
109              * enabling multicasting;
110              * use   setsockopt(2)  to  set  the  following  socket  options:
111                SO_DEBUG, SO_MARK, SO_PRIORITY (for  a  priority  outside  the
112                range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.
113
114       CAP_NET_BIND_SERVICE
115              Bind  a socket to Internet domain privileged ports (port numbers
116              less than 1024).
117
118       CAP_NET_BROADCAST
119              (Unused)  Make socket broadcasts, and listen to multicasts.
120
121       CAP_NET_RAW
122              * Use RAW and PACKET sockets;
123              * bind to any address for transparent proxying.
124
125       CAP_SETGID
126              * Make arbitrary manipulations of process GIDs and supplementary
127                GID list;
128              * forge  GID  when  passing  socket  credentials via UNIX domain
129                sockets;
130              * write a group ID mapping in a user namespace (see  user_names‐
131                paces(7)).
132
133       CAP_SETFCAP (since Linux 2.6.24)
134              Set arbitrary capabilities on a file.
135
136       CAP_SETPCAP
137              If  file  capabilities are supported (i.e., since Linux 2.6.24):
138              add any capability from the calling thread's bounding set to its
139              inheritable  set;  drop  capabilities from the bounding set (via
140              prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags.
141
142              If file capabilities are not  supported  (i.e.,  kernels  before
143              Linux  2.6.24):  grant  or remove any capability in the caller's
144              permitted capability set to or from any  other  process.   (This
145              property of CAP_SETPCAP is not available when the kernel is con‐
146              figured to support  file  capabilities,  since  CAP_SETPCAP  has
147              entirely different semantics for such kernels.)
148
149       CAP_SETUID
150              * Make  arbitrary  manipulations  of  process  UIDs  (setuid(2),
151                setreuid(2), setresuid(2), setfsuid(2));
152              * forge UID when passing  socket  credentials  via  UNIX  domain
153                sockets;
154              * write  a  user ID mapping in a user namespace (see user_names‐
155                paces(7)).
156
157       CAP_SYS_ADMIN
158              Note: this capability is overloaded; see Notes to kernel  devel‐
159              opers, below.
160
161              * Perform a range of system administration operations including:
162                quotactl(2),  mount(2),  umount(2),   swapon(2),   swapoff(2),
163                sethostname(2), and setdomainname(2);
164              * perform  privileged  syslog(2) operations (since Linux 2.6.37,
165                CAP_SYSLOG should be used to permit such operations);
166              * perform VM86_REQUEST_IRQ vm86(2) command;
167              * perform IPC_SET and IPC_RMID operations on arbitrary System  V
168                IPC objects;
169              * override RLIMIT_NPROC resource limit;
170              * perform operations on trusted and security Extended Attributes
171                (see xattr(7));
172              * use lookup_dcookie(2);
173              * use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before  Linux
174                2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
175              * forge  PID  when  passing  socket  credentials via UNIX domain
176                sockets;
177              * exceed /proc/sys/fs/file-max, the  system-wide  limit  on  the
178                number  of  open files, in system calls that open files (e.g.,
179                accept(2), execve(2), open(2), pipe(2));
180              * employ CLONE_* flags that create new namespaces with  clone(2)
181                and unshare(2) (but, since Linux 3.8, creating user namespaces
182                does not require any capability);
183              * call perf_event_open(2);
184              * access privileged perf event information;
185              * call setns(2) (requires CAP_SYS_ADMIN  in  the  target  names‐
186                pace);
187              * call fanotify_init(2);
188              * call bpf(2);
189              * perform  privileged  KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2)
190                operations;
191              * perform madvise(2) MADV_HWPOISON operation;
192              * employ the TIOCSTI ioctl(2)  to  insert  characters  into  the
193                input  queue of a terminal other than the caller's controlling
194                terminal;
195              * employ the obsolete nfsservctl(2) system call;
196              * employ the obsolete bdflush(2) system call;
197              * perform various privileged block-device ioctl(2) operations;
198              * perform various privileged filesystem ioctl(2) operations;
199              * perform privileged  ioctl(2)  operations  on  the  /dev/random
200                device (see random(4));
201              * install  a  seccomp(2)  filter without first having to set the
202                no_new_privs thread attribute;
203              * modify allow/deny rules for device control groups;
204              * employ the ptrace(2)  PTRACE_SECCOMP_GET_FILTER  operation  to
205                dump tracee's seccomp filters;
206              * employ  the  ptrace(2)  PTRACE_SETOPTIONS operation to suspend
207                the tracee's  seccomp  protections  (i.e.,  the  PTRACE_O_SUS‐
208                PEND_SECCOMP flag);
209              * perform administrative operations on many device drivers.
210
211       CAP_SYS_BOOT
212              Use reboot(2) and kexec_load(2).
213
214       CAP_SYS_CHROOT
215              Use chroot(2).
216
217       CAP_SYS_MODULE
218              * Load   and  unload  kernel  modules  (see  init_module(2)  and
219                delete_module(2));
220              * in kernels before 2.6.25: drop capabilities from  the  system-
221                wide capability bounding set.
222
223       CAP_SYS_NICE
224              * Raise  process nice value (nice(2), setpriority(2)) and change
225                the nice value for arbitrary processes;
226              * set real-time scheduling policies for calling process, and set
227                scheduling  policies  and  priorities  for arbitrary processes
228                (sched_setscheduler(2), sched_setparam(2), shed_setattr(2));
229              * set CPU  affinity  for  arbitrary  processes  (sched_setaffin‐
230                ity(2));
231              * set  I/O scheduling class and priority for arbitrary processes
232                (ioprio_set(2));
233              * apply migrate_pages(2) to arbitrary processes and  allow  pro‐
234                cesses to be migrated to arbitrary nodes;
235              * apply move_pages(2) to arbitrary processes;
236              * use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).
237
238       CAP_SYS_PACCT
239              Use acct(2).
240
241       CAP_SYS_PTRACE
242              * Trace arbitrary processes using ptrace(2);
243              * apply get_robust_list(2) to arbitrary processes;
244              * transfer  data  to  or  from the memory of arbitrary processes
245                using process_vm_readv(2) and process_vm_writev(2);
246              * inspect processes using kcmp(2).
247
248       CAP_SYS_RAWIO
249              * Perform I/O port operations (iopl(2) and ioperm(2));
250              * access /proc/kcore;
251              * employ the FIBMAP ioctl(2) operation;
252              * open devices for accessing x86 model-specific registers (MSRs,
253                see msr(4));
254              * update /proc/sys/vm/mmap_min_addr;
255              * create  memory mappings at addresses below the value specified
256                by /proc/sys/vm/mmap_min_addr;
257              * map files in /proc/bus/pci;
258              * open /dev/mem and /dev/kmem;
259              * perform various SCSI device commands;
260              * perform certain operations on hpsa(4) and cciss(4) devices;
261              * perform  a  range  of  device-specific  operations  on   other
262                devices.
263
264       CAP_SYS_RESOURCE
265              * Use reserved space on ext2 filesystems;
266              * make ioctl(2) calls controlling ext3 journaling;
267              * override disk quota limits;
268              * increase resource limits (see setrlimit(2));
269              * override RLIMIT_NPROC resource limit;
270              * override maximum number of consoles on console allocation;
271              * override maximum number of keymaps;
272              * allow more than 64hz interrupts from the real-time clock;
273              * raise  msg_qbytes limit for a System V message queue above the
274                limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
275              * allow the RLIMIT_NOFILE resource limit on the number  of  "in-
276                flight"  file  descriptors  to  be  bypassed when passing file
277                descriptors to another process via a UNIX domain  socket  (see
278                unix(7));
279              * override the /proc/sys/fs/pipe-size-max limit when setting the
280                capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command.
281              * use F_SETPIPE_SZ to increase the capacity of a pipe above  the
282                limit specified by /proc/sys/fs/pipe-max-size;
283              * override  /proc/sys/fs/mqueue/queues_max  limit  when creating
284                POSIX message queues (see mq_overview(7));
285              * employ the prctl(2) PR_SET_MM operation;
286              * set /proc/[pid]/oom_score_adj to a value lower than the  value
287                last set by a process with CAP_SYS_RESOURCE.
288
289       CAP_SYS_TIME
290              Set  system  clock (settimeofday(2), stime(2), adjtimex(2)); set
291              real-time (hardware) clock.
292
293       CAP_SYS_TTY_CONFIG
294              Use vhangup(2); employ various privileged ioctl(2) operations on
295              virtual terminals.
296
297       CAP_SYSLOG (since Linux 2.6.37)
298              * Perform  privileged  syslog(2)  operations.  See syslog(2) for
299                information on which operations require privilege.
300              * View kernel addresses exposed via /proc and  other  interfaces
301                when /proc/sys/kernel/kptr_restrict has the value 1.  (See the
302                discussion of the kptr_restrict in proc(5).)
303
304       CAP_WAKE_ALARM (since Linux 3.0)
305              Trigger something that will wake up the system (set  CLOCK_REAL‐
306              TIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
307
308   Past and current implementation
309       A full implementation of capabilities requires that:
310
311       1. For  all  privileged  operations,  the kernel must check whether the
312          thread has the required capability in its effective set.
313
314       2. The kernel must provide system calls allowing a thread's  capability
315          sets to be changed and retrieved.
316
317       3. The  filesystem must support attaching capabilities to an executable
318          file, so that a process gains those capabilities when  the  file  is
319          executed.
320
321       Before kernel 2.6.24, only the first two of these requirements are met;
322       since kernel 2.6.24, all three requirements are met.
323
324   Notes to kernel developers
325       When adding a new kernel feature that should be governed by a  capabil‐
326       ity, consider the following points.
327
328       *  The  goal  of  capabilities  is  divide  the power of superuser into
329          pieces, such that if a program that has one or more capabilities  is
330          compromised, its power to do damage to the system would be less than
331          the same program running with root privilege.
332
333       *  You have the choice of either creating a new capability for your new
334          feature,  or  associating the feature with one of the existing capa‐
335          bilities.  In order to keep the set of capabilities to a  manageable
336          size,  the  latter option is preferable, unless there are compelling
337          reasons to take the former  option.   (There  is  also  a  technical
338          limit: the size of capability sets is currently limited to 64 bits.)
339
340       *  To determine which existing capability might best be associated with
341          your new feature, review the list of capabilities above in order  to
342          find  a  "silo" into which your new feature best fits.  One approach
343          to take is to determine if there are other features requiring  capa‐
344          bilities that will always be use along with the new feature.  If the
345          new feature is useless without these other features, you should  use
346          the same capability as the other features.
347
348       *  Don't  choose  CAP_SYS_ADMIN  if  you can possibly avoid it!  A vast
349          proportion of existing capability checks are  associated  with  this
350          capability (see the partial list above).  It can plausibly be called
351          "the new root", since on the one hand, it confers a  wide  range  of
352          powers,  and  on  the other hand, its broad scope means that this is
353          the capability that is required by many privileged programs.   Don't
354          make  the problem worse.  The only new features that should be asso‐
355          ciated with CAP_SYS_ADMIN are ones that closely match existing  uses
356          in that silo.
357
358       *  If  you  have determined that it really is necessary to create a new
359          capability for your feature, don't make or name it as a "single-use"
360          capability.   Thus, for example, the addition of the highly specific
361          CAP_SYS_PACCT was probably a mistake.  Instead, try to identify  and
362          name  your new capability as a broader silo into which other related
363          future use cases might fit.
364
365   Thread capability sets
366       Each thread has three capability sets containing zero or  more  of  the
367       above capabilities:
368
369       Permitted:
370              This  is a limiting superset for the effective capabilities that
371              the thread may assume.  It is also a limiting superset  for  the
372              capabilities  that  may  be  added  to  the inheritable set by a
373              thread that does not have  the  CAP_SETPCAP  capability  in  its
374              effective set.
375
376              If  a  thread  drops a capability from its permitted set, it can
377              never reacquire that capability (unless it execve(2)s  either  a
378              set-user-ID-root  program,  or  a  program whose associated file
379              capabilities grant that capability).
380
381       Inheritable:
382              This is a set of capabilities  preserved  across  an  execve(2).
383              Inheritable  capabilities  remain inheritable when executing any
384              program, and inheritable capabilities are added to the permitted
385              set when executing a program that has the corresponding bits set
386              in the file inheritable set.
387
388              Because inheritable capabilities  are  not  generally  preserved
389              across  execve(2)  when running as a non-root user, applications
390              that wish to run  helper  programs  with  elevated  capabilities
391              should consider using ambient capabilities, described below.
392
393       Effective:
394              This  is  the  set of capabilities used by the kernel to perform
395              permission checks for the thread.
396
397       Ambient (since Linux 4.3):
398              This is a set of  capabilities  that  are  preserved  across  an
399              execve(2)  of  a  program  that  is not privileged.  The ambient
400              capability set obeys the invariant that no capability  can  ever
401              be ambient if it is not both permitted and inheritable.
402
403              The  ambient  capability  set  can  be  directly  modified using
404              prctl(2).  Ambient capabilities  are  automatically  lowered  if
405              either  of  the corresponding permitted or inheritable capabili‐
406              ties is lowered.
407
408              Executing a program that changes UID or GID due to the set-user-
409              ID or set-group-ID bits or executing a program that has any file
410              capabilities set will clear the ambient set.  Ambient  capabili‐
411              ties  are  added to the permitted set and assigned to the effec‐
412              tive set when execve(2) is called.
413
414       A child created via fork(2) inherits copies of its parent's  capability
415       sets.  See below for a discussion of the treatment of capabilities dur‐
416       ing execve(2).
417
418       Using capset(2), a thread may manipulate its own capability  sets  (see
419       below).
420
421       Since  Linux  3.2,  the  file /proc/sys/kernel/cap_last_cap exposes the
422       numerical value of the highest capability supported by the running ker‐
423       nel; this can be used to determine the highest bit that may be set in a
424       capability set.
425
426   File capabilities
427       Since kernel 2.6.24, the kernel supports  associating  capability  sets
428       with  an executable file using setcap(8).  The file capability sets are
429       stored in an extended attribute (see setxattr(2)  and  xattr(7))  named
430       security.capability.   Writing  to this extended attribute requires the
431       CAP_SETFCAP capability.  The file capability sets, in conjunction  with
432       the  capability  sets  of  the  thread, determine the capabilities of a
433       thread after an execve(2).
434
435       The three file capability sets are:
436
437       Permitted (formerly known as forced):
438              These capabilities are automatically permitted  to  the  thread,
439              regardless of the thread's inheritable capabilities.
440
441       Inheritable (formerly known as allowed):
442              This set is ANDed with the thread's inheritable set to determine
443              which inheritable capabilities are enabled in the permitted  set
444              of the thread after the execve(2).
445
446       Effective:
447              This is not a set, but rather just a single bit.  If this bit is
448              set, then during an execve(2) all of the new permitted capabili‐
449              ties  for  the  thread are also raised in the effective set.  If
450              this bit is not set, then after an execve(2), none  of  the  new
451              permitted capabilities is in the new effective set.
452
453              Enabling the file effective capability bit implies that any file
454              permitted or inheritable capability  that  causes  a  thread  to
455              acquire   the   corresponding  permitted  capability  during  an
456              execve(2) (see the transformation rules  described  below)  will
457              also  acquire  that capability in its effective set.  Therefore,
458              when   assigning   capabilities   to    a    file    (setcap(8),
459              cap_set_file(3),  cap_set_fd(3)),  if  we  specify the effective
460              flag as being enabled for any  capability,  then  the  effective
461              flag  must  also be specified as enabled for all other capabili‐
462              ties for which the corresponding permitted or inheritable  flags
463              is enabled.
464
465   File capability mask versioning
466       To  allow  extensibility, the kernel supports a scheme to encode a ver‐
467       sion number inside the security.capability extended attribute  that  is
468       used  to implement file capabilities.  These version numbers are inter‐
469       nal to the implementation,  and  not  directly  visible  to  user-space
470       applications.  To date, the following versions are supported:
471
472       VFS_CAP_REVISION_1
473              This was the original file capability implementation, which sup‐
474              ported 32-bit masks for file capabilities.
475
476       VFS_CAP_REVISION_2 (since Linux 2.6.25)
477              This version allows for file capability masks that are  64  bits
478              in  size, and was necessary as the number of supported capabili‐
479              ties grew beyond 32.  The kernel transparently continues to sup‐
480              port  the execution of files that have 32-bit version 1 capabil‐
481              ity masks, but when adding capabilities to files  that  did  not
482              previously  have  capabilities, or modifying the capabilities of
483              existing files, it automatically uses the version 2  scheme  (or
484              possibly the version 3 scheme, as described below).
485
486       VFS_CAP_REVISION_3 (since Linux 4.14)
487              Version  3  file capabilities are provided to support namespaced
488              file capabilities (described below).
489
490              As with version 2 file capabilities, version 3 capability  masks
491              are  64  bits  in  size.   But  in addition, the root user ID of
492              namespace  is  encoded  in  the   security.capability   extended
493              attribute.   (A  namespace's root user ID is the value that user
494              ID 0 inside that namespace maps to in the  initial  user  names‐
495              pace.)
496
497              Version 3 file capabilities are designed to coexist with version
498              2 capabilities; that is, on a modern Linux system, there may  be
499              some files with version 2 capabilities while others have version
500              3 capabilities.
501
502       Before Linux 4.14, the only kind  of  capability  mask  that  could  be
503       attached  to  a  file was a VFS_CAP_REVISION_2 mask.  Since Linux 4.14,
504       the version of the capability mask that is attached to a  file  depends
505       on   the   circumstances  in  which  the  security.capability  extended
506       attribute was created.
507
508       Starting with Linux 4.14, a security.capability extended  attribute  is
509       automatically  created  as (or converted to) a version 3 (VFS_CAP_REVI‐
510       SION_3) attribute if both of the following are true:
511
512       (1) The thread writing the attribute resides in a noninitial namespace.
513           (More  precisely: the thread resides in a user namespace other than
514           the one from which the underlying filesystem was mounted.)
515
516       (2) The thread has the CAP_SETFCAP  capability  over  the  file  inode,
517           meaning  that  (a) the thread has the CAP_SETFCAP capability in its
518           own user namespace; and (b) the UID and GID of the file inode  have
519           mappings in the writer's user namespace.
520
521       When  a  VFS_CAP_REVISION_3  security.capability  extended attribute is
522       created, the root user ID of the creating thread's  user  namespace  is
523       saved in the extended attribute.
524
525       By  contrast,  creating a security.capability extended attribute from a
526       privileged (CAP_SETFCAP) thread that resides in the namespace where the
527       underlying filesystem was mounted (this normally means the initial user
528       namespace) automatically results in a  version  2  (VFS_CAP_REVISION_2)
529       attribute.
530
531       Note  that  a  file  can  have  either a version 2 or a version 3 secu‐
532       rity.capability extended attribute associated with it,  but  not  both:
533       creation  or modification of the security.capability extended attribute
534       will automatically modify the version according to the circumstances in
535       which the extended attribute is created or modified.
536
537   Transformation of capabilities during execve()
538       During  an execve(2), the kernel calculates the new capabilities of the
539       process using the following algorithm:
540
541           P'(ambient)     = (file is privileged) ? 0 : P(ambient)
542
543           P'(permitted)   = (P(inheritable) & F(inheritable)) |
544                             (F(permitted) & cap_bset) | P'(ambient)
545
546           P'(effective)   = F(effective) ? P'(permitted) : P'(ambient)
547
548           P'(inheritable) = P(inheritable)    [i.e., unchanged]
549
550       where:
551
552           P         denotes the value of a thread capability set  before  the
553                     execve(2)
554
555           P'        denotes  the  value  of a thread capability set after the
556                     execve(2)
557
558           F         denotes a file capability set
559
560           cap_bset  is the value of the capability  bounding  set  (described
561                     below).
562
563       A  privileged  file is one that has capabilities or has the set-user-ID
564       or set-group-ID bit set.
565
566       Note: the capability transitions described above may not  be  performed
567       (i.e.,  file capabilities may be ignored) for the same reasons that the
568       set-user-ID and set-group-ID bits are ignored; see execve(2).
569
570       Note: according to the rules above, if a process with nonzero user  IDs
571       performs  an  execve(2)  then  any capabilities that are present in its
572       permitted and effective sets will be cleared.   For  the  treatment  of
573       capabilities  when  a  process  with  a  user  ID  of  zero performs an
574       execve(2), see below under Capabilities and execution  of  programs  by
575       root.
576
577   Safety checking for capability-dumb binaries
578       A capability-dumb binary is an application that has been marked to have
579       file capabilities, but has not been converted to use the libcap(3)  API
580       to manipulate its capabilities.  (In other words, this is a traditional
581       set-user-ID-root program that has been switched to use  file  capabili‐
582       ties, but whose code has not been modified to understand capabilities.)
583       For such applications, the effective capability bit is set on the file,
584       so  that  the  file permitted capabilities are automatically enabled in
585       the process effective set when executing the file.  The  kernel  recog‐
586       nizes  a file which has the effective capability bit set as capability-
587       dumb for the purpose of the check described here.
588
589       When executing a capability-dumb  binary,  the  kernel  checks  if  the
590       process  obtained all permitted capabilities that were specified in the
591       file permitted set,  after  the  capability  transformations  described
592       above  have  been  performed.   (The  typical reason why this might not
593       occur is that the capability bounding set masked out some of the  capa‐
594       bilities in the file permitted set.)  If the process did not obtain the
595       full set of file permitted capabilities, then execve(2) fails with  the
596       error  EPERM.   This  prevents possible security risks that could arise
597       when a capability-dumb application is executed with less privilege that
598       it  needs.   Note that, by definition, the application could not itself
599       recognize this problem, since it does not employ the libcap(3) API.
600
601   Capabilities and execution of programs by root
602       In order to provide an all-powerful root using capability sets,  during
603       an execve(2):
604
605       1. If  a  set-user-ID-root  program  is  being executed, or the real or
606          effective user ID of the process is 0 (root) then the file inherita‐
607          ble  and  permitted sets are defined to be all ones (i.e., all capa‐
608          bilities enabled).
609
610       2. If a set-user-ID-root program is being executed,  or  the  effective
611          user  ID  of  the process is 0 (root) then the file effective bit is
612          defined to be one (enabled).
613
614       The upshot of the above rules, combined with the capabilities transfor‐
615       mations described above, is as follows:
616
617       *  When  a  process  execve(2)s  a  set-user-ID-root program, or when a
618          process with an effective UID of 0 execve(2)s a  program,  it  gains
619          all  capabilities  in  its  permitted and effective capability sets,
620          except those masked out by the capability bounding set.
621
622       *  When a process with a real UID of 0 execve(2)s a program,  it  gains
623          all  capabilities  in  its  permitted  capability  set, except those
624          masked out by the capability bounding set.
625
626       The above steps yield semantics that are the same as those provided  by
627       traditional UNIX systems.
628
629   Set-user-ID-root programs that have file capabilities
630       Executing a program that is both set-user-ID root and has file capabil‐
631       ities will cause the process to gain just the capabilities  granted  by
632       the  program (i.e., not all capabilities, as would occur when executing
633       a set-user-ID-root program that does not have any associated file capa‐
634       bilities).  Note that one can assign empty capability sets to a program
635       file, and thus it is possible to create a set-user-ID-root program that
636       changes  the  effective  and saved set-user-ID of the process that exe‐
637       cutes the program to 0, but confers no capabilities to that process.
638
639   Capability bounding set
640       The capability bounding set is a security mechanism that can be used to
641       limit  the  capabilities  that  can be gained during an execve(2).  The
642       bounding set is used in the following ways:
643
644       * During an execve(2), the capability bounding set is  ANDed  with  the
645         file  permitted  capability  set, and the result of this operation is
646         assigned to the thread's permitted capability  set.   The  capability
647         bounding  set  thus places a limit on the permitted capabilities that
648         may be granted by an executable file.
649
650       * (Since Linux 2.6.25) The capability bounding set acts as  a  limiting
651         superset  for the capabilities that a thread can add to its inherita‐
652         ble set using capset(2).  This means that if a capability is  not  in
653         the  bounding  set,  then  a  thread can't add this capability to its
654         inheritable set, even if it was in its  permitted  capabilities,  and
655         thereby  cannot  have  this capability preserved in its permitted set
656         when it execve(2)s a file that has the capability in its  inheritable
657         set.
658
659       Note  that  the bounding set masks the file permitted capabilities, but
660       not the inheritable capabilities.  If a thread maintains  a  capability
661       in  its  inheritable  set  that is not in its bounding set, then it can
662       still gain that capability in its permitted set  by  executing  a  file
663       that has the capability in its inheritable set.
664
665       Depending  on the kernel version, the capability bounding set is either
666       a system-wide attribute, or a per-process attribute.
667
668       Capability bounding set prior to Linux 2.6.25
669
670       In kernels before 2.6.25, the capability bounding set is a  system-wide
671       attribute  that affects all threads on the system.  The bounding set is
672       accessible via the file /proc/sys/kernel/cap-bound.  (Confusingly, this
673       bit  mask  parameter  is  expressed  as  a  signed  decimal  number  in
674       /proc/sys/kernel/cap-bound.)
675
676       Only the init process may set capabilities in the  capability  bounding
677       set; other than that, the superuser (more precisely: a process with the
678       CAP_SYS_MODULE capability) may only clear capabilities from this set.
679
680       On a standard system the capability bounding set always masks  out  the
681       CAP_SETPCAP  capability.  To remove this restriction (dangerous!), mod‐
682       ify the definition of  CAP_INIT_EFF_SET  in  include/linux/capability.h
683       and rebuild the kernel.
684
685       The  system-wide  capability  bounding  set  feature was added to Linux
686       starting with kernel version 2.2.11.
687
688       Capability bounding set from Linux 2.6.25 onward
689
690       From  Linux  2.6.25,  the  capability  bounding  set  is  a  per-thread
691       attribute.  (There is no longer a system-wide capability bounding set.)
692
693       The  bounding set is inherited at fork(2) from the thread's parent, and
694       is preserved across an execve(2).
695
696       A thread may remove capabilities from its capability bounding set using
697       the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP
698       capability.  Once a capability has been dropped from the bounding  set,
699       it  cannot  be restored to that set.  A thread can determine if a capa‐
700       bility is in its bounding set using the prctl(2) PR_CAPBSET_READ opera‐
701       tion.
702
703       Removing  capabilities  from the bounding set is supported only if file
704       capabilities are compiled into the kernel.   In  kernels  before  Linux
705       2.6.33, file capabilities were an optional feature configurable via the
706       CONFIG_SECURITY_FILE_CAPABILITIES option.  Since Linux 2.6.33, the con‐
707       figuration  option  has  been  removed and file capabilities are always
708       part of the kernel.  When file capabilities are compiled into the  ker‐
709       nel,  the  init  process  (the ancestor of all processes) begins with a
710       full bounding set.  If file capabilities are not compiled into the ker‐
711       nel,  then  init  begins  with  a  full bounding set minus CAP_SETPCAP,
712       because this capability has a different meaning when there are no  file
713       capabilities.
714
715       Removing a capability from the bounding set does not remove it from the
716       thread's inheritable set.  However it does prevent the capability  from
717       being added back into the thread's inheritable set in the future.
718
719   Effect of user ID changes on capabilities
720       To  preserve  the  traditional  semantics for transitions between 0 and
721       nonzero user IDs, the kernel makes the following changes to a  thread's
722       capability  sets on changes to the thread's real, effective, saved set,
723       and filesystem user IDs (using setuid(2), setresuid(2), or similar):
724
725       1. If one or more of the real, effective or saved set user IDs was pre‐
726          viously  0, and as a result of the UID changes all of these IDs have
727          a nonzero value, then all capabilities are cleared from the  permit‐
728          ted, effective, and ambient capability sets.
729
730       2. If  the  effective  user  ID  is changed from 0 to nonzero, then all
731          capabilities are cleared from the effective set.
732
733       3. If the effective user ID is changed from nonzero to 0, then the per‐
734          mitted set is copied to the effective set.
735
736       4. If  the  filesystem  user ID is changed from 0 to nonzero (see setf‐
737          suid(2)), then the  following  capabilities  are  cleared  from  the
738          effective  set:  CAP_CHOWN,  CAP_DAC_OVERRIDE,  CAP_DAC_READ_SEARCH,
739          CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE  (since  Linux  2.6.30),
740          CAP_MAC_OVERRIDE,  and  CAP_MKNOD  (since  Linux  2.6.30).   If  the
741          filesystem UID is changed from nonzero to 0, then any of these capa‐
742          bilities  that  are  enabled in the permitted set are enabled in the
743          effective set.
744
745       If a thread that has a 0 value for one or more of its user IDs wants to
746       prevent  its  permitted capability set being cleared when it resets all
747       of  its  user  IDs  to  nonzero  values,  it  can  do  so   using   the
748       SECBIT_KEEP_CAPS securebits flag described below.
749
750   Programmatically adjusting capability sets
751       A  thread  can  retrieve  and  change  its  capability  sets  using the
752       capget(2)  and  capset(2)  system   calls.    However,   the   use   of
753       cap_get_proc(3)  and cap_set_proc(3), both provided in the libcap pack‐
754       age, is preferred for this purpose.  The following rules govern changes
755       to the thread capability sets:
756
757       1. If  the  caller  does  not  have the CAP_SETPCAP capability, the new
758          inheritable set must be a subset of the combination of the  existing
759          inheritable and permitted sets.
760
761       2. (Since Linux 2.6.25) The new inheritable set must be a subset of the
762          combination of the  existing  inheritable  set  and  the  capability
763          bounding set.
764
765       3. The new permitted set must be a subset of the existing permitted set
766          (i.e., it is not possible to acquire permitted capabilities that the
767          thread does not currently have).
768
769       4. The new effective set must be a subset of the new permitted set.
770
771   The securebits flags: establishing a capabilities-only environment
772       Starting  with kernel 2.6.26, and with a kernel in which file capabili‐
773       ties are enabled, Linux implements a set of per-thread securebits flags
774       that  can be used to disable special handling of capabilities for UID 0
775       (root).  These flags are as follows:
776
777       SECBIT_KEEP_CAPS
778              Setting this flag allows a thread that has one or more 0 UIDs to
779              retain  capabilities in its permitted and effective sets when it
780              switches all of its UIDs to nonzero values.  If this flag is not
781              set,  then such a UID switch causes the thread to lose all capa‐
782              bilities in those sets.  This  flag  is  always  cleared  on  an
783              execve(2).
784
785              The  setting  of  the  SECBIT_KEEP_CAPS  flag  is ignored if the
786              SECBIT_NO_SETUID_FIXUP flag is set.  (The latter flag provides a
787              superset of the effect of the former flag.)
788
789              This  flag provides the same functionality as the older prctl(2)
790              PR_SET_KEEPCAPS operation.
791
792       SECBIT_NO_SETUID_FIXUP
793              Setting this flag stops the kernel from adjusting the  process's
794              permitted,  effective,  and  ambient  capability  sets  when the
795              thread's effective and filesystem UIDs are switched between zero
796              and  nonzero  values.   (See  the  subsection  Effect of user ID
797              changes on capabilities.)
798
799       SECBIT_NOROOT
800              If this bit is set, then the kernel does not grant  capabilities
801              when  a  set-user-ID-root program is executed, or when a process
802              with an effective or real UID of 0 calls  execve(2).   (See  the
803              subsection Capabilities and execution of programs by root.)
804
805       SECBIT_NO_CAP_AMBIENT_RAISE
806              Setting this flag disallows raising ambient capabilities via the
807              prctl(2) PR_CAP_AMBIENT_RAISE operation.
808
809       Each of the above "base" flags has a companion "locked" flag.   Setting
810       any  of  the "locked" flags is irreversible, and has the effect of pre‐
811       venting further changes to the corresponding "base" flag.   The  locked
812       flags   are:   SECBIT_KEEP_CAPS_LOCKED,  SECBIT_NO_SETUID_FIXUP_LOCKED,
813       SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED.
814
815       The securebits flags can be modified and retrieved using  the  prctl(2)
816       PR_SET_SECUREBITS  and  PR_GET_SECUREBITS  operations.  The CAP_SETPCAP
817       capability is required to modify the flags.
818
819       The securebits flags are  inherited  by  child  processes.   During  an
820       execve(2),  all  of  the  flags  are preserved, except SECBIT_KEEP_CAPS
821       which is always cleared.
822
823       An application can use the following call to lock itself,  and  all  of
824       its  descendants,  into  an  environment  where the only way of gaining
825       capabilities is by executing a program with associated  file  capabili‐
826       ties:
827
828           prctl(PR_SET_SECUREBITS,
829                /* SECBIT_KEEP_CAPS off */
830                   SECBIT_KEEP_CAPS_LOCKED |
831                   SECBIT_NO_SETUID_FIXUP |
832                   SECBIT_NO_SETUID_FIXUP_LOCKED |
833                   SECBIT_NOROOT |
834                   SECBIT_NOROOT_LOCKED);
835                   /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
836                      is not required */
837
838   Interaction with user namespaces
839       For  a  discussion  of  the interaction of capabilities and user names‐
840       paces, see user_namespaces(7).
841
842   Namespaced file capabilities
843       Traditional (i.e., version 2) file capabilities associate only a set of
844       capability  masks  with  a binary executable file.  When a process exe‐
845       cutes a binary with such capabilities, it gains the associated capabil‐
846       ities  (within  its user namespace) as per the rules described above in
847       "Transformation of capabilities during execve()".
848
849       Because version 2 file capabilities confer capabilities to the  execut‐
850       ing  process  regardless  of  which  user namespace it resides in, only
851       privileged processes are permitted to  associate  capabilities  with  a
852       file.   Here,  "privileged"  means  a  process that has the CAP_SETFCAP
853       capability in the user namespace where the filesystem was mounted (nor‐
854       mally  the initial user namespace).  This limitation renders file capa‐
855       bilities useless for certain use cases.  For  example,  in  user-names‐
856       paced  containers,  it  can  be desirable to be able to create a binary
857       that confers capabilities only to processes executed inside  that  con‐
858       tainer, but not to processes that are executed outside the container.
859
860       Linux 4.14 added so-called namespaced file capabilities to support such
861       use cases.  Namespaced file capabilities  are  recorded  as  version  3
862       (i.e.,  VFS_CAP_REVISION_3)  security.capability  extended  attributes.
863       Such an attribute is automatically created when a process that  resides
864       in  a noninitial user namespace associates (setxattr(2)) file capabili‐
865       ties with a file whose user ID matches the user ID of  the  creator  of
866       the  namespace.  In this case, the kernel records not just the capabil‐
867       ity masks in the extended attribute, but also the namespace  root  user
868       ID.  For further details, see File capability mask versioning, above.
869
870       As  with  a  binary  that  has  VFS_CAP_REVISION_2 file capabilities, a
871       binary with VFS_CAP_REVISION_3 file capabilities  confers  capabilities
872       to a process during execve().  However, capabilities are conferred only
873       if the binary is executed by a process that resides in a user namespace
874       whose  UID  0  maps  to  the root user ID that is saved in the extended
875       attribute, or when executed by a process that resides in descendant  of
876       such a namespace.
877

CONFORMING TO

879       No  standards govern capabilities, but the Linux capability implementa‐
880       tion  is  based  on  the  withdrawn  POSIX.1e   draft   standard;   see
881       ⟨http://wt.tuxomania.net/publications/posix.1e/⟩.
882

NOTES

884       From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional ker‐
885       nel component, and  could  be  enabled/disabled  via  the  CONFIG_SECU‐
886       RITY_CAPABILITIES kernel configuration option.
887
888       The /proc/[pid]/task/TID/status file can be used to view the capability
889       sets of a thread.  The /proc/[pid]/status  file  shows  the  capability
890       sets  of  a process's main thread.  Before Linux 3.8, nonexistent capa‐
891       bilities were shown as being enabled (1) in these  sets.   Since  Linux
892       3.8,  all  nonexistent  capabilities  (above CAP_LAST_CAP) are shown as
893       disabled (0).
894
895       The libcap package provides a suite of routines for setting and getting
896       capabilities  that  is  more comfortable and less likely to change than
897       the interface provided by capset(2) and capget(2).  This  package  also
898       provides the setcap(8) and getcap(8) programs.  It can be found at
899       ⟨http://www.kernel.org/pub/linux/libs/security/linux-privs⟩.
900
901       Before  kernel  2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file
902       capabilities are not enabled, a thread with the CAP_SETPCAP  capability
903       can manipulate the capabilities of threads other than itself.  However,
904       this is only theoretically possible, since no thread ever has CAP_SETP‐
905       CAP in either of these cases:
906
907       * In  the pre-2.6.25 implementation the system-wide capability bounding
908         set, /proc/sys/kernel/cap-bound, always masks  out  this  capability,
909         and  this  can not be changed without modifying the kernel source and
910         rebuilding.
911
912       * If file capabilities are disabled in the current implementation, then
913         init  starts  out  with  this capability removed from its per-process
914         bounding set, and that bounding set is inherited by  all  other  pro‐
915         cesses created on the system.
916

COLOPHON

927       This  page  is  part of release 4.16 of the Linux man-pages project.  A
928       description of the project, information about reporting bugs,  and  the
929       latest     version     of     this    page,    can    be    found    at
930       https://www.kernel.org/doc/man-pages/.
931
932
933
934Linux                             2018-02-02                   CAPABILITIES(7)

NAME

DESCRIPTION

CONFORMING TO

NOTES

SEE ALSO

COLOPHON