fcntl(2) - f34

1FCNTL(2)                   Linux Programmer's Manual                  FCNTL(2)
2
3
4

NAME

6       fcntl - manipulate file descriptor
7

SYNOPSIS

9       #include <unistd.h>
10       #include <fcntl.h>
11
12       int fcntl(int fd, int cmd, ... /* arg */ );
13

DESCRIPTION

15       fcntl() performs one of the operations described below on the open file
16       descriptor fd.  The operation is determined by cmd.
17
18       fcntl() can take an optional third argument.  Whether or not this argu‐
19       ment  is  required is determined by cmd.  The required argument type is
20       indicated in parentheses after each cmd name (in most  cases,  the  re‐
21       quired  type  is int, and we identify the argument using the name arg),
22       or void is specified if the argument is not required.
23
24       Certain of the operations below are supported only since  a  particular
25       Linux  kernel  version.   The  preferred method of checking whether the
26       host kernel supports a particular operation is to invoke  fcntl()  with
27       the  desired  cmd value and then test whether the call failed with EIN‐
28       VAL, indicating that the kernel does not recognize this value.
29
30   Duplicating a file descriptor
31       F_DUPFD (int)
32              Duplicate the  file  descriptor  fd  using  the  lowest-numbered
33              available file descriptor greater than or equal to arg.  This is
34              different from dup2(2), which uses exactly the  file  descriptor
35              specified.
36
37              On success, the new file descriptor is returned.
38
39              See dup(2) for further details.
40
41       F_DUPFD_CLOEXEC (int; since Linux 2.6.24)
42              As  for F_DUPFD, but additionally set the close-on-exec flag for
43              the duplicate file descriptor.  Specifying this flag  permits  a
44              program  to avoid an additional fcntl() F_SETFD operation to set
45              the FD_CLOEXEC flag.  For an explanation of  why  this  flag  is
46              useful, see the description of O_CLOEXEC in open(2).
47
48   File descriptor flags
49       The  following commands manipulate the flags associated with a file de‐
50       scriptor.  Currently, only one such flag is  defined:  FD_CLOEXEC,  the
51       close-on-exec  flag.  If the FD_CLOEXEC bit is set, the file descriptor
52       will automatically be closed during a successful  execve(2).   (If  the
53       execve(2)  fails, the file descriptor is left open.)  If the FD_CLOEXEC
54       bit is not set, the file descriptor will  remain  open  across  an  ex‐
55       ecve(2).
56
57       F_GETFD (void)
58              Return  (as  the function result) the file descriptor flags; arg
59              is ignored.
60
61       F_SETFD (int)
62              Set the file descriptor flags to the value specified by arg.
63
64       In multithreaded programs, using fcntl() F_SETFD to set  the  close-on-
65       exec  flag  at  the same time as another thread performs a fork(2) plus
66       execve(2) is vulnerable to a race condition  that  may  unintentionally
67       leak  the file descriptor to the program executed in the child process.
68       See the discussion of the O_CLOEXEC flag in open(2) for details  and  a
69       remedy to the problem.
70
71   File status flags
72       Each  open  file  description has certain associated status flags, ini‐
73       tialized by open(2) and possibly modified by fcntl().  Duplicated  file
74       descriptors  (made with dup(2), fcntl(F_DUPFD), fork(2), etc.) refer to
75       the same open file description, and thus share  the  same  file  status
76       flags.
77
78       The file status flags and their semantics are described in open(2).
79
80       F_GETFL (void)
81              Return  (as  the  function  result) the file access mode and the
82              file status flags; arg is ignored.
83
84       F_SETFL (int)
85              Set the file status flags to the value specified by  arg.   File
86              access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation flags
87              (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg  are  ignored.
88              On  Linux,  this  command can change only the O_APPEND, O_ASYNC,
89              O_DIRECT, O_NOATIME, and O_NONBLOCK flags.  It is  not  possible
90              to change the O_DSYNC and O_SYNC flags; see BUGS, below.
91
92   Advisory record locking
93       Linux  implements traditional ("process-associated") UNIX record locks,
94       as standardized by POSIX.  For a Linux-specific alternative with better
95       semantics, see the discussion of open file description locks below.
96
97       F_SETLK,  F_SETLKW,  and F_GETLK are used to acquire, release, and test
98       for the existence of record locks (also known as byte-range,  file-seg‐
99       ment, or file-region locks).  The third argument, lock, is a pointer to
100       a structure that has at least the following fields (in unspecified  or‐
101       der).
102
103           struct flock {
104               ...
105               short l_type;    /* Type of lock: F_RDLCK,
106                                   F_WRLCK, F_UNLCK */
107               short l_whence;  /* How to interpret l_start:
108                                   SEEK_SET, SEEK_CUR, SEEK_END */
109               off_t l_start;   /* Starting offset for lock */
110               off_t l_len;     /* Number of bytes to lock */
111               pid_t l_pid;     /* PID of process blocking our lock
112                                   (set by F_GETLK and F_OFD_GETLK) */
113               ...
114           };
115
116       The  l_whence,  l_start, and l_len fields of this structure specify the
117       range of bytes we wish to lock.  Bytes past the end of the file may  be
118       locked, but not bytes before the start of the file.
119
120       l_start  is  the starting offset for the lock, and is interpreted rela‐
121       tive to either: the start of the file (if l_whence  is  SEEK_SET);  the
122       current  file  offset (if l_whence is SEEK_CUR); or the end of the file
123       (if l_whence is SEEK_END).  In the final two cases, l_start  can  be  a
124       negative  number  provided  the offset does not lie before the start of
125       the file.
126
127       l_len specifies the number of bytes to be locked.  If  l_len  is  posi‐
128       tive,  then  the  range to be locked covers bytes l_start up to and in‐
129       cluding l_start+l_len-1.  Specifying 0 for l_len has the special  mean‐
130       ing:  lock all bytes starting at the location specified by l_whence and
131       l_start through to the end of file, no matter how large the file grows.
132
133       POSIX.1-2001 allows (but does not require) an implementation to support
134       a negative l_len value; if l_len is negative, the interval described by
135       lock covers bytes l_start+l_len up to and including l_start-1.  This is
136       supported by Linux since kernel versions 2.4.21 and 2.5.49.
137
138       The  l_type  field  can  be  used  to place a read (F_RDLCK) or a write
139       (F_WRLCK) lock on a file.  Any number of processes may hold a read lock
140       (shared  lock)  on a file region, but only one process may hold a write
141       lock (exclusive lock).  An exclusive lock  excludes  all  other  locks,
142       both  shared and exclusive.  A single process can hold only one type of
143       lock on a file region; if a new lock is applied  to  an  already-locked
144       region,  then  the  existing  lock  is  converted to the new lock type.
145       (Such conversions may involve splitting, shrinking, or coalescing  with
146       an  existing  lock if the byte range specified by the new lock does not
147       precisely coincide with the range of the existing lock.)
148
149       F_SETLK (struct flock *)
150              Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release  a
151              lock  (when  l_type  is  F_UNLCK)  on the bytes specified by the
152              l_whence, l_start, and l_len fields of lock.  If  a  conflicting
153              lock  is  held by another process, this call returns -1 and sets
154              errno to EACCES or EAGAIN.  (The error  returned  in  this  case
155              differs across implementations, so POSIX requires a portable ap‐
156              plication to check for both errors.)
157
158       F_SETLKW (struct flock *)
159              As for F_SETLK, but if a conflicting lock is held on  the  file,
160              then  wait  for that lock to be released.  If a signal is caught
161              while waiting, then the call is interrupted and (after the  sig‐
162              nal handler has returned) returns immediately (with return value
163              -1 and errno set to EINTR; see signal(7)).
164
165       F_GETLK (struct flock *)
166              On input to this call, lock describes a lock we  would  like  to
167              place  on  the  file.  If the lock could be placed, fcntl() does
168              not actually place it, but returns F_UNLCK in the  l_type  field
169              of lock and leaves the other fields of the structure unchanged.
170
171              If  one or more incompatible locks would prevent this lock being
172              placed, then fcntl() returns details about one of those locks in
173              the l_type, l_whence, l_start, and l_len fields of lock.  If the
174              conflicting lock is a  traditional  (process-associated)  record
175              lock,  then  the  l_pid  field  is set to the PID of the process
176              holding that lock.  If the conflicting lock is an open file  de‐
177              scription lock, then l_pid is set to -1.  Note that the returned
178              information may already be out of date by the  time  the  caller
179              inspects it.
180
181       In  order  to place a read lock, fd must be open for reading.  In order
182       to place a write lock, fd must be open  for  writing.   To  place  both
183       types of lock, open a file read-write.
184
185       When placing locks with F_SETLKW, the kernel detects deadlocks, whereby
186       two or more processes have their  lock  requests  mutually  blocked  by
187       locks  held  by  the  other  processes.  For example, suppose process A
188       holds a write lock on byte 100 of a file, and process B holds  a  write
189       lock  on  byte 200.  If each process then attempts to lock the byte al‐
190       ready locked by the other process using F_SETLKW, then,  without  dead‐
191       lock detection, both processes would remain blocked indefinitely.  When
192       the kernel detects such deadlocks, it causes one of the  blocking  lock
193       requests  to  immediately  fail  with the error EDEADLK; an application
194       that encounters such an error should release some of its locks to allow
195       other  applications  to proceed before attempting regain the locks that
196       it requires.  Circular deadlocks involving more than two processes  are
197       also  detected.   Note, however, that there are limitations to the ker‐
198       nel's deadlock-detection algorithm; see BUGS.
199
200       As well as being removed by an explicit F_UNLCK, record locks are auto‐
201       matically released when the process terminates.
202
203       Record  locks are not inherited by a child created via fork(2), but are
204       preserved across an execve(2).
205
206       Because of the buffering performed by the stdio(3) library, the use  of
207       record  locking  with  routines  in that package should be avoided; use
208       read(2) and write(2) instead.
209
210       The record locks described above are associated with the  process  (un‐
211       like  the  open file description locks described below).  This has some
212       unfortunate consequences:
213
214       *  If a process closes any file descriptor referring to  a  file,  then
215          all  of the process's locks on that file are released, regardless of
216          the file descriptor(s) on which the locks were  obtained.   This  is
217          bad:  it  means  that a process can lose its locks on a file such as
218          /etc/passwd or /etc/mtab when for some reason a library function de‐
219          cides to open, read, and close the same file.
220
221       *  The  threads  in  a  process  share locks.  In other words, a multi‐
222          threaded program can't use record locking  to  ensure  that  threads
223          don't simultaneously access the same region of a file.
224
225       Open file description locks solve both of these problems.
226
227   Open file description locks (non-POSIX)
228       Open  file description locks are advisory byte-range locks whose opera‐
229       tion is in most respects identical to the traditional record locks  de‐
230       scribed  above.   This lock type is Linux-specific, and available since
231       Linux 3.15.  (There is a proposal with the Austin Group to include this
232       lock type in the next revision of POSIX.1.)  For an explanation of open
233       file descriptions, see open(2).
234
235       The principal difference between the two lock  types  is  that  whereas
236       traditional  record  locks are associated with a process, open file de‐
237       scription locks are associated with the open file description on  which
238       they  are  acquired,  much  like  locks acquired with flock(2).  Conse‐
239       quently (and unlike traditional advisory record locks), open  file  de‐
240       scription  locks  are  inherited  across  fork(2)  (and  clone(2)  with
241       CLONE_FILES), and are only automatically released on the last close  of
242       the  open  file  description, instead of being released on any close of
243       the file.
244
245       Conflicting lock combinations (i.e., a read lock and a  write  lock  or
246       two  write  locks)  where one lock is an open file description lock and
247       the other is a traditional record lock conflict even when they are  ac‐
248       quired by the same process on the same file descriptor.
249
250       Open  file  description locks placed via the same open file description
251       (i.e., via the same file descriptor, or via a duplicate of the file de‐
252       scriptor  created  by  fork(2), dup(2), fcntl() F_DUPFD, and so on) are
253       always compatible: if a new lock is placed on an already locked region,
254       then  the  existing lock is converted to the new lock type.  (Such con‐
255       versions may result in splitting, shrinking, or coalescing with an  ex‐
256       isting lock as discussed above.)
257
258       On  the  other hand, open file description locks may conflict with each
259       other when they are acquired  via  different  open  file  descriptions.
260       Thus, the threads in a multithreaded program can use open file descrip‐
261       tion locks to synchronize access to a file region by having each thread
262       perform  its own open(2) on the file and applying locks via the result‐
263       ing file descriptor.
264
265       As with traditional advisory locks,  the  third  argument  to  fcntl(),
266       lock, is a pointer to an flock structure.  By contrast with traditional
267       record locks, the l_pid field of that structure must  be  set  to  zero
268       when using the commands described below.
269
270       The commands for working with open file description locks are analogous
271       to those used with traditional locks:
272
273       F_OFD_SETLK (struct flock *)
274              Acquire an open file description lock (when l_type is F_RDLCK or
275              F_WRLCK)  or  release an open file description lock (when l_type
276              is F_UNLCK) on the bytes specified by the l_whence, l_start, and
277              l_len  fields of lock.  If a conflicting lock is held by another
278              process, this call returns -1 and sets errno to EAGAIN.
279
280       F_OFD_SETLKW (struct flock *)
281              As for F_OFD_SETLK, but if a conflicting lock  is  held  on  the
282              file,  then  wait  for that lock to be released.  If a signal is
283              caught while waiting, then the call is  interrupted  and  (after
284              the  signal  handler has returned) returns immediately (with re‐
285              turn value -1 and errno set to EINTR; see signal(7)).
286
287       F_OFD_GETLK (struct flock *)
288              On input to this call, lock describes an open  file  description
289              lock  we  would like to place on the file.  If the lock could be
290              placed, fcntl() does not actually place it, but returns  F_UNLCK
291              in  the  l_type field of lock and leaves the other fields of the
292              structure unchanged.  If one or more  incompatible  locks  would
293              prevent  this lock being placed, then details about one of these
294              locks are returned via lock, as described above for F_GETLK.
295
296       In the current implementation, no deadlock detection is  performed  for
297       open  file  description locks.  (This contrasts with process-associated
298       record locks, for which the kernel does perform deadlock detection.)
299
300   Mandatory locking
301       Warning: the Linux implementation of mandatory locking  is  unreliable.
302       See  BUGS  below.  Because of these bugs, and the fact that the feature
303       is believed to be little used, since Linux 4.5, mandatory  locking  has
304       been made an optional feature, governed by a configuration option (CON‐
305       FIG_MANDATORY_FILE_LOCKING).  This is an initial step  toward  removing
306       this feature completely.
307
308       By  default,  both  traditional  (process-associated) and open file de‐
309       scription record locks are advisory.  Advisory locks are  not  enforced
310       and are useful only between cooperating processes.
311
312       Both  lock  types  can also be mandatory.  Mandatory locks are enforced
313       for all processes.  If a process tries to perform an  incompatible  ac‐
314       cess (e.g., read(2) or write(2)) on a file region that has an incompat‐
315       ible mandatory lock, then the result depends upon  whether  the  O_NON‐
316       BLOCK flag is enabled for its open file description.  If the O_NONBLOCK
317       flag is not enabled, then the system call is blocked until the lock  is
318       removed  or converted to a mode that is compatible with the access.  If
319       the O_NONBLOCK flag is enabled, then the system call fails with the er‐
320       ror EAGAIN.
321
322       To  make use of mandatory locks, mandatory locking must be enabled both
323       on the filesystem that contains the file to be locked, and on the  file
324       itself.   Mandatory  locking  is  enabled on a filesystem using the "-o
325       mand" option to mount(8), or the MS_MANDLOCK flag for mount(2).  Manda‐
326       tory locking is enabled on a file by disabling group execute permission
327       on the file and enabling the set-group-ID permission bit (see  chmod(1)
328       and chmod(2)).
329
330       Mandatory  locking  is not specified by POSIX.  Some other systems also
331       support mandatory locking, although the details of  how  to  enable  it
332       vary across systems.
333
334   Lost locks
335       When an advisory lock is obtained on a networked filesystem such as NFS
336       it is possible that the lock might get lost.  This may  happen  due  to
337       administrative  action  on  the  server,  or due to a network partition
338       (i.e., loss of network connectivity with the server) which  lasts  long
339       enough  for the server to assume that the client is no longer function‐
340       ing.
341
342       When the filesystem determines  that  a  lock  has  been  lost,  future
343       read(2)  or  write(2) requests may fail with the error EIO.  This error
344       will persist until the lock  is  removed  or  the  file  descriptor  is
345       closed.   Since  Linux 3.12, this happens at least for NFSv4 (including
346       all minor versions).
347
348       Some versions of UNIX send a signal  (SIGLOST)  in  this  circumstance.
349       Linux  does  not define this signal, and does not provide any asynchro‐
350       nous notification of lost locks.
351
352   Managing signals
353       F_GETOWN, F_SETOWN, F_GETOWN_EX, F_SETOWN_EX,  F_GETSIG,  and  F_SETSIG
354       are used to manage I/O availability signals:
355
356       F_GETOWN (void)
357              Return  (as the function result) the process ID or process group
358              ID currently receiving SIGIO and SIGURG signals  for  events  on
359              file  descriptor  fd.  Process IDs are returned as positive val‐
360              ues; process group IDs are returned as negative values (but  see
361              BUGS below).  arg is ignored.
362
363       F_SETOWN (int)
364              Set  the  process ID or process group ID that will receive SIGIO
365              and SIGURG signals for events on the file  descriptor  fd.   The
366              target  process  or  process  group  ID  is specified in arg.  A
367              process ID is specified as a positive value; a process group  ID
368              is  specified  as  a negative value.  Most commonly, the calling
369              process specifies itself as the owner (that is, arg is specified
370              as getpid(2)).
371
372              As  well as setting the file descriptor owner, one must also en‐
373              able generation of signals on the file descriptor.  This is done
374              by  using  the  fcntl()  F_SETFL command to set the O_ASYNC file
375              status flag on the file descriptor.  Subsequently, a SIGIO  sig‐
376              nal  is  sent  whenever  input or output becomes possible on the
377              file descriptor.  The fcntl() F_SETSIG command can  be  used  to
378              obtain delivery of a signal other than SIGIO.
379
380              Sending a signal to the owner process (group) specified by F_SE‐
381              TOWN is subject to the same permissions checks as are  described
382              for  kill(2),  where the sending process is the one that employs
383              F_SETOWN (but see BUGS below).  If this permission check  fails,
384              then the signal is silently discarded.  Note: The F_SETOWN oper‐
385              ation records the caller's credentials at the time  of  the  fc‐
386              ntl()  call, and it is these saved credentials that are used for
387              the permission checks.
388
389              If the file descriptor fd refers to a socket, F_SETOWN also  se‐
390              lects  the  recipient  of SIGURG signals that are delivered when
391              out-of-band data arrives on that socket.  (SIGURG is sent in any
392              situation  where  select(2) would report the socket as having an
393              "exceptional condition".)
394
395              The following was true in 2.6.x kernels up to and including ker‐
396              nel 2.6.11:
397
398                     If  a  nonzero  value  is  given  to F_SETSIG in a multi‐
399                     threaded process running with a  threading  library  that
400                     supports  thread  groups  (e.g.,  NPTL),  then a positive
401                     value given to F_SETOWN has a different meaning:  instead
402                     of  being a process ID identifying a whole process, it is
403                     a thread  ID  identifying  a  specific  thread  within  a
404                     process.  Consequently, it may be necessary to pass F_SE‐
405                     TOWN the result of gettid(2) instead of getpid(2) to  get
406                     sensible  results  when  F_SETSIG  is  used.  (In current
407                     Linux threading implementations, a main  thread's  thread
408                     ID is the same as its process ID.  This means that a sin‐
409                     gle-threaded program can equally use  gettid(2)  or  get‐
410                     pid(2) in this scenario.)  Note, however, that the state‐
411                     ments in this paragraph do not apply to the SIGURG signal
412                     generated  for  out-of-band data on a socket: this signal
413                     is always sent to either a process or  a  process  group,
414                     depending on the value given to F_SETOWN.
415
416              The above behavior was accidentally dropped in Linux 2.6.12, and
417              won't be restored.  From Linux 2.6.32 onward, use F_SETOWN_EX to
418              target SIGIO and SIGURG signals at a particular thread.
419
420       F_GETOWN_EX (struct f_owner_ex *) (since Linux 2.6.32)
421              Return  the current file descriptor owner settings as defined by
422              a previous F_SETOWN_EX operation.  The information  is  returned
423              in  the  structure  pointed  to  by arg, which has the following
424              form:
425
426                  struct f_owner_ex {
427                      int   type;
428                      pid_t pid;
429                  };
430
431              The  type  field  will  have  one  of  the  values  F_OWNER_TID,
432              F_OWNER_PID, or F_OWNER_PGRP.  The pid field is a positive inte‐
433              ger representing a thread ID, process ID, or process  group  ID.
434              See F_SETOWN_EX for more details.
435
436       F_SETOWN_EX (struct f_owner_ex *) (since Linux 2.6.32)
437              This  operation  performs a similar task to F_SETOWN.  It allows
438              the caller to direct I/O  availability  signals  to  a  specific
439              thread,  process,  or  process  group.  The caller specifies the
440              target of signals via arg, which is a pointer  to  a  f_owner_ex
441              structure.   The  type  field  has  one of the following values,
442              which define how pid is interpreted:
443
444              F_OWNER_TID
445                     Send the signal to the thread whose thread ID (the  value
446                     returned by a call to clone(2) or gettid(2)) is specified
447                     in pid.
448
449              F_OWNER_PID
450                     Send the signal to the process whose ID is  specified  in
451                     pid.
452
453              F_OWNER_PGRP
454                     Send  the  signal to the process group whose ID is speci‐
455                     fied in pid.  (Note that, unlike with F_SETOWN, a process
456                     group ID is specified as a positive value here.)
457
458       F_GETSIG (void)
459              Return  (as  the  function result) the signal sent when input or
460              output becomes possible.  A value of zero means SIGIO  is  sent.
461              Any  other  value  (including SIGIO) is the signal sent instead,
462              and in this case additional info is available to the signal han‐
463              dler if installed with SA_SIGINFO.  arg is ignored.
464
465       F_SETSIG (int)
466              Set the signal sent when input or output becomes possible to the
467              value given in arg.  A value of zero means to send  the  default
468              SIGIO  signal.   Any other value (including SIGIO) is the signal
469              to send instead, and in this case additional info  is  available
470              to the signal handler if installed with SA_SIGINFO.
471
472              By  using  F_SETSIG with a nonzero value, and setting SA_SIGINFO
473              for the signal handler  (see  sigaction(2)),  extra  information
474              about  I/O events is passed to the handler in a siginfo_t struc‐
475              ture.  If the si_code field indicates the  source  is  SI_SIGIO,
476              the  si_fd  field  gives the file descriptor associated with the
477              event.  Otherwise, there is no indication which file descriptors
478              are pending, and you should use the usual mechanisms (select(2),
479              poll(2), read(2) with O_NONBLOCK set etc.)  to  determine  which
480              file descriptors are available for I/O.
481
482              Note  that the file descriptor provided in si_fd is the one that
483              was specified during the F_SETSIG operation.  This can  lead  to
484              an  unusual  corner  case.  If the file descriptor is duplicated
485              (dup(2) or similar), and the original file descriptor is closed,
486              then  I/O  events  will  continue to be generated, but the si_fd
487              field will contain the number of the now closed file descriptor.
488
489              By selecting a real time signal (value  >=  SIGRTMIN),  multiple
490              I/O  events may be queued using the same signal numbers.  (Queu‐
491              ing is dependent on available  memory.)   Extra  information  is
492              available if SA_SIGINFO is set for the signal handler, as above.
493
494              Note  that Linux imposes a limit on the number of real-time sig‐
495              nals that may be queued to a process (see getrlimit(2) and  sig‐
496              nal(7)) and if this limit is reached, then the kernel reverts to
497              delivering SIGIO, and this signal is  delivered  to  the  entire
498              process rather than to a specific thread.
499
500       Using  these mechanisms, a program can implement fully asynchronous I/O
501       without using select(2) or poll(2) most of the time.
502
503       The use of O_ASYNC is specific to BSD  and  Linux.   The  only  use  of
504       F_GETOWN  and  F_SETOWN specified in POSIX.1 is in conjunction with the
505       use of the SIGURG signal on sockets.  (POSIX does not specify the SIGIO
506       signal.)   F_GETOWN_EX,  F_SETOWN_EX, F_GETSIG, and F_SETSIG are Linux-
507       specific.  POSIX has asynchronous I/O and the aio_sigevent structure to
508       achieve  similar  things;  these are also available in Linux as part of
509       the GNU C Library (Glibc).
510
511   Leases
512       F_SETLEASE and F_GETLEASE (Linux 2.4 onward) are used  to  establish  a
513       new lease, and retrieve the current lease, on the open file description
514       referred to by the file descriptor fd.  A file lease provides a  mecha‐
515       nism  whereby the process holding the lease (the "lease holder") is no‐
516       tified (via delivery of a signal) when a process (the "lease  breaker")
517       tries  to  open(2) or truncate(2) the file referred to by that file de‐
518       scriptor.
519
520       F_SETLEASE (int)
521              Set or remove a file lease according to which of  the  following
522              values is specified in the integer arg:
523
524              F_RDLCK
525                     Take  out  a  read  lease.   This  will cause the calling
526                     process to be notified when the file is opened for  writ‐
527                     ing  or is truncated.  A read lease can be placed only on
528                     a file descriptor that is opened read-only.
529
530              F_WRLCK
531                     Take out a write lease.  This will cause the caller to be
532                     notified  when  the file is opened for reading or writing
533                     or is truncated.  A write lease may be placed on  a  file
534                     only  if there are no other open file descriptors for the
535                     file.
536
537              F_UNLCK
538                     Remove our lease from the file.
539
540       Leases are associated with an  open  file  description  (see  open(2)).
541       This  means  that  duplicate file descriptors (created by, for example,
542       fork(2) or dup(2)) refer to the same lease, and this lease may be modi‐
543       fied  or  released  using  any  of these descriptors.  Furthermore, the
544       lease is released by either an explicit F_UNLCK  operation  on  any  of
545       these  duplicate  file  descriptors,  or when all such file descriptors
546       have been closed.
547
548       Leases may be taken out only on regular files.  An unprivileged process
549       may  take  out  a  lease  only  on a file whose UID (owner) matches the
550       filesystem UID of the process.  A process with the CAP_LEASE capability
551       may take out leases on arbitrary files.
552
553       F_GETLEASE (void)
554              Indicates  what  type  of  lease is associated with the file de‐
555              scriptor fd by returning either F_RDLCK,  F_WRLCK,  or  F_UNLCK,
556              indicating,  respectively,  a  read lease , a write lease, or no
557              lease.  arg is ignored.
558
559       When a process (the "lease breaker") performs an open(2) or truncate(2)
560       that conflicts with a lease established via F_SETLEASE, the system call
561       is blocked by the kernel and the kernel notifies the  lease  holder  by
562       sending  it  a  signal (SIGIO by default).  The lease holder should re‐
563       spond to receipt of this signal by doing whatever cleanup  is  required
564       in  preparation  for  the file to be accessed by another process (e.g.,
565       flushing cached buffers) and then either remove or downgrade its lease.
566       A  lease  is removed by performing an F_SETLEASE command specifying arg
567       as F_UNLCK.  If the lease holder currently holds a write lease  on  the
568       file, and the lease breaker is opening the file for reading, then it is
569       sufficient for the lease holder to downgrade the lease to a read lease.
570       This  is  done  by  performing  an F_SETLEASE command specifying arg as
571       F_RDLCK.
572
573       If the lease holder fails to downgrade or remove the lease  within  the
574       number  of seconds specified in /proc/sys/fs/lease-break-time, then the
575       kernel forcibly removes or downgrades the lease holder's lease.
576
577       Once a lease break has been initiated, F_GETLEASE  returns  the  target
578       lease  type (either F_RDLCK or F_UNLCK, depending on what would be com‐
579       patible with the lease breaker)  until  the  lease  holder  voluntarily
580       downgrades  or  removes  the lease or the kernel forcibly does so after
581       the lease break timer expires.
582
583       Once the lease has been voluntarily or forcibly removed or  downgraded,
584       and  assuming  the lease breaker has not unblocked its system call, the
585       kernel permits the lease breaker's system call to proceed.
586
587       If the lease breaker's blocked open(2) or truncate(2) is interrupted by
588       a  signal handler, then the system call fails with the error EINTR, but
589       the other steps still occur as described above.  If the  lease  breaker
590       is killed by a signal while blocked in open(2) or truncate(2), then the
591       other steps still occur as described above.  If the lease breaker spec‐
592       ifies  the  O_NONBLOCK flag when calling open(2), then the call immedi‐
593       ately fails with the error EWOULDBLOCK, but the other steps still occur
594       as described above.
595
596       The  default  signal used to notify the lease holder is SIGIO, but this
597       can be changed using the F_SETSIG command to fcntl().   If  a  F_SETSIG
598       command  is  performed (even one specifying SIGIO), and the signal han‐
599       dler is established using SA_SIGINFO, then the handler will  receive  a
600       siginfo_t structure as its second argument, and the si_fd field of this
601       argument will hold the file descriptor of the leased file that has been
602       accessed  by  another  process.   (This  is  useful if the caller holds
603       leases against multiple files.)
604
605   File and directory change notification (dnotify)
606       F_NOTIFY (int)
607              (Linux 2.4 onward) Provide notification when the  directory  re‐
608              ferred to by fd or any of the files that it contains is changed.
609              The events to be notified are specified in arg, which is  a  bit
610              mask  specified  by ORing together zero or more of the following
611              bits:
612
613              DN_ACCESS
614                     A file was accessed  (read(2),  pread(2),  readv(2),  and
615                     similar)
616              DN_MODIFY
617                     A  file  was  modified  (write(2),  pwrite(2), writev(2),
618                     truncate(2), ftruncate(2), and similar).
619              DN_CREATE
620                     A  file  was  created   (open(2),   creat(2),   mknod(2),
621                     mkdir(2), link(2), symlink(2), rename(2) into this direc‐
622                     tory).
623              DN_DELETE
624                     A file was unlinked (unlink(2), rename(2) to another  di‐
625                     rectory, rmdir(2)).
626              DN_RENAME
627                     A file was renamed within this directory (rename(2)).
628              DN_ATTRIB
629                     The   attributes   of  a  file  were  changed  (chown(2),
630                     chmod(2), utime(2), utimensat(2), and similar).
631
632              (In order to obtain these definitions, the  _GNU_SOURCE  feature
633              test macro must be defined before including any header files.)
634
635              Directory  notifications are normally "one-shot", and the appli‐
636              cation must reregister to receive further notifications.  Alter‐
637              natively,  if DN_MULTISHOT is included in arg, then notification
638              will remain in effect until explicitly removed.
639
640              A series of F_NOTIFY requests is cumulative, with the events  in
641              arg  being added to the set already monitored.  To disable noti‐
642              fication of all events, make an F_NOTIFY call specifying arg  as
643              0.
644
645              Notification  occurs via delivery of a signal.  The default sig‐
646              nal is SIGIO, but this can be changed using the F_SETSIG command
647              to  fcntl().  (Note that SIGIO is one of the nonqueuing standard
648              signals; switching to the use of a real-time signal  means  that
649              multiple  notifications  can  be queued to the process.)  In the
650              latter case, the signal handler receives a  siginfo_t  structure
651              as  its  second  argument  (if the handler was established using
652              SA_SIGINFO) and the si_fd field of this structure  contains  the
653              file  descriptor  which  generated the notification (useful when
654              establishing notification on multiple directories).
655
656              Especially when using DN_MULTISHOT, a real time signal should be
657              used  for  notification,  so  that multiple notifications can be
658              queued.
659
660              NOTE: New applications should use the inotify interface  (avail‐
661              able since kernel 2.6.13), which provides a much superior inter‐
662              face for obtaining notifications of filesystem events.  See ino‐
663              tify(7).
664
665   Changing the capacity of a pipe
666       F_SETPIPE_SZ (int; since Linux 2.6.35)
667              Change the capacity of the pipe referred to by fd to be at least
668              arg bytes.  An unprivileged process can adjust the pipe capacity
669              to  any value between the system page size and the limit defined
670              in /proc/sys/fs/pipe-max-size (see proc(5)).   Attempts  to  set
671              the pipe capacity below the page size are silently rounded up to
672              the page size.  Attempts by an unprivileged process to  set  the
673              pipe  capacity  above  the  limit  in /proc/sys/fs/pipe-max-size
674              yield the error EPERM; a privileged  process  (CAP_SYS_RESOURCE)
675              can override the limit.
676
677              When  allocating  the  buffer for the pipe, the kernel may use a
678              capacity larger than arg, if that is convenient for  the  imple‐
679              mentation.   (In  the  current implementation, the allocation is
680              the next higher power-of-two page-size multiple of the requested
681              size.)   The  actual capacity (in bytes) that is set is returned
682              as the function result.
683
684              Attempting to set the pipe capacity smaller than the  amount  of
685              buffer  space  currently  used  to store data produces the error
686              EBUSY.
687
688              Note that because of the way the pages of the  pipe  buffer  are
689              employed  when  data is written to the pipe, the number of bytes
690              that can be written may be less than the nominal size, depending
691              on the size of the writes.
692
693       F_GETPIPE_SZ (void; since Linux 2.6.35)
694              Return  (as  the  function  result) the capacity of the pipe re‐
695              ferred to by fd.
696
697   File Sealing
698       File seals limit the set of allowed operations on a  given  file.   For
699       each seal that is set on a file, a specific set of operations will fail
700       with EPERM on this file from now on.  The file is said  to  be  sealed.
701       The default set of seals depends on the type of the underlying file and
702       filesystem.  For an overview of file sealing, a discussion of its  pur‐
703       pose, and some code examples, see memfd_create(2).
704
705       Currently, file seals can be applied only to a file descriptor returned
706       by memfd_create(2) (if the MFD_ALLOW_SEALING was employed).   On  other
707       filesystems,  all  fcntl() operations that operate on seals will return
708       EINVAL.
709
710       Seals are a property of an inode.  Thus, all open file descriptors  re‐
711       ferring  to  the  same inode share the same set of seals.  Furthermore,
712       seals can never be removed, only added.
713
714       F_ADD_SEALS (int; since Linux 3.17)
715              Add the seals given in the bit-mask argument arg to the  set  of
716              seals of the inode referred to by the file descriptor fd.  Seals
717              cannot be removed again.  Once this call succeeds, the seals are
718              enforced by the kernel immediately.  If the current set of seals
719              includes F_SEAL_SEAL (see below), then this  call  will  be  re‐
720              jected  with  EPERM.  Adding a seal that is already set is a no-
721              op, in case F_SEAL_SEAL is not set already.  In order to place a
722              seal, the file descriptor fd must be writable.
723
724       F_GET_SEALS (void; since Linux 3.17)
725              Return  (as the function result) the current set of seals of the
726              inode referred to by fd.  If no seals are set,  0  is  returned.
727              If  the  file does not support sealing, -1 is returned and errno
728              is set to EINVAL.
729
730       The following seals are available:
731
732       F_SEAL_SEAL
733              If  this  seal  is  set,  any  further  call  to  fcntl()   with
734              F_ADD_SEALS  fails  with  the error EPERM.  Therefore, this seal
735              prevents any modifications to the set of seals itself.   If  the
736              initial  set  of seals of a file includes F_SEAL_SEAL, then this
737              effectively causes the set of seals to be constant and locked.
738
739       F_SEAL_SHRINK
740              If this seal is set, the file in question cannot be  reduced  in
741              size.   This  affects  open(2)  with the O_TRUNC flag as well as
742              truncate(2) and ftruncate(2).  Those calls fail  with  EPERM  if
743              you  try  to  shrink  the file in question.  Increasing the file
744              size is still possible.
745
746       F_SEAL_GROW
747              If this seal is set, the size of the file in question cannot  be
748              increased.   This  affects  write(2) beyond the end of the file,
749              truncate(2), ftruncate(2), and fallocate(2).  These  calls  fail
750              with  EPERM  if  you use them to increase the file size.  If you
751              keep the size or shrink it, those calls still work as expected.
752
753       F_SEAL_WRITE
754              If this seal is set, you cannot modify the contents of the file.
755              Note  that  shrinking  or  growing the size of the file is still
756              possible and allowed.  Thus, this seal is normally used in  com‐
757              bination  with  one  of  the  other  seals.   This  seal affects
758              write(2) and fallocate(2) (only in  combination  with  the  FAL‐
759              LOC_FL_PUNCH_HOLE  flag).   Those  calls fail with EPERM if this
760              seal is set.  Furthermore, trying to create new shared, writable
761              memory-mappings via mmap(2) will also fail with EPERM.
762
763              Using  the  F_ADD_SEALS  operation  to set the F_SEAL_WRITE seal
764              fails with EBUSY if any writable, shared mapping  exists.   Such
765              mappings  must  be  unmapped before you can add this seal.  Fur‐
766              thermore, if there are any asynchronous I/O operations  (io_sub‐
767              mit(2)) pending on the file, all outstanding writes will be dis‐
768              carded.
769
770       F_SEAL_FUTURE_WRITE (since Linux 5.1)
771              The effect of this seal is similar to F_SEAL_WRITE, but the con‐
772              tents of the file can still be modified via shared writable map‐
773              pings that were created prior to the seal being  set.   Any  at‐
774              tempt  to  create a new writable mapping on the file via mmap(2)
775              will fail with EPERM.  Likewise, an attempt to write to the file
776              via write(2) will fail with EPERM.
777
778              Using  this seal, one process can create a memory buffer that it
779              can continue to modify while sharing that  buffer  on  a  "read-
780              only" basis with other processes.
781
782   File read/write hints
783       Write  lifetime  hints can be used to inform the kernel about the rela‐
784       tive expected lifetime of writes on a given inode or via  a  particular
785       open  file  description.   (See open(2) for an explanation of open file
786       descriptions.)  In this context, the term "write  lifetime"  means  the
787       expected  time the data will live on media, before being overwritten or
788       erased.
789
790       An application may use the different hint  values  specified  below  to
791       separate writes into different write classes, so that multiple users or
792       applications running on a single storage back-end can  aggregate  their
793       I/O  patterns in a consistent manner.  However, there are no functional
794       semantics implied by these flags, and different I/O classes can use the
795       write  lifetime  hints in arbitrary ways, so long as the hints are used
796       consistently.
797
798       The following operations can be applied to the file descriptor, fd:
799
800       F_GET_RW_HINT (uint64_t *; since Linux 4.13)
801              Returns the value of the read/write hint associated with the un‐
802              derlying inode referred to by fd.
803
804       F_SET_RW_HINT (uint64_t *; since Linux 4.13)
805              Sets  the  read/write  hint value associated with the underlying
806              inode referred to by fd.  This hint persists until either it  is
807              explicitly modified or the underlying filesystem is unmounted.
808
809       F_GET_FILE_RW_HINT (uint64_t *; since Linux 4.13)
810              Returns  the  value  of  the read/write hint associated with the
811              open file description referred to by fd.
812
813       F_SET_FILE_RW_HINT (uint64_t *; since Linux 4.13)
814              Sets the read/write hint value associated with the open file de‐
815              scription referred to by fd.
816
817       If  an  open  file description has not been assigned a read/write hint,
818       then it shall use the value assigned to the inode, if any.
819
820       The following read/write hints are valid since Linux 4.13:
821
822       RWH_WRITE_LIFE_NOT_SET
823              No specific hint has been set.  This is the default value.
824
825       RWH_WRITE_LIFE_NONE
826              No specific write lifetime is associated with this file  or  in‐
827              ode.
828
829       RWH_WRITE_LIFE_SHORT
830              Data  written to this inode or via this open file description is
831              expected to have a short lifetime.
832
833       RWH_WRITE_LIFE_MEDIUM
834              Data written to this inode or via this open file description  is
835              expected  to  have  a  lifetime  longer  than  data written with
836              RWH_WRITE_LIFE_SHORT.
837
838       RWH_WRITE_LIFE_LONG
839              Data written to this inode or via this open file description  is
840              expected  to  have  a  lifetime  longer  than  data written with
841              RWH_WRITE_LIFE_MEDIUM.
842
843       RWH_WRITE_LIFE_EXTREME
844              Data written to this inode or via this open file description  is
845              expected  to  have  a  lifetime  longer  than  data written with
846              RWH_WRITE_LIFE_LONG.
847
848       All the write-specific hints are relative to each other, and  no  indi‐
849       vidual absolute meaning should be attributed to them.
850

RETURN VALUE

852       For a successful call, the return value depends on the operation:
853
854       F_DUPFD
855              The new file descriptor.
856
857       F_GETFD
858              Value of file descriptor flags.
859
860       F_GETFL
861              Value of file status flags.
862
863       F_GETLEASE
864              Type of lease held on file descriptor.
865
866       F_GETOWN
867              Value of file descriptor owner.
868
869       F_GETSIG
870              Value  of  signal  sent  when read or write becomes possible, or
871              zero for traditional SIGIO behavior.
872
873       F_GETPIPE_SZ, F_SETPIPE_SZ
874              The pipe capacity.
875
876       F_GET_SEALS
877              A bit mask identifying the seals that have been set for the  in‐
878              ode referred to by fd.
879
880       All other commands
881              Zero.
882
883       On error, -1 is returned, and errno is set appropriately.
884

ERRORS

886       EACCES or EAGAIN
887              Operation is prohibited by locks held by other processes.
888
889       EAGAIN The  operation  is  prohibited because the file has been memory-
890              mapped by another process.
891
892       EBADF  fd is not an open file descriptor
893
894       EBADF  cmd is F_SETLK or F_SETLKW and the  file  descriptor  open  mode
895              doesn't match with the type of lock requested.
896
897       EBUSY  cmd  is  F_SETPIPE_SZ and the new pipe capacity specified in arg
898              is smaller than the amount of buffer  space  currently  used  to
899              store data in the pipe.
900
901       EBUSY  cmd  is F_ADD_SEALS, arg includes F_SEAL_WRITE, and there exists
902              a writable, shared mapping on the file referred to by fd.
903
904       EDEADLK
905              It was detected that the specified F_SETLKW command would  cause
906              a deadlock.
907
908       EFAULT lock is outside your accessible address space.
909
910       EINTR  cmd  is  F_SETLKW  or  F_OFD_SETLKW and the operation was inter‐
911              rupted by a signal; see signal(7).
912
913       EINTR  cmd is F_GETLK, F_SETLK, F_OFD_GETLK, or  F_OFD_SETLK,  and  the
914              operation  was  interrupted  by  a  signal  before  the lock was
915              checked or acquired.  Most likely when  locking  a  remote  file
916              (e.g., locking over NFS), but can sometimes happen locally.
917
918       EINVAL The value specified in cmd is not recognized by this kernel.
919
920       EINVAL cmd is F_ADD_SEALS and arg includes an unrecognized sealing bit.
921
922       EINVAL cmd  is F_ADD_SEALS or F_GET_SEALS and the filesystem containing
923              the inode referred to by fd does not support sealing.
924
925       EINVAL cmd is F_DUPFD and arg is negative or is greater than the  maxi‐
926              mum  allowable  value  (see  the  discussion of RLIMIT_NOFILE in
927              getrlimit(2)).
928
929       EINVAL cmd is F_SETSIG and arg is not an allowable signal number.
930
931       EINVAL cmd is F_OFD_SETLK, F_OFD_SETLKW, or F_OFD_GETLK, and l_pid  was
932              not specified as zero.
933
934       EMFILE cmd  is  F_DUPFD and the per-process limit on the number of open
935              file descriptors has been reached.
936
937       ENOLCK Too many segment locks open, lock table is  full,  or  a  remote
938              locking protocol failed (e.g., locking over NFS).
939
940       ENOTDIR
941              F_NOTIFY was specified in cmd, but fd does not refer to a direc‐
942              tory.
943
944       EPERM  cmd is F_SETPIPE_SZ and the soft or hard  user  pipe  limit  has
945              been reached; see pipe(7).
946
947       EPERM  Attempted  to clear the O_APPEND flag on a file that has the ap‐
948              pend-only attribute set.
949
950       EPERM  cmd was F_ADD_SEALS, but fd was not open for writing or the cur‐
951              rent set of seals on the file already includes F_SEAL_SEAL.
952

CONFORMING TO

954       SVr4,  4.3BSD,  POSIX.1-2001.   Only  the  operations F_DUPFD, F_GETFD,
955       F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK, and F_SETLKW are specified
956       in POSIX.1-2001.
957
958       F_GETOWN  and  F_SETOWN  are  specified in POSIX.1-2001.  (To get their
959       definitions, define either _XOPEN_SOURCE with the value 500 or greater,
960       or _POSIX_C_SOURCE with the value 200809L or greater.)
961
962       F_DUPFD_CLOEXEC is specified in POSIX.1-2008.  (To get this definition,
963       define  _POSIX_C_SOURCE  with  the  value  200809L   or   greater,   or
964       _XOPEN_SOURCE with the value 700 or greater.)
965
966       F_GETOWN_EX,  F_SETOWN_EX, F_SETPIPE_SZ, F_GETPIPE_SZ, F_GETSIG, F_SET‐
967       SIG, F_NOTIFY, F_GETLEASE, and F_SETLEASE are Linux-specific.   (Define
968       the _GNU_SOURCE macro to obtain these definitions.)
969
970       F_OFD_SETLK,  F_OFD_SETLKW, and F_OFD_GETLK are Linux-specific (and one
971       must define _GNU_SOURCE to obtain their definitions), but work is being
972       done to have them included in the next version of POSIX.1.
973
974       F_ADD_SEALS and F_GET_SEALS are Linux-specific.
975

NOTES

977       The  errors  returned  by  dup2(2) are different from those returned by
978       F_DUPFD.
979
980   File locking
981       The original Linux fcntl() system call was not designed to handle large
982       file offsets (in the flock structure).  Consequently, an fcntl64() sys‐
983       tem call was added in Linux 2.4.  The newer system call employs a  dif‐
984       ferent structure for file locking, flock64, and corresponding commands,
985       F_GETLK64, F_SETLK64, and F_SETLKW64.  However, these  details  can  be
986       ignored  by  applications  using  glibc, whose fcntl() wrapper function
987       transparently employs the more recent system call where  it  is  avail‐
988       able.
989
990   Record locks
991       Since  kernel  2.0,  there  is no interaction between the types of lock
992       placed by flock(2) and fcntl().
993
994       Several systems have more fields in struct flock such as, for  example,
995       l_sysid  (to  identify  the  machine where the lock is held).  Clearly,
996       l_pid alone is not going to be very useful if the process  holding  the
997       lock  may  live on a different machine; on Linux, while present on some
998       architectures (such as MIPS32), this field is not used.
999
1000       The original Linux fcntl() system call was not designed to handle large
1001       file offsets (in the flock structure).  Consequently, an fcntl64() sys‐
1002       tem call was added in Linux 2.4.  The newer system call employs a  dif‐
1003       ferent structure for file locking, flock64, and corresponding commands,
1004       F_GETLK64, F_SETLK64, and F_SETLKW64.  However, these  details  can  be
1005       ignored  by  applications  using  glibc, whose fcntl() wrapper function
1006       transparently employs the more recent system call where  it  is  avail‐
1007       able.
1008
1009   Record locking and NFS
1010       Before Linux 3.12, if an NFSv4 client loses contact with the server for
1011       a period of time (defined as more than 90 seconds  with  no  communica‐
1012       tion),  it might lose and regain a lock without ever being aware of the
1013       fact.  (The period of time after which contact is assumed lost is known
1014       as  the NFSv4 leasetime.  On a Linux NFS server, this can be determined
1015       by looking at /proc/fs/nfsd/nfsv4leasetime, which expresses the  period
1016       in seconds.  The default value for this file is 90.)  This scenario po‐
1017       tentially risks data corruption, since another process might acquire  a
1018       lock in the intervening period and perform file I/O.
1019
1020       Since Linux 3.12, if an NFSv4 client loses contact with the server, any
1021       I/O to the file by a process which "thinks" it holds a lock  will  fail
1022       until  that  process  closes and reopens the file.  A kernel parameter,
1023       nfs.recover_lost_locks, can be set to 1 to obtain the  pre-3.12  behav‐
1024       ior, whereby the client will attempt to recover lost locks when contact
1025       is reestablished with the server.  Because of  the  attendant  risk  of
1026       data corruption, this parameter defaults to 0 (disabled).
1027

BUGS

1029   F_SETFL
1030       It  is  not  possible to use F_SETFL to change the state of the O_DSYNC
1031       and O_SYNC flags.  Attempts to change the  state  of  these  flags  are
1032       silently ignored.
1033
1034   F_GETOWN
1035       A limitation of the Linux system call conventions on some architectures
1036       (notably i386) means that if a (negative) process group ID  to  be  re‐
1037       turned  by  F_GETOWN  falls  in  the range -1 to -4095, then the return
1038       value is wrongly interpreted by glibc as an error in the  system  call;
1039       that is, the return value of fcntl() will be -1, and errno will contain
1040       the (positive) process group ID.  The Linux-specific F_GETOWN_EX opera‐
1041       tion  avoids  this  problem.  Since glibc version 2.11, glibc makes the
1042       kernel  F_GETOWN  problem  invisible  by  implementing  F_GETOWN  using
1043       F_GETOWN_EX.
1044
1045   F_SETOWN
1046       In  Linux 2.4 and earlier, there is bug that can occur when an unprivi‐
1047       leged process uses F_SETOWN to specify the owner of a socket  file  de‐
1048       scriptor as a process (group) other than the caller.  In this case, fc‐
1049       ntl() can return -1 with errno  set  to  EPERM,  even  when  the  owner
1050       process  (group)  is one that the caller has permission to send signals
1051       to.  Despite this error return, the file descriptor owner is  set,  and
1052       signals will be sent to the owner.
1053
1054   Deadlock detection
1055       The  deadlock-detection  algorithm  employed by the kernel when dealing
1056       with F_SETLKW requests can yield both false negatives (failures to  de‐
1057       tect  deadlocks,  leaving a set of deadlocked processes blocked indefi‐
1058       nitely) and false positives (EDEADLK errors when there is no deadlock).
1059       For  example, the kernel limits the lock depth of its dependency search
1060       to 10 steps, meaning that circular deadlock  chains  that  exceed  that
1061       size  will  not be detected.  In addition, the kernel may falsely indi‐
1062       cate a deadlock when two or more processes created using  the  clone(2)
1063       CLONE_FILES flag place locks that appear (to the kernel) to conflict.
1064
1065   Mandatory locking
1066       The Linux implementation of mandatory locking is subject to race condi‐
1067       tions which render it unreliable: a write(2) call that overlaps with  a
1068       lock  may  modify  data after the mandatory lock is acquired; a read(2)
1069       call that overlaps with a lock may detect changes  to  data  that  were
1070       made only after a write lock was acquired.  Similar races exist between
1071       mandatory locks and mmap(2).  It is therefore inadvisable  to  rely  on
1072       mandatory locking.
1073

COLOPHON

1084       This  page  is  part of release 5.10 of the Linux man-pages project.  A
1085       description of the project, information about reporting bugs,  and  the
1086       latest     version     of     this    page,    can    be    found    at
1087       https://www.kernel.org/doc/man-pages/.
1088
1089
1090
1091Linux                             2020-12-21                          FCNTL(2)