1fcntl(2)                      System Calls Manual                     fcntl(2)
2
3
4

NAME

6       fcntl - manipulate file descriptor
7

LIBRARY

9       Standard C library (libc, -lc)
10

SYNOPSIS

12       #include <fcntl.h>
13
14       int fcntl(int fd, int cmd, ... /* arg */ );
15

DESCRIPTION

17       fcntl() performs one of the operations described below on the open file
18       descriptor fd.  The operation is determined by cmd.
19
20       fcntl() can take an optional third argument.  Whether or not this argu‐
21       ment  is  required is determined by cmd.  The required argument type is
22       indicated in parentheses after each cmd name (in most  cases,  the  re‐
23       quired  type  is int, and we identify the argument using the name arg),
24       or void is specified if the argument is not required.
25
26       Certain of the operations below are supported only since  a  particular
27       Linux  kernel  version.   The  preferred method of checking whether the
28       host kernel supports a particular operation is to invoke  fcntl()  with
29       the  desired  cmd value and then test whether the call failed with EIN‐
30       VAL, indicating that the kernel does not recognize this value.
31
32   Duplicating a file descriptor
33       F_DUPFD (int)
34              Duplicate the  file  descriptor  fd  using  the  lowest-numbered
35              available file descriptor greater than or equal to arg.  This is
36              different from dup2(2), which uses exactly the  file  descriptor
37              specified.
38
39              On success, the new file descriptor is returned.
40
41              See dup(2) for further details.
42
43       F_DUPFD_CLOEXEC (int; since Linux 2.6.24)
44              As  for F_DUPFD, but additionally set the close-on-exec flag for
45              the duplicate file descriptor.  Specifying this flag  permits  a
46              program  to avoid an additional fcntl() F_SETFD operation to set
47              the FD_CLOEXEC flag.  For an explanation of  why  this  flag  is
48              useful, see the description of O_CLOEXEC in open(2).
49
50   File descriptor flags
51       The  following commands manipulate the flags associated with a file de‐
52       scriptor.  Currently, only one such flag is  defined:  FD_CLOEXEC,  the
53       close-on-exec  flag.  If the FD_CLOEXEC bit is set, the file descriptor
54       will automatically be closed during a successful  execve(2).   (If  the
55       execve(2)  fails, the file descriptor is left open.)  If the FD_CLOEXEC
56       bit is not set, the file descriptor will  remain  open  across  an  ex‐
57       ecve(2).
58
59       F_GETFD (void)
60              Return  (as  the function result) the file descriptor flags; arg
61              is ignored.
62
63       F_SETFD (int)
64              Set the file descriptor flags to the value specified by arg.
65
66       In multithreaded programs, using fcntl() F_SETFD to set  the  close-on-
67       exec  flag  at  the same time as another thread performs a fork(2) plus
68       execve(2) is vulnerable to a race condition  that  may  unintentionally
69       leak  the file descriptor to the program executed in the child process.
70       See the discussion of the O_CLOEXEC flag in open(2) for details  and  a
71       remedy to the problem.
72
73   File status flags
74       Each  open  file  description has certain associated status flags, ini‐
75       tialized by open(2) and possibly modified by fcntl().  Duplicated  file
76       descriptors  (made with dup(2), fcntl(F_DUPFD), fork(2), etc.) refer to
77       the same open file description, and thus share  the  same  file  status
78       flags.
79
80       The file status flags and their semantics are described in open(2).
81
82       F_GETFL (void)
83              Return  (as  the  function  result) the file access mode and the
84              file status flags; arg is ignored.
85
86       F_SETFL (int)
87              Set the file status flags to the value specified by  arg.   File
88              access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation flags
89              (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg  are  ignored.
90              On  Linux,  this  command can change only the O_APPEND, O_ASYNC,
91              O_DIRECT, O_NOATIME, and O_NONBLOCK flags.  It is  not  possible
92              to change the O_DSYNC and O_SYNC flags; see BUGS, below.
93
94   Advisory record locking
95       Linux  implements traditional ("process-associated") UNIX record locks,
96       as standardized by POSIX.  For a Linux-specific alternative with better
97       semantics, see the discussion of open file description locks below.
98
99       F_SETLK,  F_SETLKW,  and F_GETLK are used to acquire, release, and test
100       for the existence of record locks (also known as byte-range,  file-seg‐
101       ment, or file-region locks).  The third argument, lock, is a pointer to
102       a structure that has at least the following fields (in unspecified  or‐
103       der).
104
105           struct flock {
106               ...
107               short l_type;    /* Type of lock: F_RDLCK,
108                                   F_WRLCK, F_UNLCK */
109               short l_whence;  /* How to interpret l_start:
110                                   SEEK_SET, SEEK_CUR, SEEK_END */
111               off_t l_start;   /* Starting offset for lock */
112               off_t l_len;     /* Number of bytes to lock */
113               pid_t l_pid;     /* PID of process blocking our lock
114                                   (set by F_GETLK and F_OFD_GETLK) */
115               ...
116           };
117
118       The  l_whence,  l_start, and l_len fields of this structure specify the
119       range of bytes we wish to lock.  Bytes past the end of the file may  be
120       locked, but not bytes before the start of the file.
121
122       l_start  is  the starting offset for the lock, and is interpreted rela‐
123       tive to either: the start of the file (if l_whence  is  SEEK_SET);  the
124       current  file  offset (if l_whence is SEEK_CUR); or the end of the file
125       (if l_whence is SEEK_END).  In the final two cases, l_start  can  be  a
126       negative  number  provided  the offset does not lie before the start of
127       the file.
128
129       l_len specifies the number of bytes to be locked.  If  l_len  is  posi‐
130       tive,  then  the  range to be locked covers bytes l_start up to and in‐
131       cluding l_start+l_len-1.  Specifying 0 for l_len has the special  mean‐
132       ing:  lock all bytes starting at the location specified by l_whence and
133       l_start through to the end of file, no matter how large the file grows.
134
135       POSIX.1-2001 allows (but does not require) an implementation to support
136       a negative l_len value; if l_len is negative, the interval described by
137       lock covers bytes l_start+l_len up to and including l_start-1.  This is
138       supported since Linux 2.4.21 and Linux 2.5.49.
139
140       The  l_type  field  can  be  used  to place a read (F_RDLCK) or a write
141       (F_WRLCK) lock on a file.  Any number of processes may hold a read lock
142       (shared  lock)  on a file region, but only one process may hold a write
143       lock (exclusive lock).  An exclusive lock  excludes  all  other  locks,
144       both  shared and exclusive.  A single process can hold only one type of
145       lock on a file region; if a new lock is applied  to  an  already-locked
146       region,  then  the  existing  lock  is  converted to the new lock type.
147       (Such conversions may involve splitting, shrinking, or coalescing  with
148       an  existing  lock if the byte range specified by the new lock does not
149       precisely coincide with the range of the existing lock.)
150
151       F_SETLK (struct flock *)
152              Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release  a
153              lock  (when  l_type  is  F_UNLCK)  on the bytes specified by the
154              l_whence, l_start, and l_len fields of lock.  If  a  conflicting
155              lock  is  held by another process, this call returns -1 and sets
156              errno to EACCES or EAGAIN.  (The error  returned  in  this  case
157              differs across implementations, so POSIX requires a portable ap‐
158              plication to check for both errors.)
159
160       F_SETLKW (struct flock *)
161              As for F_SETLK, but if a conflicting lock is held on  the  file,
162              then  wait  for that lock to be released.  If a signal is caught
163              while waiting, then the call is interrupted and (after the  sig‐
164              nal handler has returned) returns immediately (with return value
165              -1 and errno set to EINTR; see signal(7)).
166
167       F_GETLK (struct flock *)
168              On input to this call, lock describes a lock we  would  like  to
169              place  on  the  file.  If the lock could be placed, fcntl() does
170              not actually place it, but returns F_UNLCK in the  l_type  field
171              of lock and leaves the other fields of the structure unchanged.
172
173              If  one or more incompatible locks would prevent this lock being
174              placed, then fcntl() returns details about one of those locks in
175              the l_type, l_whence, l_start, and l_len fields of lock.  If the
176              conflicting lock is a  traditional  (process-associated)  record
177              lock,  then  the  l_pid  field  is set to the PID of the process
178              holding that lock.  If the conflicting lock is an open file  de‐
179              scription lock, then l_pid is set to -1.  Note that the returned
180              information may already be out of date by the  time  the  caller
181              inspects it.
182
183       In  order  to place a read lock, fd must be open for reading.  In order
184       to place a write lock, fd must be open  for  writing.   To  place  both
185       types of lock, open a file read-write.
186
187       When placing locks with F_SETLKW, the kernel detects deadlocks, whereby
188       two or more processes have their  lock  requests  mutually  blocked  by
189       locks  held  by  the  other  processes.  For example, suppose process A
190       holds a write lock on byte 100 of a file, and process B holds  a  write
191       lock  on  byte 200.  If each process then attempts to lock the byte al‐
192       ready locked by the other process using F_SETLKW, then,  without  dead‐
193       lock detection, both processes would remain blocked indefinitely.  When
194       the kernel detects such deadlocks, it causes one of the  blocking  lock
195       requests  to  immediately  fail  with the error EDEADLK; an application
196       that encounters such an error should release some of its locks to allow
197       other  applications  to proceed before attempting regain the locks that
198       it requires.  Circular deadlocks involving more than two processes  are
199       also  detected.   Note, however, that there are limitations to the ker‐
200       nel's deadlock-detection algorithm; see BUGS.
201
202       As well as being removed by an explicit F_UNLCK, record locks are auto‐
203       matically released when the process terminates.
204
205       Record  locks are not inherited by a child created via fork(2), but are
206       preserved across an execve(2).
207
208       Because of the buffering performed by the stdio(3) library, the use  of
209       record  locking  with  routines  in that package should be avoided; use
210       read(2) and write(2) instead.
211
212       The record locks described above are associated with the  process  (un‐
213       like  the  open file description locks described below).  This has some
214       unfortunate consequences:
215
216       •  If a process closes any file descriptor referring to  a  file,  then
217          all  of the process's locks on that file are released, regardless of
218          the file descriptor(s) on which the locks were  obtained.   This  is
219          bad:  it  means  that a process can lose its locks on a file such as
220          /etc/passwd or /etc/mtab when for some reason a library function de‐
221          cides to open, read, and close the same file.
222
223       •  The  threads  in  a  process  share locks.  In other words, a multi‐
224          threaded program can't use record locking  to  ensure  that  threads
225          don't simultaneously access the same region of a file.
226
227       Open file description locks solve both of these problems.
228
229   Open file description locks (non-POSIX)
230       Open  file description locks are advisory byte-range locks whose opera‐
231       tion is in most respects identical to the traditional record locks  de‐
232       scribed  above.   This lock type is Linux-specific, and available since
233       Linux 3.15.  (There is a proposal with the Austin Group to include this
234       lock type in the next revision of POSIX.1.)  For an explanation of open
235       file descriptions, see open(2).
236
237       The principal difference between the two lock  types  is  that  whereas
238       traditional  record  locks are associated with a process, open file de‐
239       scription locks are associated with the open file description on  which
240       they  are  acquired,  much  like  locks acquired with flock(2).  Conse‐
241       quently (and unlike traditional advisory record locks), open  file  de‐
242       scription  locks  are  inherited  across  fork(2)  (and  clone(2)  with
243       CLONE_FILES), and are only automatically released on the last close  of
244       the  open  file  description, instead of being released on any close of
245       the file.
246
247       Conflicting lock combinations (i.e., a read lock and a  write  lock  or
248       two  write  locks)  where one lock is an open file description lock and
249       the other is a traditional record lock conflict even when they are  ac‐
250       quired by the same process on the same file descriptor.
251
252       Open  file  description locks placed via the same open file description
253       (i.e., via the same file descriptor, or via a duplicate of the file de‐
254       scriptor  created  by  fork(2), dup(2), fcntl() F_DUPFD, and so on) are
255       always compatible: if a new lock is placed on an already locked region,
256       then  the  existing lock is converted to the new lock type.  (Such con‐
257       versions may result in splitting, shrinking, or coalescing with an  ex‐
258       isting lock as discussed above.)
259
260       On  the  other hand, open file description locks may conflict with each
261       other when they are acquired  via  different  open  file  descriptions.
262       Thus, the threads in a multithreaded program can use open file descrip‐
263       tion locks to synchronize access to a file region by having each thread
264       perform  its own open(2) on the file and applying locks via the result‐
265       ing file descriptor.
266
267       As with traditional advisory locks,  the  third  argument  to  fcntl(),
268       lock, is a pointer to an flock structure.  By contrast with traditional
269       record locks, the l_pid field of that structure must  be  set  to  zero
270       when using the commands described below.
271
272       The commands for working with open file description locks are analogous
273       to those used with traditional locks:
274
275       F_OFD_SETLK (struct flock *)
276              Acquire an open file description lock (when l_type is F_RDLCK or
277              F_WRLCK)  or  release an open file description lock (when l_type
278              is F_UNLCK) on the bytes specified by the l_whence, l_start, and
279              l_len  fields of lock.  If a conflicting lock is held by another
280              process, this call returns -1 and sets errno to EAGAIN.
281
282       F_OFD_SETLKW (struct flock *)
283              As for F_OFD_SETLK, but if a conflicting lock  is  held  on  the
284              file,  then  wait  for that lock to be released.  If a signal is
285              caught while waiting, then the call is  interrupted  and  (after
286              the  signal  handler has returned) returns immediately (with re‐
287              turn value -1 and errno set to EINTR; see signal(7)).
288
289       F_OFD_GETLK (struct flock *)
290              On input to this call, lock describes an open  file  description
291              lock  we  would like to place on the file.  If the lock could be
292              placed, fcntl() does not actually place it, but returns  F_UNLCK
293              in  the  l_type field of lock and leaves the other fields of the
294              structure unchanged.  If one or more  incompatible  locks  would
295              prevent  this lock being placed, then details about one of these
296              locks are returned via lock, as described above for F_GETLK.
297
298       In the current implementation, no deadlock detection is  performed  for
299       open  file  description locks.  (This contrasts with process-associated
300       record locks, for which the kernel does perform deadlock detection.)
301
302   Mandatory locking
303       Warning: the Linux implementation of mandatory locking  is  unreliable.
304       See  BUGS  below.  Because of these bugs, and the fact that the feature
305       is believed to be little used, since Linux 4.5, mandatory  locking  has
306       been made an optional feature, governed by a configuration option (CON‐
307       FIG_MANDATORY_FILE_LOCKING).  This feature is no  longer  supported  at
308       all in Linux 5.15 and above.
309
310       By  default,  both  traditional  (process-associated) and open file de‐
311       scription record locks are advisory.  Advisory locks are  not  enforced
312       and are useful only between cooperating processes.
313
314       Both  lock  types  can also be mandatory.  Mandatory locks are enforced
315       for all processes.  If a process tries to perform an  incompatible  ac‐
316       cess (e.g., read(2) or write(2)) on a file region that has an incompat‐
317       ible mandatory lock, then the result depends upon  whether  the  O_NON‐
318       BLOCK flag is enabled for its open file description.  If the O_NONBLOCK
319       flag is not enabled, then the system call is blocked until the lock  is
320       removed  or converted to a mode that is compatible with the access.  If
321       the O_NONBLOCK flag is enabled, then the system call fails with the er‐
322       ror EAGAIN.
323
324       To  make use of mandatory locks, mandatory locking must be enabled both
325       on the filesystem that contains the file to be locked, and on the  file
326       itself.   Mandatory  locking  is  enabled on a filesystem using the "-o
327       mand" option to mount(8), or the MS_MANDLOCK flag for mount(2).  Manda‐
328       tory locking is enabled on a file by disabling group execute permission
329       on the file and enabling the set-group-ID permission bit (see  chmod(1)
330       and chmod(2)).
331
332       Mandatory  locking  is not specified by POSIX.  Some other systems also
333       support mandatory locking, although the details of  how  to  enable  it
334       vary across systems.
335
336   Lost locks
337       When an advisory lock is obtained on a networked filesystem such as NFS
338       it is possible that the lock might get lost.  This may  happen  due  to
339       administrative  action  on  the  server,  or due to a network partition
340       (i.e., loss of network connectivity with the server) which  lasts  long
341       enough  for the server to assume that the client is no longer function‐
342       ing.
343
344       When the filesystem determines  that  a  lock  has  been  lost,  future
345       read(2)  or  write(2) requests may fail with the error EIO.  This error
346       will persist until the lock  is  removed  or  the  file  descriptor  is
347       closed.   Since  Linux 3.12, this happens at least for NFSv4 (including
348       all minor versions).
349
350       Some versions of UNIX send a signal  (SIGLOST)  in  this  circumstance.
351       Linux  does  not define this signal, and does not provide any asynchro‐
352       nous notification of lost locks.
353
354   Managing signals
355       F_GETOWN, F_SETOWN, F_GETOWN_EX, F_SETOWN_EX,  F_GETSIG,  and  F_SETSIG
356       are used to manage I/O availability signals:
357
358       F_GETOWN (void)
359              Return  (as the function result) the process ID or process group
360              ID currently receiving SIGIO and SIGURG signals  for  events  on
361              file  descriptor  fd.  Process IDs are returned as positive val‐
362              ues; process group IDs are returned as negative values (but  see
363              BUGS below).  arg is ignored.
364
365       F_SETOWN (int)
366              Set  the  process ID or process group ID that will receive SIGIO
367              and SIGURG signals for events on the file  descriptor  fd.   The
368              target  process  or  process  group  ID  is specified in arg.  A
369              process ID is specified as a positive value; a process group  ID
370              is  specified  as  a negative value.  Most commonly, the calling
371              process specifies itself as the owner (that is, arg is specified
372              as getpid(2)).
373
374              As  well as setting the file descriptor owner, one must also en‐
375              able generation of signals on the file descriptor.  This is done
376              by  using  the  fcntl()  F_SETFL command to set the O_ASYNC file
377              status flag on the file descriptor.  Subsequently, a SIGIO  sig‐
378              nal  is  sent  whenever  input or output becomes possible on the
379              file descriptor.  The fcntl() F_SETSIG command can  be  used  to
380              obtain delivery of a signal other than SIGIO.
381
382              Sending a signal to the owner process (group) specified by F_SE‐
383              TOWN is subject to the same permissions checks as are  described
384              for  kill(2),  where the sending process is the one that employs
385              F_SETOWN (but see BUGS below).  If this permission check  fails,
386              then the signal is silently discarded.  Note: The F_SETOWN oper‐
387              ation records the caller's credentials at the time  of  the  fc‐
388              ntl()  call, and it is these saved credentials that are used for
389              the permission checks.
390
391              If the file descriptor fd refers to a socket, F_SETOWN also  se‐
392              lects  the  recipient  of SIGURG signals that are delivered when
393              out-of-band data arrives on that socket.  (SIGURG is sent in any
394              situation  where  select(2) would report the socket as having an
395              "exceptional condition".)
396
397              The following was true in Linux 2.6.x up to and including  Linux
398              2.6.11:
399
400                     If  a  nonzero  value  is  given  to F_SETSIG in a multi‐
401                     threaded process running with a  threading  library  that
402                     supports  thread  groups  (e.g.,  NPTL),  then a positive
403                     value given to F_SETOWN has a different meaning:  instead
404                     of  being a process ID identifying a whole process, it is
405                     a thread  ID  identifying  a  specific  thread  within  a
406                     process.  Consequently, it may be necessary to pass F_SE‐
407                     TOWN the result of gettid(2) instead of getpid(2) to  get
408                     sensible  results  when  F_SETSIG  is  used.  (In current
409                     Linux threading implementations, a main  thread's  thread
410                     ID is the same as its process ID.  This means that a sin‐
411                     gle-threaded program can equally use  gettid(2)  or  get‐
412                     pid(2) in this scenario.)  Note, however, that the state‐
413                     ments in this paragraph do not apply to the SIGURG signal
414                     generated  for  out-of-band data on a socket: this signal
415                     is always sent to either a process or  a  process  group,
416                     depending on the value given to F_SETOWN.
417
418              The above behavior was accidentally dropped in Linux 2.6.12, and
419              won't be restored.  From Linux 2.6.32 onward, use F_SETOWN_EX to
420              target SIGIO and SIGURG signals at a particular thread.
421
422       F_GETOWN_EX (struct f_owner_ex *) (since Linux 2.6.32)
423              Return  the current file descriptor owner settings as defined by
424              a previous F_SETOWN_EX operation.  The information  is  returned
425              in  the  structure  pointed  to  by arg, which has the following
426              form:
427
428                  struct f_owner_ex {
429                      int   type;
430                      pid_t pid;
431                  };
432
433              The  type  field  will  have  one  of  the  values  F_OWNER_TID,
434              F_OWNER_PID, or F_OWNER_PGRP.  The pid field is a positive inte‐
435              ger representing a thread ID, process ID, or process  group  ID.
436              See F_SETOWN_EX for more details.
437
438       F_SETOWN_EX (struct f_owner_ex *) (since Linux 2.6.32)
439              This  operation  performs a similar task to F_SETOWN.  It allows
440              the caller to direct I/O  availability  signals  to  a  specific
441              thread,  process,  or  process  group.  The caller specifies the
442              target of signals via arg, which is a pointer  to  a  f_owner_ex
443              structure.   The  type  field  has  one of the following values,
444              which define how pid is interpreted:
445
446              F_OWNER_TID
447                     Send the signal to the thread whose thread ID (the  value
448                     returned by a call to clone(2) or gettid(2)) is specified
449                     in pid.
450
451              F_OWNER_PID
452                     Send the signal to the process whose ID is  specified  in
453                     pid.
454
455              F_OWNER_PGRP
456                     Send  the  signal to the process group whose ID is speci‐
457                     fied in pid.  (Note that, unlike with F_SETOWN, a process
458                     group ID is specified as a positive value here.)
459
460       F_GETSIG (void)
461              Return  (as  the  function result) the signal sent when input or
462              output becomes possible.  A value of zero means SIGIO  is  sent.
463              Any  other  value  (including SIGIO) is the signal sent instead,
464              and in this case additional info is available to the signal han‐
465              dler if installed with SA_SIGINFO.  arg is ignored.
466
467       F_SETSIG (int)
468              Set the signal sent when input or output becomes possible to the
469              value given in arg.  A value of zero means to send  the  default
470              SIGIO  signal.   Any other value (including SIGIO) is the signal
471              to send instead, and in this case additional info  is  available
472              to the signal handler if installed with SA_SIGINFO.
473
474              By  using  F_SETSIG with a nonzero value, and setting SA_SIGINFO
475              for the signal handler  (see  sigaction(2)),  extra  information
476              about  I/O events is passed to the handler in a siginfo_t struc‐
477              ture.  If the si_code field indicates the  source  is  SI_SIGIO,
478              the  si_fd  field  gives the file descriptor associated with the
479              event.  Otherwise, there is no indication which file descriptors
480              are pending, and you should use the usual mechanisms (select(2),
481              poll(2), read(2) with O_NONBLOCK set etc.)  to  determine  which
482              file descriptors are available for I/O.
483
484              Note  that the file descriptor provided in si_fd is the one that
485              was specified during the F_SETSIG operation.  This can  lead  to
486              an  unusual  corner  case.  If the file descriptor is duplicated
487              (dup(2) or similar), and the original file descriptor is closed,
488              then  I/O  events  will  continue to be generated, but the si_fd
489              field will contain the number of the now closed file descriptor.
490
491              By selecting a real time signal (value  >=  SIGRTMIN),  multiple
492              I/O  events may be queued using the same signal numbers.  (Queu‐
493              ing is dependent on available  memory.)   Extra  information  is
494              available if SA_SIGINFO is set for the signal handler, as above.
495
496              Note  that Linux imposes a limit on the number of real-time sig‐
497              nals that may be queued to a process (see getrlimit(2) and  sig‐
498              nal(7)) and if this limit is reached, then the kernel reverts to
499              delivering SIGIO, and this signal is  delivered  to  the  entire
500              process rather than to a specific thread.
501
502       Using  these mechanisms, a program can implement fully asynchronous I/O
503       without using select(2) or poll(2) most of the time.
504
505       The use of O_ASYNC is specific to BSD  and  Linux.   The  only  use  of
506       F_GETOWN  and  F_SETOWN specified in POSIX.1 is in conjunction with the
507       use of the SIGURG signal on sockets.  (POSIX does not specify the SIGIO
508       signal.)   F_GETOWN_EX,  F_SETOWN_EX, F_GETSIG, and F_SETSIG are Linux-
509       specific.  POSIX has asynchronous I/O and the aio_sigevent structure to
510       achieve  similar  things;  these are also available in Linux as part of
511       the GNU C Library (glibc).
512
513   Leases
514       F_SETLEASE and F_GETLEASE (Linux 2.4 onward) are used  to  establish  a
515       new lease, and retrieve the current lease, on the open file description
516       referred to by the file descriptor fd.  A file lease provides a  mecha‐
517       nism  whereby the process holding the lease (the "lease holder") is no‐
518       tified (via delivery of a signal) when a process (the "lease  breaker")
519       tries  to  open(2) or truncate(2) the file referred to by that file de‐
520       scriptor.
521
522       F_SETLEASE (int)
523              Set or remove a file lease according to which of  the  following
524              values is specified in the integer arg:
525
526              F_RDLCK
527                     Take  out  a  read  lease.   This  will cause the calling
528                     process to be notified when the file is opened for  writ‐
529                     ing  or is truncated.  A read lease can be placed only on
530                     a file descriptor that is opened read-only.
531
532              F_WRLCK
533                     Take out a write lease.  This will cause the caller to be
534                     notified  when  the file is opened for reading or writing
535                     or is truncated.  A write lease may be placed on  a  file
536                     only  if there are no other open file descriptors for the
537                     file.
538
539              F_UNLCK
540                     Remove our lease from the file.
541
542       Leases are associated with an  open  file  description  (see  open(2)).
543       This  means  that  duplicate file descriptors (created by, for example,
544       fork(2) or dup(2)) refer to the same lease, and this lease may be modi‐
545       fied  or  released  using  any  of these descriptors.  Furthermore, the
546       lease is released by either an explicit F_UNLCK  operation  on  any  of
547       these  duplicate  file  descriptors,  or when all such file descriptors
548       have been closed.
549
550       Leases may be taken out only on regular files.  An unprivileged process
551       may  take  out  a  lease  only  on a file whose UID (owner) matches the
552       filesystem UID of the process.  A process with the CAP_LEASE capability
553       may take out leases on arbitrary files.
554
555       F_GETLEASE (void)
556              Indicates  what  type  of  lease is associated with the file de‐
557              scriptor fd by returning either F_RDLCK,  F_WRLCK,  or  F_UNLCK,
558              indicating,  respectively,  a  read lease , a write lease, or no
559              lease.  arg is ignored.
560
561       When a process (the "lease breaker") performs an open(2) or truncate(2)
562       that conflicts with a lease established via F_SETLEASE, the system call
563       is blocked by the kernel and the kernel notifies the  lease  holder  by
564       sending  it  a  signal (SIGIO by default).  The lease holder should re‐
565       spond to receipt of this signal by doing whatever cleanup  is  required
566       in  preparation  for  the file to be accessed by another process (e.g.,
567       flushing cached buffers) and then either remove or downgrade its lease.
568       A  lease  is removed by performing an F_SETLEASE command specifying arg
569       as F_UNLCK.  If the lease holder currently holds a write lease  on  the
570       file, and the lease breaker is opening the file for reading, then it is
571       sufficient for the lease holder to downgrade the lease to a read lease.
572       This  is  done  by  performing  an F_SETLEASE command specifying arg as
573       F_RDLCK.
574
575       If the lease holder fails to downgrade or remove the lease  within  the
576       number  of seconds specified in /proc/sys/fs/lease-break-time, then the
577       kernel forcibly removes or downgrades the lease holder's lease.
578
579       Once a lease break has been initiated, F_GETLEASE  returns  the  target
580       lease  type (either F_RDLCK or F_UNLCK, depending on what would be com‐
581       patible with the lease breaker)  until  the  lease  holder  voluntarily
582       downgrades  or  removes  the lease or the kernel forcibly does so after
583       the lease break timer expires.
584
585       Once the lease has been voluntarily or forcibly removed or  downgraded,
586       and  assuming  the lease breaker has not unblocked its system call, the
587       kernel permits the lease breaker's system call to proceed.
588
589       If the lease breaker's blocked open(2) or truncate(2) is interrupted by
590       a  signal handler, then the system call fails with the error EINTR, but
591       the other steps still occur as described above.  If the  lease  breaker
592       is killed by a signal while blocked in open(2) or truncate(2), then the
593       other steps still occur as described above.  If the lease breaker spec‐
594       ifies  the  O_NONBLOCK flag when calling open(2), then the call immedi‐
595       ately fails with the error EWOULDBLOCK, but the other steps still occur
596       as described above.
597
598       The  default  signal used to notify the lease holder is SIGIO, but this
599       can be changed using the F_SETSIG command to fcntl().   If  a  F_SETSIG
600       command  is  performed (even one specifying SIGIO), and the signal han‐
601       dler is established using SA_SIGINFO, then the handler will  receive  a
602       siginfo_t structure as its second argument, and the si_fd field of this
603       argument will hold the file descriptor of the leased file that has been
604       accessed  by  another  process.   (This  is  useful if the caller holds
605       leases against multiple files.)
606
607   File and directory change notification (dnotify)
608       F_NOTIFY (int)
609              (Linux 2.4 onward) Provide notification when the  directory  re‐
610              ferred to by fd or any of the files that it contains is changed.
611              The events to be notified are specified in arg, which is  a  bit
612              mask  specified  by ORing together zero or more of the following
613              bits:
614
615              DN_ACCESS
616                     A file was accessed  (read(2),  pread(2),  readv(2),  and
617                     similar)
618              DN_MODIFY
619                     A  file  was  modified  (write(2),  pwrite(2), writev(2),
620                     truncate(2), ftruncate(2), and similar).
621              DN_CREATE
622                     A  file  was  created   (open(2),   creat(2),   mknod(2),
623                     mkdir(2), link(2), symlink(2), rename(2) into this direc‐
624                     tory).
625              DN_DELETE
626                     A file was unlinked (unlink(2), rename(2) to another  di‐
627                     rectory, rmdir(2)).
628              DN_RENAME
629                     A file was renamed within this directory (rename(2)).
630              DN_ATTRIB
631                     The   attributes   of  a  file  were  changed  (chown(2),
632                     chmod(2), utime(2), utimensat(2), and similar).
633
634              (In order to obtain these definitions, the  _GNU_SOURCE  feature
635              test macro must be defined before including any header files.)
636
637              Directory  notifications are normally "one-shot", and the appli‐
638              cation must reregister to receive further notifications.  Alter‐
639              natively,  if DN_MULTISHOT is included in arg, then notification
640              will remain in effect until explicitly removed.
641
642              A series of F_NOTIFY requests is cumulative, with the events  in
643              arg  being added to the set already monitored.  To disable noti‐
644              fication of all events, make an F_NOTIFY call specifying arg  as
645              0.
646
647              Notification  occurs via delivery of a signal.  The default sig‐
648              nal is SIGIO, but this can be changed using the F_SETSIG command
649              to  fcntl().  (Note that SIGIO is one of the nonqueuing standard
650              signals; switching to the use of a real-time signal  means  that
651              multiple  notifications  can  be queued to the process.)  In the
652              latter case, the signal handler receives a  siginfo_t  structure
653              as  its  second  argument  (if the handler was established using
654              SA_SIGINFO) and the si_fd field of this structure  contains  the
655              file  descriptor  which  generated the notification (useful when
656              establishing notification on multiple directories).
657
658              Especially when using DN_MULTISHOT, a real time signal should be
659              used  for  notification,  so  that multiple notifications can be
660              queued.
661
662              NOTE: New applications should use the inotify interface  (avail‐
663              able  since Linux 2.6.13), which provides a much superior inter‐
664              face for obtaining notifications of filesystem events.  See ino‐
665              tify(7).
666
667   Changing the capacity of a pipe
668       F_SETPIPE_SZ (int; since Linux 2.6.35)
669              Change the capacity of the pipe referred to by fd to be at least
670              arg bytes.  An unprivileged process can adjust the pipe capacity
671              to  any value between the system page size and the limit defined
672              in /proc/sys/fs/pipe-max-size (see proc(5)).   Attempts  to  set
673              the pipe capacity below the page size are silently rounded up to
674              the page size.  Attempts by an unprivileged process to  set  the
675              pipe  capacity  above  the  limit  in /proc/sys/fs/pipe-max-size
676              yield the error EPERM; a privileged  process  (CAP_SYS_RESOURCE)
677              can override the limit.
678
679              When  allocating  the  buffer for the pipe, the kernel may use a
680              capacity larger than arg, if that is convenient for  the  imple‐
681              mentation.   (In  the  current implementation, the allocation is
682              the next higher power-of-two page-size multiple of the requested
683              size.)   The  actual capacity (in bytes) that is set is returned
684              as the function result.
685
686              Attempting to set the pipe capacity smaller than the  amount  of
687              buffer  space  currently  used  to store data produces the error
688              EBUSY.
689
690              Note that because of the way the pages of the  pipe  buffer  are
691              employed  when  data is written to the pipe, the number of bytes
692              that can be written may be less than the nominal size, depending
693              on the size of the writes.
694
695       F_GETPIPE_SZ (void; since Linux 2.6.35)
696              Return  (as  the  function  result) the capacity of the pipe re‐
697              ferred to by fd.
698
699   File Sealing
700       File seals limit the set of allowed operations on a  given  file.   For
701       each seal that is set on a file, a specific set of operations will fail
702       with EPERM on this file from now on.  The file is said  to  be  sealed.
703       The default set of seals depends on the type of the underlying file and
704       filesystem.  For an overview of file sealing, a discussion of its  pur‐
705       pose, and some code examples, see memfd_create(2).
706
707       Currently, file seals can be applied only to a file descriptor returned
708       by memfd_create(2) (if the MFD_ALLOW_SEALING was employed).   On  other
709       filesystems,  all  fcntl() operations that operate on seals will return
710       EINVAL.
711
712       Seals are a property of an inode.  Thus, all open file descriptors  re‐
713       ferring  to  the  same inode share the same set of seals.  Furthermore,
714       seals can never be removed, only added.
715
716       F_ADD_SEALS (int; since Linux 3.17)
717              Add the seals given in the bit-mask argument arg to the  set  of
718              seals of the inode referred to by the file descriptor fd.  Seals
719              cannot be removed again.  Once this call succeeds, the seals are
720              enforced by the kernel immediately.  If the current set of seals
721              includes F_SEAL_SEAL (see below), then this  call  will  be  re‐
722              jected  with  EPERM.  Adding a seal that is already set is a no-
723              op, in case F_SEAL_SEAL is not set already.  In order to place a
724              seal, the file descriptor fd must be writable.
725
726       F_GET_SEALS (void; since Linux 3.17)
727              Return  (as the function result) the current set of seals of the
728              inode referred to by fd.  If no seals are set,  0  is  returned.
729              If  the  file does not support sealing, -1 is returned and errno
730              is set to EINVAL.
731
732       The following seals are available:
733
734       F_SEAL_SEAL
735              If  this  seal  is  set,  any  further  call  to  fcntl()   with
736              F_ADD_SEALS  fails  with  the error EPERM.  Therefore, this seal
737              prevents any modifications to the set of seals itself.   If  the
738              initial  set  of seals of a file includes F_SEAL_SEAL, then this
739              effectively causes the set of seals to be constant and locked.
740
741       F_SEAL_SHRINK
742              If this seal is set, the file in question cannot be  reduced  in
743              size.   This  affects  open(2)  with the O_TRUNC flag as well as
744              truncate(2) and ftruncate(2).  Those calls fail  with  EPERM  if
745              you  try  to  shrink  the file in question.  Increasing the file
746              size is still possible.
747
748       F_SEAL_GROW
749              If this seal is set, the size of the file in question cannot  be
750              increased.   This  affects  write(2) beyond the end of the file,
751              truncate(2), ftruncate(2), and fallocate(2).  These  calls  fail
752              with  EPERM  if  you use them to increase the file size.  If you
753              keep the size or shrink it, those calls still work as expected.
754
755       F_SEAL_WRITE
756              If this seal is set, you cannot modify the contents of the file.
757              Note  that  shrinking  or  growing the size of the file is still
758              possible and allowed.  Thus, this seal is normally used in  com‐
759              bination  with  one  of  the  other  seals.   This  seal affects
760              write(2) and fallocate(2) (only in  combination  with  the  FAL‐
761              LOC_FL_PUNCH_HOLE  flag).   Those  calls fail with EPERM if this
762              seal is set.  Furthermore, trying to create new shared, writable
763              memory-mappings via mmap(2) will also fail with EPERM.
764
765              Using  the  F_ADD_SEALS  operation  to set the F_SEAL_WRITE seal
766              fails with EBUSY if any writable, shared mapping  exists.   Such
767              mappings  must  be  unmapped before you can add this seal.  Fur‐
768              thermore, if there are any asynchronous I/O operations  (io_sub‐
769              mit(2)) pending on the file, all outstanding writes will be dis‐
770              carded.
771
772       F_SEAL_FUTURE_WRITE (since Linux 5.1)
773              The effect of this seal is similar to F_SEAL_WRITE, but the con‐
774              tents of the file can still be modified via shared writable map‐
775              pings that were created prior to the seal being  set.   Any  at‐
776              tempt  to  create a new writable mapping on the file via mmap(2)
777              will fail with EPERM.  Likewise, an attempt to write to the file
778              via write(2) will fail with EPERM.
779
780              Using  this seal, one process can create a memory buffer that it
781              can continue to modify while sharing that  buffer  on  a  "read-
782              only" basis with other processes.
783
784   File read/write hints
785       Write  lifetime  hints can be used to inform the kernel about the rela‐
786       tive expected lifetime of writes on a given inode or via  a  particular
787       open  file  description.   (See open(2) for an explanation of open file
788       descriptions.)  In this context, the term "write  lifetime"  means  the
789       expected  time the data will live on media, before being overwritten or
790       erased.
791
792       An application may use the different hint  values  specified  below  to
793       separate writes into different write classes, so that multiple users or
794       applications running on a single storage back-end can  aggregate  their
795       I/O  patterns in a consistent manner.  However, there are no functional
796       semantics implied by these flags, and different I/O classes can use the
797       write  lifetime  hints in arbitrary ways, so long as the hints are used
798       consistently.
799
800       The following operations can be applied to the file descriptor, fd:
801
802       F_GET_RW_HINT (uint64_t *; since Linux 4.13)
803              Returns the value of the read/write hint associated with the un‐
804              derlying inode referred to by fd.
805
806       F_SET_RW_HINT (uint64_t *; since Linux 4.13)
807              Sets  the  read/write  hint value associated with the underlying
808              inode referred to by fd.  This hint persists until either it  is
809              explicitly modified or the underlying filesystem is unmounted.
810
811       F_GET_FILE_RW_HINT (uint64_t *; since Linux 4.13)
812              Returns  the  value  of  the read/write hint associated with the
813              open file description referred to by fd.
814
815       F_SET_FILE_RW_HINT (uint64_t *; since Linux 4.13)
816              Sets the read/write hint value associated with the open file de‐
817              scription referred to by fd.
818
819       If  an  open  file description has not been assigned a read/write hint,
820       then it shall use the value assigned to the inode, if any.
821
822       The following read/write hints are valid since Linux 4.13:
823
824       RWH_WRITE_LIFE_NOT_SET
825              No specific hint has been set.  This is the default value.
826
827       RWH_WRITE_LIFE_NONE
828              No specific write lifetime is associated with this file  or  in‐
829              ode.
830
831       RWH_WRITE_LIFE_SHORT
832              Data  written to this inode or via this open file description is
833              expected to have a short lifetime.
834
835       RWH_WRITE_LIFE_MEDIUM
836              Data written to this inode or via this open file description  is
837              expected  to  have  a  lifetime  longer  than  data written with
838              RWH_WRITE_LIFE_SHORT.
839
840       RWH_WRITE_LIFE_LONG
841              Data written to this inode or via this open file description  is
842              expected  to  have  a  lifetime  longer  than  data written with
843              RWH_WRITE_LIFE_MEDIUM.
844
845       RWH_WRITE_LIFE_EXTREME
846              Data written to this inode or via this open file description  is
847              expected  to  have  a  lifetime  longer  than  data written with
848              RWH_WRITE_LIFE_LONG.
849
850       All the write-specific hints are relative to each other, and  no  indi‐
851       vidual absolute meaning should be attributed to them.
852

RETURN VALUE

854       For a successful call, the return value depends on the operation:
855
856       F_DUPFD
857              The new file descriptor.
858
859       F_GETFD
860              Value of file descriptor flags.
861
862       F_GETFL
863              Value of file status flags.
864
865       F_GETLEASE
866              Type of lease held on file descriptor.
867
868       F_GETOWN
869              Value of file descriptor owner.
870
871       F_GETSIG
872              Value  of  signal  sent  when read or write becomes possible, or
873              zero for traditional SIGIO behavior.
874
875       F_GETPIPE_SZ, F_SETPIPE_SZ
876              The pipe capacity.
877
878       F_GET_SEALS
879              A bit mask identifying the seals that have been set for the  in‐
880              ode referred to by fd.
881
882       All other commands
883              Zero.
884
885       On error, -1 is returned, and errno is set to indicate the error.
886

ERRORS

888       EACCES or EAGAIN
889              Operation is prohibited by locks held by other processes.
890
891       EAGAIN The  operation  is  prohibited because the file has been memory-
892              mapped by another process.
893
894       EBADF  fd is not an open file descriptor
895
896       EBADF  cmd is F_SETLK or F_SETLKW and the  file  descriptor  open  mode
897              doesn't match with the type of lock requested.
898
899       EBUSY  cmd  is  F_SETPIPE_SZ and the new pipe capacity specified in arg
900              is smaller than the amount of buffer  space  currently  used  to
901              store data in the pipe.
902
903       EBUSY  cmd  is F_ADD_SEALS, arg includes F_SEAL_WRITE, and there exists
904              a writable, shared mapping on the file referred to by fd.
905
906       EDEADLK
907              It was detected that the specified F_SETLKW command would  cause
908              a deadlock.
909
910       EFAULT lock is outside your accessible address space.
911
912       EINTR  cmd  is  F_SETLKW  or  F_OFD_SETLKW and the operation was inter‐
913              rupted by a signal; see signal(7).
914
915       EINTR  cmd is F_GETLK, F_SETLK, F_OFD_GETLK, or  F_OFD_SETLK,  and  the
916              operation  was  interrupted  by  a  signal  before  the lock was
917              checked or acquired.  Most likely when  locking  a  remote  file
918              (e.g., locking over NFS), but can sometimes happen locally.
919
920       EINVAL The value specified in cmd is not recognized by this kernel.
921
922       EINVAL cmd is F_ADD_SEALS and arg includes an unrecognized sealing bit.
923
924       EINVAL cmd  is F_ADD_SEALS or F_GET_SEALS and the filesystem containing
925              the inode referred to by fd does not support sealing.
926
927       EINVAL cmd is F_DUPFD and arg is negative or is greater than the  maxi‐
928              mum  allowable  value  (see  the  discussion of RLIMIT_NOFILE in
929              getrlimit(2)).
930
931       EINVAL cmd is F_SETSIG and arg is not an allowable signal number.
932
933       EINVAL cmd is F_OFD_SETLK, F_OFD_SETLKW, or F_OFD_GETLK, and l_pid  was
934              not specified as zero.
935
936       EMFILE cmd  is  F_DUPFD and the per-process limit on the number of open
937              file descriptors has been reached.
938
939       ENOLCK Too many segment locks open, lock table is  full,  or  a  remote
940              locking protocol failed (e.g., locking over NFS).
941
942       ENOTDIR
943              F_NOTIFY was specified in cmd, but fd does not refer to a direc‐
944              tory.
945
946       EPERM  cmd is F_SETPIPE_SZ and the soft or hard  user  pipe  limit  has
947              been reached; see pipe(7).
948
949       EPERM  Attempted  to clear the O_APPEND flag on a file that has the ap‐
950              pend-only attribute set.
951
952       EPERM  cmd was F_ADD_SEALS, but fd was not open for writing or the cur‐
953              rent set of seals on the file already includes F_SEAL_SEAL.
954

STANDARDS

956       POSIX.1-2008.
957
958       F_GETOWN_EX,  F_SETOWN_EX, F_SETPIPE_SZ, F_GETPIPE_SZ, F_GETSIG, F_SET‐
959       SIG, F_NOTIFY, F_GETLEASE, and F_SETLEASE are Linux-specific.   (Define
960       the _GNU_SOURCE macro to obtain these definitions.)
961
962       F_OFD_SETLK,  F_OFD_SETLKW, and F_OFD_GETLK are Linux-specific (and one
963       must define _GNU_SOURCE to obtain their definitions), but work is being
964       done to have them included in the next version of POSIX.1.
965
966       F_ADD_SEALS and F_GET_SEALS are Linux-specific.
967

HISTORY

969       SVr4, 4.3BSD, POSIX.1-2001.
970
971       Only  the  operations  F_DUPFD,  F_GETFD,  F_SETFD,  F_GETFL,  F_SETFL,
972       F_GETLK, F_SETLK, and F_SETLKW are specified in POSIX.1-2001.
973
974       F_GETOWN and F_SETOWN are specified in  POSIX.1-2001.   (To  get  their
975       definitions, define either _XOPEN_SOURCE with the value 500 or greater,
976       or _POSIX_C_SOURCE with the value 200809L or greater.)
977
978       F_DUPFD_CLOEXEC is specified in POSIX.1-2008.  (To get this definition,
979       define   _POSIX_C_SOURCE   with   the  value  200809L  or  greater,  or
980       _XOPEN_SOURCE with the value 700 or greater.)
981

NOTES

983       The errors returned by dup2(2) are different  from  those  returned  by
984       F_DUPFD.
985
986   File locking
987       The original Linux fcntl() system call was not designed to handle large
988       file offsets (in the flock structure).  Consequently, an fcntl64() sys‐
989       tem  call was added in Linux 2.4.  The newer system call employs a dif‐
990       ferent structure for file locking, flock64, and corresponding commands,
991       F_GETLK64,  F_SETLK64,  and  F_SETLKW64.  However, these details can be
992       ignored by applications using glibc,  whose  fcntl()  wrapper  function
993       transparently  employs  the  more recent system call where it is avail‐
994       able.
995
996   Record locks
997       Since Linux 2.0, there is no interaction  between  the  types  of  lock
998       placed by flock(2) and fcntl().
999
1000       Several  systems have more fields in struct flock such as, for example,
1001       l_sysid (to identify the machine where the  lock  is  held).   Clearly,
1002       l_pid  alone  is not going to be very useful if the process holding the
1003       lock may live on a different machine; on Linux, while present  on  some
1004       architectures (such as MIPS32), this field is not used.
1005
1006       The original Linux fcntl() system call was not designed to handle large
1007       file offsets (in the flock structure).  Consequently, an fcntl64() sys‐
1008       tem  call was added in Linux 2.4.  The newer system call employs a dif‐
1009       ferent structure for file locking, flock64, and corresponding commands,
1010       F_GETLK64,  F_SETLK64,  and  F_SETLKW64.  However, these details can be
1011       ignored by applications using glibc,  whose  fcntl()  wrapper  function
1012       transparently  employs  the  more recent system call where it is avail‐
1013       able.
1014
1015   Record locking and NFS
1016       Before Linux 3.12, if an NFSv4 client loses contact with the server for
1017       a  period  of  time (defined as more than 90 seconds with no communica‐
1018       tion), it might lose and regain a lock without ever being aware of  the
1019       fact.  (The period of time after which contact is assumed lost is known
1020       as the NFSv4 leasetime.  On a Linux NFS server, this can be  determined
1021       by  looking at /proc/fs/nfsd/nfsv4leasetime, which expresses the period
1022       in seconds.  The default value for this file is 90.)  This scenario po‐
1023       tentially  risks data corruption, since another process might acquire a
1024       lock in the intervening period and perform file I/O.
1025
1026       Since Linux 3.12, if an NFSv4 client loses contact with the server, any
1027       I/O  to  the file by a process which "thinks" it holds a lock will fail
1028       until that process closes and reopens the file.   A  kernel  parameter,
1029       nfs.recover_lost_locks,  can  be set to 1 to obtain the pre-3.12 behav‐
1030       ior, whereby the client will attempt to recover lost locks when contact
1031       is  reestablished  with  the  server.  Because of the attendant risk of
1032       data corruption, this parameter defaults to 0 (disabled).
1033

BUGS

1035   F_SETFL
1036       It is not possible to use F_SETFL to change the state  of  the  O_DSYNC
1037       and  O_SYNC  flags.   Attempts  to  change the state of these flags are
1038       silently ignored.
1039
1040   F_GETOWN
1041       A limitation of the Linux system call conventions on some architectures
1042       (notably  i386)  means  that if a (negative) process group ID to be re‐
1043       turned by F_GETOWN falls in the range -1  to  -4095,  then  the  return
1044       value  is  wrongly interpreted by glibc as an error in the system call;
1045       that is, the return value of fcntl() will be -1, and errno will contain
1046       the (positive) process group ID.  The Linux-specific F_GETOWN_EX opera‐
1047       tion avoids this problem.  Since glibc 2.11,  glibc  makes  the  kernel
1048       F_GETOWN problem invisible by implementing F_GETOWN using F_GETOWN_EX.
1049
1050   F_SETOWN
1051       In  Linux 2.4 and earlier, there is bug that can occur when an unprivi‐
1052       leged process uses F_SETOWN to specify the owner of a socket  file  de‐
1053       scriptor as a process (group) other than the caller.  In this case, fc‐
1054       ntl() can return -1 with errno  set  to  EPERM,  even  when  the  owner
1055       process  (group)  is one that the caller has permission to send signals
1056       to.  Despite this error return, the file descriptor owner is  set,  and
1057       signals will be sent to the owner.
1058
1059   Deadlock detection
1060       The  deadlock-detection  algorithm  employed by the kernel when dealing
1061       with F_SETLKW requests can yield both false negatives (failures to  de‐
1062       tect  deadlocks,  leaving a set of deadlocked processes blocked indefi‐
1063       nitely) and false positives (EDEADLK errors when there is no deadlock).
1064       For  example, the kernel limits the lock depth of its dependency search
1065       to 10 steps, meaning that circular deadlock  chains  that  exceed  that
1066       size  will  not be detected.  In addition, the kernel may falsely indi‐
1067       cate a deadlock when two or more processes created using  the  clone(2)
1068       CLONE_FILES flag place locks that appear (to the kernel) to conflict.
1069
1070   Mandatory locking
1071       The Linux implementation of mandatory locking is subject to race condi‐
1072       tions which render it unreliable: a write(2) call that overlaps with  a
1073       lock  may  modify  data after the mandatory lock is acquired; a read(2)
1074       call that overlaps with a lock may detect changes  to  data  that  were
1075       made only after a write lock was acquired.  Similar races exist between
1076       mandatory locks and mmap(2).  It is therefore inadvisable  to  rely  on
1077       mandatory locking.
1078

SEE ALSO

1080       dup2(2),  flock(2), open(2), socket(2), lockf(3), capabilities(7), fea‐
1081       ture_test_macros(7), lslocks(8)
1082
1083       locks.txt, mandatory-locking.txt, and dnotify.txt in the  Linux  kernel
1084       source  directory  Documentation/filesystems/  (on older kernels, these
1085       files are directly  under  the  Documentation/  directory,  and  manda‐
1086       tory-locking.txt is called mandatory.txt)
1087
1088
1089
1090Linux man-pages 6.04              2023-03-30                          fcntl(2)
Impressum