1FUTEX(2)                   Linux Programmer's Manual                  FUTEX(2)
2
3
4

NAME

6       futex - fast user-space locking
7

SYNOPSIS

9       #include <linux/futex.h>
10       #include <stdint.h>
11       #include <sys/time.h>
12
13       long futex(uint32_t *uaddr, int futex_op, uint32_t val,
14                 const struct timespec *timeout,   /* or: uint32_t val2 */
15                 uint32_t *uaddr2, uint32_t val3);
16
17       Note: There is no glibc wrapper for this system call; see NOTES.
18

DESCRIPTION

20       The  futex()  system call provides a method for waiting until a certain
21       condition becomes true.  It is typically used as a  blocking  construct
22       in  the  context of shared-memory synchronization.  When using futexes,
23       the majority of the synchronization operations are  performed  in  user
24       space.   A user-space program employs the futex() system call only when
25       it is likely that the program has to block for a longer time until  the
26       condition  becomes  true.  Other futex() operations can be used to wake
27       any processes or threads waiting for a particular condition.
28
29       A futex is a 32-bit value—referred to below as a futex  word—whose  ad‐
30       dress  is supplied to the futex() system call.  (Futexes are 32 bits in
31       size on all platforms, including 64-bit systems.)  All futex operations
32       are  governed  by  this  value.  In order to share a futex between pro‐
33       cesses, the futex is placed in a region of shared memory, created using
34       (for example) mmap(2) or shmat(2).  (Thus, the futex word may have dif‐
35       ferent virtual addresses in different processes,  but  these  addresses
36       all refer to the same location in physical memory.)  In a multithreaded
37       program, it is sufficient to place the futex word in a global  variable
38       shared by all threads.
39
40       When  executing  a futex operation that requests to block a thread, the
41       kernel will block only if the futex word has the value that the calling
42       thread  supplied  (as  one of the arguments of the futex() call) as the
43       expected value of the futex word.  The  loading  of  the  futex  word's
44       value,  the  comparison  of that value with the expected value, and the
45       actual blocking will happen atomically and will be totally ordered with
46       respect to concurrent operations performed by other threads on the same
47       futex word.  Thus, the futex word is used to connect  the  synchroniza‐
48       tion  in  user space with the implementation of blocking by the kernel.
49       Analogously to an atomic  compare-and-exchange  operation  that  poten‐
50       tially  changes  shared  memory, blocking via a futex is an atomic com‐
51       pare-and-block operation.
52
53       One use of futexes is for implementing locks.  The state  of  the  lock
54       (i.e.,  acquired  or  not acquired) can be represented as an atomically
55       accessed flag in shared memory.  In the uncontended case, a thread  can
56       access  or  modify the lock state with atomic instructions, for example
57       atomically changing it from not acquired to acquired  using  an  atomic
58       compare-and-exchange instruction.  (Such instructions are performed en‐
59       tirely in user mode, and the kernel maintains no information about  the
60       lock  state.)   On  the other hand, a thread may be unable to acquire a
61       lock because it is already acquired by another  thread.   It  then  may
62       pass the lock's flag as a futex word and the value representing the ac‐
63       quired state as the expected value to a futex() wait  operation.   This
64       futex()  operation will block if and only if the lock is still acquired
65       (i.e., the value in the futex word still matches the "acquired state").
66       When  releasing the lock, a thread has to first reset the lock state to
67       not acquired and then execute a  futex  operation  that  wakes  threads
68       blocked  on the lock flag used as a futex word (this can be further op‐
69       timized to avoid unnecessary wake-ups).  See futex(7) for  more  detail
70       on how to use futexes.
71
72       Besides  the basic wait and wake-up futex functionality, there are fur‐
73       ther futex operations aimed at supporting more complex use cases.
74
75       Note that no explicit initialization or destruction is necessary to use
76       futexes; the kernel maintains a futex (i.e., the kernel-internal imple‐
77       mentation artifact) only while operations such as FUTEX_WAIT, described
78       below, are being performed on a particular futex word.
79
80   Arguments
81       The uaddr argument points to the futex word.  On all platforms, futexes
82       are four-byte integers that must be aligned on  a  four-byte  boundary.
83       The  operation to perform on the futex is specified in the futex_op ar‐
84       gument; val is a value whose meaning and purpose depends on futex_op.
85
86       The remaining arguments (timeout, uaddr2, and val3) are  required  only
87       for  certain  of  the  futex  operations described below.  Where one of
88       these arguments is not required, it is ignored.
89
90       For several blocking operations, the timeout argument is a pointer to a
91       timespec  structure  that  specifies a timeout for the operation.  How‐
92       ever,  notwithstanding the prototype shown above, for some  operations,
93       the  least  significant four bytes of this argument are instead used as
94       an integer whose meaning is determined by the operation.  For these op‐
95       erations,  the  kernel  casts the timeout value first to unsigned long,
96       then to uint32_t, and in the remainder of this page, this  argument  is
97       referred to as val2 when interpreted in this fashion.
98
99       Where  it is required, the uaddr2 argument is a pointer to a second fu‐
100       tex word that is employed by the operation.
101
102       The interpretation of the final integer argument, val3, depends on  the
103       operation.
104
105   Futex operations
106       The  futex_op  argument consists of two parts: a command that specifies
107       the operation to be performed, bitwise ORed with zero or  more  options
108       that  modify  the  behaviour of the operation.  The options that may be
109       included in futex_op are as follows:
110
111       FUTEX_PRIVATE_FLAG (since Linux 2.6.22)
112              This option bit can be employed with all futex  operations.   It
113              tells  the  kernel  that  the  futex  is process-private and not
114              shared with another process (i.e., it is being used for synchro‐
115              nization only between threads of the same process).  This allows
116              the kernel to make some additional performance optimizations.
117
118              As a convenience, <linux/futex.h> defines  a  set  of  constants
119              with  the suffix _PRIVATE that are equivalents of all of the op‐
120              erations listed below, but with the FUTEX_PRIVATE_FLAG ORed into
121              the  constant  value.   Thus,  there are FUTEX_WAIT_PRIVATE, FU‐
122              TEX_WAKE_PRIVATE, and so on.
123
124       FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
125              This option bit can be employed only with the FUTEX_WAIT_BITSET,
126              FUTEX_WAIT_REQUEUE_PI,  and  (since Linux 4.5) FUTEX_WAIT opera‐
127              tions.
128
129              If this option is set, the kernel measures the  timeout  against
130              the CLOCK_REALTIME clock.
131
132              If  this  option  is  not  set,  the kernel measures the timeout
133              against the CLOCK_MONOTONIC clock.
134
135       The operation specified in futex_op is one of the following:
136
137       FUTEX_WAIT (since Linux 2.6.0)
138              This operation tests that the value at the futex word pointed to
139              by  the address uaddr still contains the expected value val, and
140              if so, then sleeps waiting for a FUTEX_WAKE operation on the fu‐
141              tex  word.  The load of the value of the futex word is an atomic
142              memory access (i.e., using atomic machine  instructions  of  the
143              respective  architecture).   This  load, the comparison with the
144              expected value, and starting to sleep are  performed  atomically
145              and  totally  ordered  with respect to other futex operations on
146              the same futex word.  If the thread starts to sleep, it is  con‐
147              sidered  a  waiter  on this futex word.  If the futex value does
148              not match val, then the call fails immediately  with  the  error
149              EAGAIN.
150
151              The purpose of the comparison with the expected value is to pre‐
152              vent lost wake-ups.  If another thread changed the value of  the
153              futex  word  after  the calling thread decided to block based on
154              the prior value, and if the other thread executed  a  FUTEX_WAKE
155              operation (or similar wake-up) after the value change and before
156              this FUTEX_WAIT operation, then the calling thread will  observe
157              the value change and will not start to sleep.
158
159              If the timeout is not NULL, the structure it points to specifies
160              a timeout for the wait.  (This interval will be  rounded  up  to
161              the  system  clock  granularity, and is guaranteed not to expire
162              early.)  The timeout is by default  measured  according  to  the
163              CLOCK_MONOTONIC  clock, but, since Linux 4.5, the CLOCK_REALTIME
164              clock can be selected by specifying FUTEX_CLOCK_REALTIME in  fu‐
165              tex_op.  If timeout is NULL, the call blocks indefinitely.
166
167              Note:  for  FUTEX_WAIT,  timeout  is  interpreted  as a relative
168              value.  This differs from other futex operations, where  timeout
169              is  interpreted  as an absolute value.  To obtain the equivalent
170              of FUTEX_WAIT with an absolute timeout, employ FUTEX_WAIT_BITSET
171              with val3 specified as FUTEX_BITSET_MATCH_ANY.
172
173              The arguments uaddr2 and val3 are ignored.
174
175       FUTEX_WAKE (since Linux 2.6.0)
176              This operation wakes at most val of the waiters that are waiting
177              (e.g., inside FUTEX_WAIT) on  the  futex  word  at  the  address
178              uaddr.   Most  commonly, val is specified as either 1 (wake up a
179              single waiter) or INT_MAX (wake up all waiters).   No  guarantee
180              is  provided about which waiters are awoken (e.g., a waiter with
181              a higher scheduling priority is not guaranteed to be  awoken  in
182              preference to a waiter with a lower priority).
183
184              The arguments timeout, uaddr2, and val3 are ignored.
185
186       FUTEX_FD (from Linux 2.6.0 up to and including Linux 2.6.25)
187              This operation creates a file descriptor that is associated with
188              the futex at uaddr.  The caller must close the returned file de‐
189              scriptor  after  use.  When another process or thread performs a
190              FUTEX_WAKE on the futex word, the file descriptor  indicates  as
191              being readable with select(2), poll(2), and epoll(7)
192
193              The file descriptor can be used to obtain asynchronous notifica‐
194              tions: if val is nonzero, then, when another process  or  thread
195              executes a FUTEX_WAKE, the caller will receive the signal number
196              that was passed in val.
197
198              The arguments timeout, uaddr2, and val3 are ignored.
199
200              Because it was inherently racy, FUTEX_FD has been  removed  from
201              Linux 2.6.26 onward.
202
203       FUTEX_REQUEUE (since Linux 2.6.0)
204              This  operation performs the same task as FUTEX_CMP_REQUEUE (see
205              below), except that no check is made using the  value  in  val3.
206              (The argument val3 is ignored.)
207
208       FUTEX_CMP_REQUEUE (since Linux 2.6.7)
209              This  operation  first  checks  whether the location uaddr still
210              contains the value val3.  If not, the operation fails  with  the
211              error  EAGAIN.   Otherwise,  the operation wakes up a maximum of
212              val waiters that are waiting on the futex at  uaddr.   If  there
213              are  more  than  val waiters, then the remaining waiters are re‐
214              moved from the wait queue of the source futex at uaddr and added
215              to the wait queue of the target futex at uaddr2.  The val2 argu‐
216              ment specifies an upper limit on the number of waiters that  are
217              requeued to the futex at uaddr2.
218
219              The  load  from  uaddr  is  an atomic memory access (i.e., using
220              atomic machine instructions  of  the  respective  architecture).
221              This  load,  the comparison with val3, and the requeueing of any
222              waiters are performed atomically and totally  ordered  with  re‐
223              spect to other operations on the same futex word.
224
225              Typical  values  to  specify  for  val  are 0 or 1.  (Specifying
226              INT_MAX is not useful, because it would make  the  FUTEX_CMP_RE‐
227              QUEUE  operation  equivalent  to  FUTEX_WAKE.)   The limit value
228              specified via val2 is typically either 1 or INT_MAX.   (Specify‐
229              ing  the  argument as 0 is not useful, because it would make the
230              FUTEX_CMP_REQUEUE operation equivalent to FUTEX_WAIT.)
231
232              The FUTEX_CMP_REQUEUE operation was added as a  replacement  for
233              the  earlier FUTEX_REQUEUE.  The difference is that the check of
234              the value at uaddr can be used to ensure that requeueing happens
235              only  under  certain conditions, which allows race conditions to
236              be avoided in certain use cases.
237
238              Both FUTEX_REQUEUE and FUTEX_CMP_REQUEUE can be  used  to  avoid
239              "thundering  herd"  wake-ups  that  could  occur  when using FU‐
240              TEX_WAKE in cases where all of the waiters that are  woken  need
241              to  acquire  another  futex.   Consider  the following scenario,
242              where multiple waiter threads are waiting on B, a wait queue im‐
243              plemented using a futex:
244
245                  lock(A)
246                  while (!check_value(V)) {
247                      unlock(A);
248                      block_on(B);
249                      lock(A);
250                  };
251                  unlock(A);
252
253              If a waker thread used FUTEX_WAKE, then all waiters waiting on B
254              would be woken up, and they would all try  to  acquire  lock  A.
255              However,  waking  all  of  the  threads  in this manner would be
256              pointless because all except one of the  threads  would  immedi‐
257              ately  block  on lock A again.  By contrast, a requeue operation
258              wakes just one waiter and moves the other waiters to lock A, and
259              when  the  woken  waiter unlocks A then the next waiter can pro‐
260              ceed.
261
262       FUTEX_WAKE_OP (since Linux 2.6.14)
263              This operation was added to support some  user-space  use  cases
264              where more than one futex must be handled at the same time.  The
265              most notable example is the implementation of  pthread_cond_sig‐
266              nal(3),  which  requires operations on two futexes, the one used
267              to implement the mutex and the one used in the implementation of
268              the  wait  queue  associated  with  the condition variable.  FU‐
269              TEX_WAKE_OP allows such cases to be implemented without  leading
270              to high rates of contention and context switching.
271
272              The  FUTEX_WAKE_OP operation is equivalent to executing the fol‐
273              lowing code atomically and totally ordered with respect to other
274              futex operations on any of the two supplied futex words:
275
276                  uint32_t oldval = *(uint32_t *) uaddr2;
277                  *(uint32_t *) uaddr2 = oldval op oparg;
278                  futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
279                  if (oldval cmp cmparg)
280                      futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
281
282              In other words, FUTEX_WAKE_OP does the following:
283
284              *  saves the original value of the futex word at uaddr2 and per‐
285                 forms an operation to  modify  the  value  of  the  futex  at
286                 uaddr2;  this  is  an  atomic read-modify-write memory access
287                 (i.e., using atomic machine instructions  of  the  respective
288                 architecture)
289
290              *  wakes  up a maximum of val waiters on the futex for the futex
291                 word at uaddr; and
292
293              *  dependent on the results of a test of the original  value  of
294                 the  futex word at uaddr2, wakes up a maximum of val2 waiters
295                 on the futex for the futex word at uaddr2.
296
297              The operation and comparison that are to be  performed  are  en‐
298              coded in the bits of the argument val3.  Pictorially, the encod‐
299              ing is:
300
301                  +---+---+-----------+-----------+
302                  |op |cmp|   oparg   |  cmparg   |
303                  +---+---+-----------+-----------+
304                    4   4       12          12    <== # of bits
305
306              Expressed in code, the encoding is:
307
308                  #define FUTEX_OP(op, oparg, cmp, cmparg) \
309                                  (((op & 0xf) << 28) | \
310                                  ((cmp & 0xf) << 24) | \
311                                  ((oparg & 0xfff) << 12) | \
312                                  (cmparg & 0xfff))
313
314              In the above, op and cmp are each one of the codes listed below.
315              The  oparg and cmparg components are literal numeric values, ex‐
316              cept as noted below.
317
318              The op component has one of the following values:
319
320                  FUTEX_OP_SET        0  /* uaddr2 = oparg; */
321                  FUTEX_OP_ADD        1  /* uaddr2 += oparg; */
322                  FUTEX_OP_OR         2  /* uaddr2 |= oparg; */
323                  FUTEX_OP_ANDN       3  /* uaddr2 &= ~oparg; */
324                  FUTEX_OP_XOR        4  /* uaddr2 ^= oparg; */
325
326              In addition, bitwise ORing the following value  into  op  causes
327              (1 << oparg) to be used as the operand:
328
329                  FUTEX_OP_ARG_SHIFT  8  /* Use (1 << oparg) as operand */
330
331              The cmp field is one of the following:
332
333                  FUTEX_OP_CMP_EQ     0  /* if (oldval == cmparg) wake */
334                  FUTEX_OP_CMP_NE     1  /* if (oldval != cmparg) wake */
335                  FUTEX_OP_CMP_LT     2  /* if (oldval < cmparg) wake */
336                  FUTEX_OP_CMP_LE     3  /* if (oldval <= cmparg) wake */
337                  FUTEX_OP_CMP_GT     4  /* if (oldval > cmparg) wake */
338                  FUTEX_OP_CMP_GE     5  /* if (oldval >= cmparg) wake */
339
340              The  return  value  of FUTEX_WAKE_OP is the sum of the number of
341              waiters woken on the futex uaddr plus the number of waiters  wo‐
342              ken on the futex uaddr2.
343
344       FUTEX_WAIT_BITSET (since Linux 2.6.25)
345              This  operation  is  like FUTEX_WAIT except that val3 is used to
346              provide a 32-bit bit mask to the  kernel.   This  bit  mask,  in
347              which  at least one bit must be set, is stored in the kernel-in‐
348              ternal  state  of  the  waiter.   See  the  description  of  FU‐
349              TEX_WAKE_BITSET for further details.
350
351              If  timeout is not NULL, the structure it points to specifies an
352              absolute timeout for the wait operation.  If  timeout  is  NULL,
353              the operation can block indefinitely.
354
355              The uaddr2 argument is ignored.
356
357       FUTEX_WAKE_BITSET (since Linux 2.6.25)
358              This  operation  is  the same as FUTEX_WAKE except that the val3
359              argument is used to provide a 32-bit bit  mask  to  the  kernel.
360              This bit mask, in which at least one bit must be set, is used to
361              select which waiters should be woken up.  The selection is  done
362              by  a  bitwise  AND  of  the "wake" bit mask (i.e., the value in
363              val3) and the bit mask which is stored  in  the  kernel-internal
364              state  of  the waiter (the "wait" bit mask that is set using FU‐
365              TEX_WAIT_BITSET).  All of the waiters for which  the  result  of
366              the  AND is nonzero are woken up; the remaining waiters are left
367              sleeping.
368
369              The effect of FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET is to  al‐
370              low  selective  wake-ups among multiple waiters that are blocked
371              on the same futex.  However, note that,  depending  on  the  use
372              case,  employing  this  bit-mask multiplexing feature on a futex
373              can be less efficient than simply using  multiple  futexes,  be‐
374              cause  employing  bit-mask  multiplexing  requires the kernel to
375              check all waiters on a futex, including those that are  not  in‐
376              terested  in being woken up (i.e., they do not have the relevant
377              bit set in their "wait" bit mask).
378
379              The constant FUTEX_BITSET_MATCH_ANY, which corresponds to all 32
380              bits  set  in the bit mask, can be used as the val3 argument for
381              FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET.  Other than differences
382              in  the  handling of the timeout argument, the FUTEX_WAIT opera‐
383              tion is equivalent to FUTEX_WAIT_BITSET with val3  specified  as
384              FUTEX_BITSET_MATCH_ANY;  that  is, allow a wake-up by any waker.
385              The FUTEX_WAKE operation is equivalent to FUTEX_WAKE_BITSET with
386              val3  specified  as FUTEX_BITSET_MATCH_ANY; that is, wake up any
387              waiter(s).
388
389              The uaddr2 and timeout arguments are ignored.
390
391   Priority-inheritance futexes
392       Linux supports priority-inheritance (PI) futexes  in  order  to  handle
393       priority-inversion  problems  that can be encountered with normal futex
394       locks.  Priority inversion is the problem that occurs when a  high-pri‐
395       ority  task is blocked waiting to acquire a lock held by a low-priority
396       task, while tasks at an intermediate priority continuously preempt  the
397       low-priority  task  from  the CPU.  Consequently, the low-priority task
398       makes no progress toward releasing the lock, and the high-priority task
399       remains blocked.
400
401       Priority  inheritance  is a mechanism for dealing with the priority-in‐
402       version problem.  With this mechanism, when a  high-priority  task  be‐
403       comes  blocked  by  a lock held by a low-priority task, the priority of
404       the low-priority task is temporarily raised to that of the  high-prior‐
405       ity  task, so that it is not preempted by any intermediate level tasks,
406       and can thus make progress toward releasing the lock.  To be effective,
407       priority  inheritance must be transitive, meaning that if a high-prior‐
408       ity task blocks on a lock held by a lower-priority task that is  itself
409       blocked  by  a  lock held by another intermediate-priority task (and so
410       on, for chains of arbitrary length), then both of those tasks (or  more
411       generally,  all  of  the  tasks  in a lock chain) have their priorities
412       raised to be the same as the high-priority task.
413
414       From a user-space perspective, what makes a futex PI-aware is a  policy
415       agreement (described below) between user space and the kernel about the
416       value of the futex word, coupled with the use of  the  PI-futex  opera‐
417       tions  described  below.   (Unlike the other futex operations described
418       above, the PI-futex operations are designed for the  implementation  of
419       very specific IPC mechanisms.)
420
421       The PI-futex operations described below differ from the other futex op‐
422       erations in that they impose policy on the use of the value of the  fu‐
423       tex word:
424
425       *  If the lock is not acquired, the futex word's value shall be 0.
426
427       *  If  the lock is acquired, the futex word's value shall be the thread
428          ID (TID; see gettid(2)) of the owning thread.
429
430       *  If the lock is owned and there are threads contending for the  lock,
431          then  the  FUTEX_WAITERS bit shall be set in the futex word's value;
432          in other words, this value is:
433
434              FUTEX_WAITERS | TID
435
436          (Note that is invalid for a PI futex word to have no owner  and  FU‐
437          TEX_WAITERS set.)
438
439       With  this policy in place, a user-space application can acquire an un‐
440       acquired lock or release a lock using atomic instructions  executed  in
441       user  mode  (e.g.,  a compare-and-swap operation such as cmpxchg on the
442       x86 architecture).  Acquiring a lock simply consists of using  compare-
443       and-swap  to  atomically set the futex word's value to the caller's TID
444       if its previous value was 0.  Releasing a lock requires using  compare-
445       and-swap  to  set the futex word's value to 0 if the previous value was
446       the expected TID.
447
448       If a futex is already acquired (i.e., has  a  nonzero  value),  waiters
449       must  employ the FUTEX_LOCK_PI operation to acquire the lock.  If other
450       threads are waiting for the lock, then the FUTEX_WAITERS bit is set  in
451       the futex value; in this case, the lock owner must employ the FUTEX_UN‐
452       LOCK_PI operation to release the lock.
453
454       In the cases where callers are forced into the kernel  (i.e.,  required
455       to  perform  a  futex() call), they then deal directly with a so-called
456       RT-mutex, a kernel locking mechanism which implements the required pri‐
457       ority-inheritance semantics.  After the RT-mutex is acquired, the futex
458       value is updated accordingly, before the calling thread returns to user
459       space.
460
461       It  is  important  to note that the kernel will update the futex word's
462       value prior to returning to user space.  (This prevents the possibility
463       of the futex word's value ending up in an invalid state, such as having
464       an owner but the value being 0, or having waiters but  not  having  the
465       FUTEX_WAITERS bit set.)
466
467       If  a  futex  has an associated RT-mutex in the kernel (i.e., there are
468       blocked waiters) and the owner of the futex/RT-mutex dies unexpectedly,
469       then  the  kernel  cleans up the RT-mutex and hands it over to the next
470       waiter.  This in turn requires that the user-space value is updated ac‐
471       cordingly.   To indicate that this is required, the kernel sets the FU‐
472       TEX_OWNER_DIED bit in the futex word along with the thread  ID  of  the
473       new  owner.   User  space can detect this situation via the presence of
474       the FUTEX_OWNER_DIED bit and is then responsible for  cleaning  up  the
475       stale state left over by the dead owner.
476
477       PI futexes are operated on by specifying one of the values listed below
478       in futex_op.  Note that the PI futex operations must be used as  paired
479       operations and are subject to some additional requirements:
480
481       *  FUTEX_LOCK_PI  and  FUTEX_TRYLOCK_PI pair with FUTEX_UNLOCK_PI.  FU‐
482          TEX_UNLOCK_PI must be called only on a futex owned  by  the  calling
483          thread,  as  defined  by the value policy, otherwise the error EPERM
484          results.
485
486       *  FUTEX_WAIT_REQUEUE_PI pairs with FUTEX_CMP_REQUEUE_PI.  This must be
487          performed  from  a non-PI futex to a distinct PI futex (or the error
488          EINVAL results).  Additionally, val (the number of waiters to be wo‐
489          ken) must be 1 (or the error EINVAL results).
490
491       The PI futex operations are as follows:
492
493       FUTEX_LOCK_PI (since Linux 2.6.18)
494              This  operation is used after an attempt to acquire the lock via
495              an atomic user-mode instruction failed because  the  futex  word
496              has a nonzero value—specifically, because it contained the (PID-
497              namespace-specific) TID of the lock owner.
498
499              The operation checks the value of the futex word at the  address
500              uaddr.   If  the value is 0, then the kernel tries to atomically
501              set the futex value to the caller's TID.  If  the  futex  word's
502              value  is  nonzero, the kernel atomically sets the FUTEX_WAITERS
503              bit, which signals the futex owner that it cannot unlock the fu‐
504              tex  in  user  space atomically by setting the futex value to 0.
505              After that, the kernel:
506
507              1. Tries to find the thread which is associated with  the  owner
508                 TID.
509
510              2. Creates  or  reuses kernel state on behalf of the owner.  (If
511                 this is the first waiter, there is no kernel state  for  this
512                 futex, so kernel state is created by locking the RT-mutex and
513                 the futex owner is made the owner of the RT-mutex.  If  there
514                 are existing waiters, then the existing state is reused.)
515
516              3. Attaches  the  waiter  to  the futex (i.e., the waiter is en‐
517                 queued on the RT-mutex waiter list).
518
519              If more than one waiter exists, the enqueueing of the waiter  is
520              in  descending priority order.  (For information on priority or‐
521              dering, see the discussion of  the  SCHED_DEADLINE,  SCHED_FIFO,
522              and SCHED_RR scheduling policies in sched(7).)  The owner inher‐
523              its either the waiter's CPU bandwidth (if the waiter  is  sched‐
524              uled  under  the SCHED_DEADLINE policy) or the waiter's priority
525              (if the waiter is scheduled under  the  SCHED_RR  or  SCHED_FIFO
526              policy).  This inheritance follows the lock chain in the case of
527              nested locking and performs deadlock detection.
528
529              The timeout argument provides a timeout for  the  lock  attempt.
530              If  timeout is not NULL, the structure it points to specifies an
531              absolute timeout, measured against the CLOCK_REALTIME clock.  If
532              timeout is NULL, the operation will block indefinitely.
533
534              The uaddr2, val, and val3 arguments are ignored.
535
536       FUTEX_TRYLOCK_PI (since Linux 2.6.18)
537              This  operation  tries  to acquire the lock at uaddr.  It is in‐
538              voked when a user-space atomic acquire did not  succeed  because
539              the futex word was not 0.
540
541              Because  the  kernel  has  access to more state information than
542              user space, acquisition of the lock might succeed  if  performed
543              by the kernel in cases where the futex word (i.e., the state in‐
544              formation accessible to use-space)  contains  stale  state  (FU‐
545              TEX_WAITERS  and/or FUTEX_OWNER_DIED).  This can happen when the
546              owner of the futex died.  User space cannot handle  this  condi‐
547              tion  in  a race-free manner, but the kernel can fix this up and
548              acquire the futex.
549
550              The uaddr2, val, timeout, and val3 arguments are ignored.
551
552       FUTEX_UNLOCK_PI (since Linux 2.6.18)
553              This operation wakes the top priority waiter that is waiting  in
554              FUTEX_LOCK_PI  on  the futex address provided by the uaddr argu‐
555              ment.
556
557              This is called when the user-space  value  at  uaddr  cannot  be
558              changed atomically from a TID (of the owner) to 0.
559
560              The uaddr2, val, timeout, and val3 arguments are ignored.
561
562       FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
563              This  operation  is a PI-aware variant of FUTEX_CMP_REQUEUE.  It
564              requeues waiters that are blocked via  FUTEX_WAIT_REQUEUE_PI  on
565              uaddr  from  a  non-PI source futex (uaddr) to a PI target futex
566              (uaddr2).
567
568              As with FUTEX_CMP_REQUEUE, this operation wakes up a maximum  of
569              val  waiters  that  are waiting on the futex at uaddr.  However,
570              for FUTEX_CMP_REQUEUE_PI, val is required to  be  1  (since  the
571              main  point is to avoid a thundering herd).  The remaining wait‐
572              ers are removed from the wait queue of the source futex at uaddr
573              and added to the wait queue of the target futex at uaddr2.
574
575              The  val2  and val3 arguments serve the same purposes as for FU‐
576              TEX_CMP_REQUEUE.
577
578       FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
579              Wait on a non-PI futex at uaddr and potentially be requeued (via
580              a  FUTEX_CMP_REQUEUE_PI operation in another task) onto a PI fu‐
581              tex at uaddr2.  The wait operation on uaddr is the same  as  for
582              FUTEX_WAIT.
583
584              The  waiter  can  be  removed from the wait on uaddr without re‐
585              queueing on uaddr2 via a FUTEX_WAKE operation in  another  task.
586              In this case, the FUTEX_WAIT_REQUEUE_PI operation fails with the
587              error EAGAIN.
588
589              If timeout is not NULL, the structure it points to specifies  an
590              absolute  timeout  for  the wait operation.  If timeout is NULL,
591              the operation can block indefinitely.
592
593              The val3 argument is ignored.
594
595              The FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI were added to
596              support a fairly specific use case: support for priority-inheri‐
597              tance-aware POSIX threads condition variables.  The idea is that
598              these  operations  should  always  be paired, in order to ensure
599              that user space and the kernel remain in sync.  Thus, in the FU‐
600              TEX_WAIT_REQUEUE_PI  operation,  the user-space application pre-
601              specifies the target of the requeue that takes place in the  FU‐
602              TEX_CMP_REQUEUE_PI operation.
603

RETURN VALUE

605       In  the  event  of  an error (and assuming that futex() was invoked via
606       syscall(2)), all operations return -1 and set  errno  to  indicate  the
607       cause of the error.
608
609       The  return  value on success depends on the operation, as described in
610       the following list:
611
612       FUTEX_WAIT
613              Returns 0 if the caller was woken up.  Note that a  wake-up  can
614              also  be caused by common futex usage patterns in unrelated code
615              that happened to have previously used the  futex  word's  memory
616              location  (e.g., typical futex-based implementations of Pthreads
617              mutexes can cause this under some conditions).  Therefore, call‐
618              ers should always conservatively assume that a return value of 0
619              can mean a spurious wake-up, and  use  the  futex  word's  value
620              (i.e.,  the user-space synchronization scheme) to decide whether
621              to continue to block or not.
622
623       FUTEX_WAKE
624              Returns the number of waiters that were woken up.
625
626       FUTEX_FD
627              Returns the new file descriptor associated with the futex.
628
629       FUTEX_REQUEUE
630              Returns the number of waiters that were woken up.
631
632       FUTEX_CMP_REQUEUE
633              Returns the total number of waiters that were woken  up  or  re‐
634              queued to the futex for the futex word at uaddr2.  If this value
635              is greater than val, then the difference is the number of  wait‐
636              ers requeued to the futex for the futex word at uaddr2.
637
638       FUTEX_WAKE_OP
639              Returns the total number of waiters that were woken up.  This is
640              the sum of the woken waiters on the two futexes  for  the  futex
641              words at uaddr and uaddr2.
642
643       FUTEX_WAIT_BITSET
644              Returns 0 if the caller was woken up.  See FUTEX_WAIT for how to
645              interpret this correctly in practice.
646
647       FUTEX_WAKE_BITSET
648              Returns the number of waiters that were woken up.
649
650       FUTEX_LOCK_PI
651              Returns 0 if the futex was successfully locked.
652
653       FUTEX_TRYLOCK_PI
654              Returns 0 if the futex was successfully locked.
655
656       FUTEX_UNLOCK_PI
657              Returns 0 if the futex was successfully unlocked.
658
659       FUTEX_CMP_REQUEUE_PI
660              Returns the total number of waiters that were woken  up  or  re‐
661              queued to the futex for the futex word at uaddr2.  If this value
662              is greater than val, then difference is the  number  of  waiters
663              requeued to the futex for the futex word at uaddr2.
664
665       FUTEX_WAIT_REQUEUE_PI
666              Returns  0  if the caller was successfully requeued to the futex
667              for the futex word at uaddr2.
668

ERRORS

670       EACCES No read access to the memory of a futex word.
671
672       EAGAIN (FUTEX_WAIT, FUTEX_WAIT_BITSET, FUTEX_WAIT_REQUEUE_PI) The value
673              pointed  to  by uaddr was not equal to the expected value val at
674              the time of the call.
675
676              Note: on Linux, the symbolic names EAGAIN and EWOULDBLOCK  (both
677              of  which  appear  in  different parts of the kernel futex code)
678              have the same value.
679
680       EAGAIN (FUTEX_CMP_REQUEUE, FUTEX_CMP_REQUEUE_PI) The value  pointed  to
681              by uaddr is not equal to the expected value val3.
682
683       EAGAIN (FUTEX_LOCK_PI,  FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The fu‐
684              tex owner thread ID of uaddr (for FUTEX_CMP_REQUEUE_PI:  uaddr2)
685              is  about  to  exit,  but has not yet handled the internal state
686              cleanup.  Try again.
687
688       EDEADLK
689              (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The  fu‐
690              tex word at uaddr is already locked by the caller.
691
692       EDEADLK
693              (FUTEX_CMP_REQUEUE_PI) While requeueing a waiter to the PI futex
694              for the futex word at uaddr2, the kernel detected a deadlock.
695
696       EFAULT A required pointer argument (i.e., uaddr,  uaddr2,  or  timeout)
697              did not point to a valid user-space address.
698
699       EINTR  A FUTEX_WAIT or FUTEX_WAIT_BITSET operation was interrupted by a
700              signal (see signal(7)).  In kernels before  Linux  2.6.22,  this
701              error  could also be returned for a spurious wakeup; since Linux
702              2.6.22, this no longer happens.
703
704       EINVAL The operation in futex_op is one of those that employs  a  time‐
705              out,  but  the supplied timeout argument was invalid (tv_sec was
706              less than zero, or tv_nsec was not less than 1,000,000,000).
707
708       EINVAL The operation specified in futex_op employs one or both  of  the
709              pointers  uaddr and uaddr2, but one of these does not point to a
710              valid object—that is, the address is not four-byte-aligned.
711
712       EINVAL (FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET) The bit mask supplied  in
713              val3 is zero.
714
715       EINVAL (FUTEX_CMP_REQUEUE_PI) uaddr equals uaddr2 (i.e., an attempt was
716              made to requeue to the same futex).
717
718       EINVAL (FUTEX_FD) The signal number supplied in val is invalid.
719
720       EINVAL (FUTEX_WAKE,  FUTEX_WAKE_OP,  FUTEX_WAKE_BITSET,  FUTEX_REQUEUE,
721              FUTEX_CMP_REQUEUE)  The kernel detected an inconsistency between
722              the user-space state at uaddr and the kernel state—that  is,  it
723              detected a waiter which waits in FUTEX_LOCK_PI on uaddr.
724
725       EINVAL (FUTEX_LOCK_PI,  FUTEX_TRYLOCK_PI,  FUTEX_UNLOCK_PI)  The kernel
726              detected an inconsistency between the user-space state at  uaddr
727              and the kernel state.  This indicates either state corruption or
728              that the kernel found a waiter on uaddr which is waiting via FU‐
729              TEX_WAIT or FUTEX_WAIT_BITSET.
730
731       EINVAL (FUTEX_CMP_REQUEUE_PI)  The kernel detected an inconsistency be‐
732              tween the user-space state at uaddr2 and the kernel state;  that
733              is,  the  kernel detected a waiter which waits via FUTEX_WAIT or
734              FUTEX_WAIT_BITSET on uaddr2.
735
736       EINVAL (FUTEX_CMP_REQUEUE_PI) The kernel detected an inconsistency  be‐
737              tween  the  user-space state at uaddr and the kernel state; that
738              is, the kernel detected a waiter which waits via  FUTEX_WAIT  or
739              FUTEX_WAIT_BITESET on uaddr.
740
741       EINVAL (FUTEX_CMP_REQUEUE_PI)  The kernel detected an inconsistency be‐
742              tween the user-space state at uaddr and the kernel  state;  that
743              is,  the  kernel  detected a waiter which waits on uaddr via FU‐
744              TEX_LOCK_PI (instead of FUTEX_WAIT_REQUEUE_PI).
745
746       EINVAL (FUTEX_CMP_REQUEUE_PI) An attempt was made to requeue  a  waiter
747              to  a  futex  other  than  that  specified  by  the matching FU‐
748              TEX_WAIT_REQUEUE_PI call for that waiter.
749
750       EINVAL (FUTEX_CMP_REQUEUE_PI) The val argument is not 1.
751
752       EINVAL Invalid argument.
753
754       ENFILE (FUTEX_FD) The system-wide limit on the  total  number  of  open
755              files has been reached.
756
757       ENOMEM (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The ker‐
758              nel could not allocate memory to hold state information.
759
760       ENOSYS Invalid operation specified in futex_op.
761
762       ENOSYS The FUTEX_CLOCK_REALTIME option was specified in  futex_op,  but
763              the   accompanying   operation   was   neither  FUTEX_WAIT,  FU‐
764              TEX_WAIT_BITSET, nor FUTEX_WAIT_REQUEUE_PI.
765
766       ENOSYS (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_UNLOCK_PI, FUTEX_CMP_RE‐
767              QUEUE_PI,  FUTEX_WAIT_REQUEUE_PI)  A  run-time  check determined
768              that the operation is not available.   The  PI-futex  operations
769              are  not  implemented on all architectures and are not supported
770              on some CPU variants.
771
772       EPERM  (FUTEX_LOCK_PI,  FUTEX_TRYLOCK_PI,   FUTEX_CMP_REQUEUE_PI)   The
773              caller  is  not  allowed  to attach itself to the futex at uaddr
774              (for FUTEX_CMP_REQUEUE_PI: the futex at uaddr2).  (This  may  be
775              caused by a state corruption in user space.)
776
777       EPERM  (FUTEX_UNLOCK_PI)  The  caller does not own the lock represented
778              by the futex word.
779
780       ESRCH  (FUTEX_LOCK_PI,  FUTEX_TRYLOCK_PI,   FUTEX_CMP_REQUEUE_PI)   The
781              thread ID in the futex word at uaddr does not exist.
782
783       ESRCH  (FUTEX_CMP_REQUEUE_PI) The thread ID in the futex word at uaddr2
784              does not exist.
785
786       ETIMEDOUT
787              The operation in futex_op  employed  the  timeout  specified  in
788              timeout, and the timeout expired before the operation completed.
789

VERSIONS

791       Futexes were first made available in a stable kernel release with Linux
792       2.6.0.
793
794       Initial futex support was merged in Linux 2.5.7 but with different  se‐
795       mantics  from  what  was  described above.  A four-argument system call
796       with the semantics described in  this  page  was  introduced  in  Linux
797       2.5.40.   A fifth argument was added in Linux 2.5.70, and a sixth argu‐
798       ment was added in Linux 2.6.7.
799

CONFORMING TO

801       This system call is Linux-specific.
802

NOTES

804       Glibc does not provide a wrapper for this system call;  call  it  using
805       syscall(2).
806
807       Several  higher-level  programming abstractions are implemented via fu‐
808       texes, including POSIX semaphores and various  POSIX  threads  synchro‐
809       nization  mechanisms  (mutexes,  condition variables, read-write locks,
810       and barriers).
811

EXAMPLES

813       The program below demonstrates use of futexes in a program where a par‐
814       ent  process and a child process use a pair of futexes located inside a
815       shared anonymous mapping to synchronize access to  a  shared  resource:
816       the  terminal.  The two processes each write nloops (a command-line ar‐
817       gument that defaults to 5 if omitted) messages to the terminal and  em‐
818       ploy  a  synchronization  protocol  that ensures that they alternate in
819       writing messages.  Upon running this program we see output such as  the
820       following:
821
822           $ ./futex_demo
823           Parent (18534) 0
824           Child  (18535) 0
825           Parent (18534) 1
826           Child  (18535) 1
827           Parent (18534) 2
828           Child  (18535) 2
829           Parent (18534) 3
830           Child  (18535) 3
831           Parent (18534) 4
832           Child  (18535) 4
833
834   Program source
835
836       /* futex_demo.c
837
838          Usage: futex_demo [nloops]
839                           (Default: 5)
840
841          Demonstrate the use of futexes in a program where parent and child
842          use a pair of futexes located inside a shared anonymous mapping to
843          synchronize access to a shared resource: the terminal. The two
844          processes each write 'num-loops' messages to the terminal and employ
845          a synchronization protocol that ensures that they alternate in
846          writing messages.
847       */
848       #define _GNU_SOURCE
849       #include <stdio.h>
850       #include <errno.h>
851       #include <stdatomic.h>
852       #include <stdint.h>
853       #include <stdlib.h>
854       #include <unistd.h>
855       #include <sys/wait.h>
856       #include <sys/mman.h>
857       #include <sys/syscall.h>
858       #include <linux/futex.h>
859       #include <sys/time.h>
860
861       #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
862                               } while (0)
863
864       static uint32_t *futex1, *futex2, *iaddr;
865
866       static int
867       futex(uint32_t *uaddr, int futex_op, uint32_t val,
868             const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3)
869       {
870           return syscall(SYS_futex, uaddr, futex_op, val,
871                          timeout, uaddr2, val3);
872       }
873
874       /* Acquire the futex pointed to by 'futexp': wait for its value to
875          become 1, and then set the value to 0. */
876
877       static void
878       fwait(uint32_t *futexp)
879       {
880           long s;
881
882           /* atomic_compare_exchange_strong(ptr, oldval, newval)
883              atomically performs the equivalent of:
884
885                  if (*ptr == *oldval)
886                      *ptr = newval;
887
888              It returns true if the test yielded true and *ptr was updated. */
889
890           while (1) {
891
892               /* Is the futex available? */
893               const uint32_t one = 1;
894               if (atomic_compare_exchange_strong(futexp, &one, 0))
895                   break;      /* Yes */
896
897               /* Futex is not available; wait */
898
899               s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
900               if (s == -1 && errno != EAGAIN)
901                   errExit("futex-FUTEX_WAIT");
902           }
903       }
904
905       /* Release the futex pointed to by 'futexp': if the futex currently
906          has the value 0, set its value to 1 and the wake any futex waiters,
907          so that if the peer is blocked in fwait(), it can proceed. */
908
909       static void
910       fpost(uint32_t *futexp)
911       {
912           long s;
913
914           /* atomic_compare_exchange_strong() was described
915              in comments above */
916
917           const uint32_t zero = 0;
918           if (atomic_compare_exchange_strong(futexp, &zero, 1)) {
919               s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
920               if (s  == -1)
921                   errExit("futex-FUTEX_WAKE");
922           }
923       }
924
925       int
926       main(int argc, char *argv[])
927       {
928           pid_t childPid;
929           int nloops;
930
931           setbuf(stdout, NULL);
932
933           nloops = (argc > 1) ? atoi(argv[1]) : 5;
934
935           /* Create a shared anonymous mapping that will hold the futexes.
936              Since the futexes are being shared between processes, we
937              subsequently use the "shared" futex operations (i.e., not the
938              ones suffixed "_PRIVATE") */
939
940           iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE,
941                       MAP_ANONYMOUS | MAP_SHARED, -1, 0);
942           if (iaddr == MAP_FAILED)
943               errExit("mmap");
944
945           futex1 = &iaddr[0];
946           futex2 = &iaddr[1];
947
948           *futex1 = 0;        /* State: unavailable */
949           *futex2 = 1;        /* State: available */
950
951           /* Create a child process that inherits the shared anonymous
952              mapping */
953
954           childPid = fork();
955           if (childPid == -1)
956               errExit("fork");
957
958           if (childPid == 0) {        /* Child */
959               for (int j = 0; j < nloops; j++) {
960                   fwait(futex1);
961                   printf("Child  (%jd) %d\n", (intmax_t) getpid(), j);
962                   fpost(futex2);
963               }
964
965               exit(EXIT_SUCCESS);
966           }
967
968           /* Parent falls through to here */
969
970           for (int j = 0; j < nloops; j++) {
971               fwait(futex2);
972               printf("Parent (%jd) %d\n", (intmax_t) getpid(), j);
973               fpost(futex1);
974           }
975
976           wait(NULL);
977
978           exit(EXIT_SUCCESS);
979       }
980

SEE ALSO

982       get_robust_list(2), restart_syscall(2), pthread_mutexattr_getproto‐
983       col(3), futex(7), sched(7)
984
985       The following kernel source files:
986
987       * Documentation/pi-futex.txt
988
989       * Documentation/futex-requeue-pi.txt
990
991       * Documentation/locking/rt-mutex.txt
992
993       * Documentation/locking/rt-mutex-design.txt
994
995       * Documentation/robust-futex-ABI.txt
996
997       Franke, H., Russell, R., and Kirwood, M., 2002.  Fuss, Futexes and Fur‐
998       wocks: Fast Userlevel Locking in Linux (from proceedings of the Ottawa
999       Linux Symposium 2002),
1000http://kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf
1001
1002       Hart, D., 2009. A futex overview and update,
1003http://lwn.net/Articles/360699/
1004
1005       Hart, D. and Guniguntala, D., 2009.  Requeue-PI: Making Glibc Condvars
1006       PI-Aware (from proceedings of the 2009 Real-Time Linux Workshop),
1007http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1008
1009       Drepper, U., 2011. Futexes Are Tricky,
1010http://www.akkadia.org/drepper/futex.pdf
1011
1012       Futex example library, futex-*.tar.bz2 at
1013       ⟨ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/⟩
1014

COLOPHON

1016       This page is part of release 5.10 of the Linux man-pages project.  A
1017       description of the project, information about reporting bugs, and the
1018       latest version of this page, can be found at
1019       https://www.kernel.org/doc/man-pages/.
1020
1021
1022
1023Linux                             2020-11-01                          FUTEX(2)
Impressum