1FUTEX(2) Linux Programmer's Manual FUTEX(2)
2
3
4
6 futex - fast user-space locking
7
9 #include <linux/futex.h>
10 #include <stdint.h>
11 #include <sys/time.h>
12
13 long futex(uint32_t *uaddr, int futex_op, uint32_t val,
14 const struct timespec *timeout, /* or: uint32_t val2 */
15 uint32_t *uaddr2, uint32_t val3);
16
17 Note: There is no glibc wrapper for this system call; see NOTES.
18
20 The futex() system call provides a method for waiting until a certain
21 condition becomes true. It is typically used as a blocking construct
22 in the context of shared-memory synchronization. When using futexes,
23 the majority of the synchronization operations are performed in user
24 space. A user-space program employs the futex() system call only when
25 it is likely that the program has to block for a longer time until the
26 condition becomes true. Other futex() operations can be used to wake
27 any processes or threads waiting for a particular condition.
28
29 A futex is a 32-bit value—referred to below as a futex word—whose ad‐
30 dress is supplied to the futex() system call. (Futexes are 32 bits in
31 size on all platforms, including 64-bit systems.) All futex operations
32 are governed by this value. In order to share a futex between pro‐
33 cesses, the futex is placed in a region of shared memory, created using
34 (for example) mmap(2) or shmat(2). (Thus, the futex word may have dif‐
35 ferent virtual addresses in different processes, but these addresses
36 all refer to the same location in physical memory.) In a multithreaded
37 program, it is sufficient to place the futex word in a global variable
38 shared by all threads.
39
40 When executing a futex operation that requests to block a thread, the
41 kernel will block only if the futex word has the value that the calling
42 thread supplied (as one of the arguments of the futex() call) as the
43 expected value of the futex word. The loading of the futex word's
44 value, the comparison of that value with the expected value, and the
45 actual blocking will happen atomically and will be totally ordered with
46 respect to concurrent operations performed by other threads on the same
47 futex word. Thus, the futex word is used to connect the synchroniza‐
48 tion in user space with the implementation of blocking by the kernel.
49 Analogously to an atomic compare-and-exchange operation that poten‐
50 tially changes shared memory, blocking via a futex is an atomic com‐
51 pare-and-block operation.
52
53 One use of futexes is for implementing locks. The state of the lock
54 (i.e., acquired or not acquired) can be represented as an atomically
55 accessed flag in shared memory. In the uncontended case, a thread can
56 access or modify the lock state with atomic instructions, for example
57 atomically changing it from not acquired to acquired using an atomic
58 compare-and-exchange instruction. (Such instructions are performed en‐
59 tirely in user mode, and the kernel maintains no information about the
60 lock state.) On the other hand, a thread may be unable to acquire a
61 lock because it is already acquired by another thread. It then may
62 pass the lock's flag as a futex word and the value representing the ac‐
63 quired state as the expected value to a futex() wait operation. This
64 futex() operation will block if and only if the lock is still acquired
65 (i.e., the value in the futex word still matches the "acquired state").
66 When releasing the lock, a thread has to first reset the lock state to
67 not acquired and then execute a futex operation that wakes threads
68 blocked on the lock flag used as a futex word (this can be further op‐
69 timized to avoid unnecessary wake-ups). See futex(7) for more detail
70 on how to use futexes.
71
72 Besides the basic wait and wake-up futex functionality, there are fur‐
73 ther futex operations aimed at supporting more complex use cases.
74
75 Note that no explicit initialization or destruction is necessary to use
76 futexes; the kernel maintains a futex (i.e., the kernel-internal imple‐
77 mentation artifact) only while operations such as FUTEX_WAIT, described
78 below, are being performed on a particular futex word.
79
80 Arguments
81 The uaddr argument points to the futex word. On all platforms, futexes
82 are four-byte integers that must be aligned on a four-byte boundary.
83 The operation to perform on the futex is specified in the futex_op ar‐
84 gument; val is a value whose meaning and purpose depends on futex_op.
85
86 The remaining arguments (timeout, uaddr2, and val3) are required only
87 for certain of the futex operations described below. Where one of
88 these arguments is not required, it is ignored.
89
90 For several blocking operations, the timeout argument is a pointer to a
91 timespec structure that specifies a timeout for the operation. How‐
92 ever, notwithstanding the prototype shown above, for some operations,
93 the least significant four bytes of this argument are instead used as
94 an integer whose meaning is determined by the operation. For these op‐
95 erations, the kernel casts the timeout value first to unsigned long,
96 then to uint32_t, and in the remainder of this page, this argument is
97 referred to as val2 when interpreted in this fashion.
98
99 Where it is required, the uaddr2 argument is a pointer to a second fu‐
100 tex word that is employed by the operation.
101
102 The interpretation of the final integer argument, val3, depends on the
103 operation.
104
105 Futex operations
106 The futex_op argument consists of two parts: a command that specifies
107 the operation to be performed, bitwise ORed with zero or more options
108 that modify the behaviour of the operation. The options that may be
109 included in futex_op are as follows:
110
111 FUTEX_PRIVATE_FLAG (since Linux 2.6.22)
112 This option bit can be employed with all futex operations. It
113 tells the kernel that the futex is process-private and not
114 shared with another process (i.e., it is being used for synchro‐
115 nization only between threads of the same process). This allows
116 the kernel to make some additional performance optimizations.
117
118 As a convenience, <linux/futex.h> defines a set of constants
119 with the suffix _PRIVATE that are equivalents of all of the op‐
120 erations listed below, but with the FUTEX_PRIVATE_FLAG ORed into
121 the constant value. Thus, there are FUTEX_WAIT_PRIVATE, FU‐
122 TEX_WAKE_PRIVATE, and so on.
123
124 FUTEX_CLOCK_REALTIME (since Linux 2.6.28)
125 This option bit can be employed only with the FUTEX_WAIT_BITSET,
126 FUTEX_WAIT_REQUEUE_PI, and (since Linux 4.5) FUTEX_WAIT opera‐
127 tions.
128
129 If this option is set, the kernel measures the timeout against
130 the CLOCK_REALTIME clock.
131
132 If this option is not set, the kernel measures the timeout
133 against the CLOCK_MONOTONIC clock.
134
135 The operation specified in futex_op is one of the following:
136
137 FUTEX_WAIT (since Linux 2.6.0)
138 This operation tests that the value at the futex word pointed to
139 by the address uaddr still contains the expected value val, and
140 if so, then sleeps waiting for a FUTEX_WAKE operation on the fu‐
141 tex word. The load of the value of the futex word is an atomic
142 memory access (i.e., using atomic machine instructions of the
143 respective architecture). This load, the comparison with the
144 expected value, and starting to sleep are performed atomically
145 and totally ordered with respect to other futex operations on
146 the same futex word. If the thread starts to sleep, it is con‐
147 sidered a waiter on this futex word. If the futex value does
148 not match val, then the call fails immediately with the error
149 EAGAIN.
150
151 The purpose of the comparison with the expected value is to pre‐
152 vent lost wake-ups. If another thread changed the value of the
153 futex word after the calling thread decided to block based on
154 the prior value, and if the other thread executed a FUTEX_WAKE
155 operation (or similar wake-up) after the value change and before
156 this FUTEX_WAIT operation, then the calling thread will observe
157 the value change and will not start to sleep.
158
159 If the timeout is not NULL, the structure it points to specifies
160 a timeout for the wait. (This interval will be rounded up to
161 the system clock granularity, and is guaranteed not to expire
162 early.) The timeout is by default measured according to the
163 CLOCK_MONOTONIC clock, but, since Linux 4.5, the CLOCK_REALTIME
164 clock can be selected by specifying FUTEX_CLOCK_REALTIME in fu‐
165 tex_op. If timeout is NULL, the call blocks indefinitely.
166
167 Note: for FUTEX_WAIT, timeout is interpreted as a relative
168 value. This differs from other futex operations, where timeout
169 is interpreted as an absolute value. To obtain the equivalent
170 of FUTEX_WAIT with an absolute timeout, employ FUTEX_WAIT_BITSET
171 with val3 specified as FUTEX_BITSET_MATCH_ANY.
172
173 The arguments uaddr2 and val3 are ignored.
174
175 FUTEX_WAKE (since Linux 2.6.0)
176 This operation wakes at most val of the waiters that are waiting
177 (e.g., inside FUTEX_WAIT) on the futex word at the address
178 uaddr. Most commonly, val is specified as either 1 (wake up a
179 single waiter) or INT_MAX (wake up all waiters). No guarantee
180 is provided about which waiters are awoken (e.g., a waiter with
181 a higher scheduling priority is not guaranteed to be awoken in
182 preference to a waiter with a lower priority).
183
184 The arguments timeout, uaddr2, and val3 are ignored.
185
186 FUTEX_FD (from Linux 2.6.0 up to and including Linux 2.6.25)
187 This operation creates a file descriptor that is associated with
188 the futex at uaddr. The caller must close the returned file de‐
189 scriptor after use. When another process or thread performs a
190 FUTEX_WAKE on the futex word, the file descriptor indicates as
191 being readable with select(2), poll(2), and epoll(7)
192
193 The file descriptor can be used to obtain asynchronous notifica‐
194 tions: if val is nonzero, then, when another process or thread
195 executes a FUTEX_WAKE, the caller will receive the signal number
196 that was passed in val.
197
198 The arguments timeout, uaddr2, and val3 are ignored.
199
200 Because it was inherently racy, FUTEX_FD has been removed from
201 Linux 2.6.26 onward.
202
203 FUTEX_REQUEUE (since Linux 2.6.0)
204 This operation performs the same task as FUTEX_CMP_REQUEUE (see
205 below), except that no check is made using the value in val3.
206 (The argument val3 is ignored.)
207
208 FUTEX_CMP_REQUEUE (since Linux 2.6.7)
209 This operation first checks whether the location uaddr still
210 contains the value val3. If not, the operation fails with the
211 error EAGAIN. Otherwise, the operation wakes up a maximum of
212 val waiters that are waiting on the futex at uaddr. If there
213 are more than val waiters, then the remaining waiters are re‐
214 moved from the wait queue of the source futex at uaddr and added
215 to the wait queue of the target futex at uaddr2. The val2 argu‐
216 ment specifies an upper limit on the number of waiters that are
217 requeued to the futex at uaddr2.
218
219 The load from uaddr is an atomic memory access (i.e., using
220 atomic machine instructions of the respective architecture).
221 This load, the comparison with val3, and the requeueing of any
222 waiters are performed atomically and totally ordered with re‐
223 spect to other operations on the same futex word.
224
225 Typical values to specify for val are 0 or 1. (Specifying
226 INT_MAX is not useful, because it would make the FUTEX_CMP_RE‐
227 QUEUE operation equivalent to FUTEX_WAKE.) The limit value
228 specified via val2 is typically either 1 or INT_MAX. (Specify‐
229 ing the argument as 0 is not useful, because it would make the
230 FUTEX_CMP_REQUEUE operation equivalent to FUTEX_WAIT.)
231
232 The FUTEX_CMP_REQUEUE operation was added as a replacement for
233 the earlier FUTEX_REQUEUE. The difference is that the check of
234 the value at uaddr can be used to ensure that requeueing happens
235 only under certain conditions, which allows race conditions to
236 be avoided in certain use cases.
237
238 Both FUTEX_REQUEUE and FUTEX_CMP_REQUEUE can be used to avoid
239 "thundering herd" wake-ups that could occur when using FU‐
240 TEX_WAKE in cases where all of the waiters that are woken need
241 to acquire another futex. Consider the following scenario,
242 where multiple waiter threads are waiting on B, a wait queue im‐
243 plemented using a futex:
244
245 lock(A)
246 while (!check_value(V)) {
247 unlock(A);
248 block_on(B);
249 lock(A);
250 };
251 unlock(A);
252
253 If a waker thread used FUTEX_WAKE, then all waiters waiting on B
254 would be woken up, and they would all try to acquire lock A.
255 However, waking all of the threads in this manner would be
256 pointless because all except one of the threads would immedi‐
257 ately block on lock A again. By contrast, a requeue operation
258 wakes just one waiter and moves the other waiters to lock A, and
259 when the woken waiter unlocks A then the next waiter can pro‐
260 ceed.
261
262 FUTEX_WAKE_OP (since Linux 2.6.14)
263 This operation was added to support some user-space use cases
264 where more than one futex must be handled at the same time. The
265 most notable example is the implementation of pthread_cond_sig‐
266 nal(3), which requires operations on two futexes, the one used
267 to implement the mutex and the one used in the implementation of
268 the wait queue associated with the condition variable. FU‐
269 TEX_WAKE_OP allows such cases to be implemented without leading
270 to high rates of contention and context switching.
271
272 The FUTEX_WAKE_OP operation is equivalent to executing the fol‐
273 lowing code atomically and totally ordered with respect to other
274 futex operations on any of the two supplied futex words:
275
276 uint32_t oldval = *(uint32_t *) uaddr2;
277 *(uint32_t *) uaddr2 = oldval op oparg;
278 futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
279 if (oldval cmp cmparg)
280 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
281
282 In other words, FUTEX_WAKE_OP does the following:
283
284 * saves the original value of the futex word at uaddr2 and per‐
285 forms an operation to modify the value of the futex at
286 uaddr2; this is an atomic read-modify-write memory access
287 (i.e., using atomic machine instructions of the respective
288 architecture)
289
290 * wakes up a maximum of val waiters on the futex for the futex
291 word at uaddr; and
292
293 * dependent on the results of a test of the original value of
294 the futex word at uaddr2, wakes up a maximum of val2 waiters
295 on the futex for the futex word at uaddr2.
296
297 The operation and comparison that are to be performed are en‐
298 coded in the bits of the argument val3. Pictorially, the encod‐
299 ing is:
300
301 +---+---+-----------+-----------+
302 |op |cmp| oparg | cmparg |
303 +---+---+-----------+-----------+
304 4 4 12 12 <== # of bits
305
306 Expressed in code, the encoding is:
307
308 #define FUTEX_OP(op, oparg, cmp, cmparg) \
309 (((op & 0xf) << 28) | \
310 ((cmp & 0xf) << 24) | \
311 ((oparg & 0xfff) << 12) | \
312 (cmparg & 0xfff))
313
314 In the above, op and cmp are each one of the codes listed below.
315 The oparg and cmparg components are literal numeric values, ex‐
316 cept as noted below.
317
318 The op component has one of the following values:
319
320 FUTEX_OP_SET 0 /* uaddr2 = oparg; */
321 FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
322 FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
323 FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
324 FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
325
326 In addition, bitwise ORing the following value into op causes
327 (1 << oparg) to be used as the operand:
328
329 FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
330
331 The cmp field is one of the following:
332
333 FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
334 FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
335 FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
336 FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
337 FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
338 FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
339
340 The return value of FUTEX_WAKE_OP is the sum of the number of
341 waiters woken on the futex uaddr plus the number of waiters wo‐
342 ken on the futex uaddr2.
343
344 FUTEX_WAIT_BITSET (since Linux 2.6.25)
345 This operation is like FUTEX_WAIT except that val3 is used to
346 provide a 32-bit bit mask to the kernel. This bit mask, in
347 which at least one bit must be set, is stored in the kernel-in‐
348 ternal state of the waiter. See the description of FU‐
349 TEX_WAKE_BITSET for further details.
350
351 If timeout is not NULL, the structure it points to specifies an
352 absolute timeout for the wait operation. If timeout is NULL,
353 the operation can block indefinitely.
354
355 The uaddr2 argument is ignored.
356
357 FUTEX_WAKE_BITSET (since Linux 2.6.25)
358 This operation is the same as FUTEX_WAKE except that the val3
359 argument is used to provide a 32-bit bit mask to the kernel.
360 This bit mask, in which at least one bit must be set, is used to
361 select which waiters should be woken up. The selection is done
362 by a bitwise AND of the "wake" bit mask (i.e., the value in
363 val3) and the bit mask which is stored in the kernel-internal
364 state of the waiter (the "wait" bit mask that is set using FU‐
365 TEX_WAIT_BITSET). All of the waiters for which the result of
366 the AND is nonzero are woken up; the remaining waiters are left
367 sleeping.
368
369 The effect of FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET is to al‐
370 low selective wake-ups among multiple waiters that are blocked
371 on the same futex. However, note that, depending on the use
372 case, employing this bit-mask multiplexing feature on a futex
373 can be less efficient than simply using multiple futexes, be‐
374 cause employing bit-mask multiplexing requires the kernel to
375 check all waiters on a futex, including those that are not in‐
376 terested in being woken up (i.e., they do not have the relevant
377 bit set in their "wait" bit mask).
378
379 The constant FUTEX_BITSET_MATCH_ANY, which corresponds to all 32
380 bits set in the bit mask, can be used as the val3 argument for
381 FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET. Other than differences
382 in the handling of the timeout argument, the FUTEX_WAIT opera‐
383 tion is equivalent to FUTEX_WAIT_BITSET with val3 specified as
384 FUTEX_BITSET_MATCH_ANY; that is, allow a wake-up by any waker.
385 The FUTEX_WAKE operation is equivalent to FUTEX_WAKE_BITSET with
386 val3 specified as FUTEX_BITSET_MATCH_ANY; that is, wake up any
387 waiter(s).
388
389 The uaddr2 and timeout arguments are ignored.
390
391 Priority-inheritance futexes
392 Linux supports priority-inheritance (PI) futexes in order to handle
393 priority-inversion problems that can be encountered with normal futex
394 locks. Priority inversion is the problem that occurs when a high-pri‐
395 ority task is blocked waiting to acquire a lock held by a low-priority
396 task, while tasks at an intermediate priority continuously preempt the
397 low-priority task from the CPU. Consequently, the low-priority task
398 makes no progress toward releasing the lock, and the high-priority task
399 remains blocked.
400
401 Priority inheritance is a mechanism for dealing with the priority-in‐
402 version problem. With this mechanism, when a high-priority task be‐
403 comes blocked by a lock held by a low-priority task, the priority of
404 the low-priority task is temporarily raised to that of the high-prior‐
405 ity task, so that it is not preempted by any intermediate level tasks,
406 and can thus make progress toward releasing the lock. To be effective,
407 priority inheritance must be transitive, meaning that if a high-prior‐
408 ity task blocks on a lock held by a lower-priority task that is itself
409 blocked by a lock held by another intermediate-priority task (and so
410 on, for chains of arbitrary length), then both of those tasks (or more
411 generally, all of the tasks in a lock chain) have their priorities
412 raised to be the same as the high-priority task.
413
414 From a user-space perspective, what makes a futex PI-aware is a policy
415 agreement (described below) between user space and the kernel about the
416 value of the futex word, coupled with the use of the PI-futex opera‐
417 tions described below. (Unlike the other futex operations described
418 above, the PI-futex operations are designed for the implementation of
419 very specific IPC mechanisms.)
420
421 The PI-futex operations described below differ from the other futex op‐
422 erations in that they impose policy on the use of the value of the fu‐
423 tex word:
424
425 * If the lock is not acquired, the futex word's value shall be 0.
426
427 * If the lock is acquired, the futex word's value shall be the thread
428 ID (TID; see gettid(2)) of the owning thread.
429
430 * If the lock is owned and there are threads contending for the lock,
431 then the FUTEX_WAITERS bit shall be set in the futex word's value;
432 in other words, this value is:
433
434 FUTEX_WAITERS | TID
435
436 (Note that is invalid for a PI futex word to have no owner and FU‐
437 TEX_WAITERS set.)
438
439 With this policy in place, a user-space application can acquire an un‐
440 acquired lock or release a lock using atomic instructions executed in
441 user mode (e.g., a compare-and-swap operation such as cmpxchg on the
442 x86 architecture). Acquiring a lock simply consists of using compare-
443 and-swap to atomically set the futex word's value to the caller's TID
444 if its previous value was 0. Releasing a lock requires using compare-
445 and-swap to set the futex word's value to 0 if the previous value was
446 the expected TID.
447
448 If a futex is already acquired (i.e., has a nonzero value), waiters
449 must employ the FUTEX_LOCK_PI operation to acquire the lock. If other
450 threads are waiting for the lock, then the FUTEX_WAITERS bit is set in
451 the futex value; in this case, the lock owner must employ the FUTEX_UN‐
452 LOCK_PI operation to release the lock.
453
454 In the cases where callers are forced into the kernel (i.e., required
455 to perform a futex() call), they then deal directly with a so-called
456 RT-mutex, a kernel locking mechanism which implements the required pri‐
457 ority-inheritance semantics. After the RT-mutex is acquired, the futex
458 value is updated accordingly, before the calling thread returns to user
459 space.
460
461 It is important to note that the kernel will update the futex word's
462 value prior to returning to user space. (This prevents the possibility
463 of the futex word's value ending up in an invalid state, such as having
464 an owner but the value being 0, or having waiters but not having the
465 FUTEX_WAITERS bit set.)
466
467 If a futex has an associated RT-mutex in the kernel (i.e., there are
468 blocked waiters) and the owner of the futex/RT-mutex dies unexpectedly,
469 then the kernel cleans up the RT-mutex and hands it over to the next
470 waiter. This in turn requires that the user-space value is updated ac‐
471 cordingly. To indicate that this is required, the kernel sets the FU‐
472 TEX_OWNER_DIED bit in the futex word along with the thread ID of the
473 new owner. User space can detect this situation via the presence of
474 the FUTEX_OWNER_DIED bit and is then responsible for cleaning up the
475 stale state left over by the dead owner.
476
477 PI futexes are operated on by specifying one of the values listed below
478 in futex_op. Note that the PI futex operations must be used as paired
479 operations and are subject to some additional requirements:
480
481 * FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI pair with FUTEX_UNLOCK_PI. FU‐
482 TEX_UNLOCK_PI must be called only on a futex owned by the calling
483 thread, as defined by the value policy, otherwise the error EPERM
484 results.
485
486 * FUTEX_WAIT_REQUEUE_PI pairs with FUTEX_CMP_REQUEUE_PI. This must be
487 performed from a non-PI futex to a distinct PI futex (or the error
488 EINVAL results). Additionally, val (the number of waiters to be wo‐
489 ken) must be 1 (or the error EINVAL results).
490
491 The PI futex operations are as follows:
492
493 FUTEX_LOCK_PI (since Linux 2.6.18)
494 This operation is used after an attempt to acquire the lock via
495 an atomic user-mode instruction failed because the futex word
496 has a nonzero value—specifically, because it contained the (PID-
497 namespace-specific) TID of the lock owner.
498
499 The operation checks the value of the futex word at the address
500 uaddr. If the value is 0, then the kernel tries to atomically
501 set the futex value to the caller's TID. If the futex word's
502 value is nonzero, the kernel atomically sets the FUTEX_WAITERS
503 bit, which signals the futex owner that it cannot unlock the fu‐
504 tex in user space atomically by setting the futex value to 0.
505 After that, the kernel:
506
507 1. Tries to find the thread which is associated with the owner
508 TID.
509
510 2. Creates or reuses kernel state on behalf of the owner. (If
511 this is the first waiter, there is no kernel state for this
512 futex, so kernel state is created by locking the RT-mutex and
513 the futex owner is made the owner of the RT-mutex. If there
514 are existing waiters, then the existing state is reused.)
515
516 3. Attaches the waiter to the futex (i.e., the waiter is en‐
517 queued on the RT-mutex waiter list).
518
519 If more than one waiter exists, the enqueueing of the waiter is
520 in descending priority order. (For information on priority or‐
521 dering, see the discussion of the SCHED_DEADLINE, SCHED_FIFO,
522 and SCHED_RR scheduling policies in sched(7).) The owner inher‐
523 its either the waiter's CPU bandwidth (if the waiter is sched‐
524 uled under the SCHED_DEADLINE policy) or the waiter's priority
525 (if the waiter is scheduled under the SCHED_RR or SCHED_FIFO
526 policy). This inheritance follows the lock chain in the case of
527 nested locking and performs deadlock detection.
528
529 The timeout argument provides a timeout for the lock attempt.
530 If timeout is not NULL, the structure it points to specifies an
531 absolute timeout, measured against the CLOCK_REALTIME clock. If
532 timeout is NULL, the operation will block indefinitely.
533
534 The uaddr2, val, and val3 arguments are ignored.
535
536 FUTEX_TRYLOCK_PI (since Linux 2.6.18)
537 This operation tries to acquire the lock at uaddr. It is in‐
538 voked when a user-space atomic acquire did not succeed because
539 the futex word was not 0.
540
541 Because the kernel has access to more state information than
542 user space, acquisition of the lock might succeed if performed
543 by the kernel in cases where the futex word (i.e., the state in‐
544 formation accessible to use-space) contains stale state (FU‐
545 TEX_WAITERS and/or FUTEX_OWNER_DIED). This can happen when the
546 owner of the futex died. User space cannot handle this condi‐
547 tion in a race-free manner, but the kernel can fix this up and
548 acquire the futex.
549
550 The uaddr2, val, timeout, and val3 arguments are ignored.
551
552 FUTEX_UNLOCK_PI (since Linux 2.6.18)
553 This operation wakes the top priority waiter that is waiting in
554 FUTEX_LOCK_PI on the futex address provided by the uaddr argu‐
555 ment.
556
557 This is called when the user-space value at uaddr cannot be
558 changed atomically from a TID (of the owner) to 0.
559
560 The uaddr2, val, timeout, and val3 arguments are ignored.
561
562 FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
563 This operation is a PI-aware variant of FUTEX_CMP_REQUEUE. It
564 requeues waiters that are blocked via FUTEX_WAIT_REQUEUE_PI on
565 uaddr from a non-PI source futex (uaddr) to a PI target futex
566 (uaddr2).
567
568 As with FUTEX_CMP_REQUEUE, this operation wakes up a maximum of
569 val waiters that are waiting on the futex at uaddr. However,
570 for FUTEX_CMP_REQUEUE_PI, val is required to be 1 (since the
571 main point is to avoid a thundering herd). The remaining wait‐
572 ers are removed from the wait queue of the source futex at uaddr
573 and added to the wait queue of the target futex at uaddr2.
574
575 The val2 and val3 arguments serve the same purposes as for FU‐
576 TEX_CMP_REQUEUE.
577
578 FUTEX_WAIT_REQUEUE_PI (since Linux 2.6.31)
579 Wait on a non-PI futex at uaddr and potentially be requeued (via
580 a FUTEX_CMP_REQUEUE_PI operation in another task) onto a PI fu‐
581 tex at uaddr2. The wait operation on uaddr is the same as for
582 FUTEX_WAIT.
583
584 The waiter can be removed from the wait on uaddr without re‐
585 queueing on uaddr2 via a FUTEX_WAKE operation in another task.
586 In this case, the FUTEX_WAIT_REQUEUE_PI operation fails with the
587 error EAGAIN.
588
589 If timeout is not NULL, the structure it points to specifies an
590 absolute timeout for the wait operation. If timeout is NULL,
591 the operation can block indefinitely.
592
593 The val3 argument is ignored.
594
595 The FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI were added to
596 support a fairly specific use case: support for priority-inheri‐
597 tance-aware POSIX threads condition variables. The idea is that
598 these operations should always be paired, in order to ensure
599 that user space and the kernel remain in sync. Thus, in the FU‐
600 TEX_WAIT_REQUEUE_PI operation, the user-space application pre-
601 specifies the target of the requeue that takes place in the FU‐
602 TEX_CMP_REQUEUE_PI operation.
603
605 In the event of an error (and assuming that futex() was invoked via
606 syscall(2)), all operations return -1 and set errno to indicate the
607 cause of the error.
608
609 The return value on success depends on the operation, as described in
610 the following list:
611
612 FUTEX_WAIT
613 Returns 0 if the caller was woken up. Note that a wake-up can
614 also be caused by common futex usage patterns in unrelated code
615 that happened to have previously used the futex word's memory
616 location (e.g., typical futex-based implementations of Pthreads
617 mutexes can cause this under some conditions). Therefore, call‐
618 ers should always conservatively assume that a return value of 0
619 can mean a spurious wake-up, and use the futex word's value
620 (i.e., the user-space synchronization scheme) to decide whether
621 to continue to block or not.
622
623 FUTEX_WAKE
624 Returns the number of waiters that were woken up.
625
626 FUTEX_FD
627 Returns the new file descriptor associated with the futex.
628
629 FUTEX_REQUEUE
630 Returns the number of waiters that were woken up.
631
632 FUTEX_CMP_REQUEUE
633 Returns the total number of waiters that were woken up or re‐
634 queued to the futex for the futex word at uaddr2. If this value
635 is greater than val, then the difference is the number of wait‐
636 ers requeued to the futex for the futex word at uaddr2.
637
638 FUTEX_WAKE_OP
639 Returns the total number of waiters that were woken up. This is
640 the sum of the woken waiters on the two futexes for the futex
641 words at uaddr and uaddr2.
642
643 FUTEX_WAIT_BITSET
644 Returns 0 if the caller was woken up. See FUTEX_WAIT for how to
645 interpret this correctly in practice.
646
647 FUTEX_WAKE_BITSET
648 Returns the number of waiters that were woken up.
649
650 FUTEX_LOCK_PI
651 Returns 0 if the futex was successfully locked.
652
653 FUTEX_TRYLOCK_PI
654 Returns 0 if the futex was successfully locked.
655
656 FUTEX_UNLOCK_PI
657 Returns 0 if the futex was successfully unlocked.
658
659 FUTEX_CMP_REQUEUE_PI
660 Returns the total number of waiters that were woken up or re‐
661 queued to the futex for the futex word at uaddr2. If this value
662 is greater than val, then difference is the number of waiters
663 requeued to the futex for the futex word at uaddr2.
664
665 FUTEX_WAIT_REQUEUE_PI
666 Returns 0 if the caller was successfully requeued to the futex
667 for the futex word at uaddr2.
668
670 EACCES No read access to the memory of a futex word.
671
672 EAGAIN (FUTEX_WAIT, FUTEX_WAIT_BITSET, FUTEX_WAIT_REQUEUE_PI) The value
673 pointed to by uaddr was not equal to the expected value val at
674 the time of the call.
675
676 Note: on Linux, the symbolic names EAGAIN and EWOULDBLOCK (both
677 of which appear in different parts of the kernel futex code)
678 have the same value.
679
680 EAGAIN (FUTEX_CMP_REQUEUE, FUTEX_CMP_REQUEUE_PI) The value pointed to
681 by uaddr is not equal to the expected value val3.
682
683 EAGAIN (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The fu‐
684 tex owner thread ID of uaddr (for FUTEX_CMP_REQUEUE_PI: uaddr2)
685 is about to exit, but has not yet handled the internal state
686 cleanup. Try again.
687
688 EDEADLK
689 (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The fu‐
690 tex word at uaddr is already locked by the caller.
691
692 EDEADLK
693 (FUTEX_CMP_REQUEUE_PI) While requeueing a waiter to the PI futex
694 for the futex word at uaddr2, the kernel detected a deadlock.
695
696 EFAULT A required pointer argument (i.e., uaddr, uaddr2, or timeout)
697 did not point to a valid user-space address.
698
699 EINTR A FUTEX_WAIT or FUTEX_WAIT_BITSET operation was interrupted by a
700 signal (see signal(7)). In kernels before Linux 2.6.22, this
701 error could also be returned for a spurious wakeup; since Linux
702 2.6.22, this no longer happens.
703
704 EINVAL The operation in futex_op is one of those that employs a time‐
705 out, but the supplied timeout argument was invalid (tv_sec was
706 less than zero, or tv_nsec was not less than 1,000,000,000).
707
708 EINVAL The operation specified in futex_op employs one or both of the
709 pointers uaddr and uaddr2, but one of these does not point to a
710 valid object—that is, the address is not four-byte-aligned.
711
712 EINVAL (FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET) The bit mask supplied in
713 val3 is zero.
714
715 EINVAL (FUTEX_CMP_REQUEUE_PI) uaddr equals uaddr2 (i.e., an attempt was
716 made to requeue to the same futex).
717
718 EINVAL (FUTEX_FD) The signal number supplied in val is invalid.
719
720 EINVAL (FUTEX_WAKE, FUTEX_WAKE_OP, FUTEX_WAKE_BITSET, FUTEX_REQUEUE,
721 FUTEX_CMP_REQUEUE) The kernel detected an inconsistency between
722 the user-space state at uaddr and the kernel state—that is, it
723 detected a waiter which waits in FUTEX_LOCK_PI on uaddr.
724
725 EINVAL (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_UNLOCK_PI) The kernel
726 detected an inconsistency between the user-space state at uaddr
727 and the kernel state. This indicates either state corruption or
728 that the kernel found a waiter on uaddr which is waiting via FU‐
729 TEX_WAIT or FUTEX_WAIT_BITSET.
730
731 EINVAL (FUTEX_CMP_REQUEUE_PI) The kernel detected an inconsistency be‐
732 tween the user-space state at uaddr2 and the kernel state; that
733 is, the kernel detected a waiter which waits via FUTEX_WAIT or
734 FUTEX_WAIT_BITSET on uaddr2.
735
736 EINVAL (FUTEX_CMP_REQUEUE_PI) The kernel detected an inconsistency be‐
737 tween the user-space state at uaddr and the kernel state; that
738 is, the kernel detected a waiter which waits via FUTEX_WAIT or
739 FUTEX_WAIT_BITESET on uaddr.
740
741 EINVAL (FUTEX_CMP_REQUEUE_PI) The kernel detected an inconsistency be‐
742 tween the user-space state at uaddr and the kernel state; that
743 is, the kernel detected a waiter which waits on uaddr via FU‐
744 TEX_LOCK_PI (instead of FUTEX_WAIT_REQUEUE_PI).
745
746 EINVAL (FUTEX_CMP_REQUEUE_PI) An attempt was made to requeue a waiter
747 to a futex other than that specified by the matching FU‐
748 TEX_WAIT_REQUEUE_PI call for that waiter.
749
750 EINVAL (FUTEX_CMP_REQUEUE_PI) The val argument is not 1.
751
752 EINVAL Invalid argument.
753
754 ENFILE (FUTEX_FD) The system-wide limit on the total number of open
755 files has been reached.
756
757 ENOMEM (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The ker‐
758 nel could not allocate memory to hold state information.
759
760 ENOSYS Invalid operation specified in futex_op.
761
762 ENOSYS The FUTEX_CLOCK_REALTIME option was specified in futex_op, but
763 the accompanying operation was neither FUTEX_WAIT, FU‐
764 TEX_WAIT_BITSET, nor FUTEX_WAIT_REQUEUE_PI.
765
766 ENOSYS (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_UNLOCK_PI, FUTEX_CMP_RE‐
767 QUEUE_PI, FUTEX_WAIT_REQUEUE_PI) A run-time check determined
768 that the operation is not available. The PI-futex operations
769 are not implemented on all architectures and are not supported
770 on some CPU variants.
771
772 EPERM (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The
773 caller is not allowed to attach itself to the futex at uaddr
774 (for FUTEX_CMP_REQUEUE_PI: the futex at uaddr2). (This may be
775 caused by a state corruption in user space.)
776
777 EPERM (FUTEX_UNLOCK_PI) The caller does not own the lock represented
778 by the futex word.
779
780 ESRCH (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The
781 thread ID in the futex word at uaddr does not exist.
782
783 ESRCH (FUTEX_CMP_REQUEUE_PI) The thread ID in the futex word at uaddr2
784 does not exist.
785
786 ETIMEDOUT
787 The operation in futex_op employed the timeout specified in
788 timeout, and the timeout expired before the operation completed.
789
791 Futexes were first made available in a stable kernel release with Linux
792 2.6.0.
793
794 Initial futex support was merged in Linux 2.5.7 but with different se‐
795 mantics from what was described above. A four-argument system call
796 with the semantics described in this page was introduced in Linux
797 2.5.40. A fifth argument was added in Linux 2.5.70, and a sixth argu‐
798 ment was added in Linux 2.6.7.
799
801 This system call is Linux-specific.
802
804 Glibc does not provide a wrapper for this system call; call it using
805 syscall(2).
806
807 Several higher-level programming abstractions are implemented via fu‐
808 texes, including POSIX semaphores and various POSIX threads synchro‐
809 nization mechanisms (mutexes, condition variables, read-write locks,
810 and barriers).
811
813 The program below demonstrates use of futexes in a program where a par‐
814 ent process and a child process use a pair of futexes located inside a
815 shared anonymous mapping to synchronize access to a shared resource:
816 the terminal. The two processes each write nloops (a command-line ar‐
817 gument that defaults to 5 if omitted) messages to the terminal and em‐
818 ploy a synchronization protocol that ensures that they alternate in
819 writing messages. Upon running this program we see output such as the
820 following:
821
822 $ ./futex_demo
823 Parent (18534) 0
824 Child (18535) 0
825 Parent (18534) 1
826 Child (18535) 1
827 Parent (18534) 2
828 Child (18535) 2
829 Parent (18534) 3
830 Child (18535) 3
831 Parent (18534) 4
832 Child (18535) 4
833
834 Program source
835
836 /* futex_demo.c
837
838 Usage: futex_demo [nloops]
839 (Default: 5)
840
841 Demonstrate the use of futexes in a program where parent and child
842 use a pair of futexes located inside a shared anonymous mapping to
843 synchronize access to a shared resource: the terminal. The two
844 processes each write 'num-loops' messages to the terminal and employ
845 a synchronization protocol that ensures that they alternate in
846 writing messages.
847 */
848 #define _GNU_SOURCE
849 #include <stdio.h>
850 #include <errno.h>
851 #include <stdatomic.h>
852 #include <stdint.h>
853 #include <stdlib.h>
854 #include <unistd.h>
855 #include <sys/wait.h>
856 #include <sys/mman.h>
857 #include <sys/syscall.h>
858 #include <linux/futex.h>
859 #include <sys/time.h>
860
861 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
862 } while (0)
863
864 static uint32_t *futex1, *futex2, *iaddr;
865
866 static int
867 futex(uint32_t *uaddr, int futex_op, uint32_t val,
868 const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3)
869 {
870 return syscall(SYS_futex, uaddr, futex_op, val,
871 timeout, uaddr2, val3);
872 }
873
874 /* Acquire the futex pointed to by 'futexp': wait for its value to
875 become 1, and then set the value to 0. */
876
877 static void
878 fwait(uint32_t *futexp)
879 {
880 long s;
881
882 /* atomic_compare_exchange_strong(ptr, oldval, newval)
883 atomically performs the equivalent of:
884
885 if (*ptr == *oldval)
886 *ptr = newval;
887
888 It returns true if the test yielded true and *ptr was updated. */
889
890 while (1) {
891
892 /* Is the futex available? */
893 const uint32_t one = 1;
894 if (atomic_compare_exchange_strong(futexp, &one, 0))
895 break; /* Yes */
896
897 /* Futex is not available; wait */
898
899 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
900 if (s == -1 && errno != EAGAIN)
901 errExit("futex-FUTEX_WAIT");
902 }
903 }
904
905 /* Release the futex pointed to by 'futexp': if the futex currently
906 has the value 0, set its value to 1 and the wake any futex waiters,
907 so that if the peer is blocked in fwait(), it can proceed. */
908
909 static void
910 fpost(uint32_t *futexp)
911 {
912 long s;
913
914 /* atomic_compare_exchange_strong() was described
915 in comments above */
916
917 const uint32_t zero = 0;
918 if (atomic_compare_exchange_strong(futexp, &zero, 1)) {
919 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
920 if (s == -1)
921 errExit("futex-FUTEX_WAKE");
922 }
923 }
924
925 int
926 main(int argc, char *argv[])
927 {
928 pid_t childPid;
929 int nloops;
930
931 setbuf(stdout, NULL);
932
933 nloops = (argc > 1) ? atoi(argv[1]) : 5;
934
935 /* Create a shared anonymous mapping that will hold the futexes.
936 Since the futexes are being shared between processes, we
937 subsequently use the "shared" futex operations (i.e., not the
938 ones suffixed "_PRIVATE") */
939
940 iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE,
941 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
942 if (iaddr == MAP_FAILED)
943 errExit("mmap");
944
945 futex1 = &iaddr[0];
946 futex2 = &iaddr[1];
947
948 *futex1 = 0; /* State: unavailable */
949 *futex2 = 1; /* State: available */
950
951 /* Create a child process that inherits the shared anonymous
952 mapping */
953
954 childPid = fork();
955 if (childPid == -1)
956 errExit("fork");
957
958 if (childPid == 0) { /* Child */
959 for (int j = 0; j < nloops; j++) {
960 fwait(futex1);
961 printf("Child (%jd) %d\n", (intmax_t) getpid(), j);
962 fpost(futex2);
963 }
964
965 exit(EXIT_SUCCESS);
966 }
967
968 /* Parent falls through to here */
969
970 for (int j = 0; j < nloops; j++) {
971 fwait(futex2);
972 printf("Parent (%jd) %d\n", (intmax_t) getpid(), j);
973 fpost(futex1);
974 }
975
976 wait(NULL);
977
978 exit(EXIT_SUCCESS);
979 }
980
982 get_robust_list(2), restart_syscall(2), pthread_mutexattr_getproto‐
983 col(3), futex(7), sched(7)
984
985 The following kernel source files:
986
987 * Documentation/pi-futex.txt
988
989 * Documentation/futex-requeue-pi.txt
990
991 * Documentation/locking/rt-mutex.txt
992
993 * Documentation/locking/rt-mutex-design.txt
994
995 * Documentation/robust-futex-ABI.txt
996
997 Franke, H., Russell, R., and Kirwood, M., 2002. Fuss, Futexes and Fur‐
998 wocks: Fast Userlevel Locking in Linux (from proceedings of the Ottawa
999 Linux Symposium 2002),
1000 ⟨http://kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf⟩
1001
1002 Hart, D., 2009. A futex overview and update,
1003 ⟨http://lwn.net/Articles/360699/⟩
1004
1005 Hart, D. and Guniguntala, D., 2009. Requeue-PI: Making Glibc Condvars
1006 PI-Aware (from proceedings of the 2009 Real-Time Linux Workshop),
1007 ⟨http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf⟩
1008
1009 Drepper, U., 2011. Futexes Are Tricky,
1010 ⟨http://www.akkadia.org/drepper/futex.pdf⟩
1011
1012 Futex example library, futex-*.tar.bz2 at
1013 ⟨ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/⟩
1014
1016 This page is part of release 5.10 of the Linux man-pages project. A
1017 description of the project, information about reporting bugs, and the
1018 latest version of this page, can be found at
1019 https://www.kernel.org/doc/man-pages/.
1020
1021
1022
1023Linux 2020-11-01 FUTEX(2)