membarrier(2)

1MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
2
3
4

NAME

6       membarrier - issue memory barriers on a set of threads
7

SYNOPSIS

9       #include <linux/membarrier.h> /* Definition of MEMBARRIER_* constants */
10       #include <sys/syscall.h>      /* Definition of SYS_* constants */
11       #include <unistd.h>
12
13       int syscall(SYS_membarrier, int cmd, unsigned int flags, int cpu_id);
14
15       Note: glibc provides no wrapper for membarrier(), necessitating the use
16       of syscall(2).
17

DESCRIPTION

19       The membarrier() system call helps reducing the overhead of the  memory
20       barrier  instructions  required  to order memory accesses on multi-core
21       systems.  However, this system call is heavier than a  memory  barrier,
22       so  using  it effectively is not as simple as replacing memory barriers
23       with this system call, but requires understanding of the details below.
24
25       Use of memory barriers needs to be done taking into account that a mem‐
26       ory  barrier  always needs to be either matched with its memory barrier
27       counterparts, or that the architecture's memory model  doesn't  require
28       the matching barriers.
29
30       There  are cases where one side of the matching barriers (which we will
31       refer to as "fast side") is executed much more  often  than  the  other
32       (which  we  will  refer to as "slow side").  This is a prime target for
33       the use of membarrier().  The key idea is to replace, for these  match‐
34       ing  barriers,  the fast-side memory barriers by simple compiler barri‐
35       ers, for example:
36
37           asm volatile ("" : : : "memory")
38
39       and replace the slow-side memory barriers by calls to membarrier().
40
41       This will add overhead to the slow side, and remove overhead  from  the
42       fast side, thus resulting in an overall performance increase as long as
43       the slow side is infrequent enough that the  overhead  of  the  membar‐
44       rier() calls does not outweigh the performance gain on the fast side.
45
46       The cmd argument is one of the following:
47
48       MEMBARRIER_CMD_QUERY (since Linux 4.3)
49              Query  the  set  of supported commands.  The return value of the
50              call is a bit mask of supported commands.  MEMBARRIER_CMD_QUERY,
51              which  has the value 0, is not itself included in this bit mask.
52              This command is always supported (on kernels where  membarrier()
53              is provided).
54
55       MEMBARRIER_CMD_GLOBAL (since Linux 4.16)
56              Ensure  that  all  threads from all processes on the system pass
57              through a state where all  memory  accesses  to  user-space  ad‐
58              dresses match program order between entry to and return from the
59              membarrier() system call.  All threads on the  system  are  tar‐
60              geted by this command.
61
62       MEMBARRIER_CMD_GLOBAL_EXPEDITED (since Linux 4.16)
63              Execute a memory barrier on all running threads of all processes
64              that   previously    registered    with    MEMBARRIER_CMD_REGIS‐
65              TER_GLOBAL_EXPEDITED.
66
67              Upon return from the system call, the calling thread has a guar‐
68              antee that all running threads have passed through a state where
69              all  memory accesses to user-space addresses match program order
70              between entry to and return from the  system  call  (non-running
71              threads  are  de facto in such a state).  This guarantee is pro‐
72              vided only for the threads of processes that  previously  regis‐
73              tered with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
74
75              Given  that registration is about the intent to receive the bar‐
76              riers, it is  valid  to  invoke  MEMBARRIER_CMD_GLOBAL_EXPEDITED
77              from  a  process  that  has  not  employed MEMBARRIER_CMD_REGIS‐
78              TER_GLOBAL_EXPEDITED.
79
80              The "expedited" commands complete faster than the  non-expedited
81              ones;  they  never block, but have the downside of causing extra
82              overhead.
83
84       MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED (since Linux 4.16)
85              Register   the   process's    intent    to    receive    MEMBAR‐
86              RIER_CMD_GLOBAL_EXPEDITED memory barriers.
87
88       MEMBARRIER_CMD_PRIVATE_EXPEDITED (since Linux 4.14)
89              Execute a memory barrier on each running thread belonging to the
90              same process as the calling thread.
91
92              Upon return from the system call, the calling thread has a guar‐
93              antee that all its running thread siblings have passed through a
94              state where all memory accesses to  user-space  addresses  match
95              program  order  between entry to and return from the system call
96              (non-running threads are de facto in such a state).  This  guar‐
97              antee  is  provided  only for threads in the same process as the
98              calling thread.
99
100              The "expedited" commands complete faster than the  non-expedited
101              ones;  they  never block, but have the downside of causing extra
102              overhead.
103
104              A process must register its intent to use the private  expedited
105              command prior to using it.
106
107       MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED (since Linux 4.14)
108              Register  the process's intent to use MEMBARRIER_CMD_PRIVATE_EX‐
109              PEDITED.
110
111       MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
112              In addition to providing  the  memory  ordering  guarantees  de‐
113              scribed  in  MEMBARRIER_CMD_PRIVATE_EXPEDITED,  upon return from
114              system call the calling thread has a guarantee that all its run‐
115              ning  thread  siblings have executed a core serializing instruc‐
116              tion.  This guarantee is provided only for threads in  the  same
117              process as the calling thread.
118
119              The  "expedited" commands complete faster than the non-expedited
120              ones, they never block, but have the downside of  causing  extra
121              overhead.
122
123              A  process must register its intent to use the private expedited
124              sync core command prior to using it.
125
126       MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
127              Register the process's intent to use  MEMBARRIER_CMD_PRIVATE_EX‐
128              PEDITED_SYNC_CORE.
129
130       MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ (since Linux 5.10)
131              Ensure the caller thread, upon return from system call, that all
132              its running thread siblings  have  any  currently  running  rseq
133              critical  sections  restarted  if flags parameter is 0; if flags
134              parameter is MEMBARRIER_CMD_FLAG_CPU,  then  this  operation  is
135              performed  only  on  CPU indicated by cpu_id.  This guarantee is
136              provided only for threads in the same  process  as  the  calling
137              thread.
138
139              RSEQ  membarrier  is  only  available in the "private expedited"
140              form.
141
142              A process must register its intent to use the private  expedited
143              rseq command prior to using it.
144
145       MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ (since Linux 5.10)
146              Register  the process's intent to use MEMBARRIER_CMD_PRIVATE_EX‐
147              PEDITED_RSEQ.
148
149       MEMBARRIER_CMD_SHARED (since Linux 4.3)
150              This is an  alias  for  MEMBARRIER_CMD_GLOBAL  that  exists  for
151              header backward compatibility.
152
153       The flags argument must be specified as 0 unless the command is MEMBAR‐
154       RIER_CMD_PRIVATE_EXPEDITED_RSEQ, in which case flags can be either 0 or
155       MEMBARRIER_CMD_FLAG_CPU.
156
157       The cpu_id argument is ignored unless flags is MEMBARRIER_CMD_FLAG_CPU,
158       in which case it must specify the CPU targeted by this membarrier  com‐
159       mand.
160
161       All  memory  accesses  performed  in  program  order from each targeted
162       thread are guaranteed to be ordered with respect to membarrier().
163
164       If we use the semantic barrier() to represent a compiler barrier  forc‐
165       ing  memory  accesses  to be performed in program order across the bar‐
166       rier, and smp_mb() to represent explicit memory barriers  forcing  full
167       memory  ordering across the barrier, we have the following ordering ta‐
168       ble for each pairing of barrier(),  membarrier(),  and  smp_mb().   The
169       pair ordering is detailed as (O: ordered, X: not ordered):
170
171                              barrier()  smp_mb()  membarrier()
172              barrier()          X          X          O
173              smp_mb()           X          O          O
174              membarrier()       O          O          O
175

RETURN VALUE

177       On  success,  the  MEMBARRIER_CMD_QUERY operation returns a bit mask of
178       supported   commands,   and    the    MEMBARRIER_CMD_GLOBAL,    MEMBAR‐
179       RIER_CMD_GLOBAL_EXPEDITED,    MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED,
180       MEMBARRIER_CMD_PRIVATE_EXPEDITED, MEMBARRIER_CMD_REGISTER_PRIVATE_EXPE‐
181       DITED,    MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE,    and    MEMBAR‐
182       RIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE operations  return  zero.
183       On error, -1 is returned, and errno is set to indicate the error.
184
185       For  a  given command, with flags set to 0, this system call is guaran‐
186       teed to always return the same value until reboot.  Further calls  with
187       the same arguments will lead to the same result.  Therefore, with flags
188       set to 0, error handling is required only for the first call to membar‐
189       rier().
190

ERRORS

192       EINVAL cmd   is   invalid,   or   flags  is  nonzero,  or  the  MEMBAR‐
193              RIER_CMD_GLOBAL command is disabled because  the  nohz_full  CPU
194              parameter  has  been  set,  or  the MEMBARRIER_CMD_PRIVATE_EXPE‐
195              DITED_SYNC_CORE    and     MEMBARRIER_CMD_REGISTER_PRIVATE_EXPE‐
196              DITED_SYNC_CORE  commands  are  not implemented by the architec‐
197              ture.
198
199       ENOSYS The membarrier() system call is not implemented by this kernel.
200
201       EPERM  The current process was not registered prior  to  using  private
202              expedited commands.
203

VERSIONS

205       The membarrier() system call was added in Linux 4.3.
206
207       Before Linux 5.10, the prototype for membarrier() was:
208
209           int membarrier(int cmd, int flags);
210

CONFORMING TO

212       membarrier() is Linux-specific.
213

NOTES

215       A  memory  barrier instruction is part of the instruction set of archi‐
216       tectures with weakly ordered memory models.  It orders memory  accesses
217       prior  to  the  barrier  and after the barrier with respect to matching
218       barriers on other cores.  For instance, a load fence  can  order  loads
219       prior  to  and  following  that fence with respect to stores ordered by
220       store fences.
221
222       Program order is the order in which instructions  are  ordered  in  the
223       program assembly code.
224
225       Examples  where  membarrier()  can be useful include implementations of
226       Read-Copy-Update libraries and garbage collectors.
227

EXAMPLES

229       Assuming a multithreaded application where  "fast_path()"  is  executed
230       very  frequently, and where "slow_path()" is executed infrequently, the
231       following code (x86) can be transformed using membarrier():
232
233           #include <stdlib.h>
234
235           static volatile int a, b;
236
237           static void
238           fast_path(int *read_b)
239           {
240               a = 1;
241               asm volatile ("mfence" : : : "memory");
242               *read_b = b;
243           }
244
245           static void
246           slow_path(int *read_a)
247           {
248               b = 1;
249               asm volatile ("mfence" : : : "memory");
250               *read_a = a;
251           }
252
253           int
254           main(int argc, char *argv[])
255           {
256               int read_a, read_b;
257
258               /*
259                * Real applications would call fast_path() and slow_path()
260                * from different threads. Call those from main() to keep
261                * this example short.
262                */
263
264               slow_path(&read_a);
265               fast_path(&read_b);
266
267               /*
268                * read_b == 0 implies read_a == 1 and
269                * read_a == 0 implies read_b == 1.
270                */
271
272               if (read_b == 0 && read_a == 0)
273                   abort();
274
275               exit(EXIT_SUCCESS);
276           }
277
278       The code above transformed to use membarrier() becomes:
279
280           #define _GNU_SOURCE
281           #include <stdlib.h>
282           #include <stdio.h>
283           #include <unistd.h>
284           #include <sys/syscall.h>
285           #include <linux/membarrier.h>
286
287           static volatile int a, b;
288
289           static int
290           membarrier(int cmd, unsigned int flags, int cpu_id)
291           {
292               return syscall(__NR_membarrier, cmd, flags, cpu_id);
293           }
294
295           static int
296           init_membarrier(void)
297           {
298               int ret;
299
300               /* Check that membarrier() is supported. */
301
302               ret = membarrier(MEMBARRIER_CMD_QUERY, 0, 0);
303               if (ret < 0) {
304                   perror("membarrier");
305                   return -1;
306               }
307
308               if (!(ret & MEMBARRIER_CMD_GLOBAL)) {
309                   fprintf(stderr,
310                       "membarrier does not support MEMBARRIER_CMD_GLOBAL\n");
311                   return -1;
312               }
313
314               return 0;
315           }
316
317           static void
318           fast_path(int *read_b)
319           {
320               a = 1;
321               asm volatile ("" : : : "memory");
322               *read_b = b;
323           }
324
325           static void
326           slow_path(int *read_a)
327           {
328               b = 1;
329               membarrier(MEMBARRIER_CMD_GLOBAL, 0, 0);
330               *read_a = a;
331           }
332
333           int
334           main(int argc, char *argv[])
335           {
336               int read_a, read_b;
337
338               if (init_membarrier())
339                   exit(EXIT_FAILURE);
340
341               /*
342                * Real applications would call fast_path() and slow_path()
343                * from different threads. Call those from main() to keep
344                * this example short.
345                */
346
347               slow_path(&read_a);
348               fast_path(&read_b);
349
350               /*
351                * read_b == 0 implies read_a == 1 and
352                * read_a == 0 implies read_b == 1.
353                */
354
355               if (read_b == 0 && read_a == 0)
356                   abort();
357
358               exit(EXIT_SUCCESS);
359           }
360

COLOPHON

362       This page is part of release 5.13 of the Linux  man-pages  project.   A
363       description  of  the project, information about reporting bugs, and the
364       latest    version    of    this    page,    can     be     found     at
365       https://www.kernel.org/doc/man-pages/.
366
367
368
369Linux                             2021-08-27                     MEMBARRIER(2)