1MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
2
3
4

NAME

6       membarrier - issue memory barriers on a set of threads
7

SYNOPSIS

9       #include <linux/membarrier.h>
10
11       int membarrier(int cmd, int flags);
12

DESCRIPTION

14       The  membarrier() system call helps reducing the overhead of the memory
15       barrier instructions required to order memory  accesses  on  multi-core
16       systems.   However,  this system call is heavier than a memory barrier,
17       so using it effectively is not as simple as replacing  memory  barriers
18       with this system call, but requires understanding of the details below.
19
20       Use of memory barriers needs to be done taking into account that a mem‐
21       ory barrier always needs to be either matched with its  memory  barrier
22       counterparts,  or  that the architecture's memory model doesn't require
23       the matching barriers.
24
25       There are cases where one side of the matching barriers (which we  will
26       refer  to  as  "fast  side") is executed much more often than the other
27       (which we will refer to as "slow side").  This is a  prime  target  for
28       the  use of membarrier().  The key idea is to replace, for these match‐
29       ing barriers, the fast-side memory barriers by simple  compiler  barri‐
30       ers, for example:
31
32           asm volatile ("" : : : "memory")
33
34       and replace the slow-side memory barriers by calls to membarrier().
35
36       This  will  add overhead to the slow side, and remove overhead from the
37       fast side, thus resulting in an overall performance increase as long as
38       the  slow  side  is  infrequent enough that the overhead of the membar‐
39       rier() calls does not outweigh the performance gain on the fast side.
40
41       The cmd argument is one of the following:
42
43       MEMBARRIER_CMD_QUERY
44              Query the set of supported commands.  The return  value  of  the
45              call is a bit mask of supported commands.  MEMBARRIER_CMD_QUERY,
46              which has the value 0, is not itself included in this bit  mask.
47              This  command is always supported (on kernels where membarrier()
48              is provided).
49
50       MEMBARRIER_CMD_SHARED
51              Ensure that all threads from all processes on  the  system  pass
52              through   a  state  where  all  memory  accesses  to  user-space
53              addresses match program order between entry to and  return  from
54              the  membarrier()  system  call.   All threads on the system are
55              targeted by this command.
56
57       MEMBARRIER_CMD_PRIVATE_EXPEDITED (since Linux 4.14)
58              Execute a memory barrier on each running thread belonging to the
59              same  process  as  the  current thread.  Upon return from system
60              call, the calling thread is assured that all its running threads
61              siblings  have  passed through a state where all memory accesses
62              to user-space addresses match program order between entry to and
63              return from the system call (non-running threads are de facto in
64              such a state).  This covers only threads from the  same  process
65              as the calling thread.
66
67              The  "expedited" commands complete faster than the non-expedited
68              ones; they never block, but have the downside of  causing  extra
69              overhead.   A  process  needs  to register its intent to use the
70              private expedited command prior to using it.
71
72       MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED (since Linux 4.14)
73              Register  the  process's  intent  to   use   MEMBARRIER_CMD_PRI‐
74              VATE_EXPEDITED.
75
76       The flags argument is currently unused and must be specified as 0.
77
78       All  memory  accesses  performed  in  program  order from each targeted
79       thread are guaranteed to be ordered with respect to membarrier().
80
81       If we use the semantic barrier() to represent a compiler barrier  forc‐
82       ing  memory  accesses  to be performed in program order across the bar‐
83       rier, and smp_mb() to represent explicit memory barriers  forcing  full
84       memory  ordering across the barrier, we have the following ordering ta‐
85       ble for each pairing of barrier(), membarrier() and smp_mb().  The pair
86       ordering is detailed as (O: ordered, X: not ordered):
87
88                              barrier()  smp_mb()  membarrier()
89              barrier()          X          X          O
90              smp_mb()           X          O          O
91              membarrier()       O          O          O
92

RETURN VALUE

94       On  success,  the  MEMBARRIER_CMD_QUERY operation returns a bit mask of
95       supported commands, and the MEMBARRIER_CMD_SHARED , MEMBARRIER_CMD_PRI‐
96       VATE_EXPEDITED , and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED , opera‐
97       tions return zero.  On error, -1 is returned, and errno is  set  appro‐
98       priately.
99
100       For  a  given command, with flags set to 0, this system call is guaran‐
101       teed to always return the same value until reboot.  Further calls  with
102       the same arguments will lead to the same result.  Therefore, with flags
103       set to 0, error handling is required only for the first call to membar‐
104       rier().
105

ERRORS

107       EINVAL cmd   is   invalid,   or   flags  is  nonzero,  or  the  MEMBAR‐
108              RIER_CMD_SHARED command is disabled because  the  nohz_full  CPU
109              parameter has been set.
110
111       ENOSYS The membarrier() system call is not implemented by this kernel.
112
113       EPERM  The  current  process  was not registered prior to using private
114              expedited commands.
115

VERSIONS

117       The membarrier() system call was added in Linux 4.3.
118

CONFORMING TO

120       membarrier() is Linux-specific.
121

NOTES

123       A memory barrier instruction is part of the instruction set  of  archi‐
124       tectures  with weakly-ordered memory models.  It orders memory accesses
125       prior to the barrier and after the barrier  with  respect  to  matching
126       barriers  on  other  cores.  For instance, a load fence can order loads
127       prior to and following that fence with respect  to  stores  ordered  by
128       store fences.
129
130       Program  order  is  the  order in which instructions are ordered in the
131       program assembly code.
132
133       Examples where membarrier() can be useful  include  implementations  of
134       Read-Copy-Update libraries and garbage collectors.
135

EXAMPLE

137       Assuming  a  multithreaded  application where "fast_path()" is executed
138       very frequently, and where "slow_path()" is executed infrequently,  the
139       following code (x86) can be transformed using membarrier():
140
141           #include <stdlib.h>
142
143           static volatile int a, b;
144
145           static void
146           fast_path(int *read_b)
147           {
148               a = 1;
149               asm volatile ("mfence" : : : "memory");
150               *read_b = b;
151           }
152
153           static void
154           slow_path(int *read_a)
155           {
156               b = 1;
157               asm volatile ("mfence" : : : "memory");
158               *read_a = a;
159           }
160
161           int
162           main(int argc, char **argv)
163           {
164               int read_a, read_b;
165
166               /*
167                * Real applications would call fast_path() and slow_path()
168                * from different threads. Call those from main() to keep
169                * this example short.
170                */
171
172               slow_path(&read_a);
173               fast_path(&read_b);
174
175               /*
176                * read_b == 0 implies read_a == 1 and
177                * read_a == 0 implies read_b == 1.
178                */
179
180               if (read_b == 0 && read_a == 0)
181                   abort();
182
183               exit(EXIT_SUCCESS);
184           }
185
186       The code above transformed to use membarrier() becomes:
187
188           #define _GNU_SOURCE
189           #include <stdlib.h>
190           #include <stdio.h>
191           #include <unistd.h>
192           #include <sys/syscall.h>
193           #include <linux/membarrier.h>
194
195           static volatile int a, b;
196
197           static int
198           membarrier(int cmd, int flags)
199           {
200               return syscall(__NR_membarrier, cmd, flags);
201           }
202
203           static int
204           init_membarrier(void)
205           {
206               int ret;
207
208               /* Check that membarrier() is supported. */
209
210               ret = membarrier(MEMBARRIER_CMD_QUERY, 0);
211               if (ret < 0) {
212                   perror("membarrier");
213                   return -1;
214               }
215
216               if (!(ret & MEMBARRIER_CMD_SHARED)) {
217                   fprintf(stderr,
218                       "membarrier does not support MEMBARRIER_CMD_SHARED\n");
219                   return -1;
220               }
221
222               return 0;
223           }
224
225           static void
226           fast_path(int *read_b)
227           {
228               a = 1;
229               asm volatile ("" : : : "memory");
230               *read_b = b;
231           }
232
233           static void
234           slow_path(int *read_a)
235           {
236               b = 1;
237               membarrier(MEMBARRIER_CMD_SHARED, 0);
238               *read_a = a;
239           }
240
241           int
242           main(int argc, char **argv)
243           {
244               int read_a, read_b;
245
246               if (init_membarrier())
247                   exit(EXIT_FAILURE);
248
249               /*
250                * Real applications would call fast_path() and slow_path()
251                * from different threads. Call those from main() to keep
252                * this example short.
253                */
254
255               slow_path(&read_a);
256               fast_path(&read_b);
257
258               /*
259                * read_b == 0 implies read_a == 1 and
260                * read_a == 0 implies read_b == 1.
261                */
262
263               if (read_b == 0 && read_a == 0)
264                   abort();
265
266               exit(EXIT_SUCCESS);
267           }
268

COLOPHON

270       This  page  is  part of release 4.15 of the Linux man-pages project.  A
271       description of the project, information about reporting bugs,  and  the
272       latest     version     of     this    page,    can    be    found    at
273       https://www.kernel.org/doc/man-pages/.
274
275
276
277Linux                             2017-11-15                     MEMBARRIER(2)
Impressum