1MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2)
2
3
4
6 membarrier - issue memory barriers on a set of threads
7
9 #include <linux/membarrier.h>
10
11 int membarrier(int cmd, int flags);
12
14 The membarrier() system call helps reducing the overhead of the memory
15 barrier instructions required to order memory accesses on multi-core
16 systems. However, this system call is heavier than a memory barrier,
17 so using it effectively is not as simple as replacing memory barriers
18 with this system call, but requires understanding of the details below.
19
20 Use of memory barriers needs to be done taking into account that a mem‐
21 ory barrier always needs to be either matched with its memory barrier
22 counterparts, or that the architecture's memory model doesn't require
23 the matching barriers.
24
25 There are cases where one side of the matching barriers (which we will
26 refer to as "fast side") is executed much more often than the other
27 (which we will refer to as "slow side"). This is a prime target for
28 the use of membarrier(). The key idea is to replace, for these match‐
29 ing barriers, the fast-side memory barriers by simple compiler barri‐
30 ers, for example:
31
32 asm volatile ("" : : : "memory")
33
34 and replace the slow-side memory barriers by calls to membarrier().
35
36 This will add overhead to the slow side, and remove overhead from the
37 fast side, thus resulting in an overall performance increase as long as
38 the slow side is infrequent enough that the overhead of the membar‐
39 rier() calls does not outweigh the performance gain on the fast side.
40
41 The cmd argument is one of the following:
42
43 MEMBARRIER_CMD_QUERY (since Linux 4.3)
44 Query the set of supported commands. The return value of the
45 call is a bit mask of supported commands. MEMBARRIER_CMD_QUERY,
46 which has the value 0, is not itself included in this bit mask.
47 This command is always supported (on kernels where membarrier()
48 is provided).
49
50 MEMBARRIER_CMD_GLOBAL (since Linux 4.16)
51 Ensure that all threads from all processes on the system pass
52 through a state where all memory accesses to user-space
53 addresses match program order between entry to and return from
54 the membarrier() system call. All threads on the system are
55 targeted by this command.
56
57 MEMBARRIER_CMD_GLOBAL_EXPEDITED (since Linux 4.16)
58 Execute a memory barrier on all running threads of all processes
59 that previously registered with MEMBARRIER_CMD_REGIS‐
60 TER_GLOBAL_EXPEDITED.
61
62 Upon return from the system call, the calling thread has a guar‐
63 antee that all running threads have passed through a state where
64 all memory accesses to user-space addresses match program order
65 between entry to and return from the system call (non-running
66 threads are de facto in such a state). This guarantee is pro‐
67 vided only for the threads of processes that previously regis‐
68 tered with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
69
70 Given that registration is about the intent to receive the bar‐
71 riers, it is valid to invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED
72 from a process that has not employed MEMBARRIER_CMD_REGIS‐
73 TER_GLOBAL_EXPEDITED.
74
75 The "expedited" commands complete faster than the non-expedited
76 ones; they never block, but have the downside of causing extra
77 overhead.
78
79 MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED (since Linux 4.16)
80 Register the process's intent to receive MEMBAR‐
81 RIER_CMD_GLOBAL_EXPEDITED memory barriers.
82
83 MEMBARRIER_CMD_PRIVATE_EXPEDITED (since Linux 4.14)
84 Execute a memory barrier on each running thread belonging to the
85 same process as the calling thread.
86
87 Upon return from the system call, the calling thread has a guar‐
88 antee that all its running thread siblings have passed through a
89 state where all memory accesses to user-space addresses match
90 program order between entry to and return from the system call
91 (non-running threads are de facto in such a state). This guar‐
92 antee is provided only for threads in the same process as the
93 calling thread.
94
95 The "expedited" commands complete faster than the non-expedited
96 ones; they never block, but have the downside of causing extra
97 overhead.
98
99 A process must register its intent to use the private expedited
100 command prior to using it.
101
102 MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED (since Linux 4.14)
103 Register the process's intent to use MEMBARRIER_CMD_PRI‐
104 VATE_EXPEDITED.
105
106 MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
107 In addition to providing the memory ordering guarantees
108 described in MEMBARRIER_CMD_PRIVATE_EXPEDITED, upon return from
109 system call the calling thread has a guarantee that all its run‐
110 ning thread siblings have executed a core serializing instruc‐
111 tion. This guarantee is provided only for threads in the same
112 process as the calling thread.
113
114 The "expedited" commands complete faster than the non-expedited
115 ones, they never block, but have the downside of causing extra
116 overhead.
117
118 A process must register its intent to use the private expedited
119 sync core command prior to using it.
120
121 MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
122 Register the process's intent to use MEMBARRIER_CMD_PRI‐
123 VATE_EXPEDITED_SYNC_CORE.
124
125 MEMBARRIER_CMD_SHARED (since Linux 4.3)
126 This is an alias for MEMBARRIER_CMD_GLOBAL that exists for
127 header backward compatibility.
128
129 The flags argument is currently unused and must be specified as 0.
130
131 All memory accesses performed in program order from each targeted
132 thread are guaranteed to be ordered with respect to membarrier().
133
134 If we use the semantic barrier() to represent a compiler barrier forc‐
135 ing memory accesses to be performed in program order across the bar‐
136 rier, and smp_mb() to represent explicit memory barriers forcing full
137 memory ordering across the barrier, we have the following ordering ta‐
138 ble for each pairing of barrier(), membarrier() and smp_mb(). The pair
139 ordering is detailed as (O: ordered, X: not ordered):
140
141 barrier() smp_mb() membarrier()
142 barrier() X X O
143 smp_mb() X O O
144 membarrier() O O O
145
147 On success, the MEMBARRIER_CMD_QUERY operation returns a bit mask of
148 supported commands, and the MEMBARRIER_CMD_GLOBAL, MEMBAR‐
149 RIER_CMD_GLOBAL_EXPEDITED, MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED,
150 MEMBARRIER_CMD_PRIVATE_EXPEDITED, MEMBARRIER_CMD_REGISTER_PRIVATE_EXPE‐
151 DITED, MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, and MEMBAR‐
152 RIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE operations return zero.
153 On error, -1 is returned, and errno is set appropriately.
154
155 For a given command, with flags set to 0, this system call is guaran‐
156 teed to always return the same value until reboot. Further calls with
157 the same arguments will lead to the same result. Therefore, with flags
158 set to 0, error handling is required only for the first call to membar‐
159 rier().
160
162 EINVAL cmd is invalid, or flags is nonzero, or the MEMBAR‐
163 RIER_CMD_GLOBAL command is disabled because the nohz_full CPU
164 parameter has been set, or the MEMBARRIER_CMD_PRIVATE_EXPE‐
165 DITED_SYNC_CORE and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPE‐
166 DITED_SYNC_CORE commands are not implemented by the architec‐
167 ture.
168
169 ENOSYS The membarrier() system call is not implemented by this kernel.
170
171 EPERM The current process was not registered prior to using private
172 expedited commands.
173
175 The membarrier() system call was added in Linux 4.3.
176
178 membarrier() is Linux-specific.
179
181 A memory barrier instruction is part of the instruction set of archi‐
182 tectures with weakly-ordered memory models. It orders memory accesses
183 prior to the barrier and after the barrier with respect to matching
184 barriers on other cores. For instance, a load fence can order loads
185 prior to and following that fence with respect to stores ordered by
186 store fences.
187
188 Program order is the order in which instructions are ordered in the
189 program assembly code.
190
191 Examples where membarrier() can be useful include implementations of
192 Read-Copy-Update libraries and garbage collectors.
193
195 Assuming a multithreaded application where "fast_path()" is executed
196 very frequently, and where "slow_path()" is executed infrequently, the
197 following code (x86) can be transformed using membarrier():
198
199 #include <stdlib.h>
200
201 static volatile int a, b;
202
203 static void
204 fast_path(int *read_b)
205 {
206 a = 1;
207 asm volatile ("mfence" : : : "memory");
208 *read_b = b;
209 }
210
211 static void
212 slow_path(int *read_a)
213 {
214 b = 1;
215 asm volatile ("mfence" : : : "memory");
216 *read_a = a;
217 }
218
219 int
220 main(int argc, char **argv)
221 {
222 int read_a, read_b;
223
224 /*
225 * Real applications would call fast_path() and slow_path()
226 * from different threads. Call those from main() to keep
227 * this example short.
228 */
229
230 slow_path(&read_a);
231 fast_path(&read_b);
232
233 /*
234 * read_b == 0 implies read_a == 1 and
235 * read_a == 0 implies read_b == 1.
236 */
237
238 if (read_b == 0 && read_a == 0)
239 abort();
240
241 exit(EXIT_SUCCESS);
242 }
243
244 The code above transformed to use membarrier() becomes:
245
246 #define _GNU_SOURCE
247 #include <stdlib.h>
248 #include <stdio.h>
249 #include <unistd.h>
250 #include <sys/syscall.h>
251 #include <linux/membarrier.h>
252
253 static volatile int a, b;
254
255 static int
256 membarrier(int cmd, int flags)
257 {
258 return syscall(__NR_membarrier, cmd, flags);
259 }
260
261 static int
262 init_membarrier(void)
263 {
264 int ret;
265
266 /* Check that membarrier() is supported. */
267
268 ret = membarrier(MEMBARRIER_CMD_QUERY, 0);
269 if (ret < 0) {
270 perror("membarrier");
271 return -1;
272 }
273
274 if (!(ret & MEMBARRIER_CMD_GLOBAL)) {
275 fprintf(stderr,
276 "membarrier does not support MEMBARRIER_CMD_GLOBAL\n");
277 return -1;
278 }
279
280 return 0;
281 }
282
283 static void
284 fast_path(int *read_b)
285 {
286 a = 1;
287 asm volatile ("" : : : "memory");
288 *read_b = b;
289 }
290
291 static void
292 slow_path(int *read_a)
293 {
294 b = 1;
295 membarrier(MEMBARRIER_CMD_GLOBAL, 0);
296 *read_a = a;
297 }
298
299 int
300 main(int argc, char **argv)
301 {
302 int read_a, read_b;
303
304 if (init_membarrier())
305 exit(EXIT_FAILURE);
306
307 /*
308 * Real applications would call fast_path() and slow_path()
309 * from different threads. Call those from main() to keep
310 * this example short.
311 */
312
313 slow_path(&read_a);
314 fast_path(&read_b);
315
316 /*
317 * read_b == 0 implies read_a == 1 and
318 * read_a == 0 implies read_b == 1.
319 */
320
321 if (read_b == 0 && read_a == 0)
322 abort();
323
324 exit(EXIT_SUCCESS);
325 }
326
328 This page is part of release 5.07 of the Linux man-pages project. A
329 description of the project, information about reporting bugs, and the
330 latest version of this page, can be found at
331 https://www.kernel.org/doc/man-pages/.
332
333
334
335Linux 2020-06-09 MEMBARRIER(2)