1memfd_create(2) System Calls Manual memfd_create(2)
2
3
4
6 memfd_create - create an anonymous file
7
9 Standard C library (libc, -lc)
10
12 #define _GNU_SOURCE /* See feature_test_macros(7) */
13 #include <sys/mman.h>
14
15 int memfd_create(const char *name, unsigned int flags);
16
18 memfd_create() creates an anonymous file and returns a file descriptor
19 that refers to it. The file behaves like a regular file, and so can be
20 modified, truncated, memory-mapped, and so on. However, unlike a regu‐
21 lar file, it lives in RAM and has a volatile backing storage. Once all
22 references to the file are dropped, it is automatically released.
23 Anonymous memory is used for all backing pages of the file. Therefore,
24 files created by memfd_create() have the same semantics as other anony‐
25 mous memory allocations such as those allocated using mmap(2) with the
26 MAP_ANONYMOUS flag.
27
28 The initial size of the file is set to 0. Following the call, the file
29 size should be set using ftruncate(2). (Alternatively, the file may be
30 populated by calls to write(2) or similar.)
31
32 The name supplied in name is used as a filename and will be displayed
33 as the target of the corresponding symbolic link in the directory
34 /proc/self/fd/. The displayed name is always prefixed with memfd: and
35 serves only for debugging purposes. Names do not affect the behavior
36 of the file descriptor, and as such multiple files can have the same
37 name without any side effects.
38
39 The following values may be bitwise ORed in flags to change the behav‐
40 ior of memfd_create():
41
42 MFD_CLOEXEC
43 Set the close-on-exec (FD_CLOEXEC) flag on the new file descrip‐
44 tor. See the description of the O_CLOEXEC flag in open(2) for
45 reasons why this may be useful.
46
47 MFD_ALLOW_SEALING
48 Allow sealing operations on this file. See the discussion of
49 the F_ADD_SEALS and F_GET_SEALS operations in fcntl(2), and also
50 NOTES, below. The initial set of seals is empty. If this flag
51 is not set, the initial set of seals will be F_SEAL_SEAL, mean‐
52 ing that no other seals can be set on the file.
53
54 MFD_HUGETLB (since Linux 4.14)
55 The anonymous file will be created in the hugetlbfs filesystem
56 using huge pages. See the Linux kernel source file Documenta‐
57 tion/admin-guide/mm/hugetlbpage.rst for more information about
58 hugetlbfs. Specifying both MFD_HUGETLB and MFD_ALLOW_SEALING in
59 flags is supported since Linux 4.16.
60
61 MFD_HUGE_2MB, MFD_HUGE_1GB, ...
62 Used in conjunction with MFD_HUGETLB to select alternative
63 hugetlb page sizes (respectively, 2 MB, 1 GB, ...) on systems
64 that support multiple hugetlb page sizes. Definitions for known
65 huge page sizes are included in the header file <linux/memfd.h>.
66
67 For details on encoding huge page sizes not included in the
68 header file, see the discussion of the similarly named constants
69 in mmap(2).
70
71 Unused bits in flags must be 0.
72
73 As its return value, memfd_create() returns a new file descriptor that
74 can be used to refer to the file. This file descriptor is opened for
75 both reading and writing (O_RDWR) and O_LARGEFILE is set for the file
76 descriptor.
77
78 With respect to fork(2) and execve(2), the usual semantics apply for
79 the file descriptor created by memfd_create(). A copy of the file de‐
80 scriptor is inherited by the child produced by fork(2) and refers to
81 the same file. The file descriptor is preserved across execve(2), un‐
82 less the close-on-exec flag has been set.
83
85 On success, memfd_create() returns a new file descriptor. On error, -1
86 is returned and errno is set to indicate the error.
87
89 EFAULT The address in name points to invalid memory.
90
91 EINVAL flags included unknown bits.
92
93 EINVAL name was too long. (The limit is 249 bytes, excluding the ter‐
94 minating null byte.)
95
96 EINVAL Both MFD_HUGETLB and MFD_ALLOW_SEALING were specified in flags.
97
98 EMFILE The per-process limit on the number of open file descriptors has
99 been reached.
100
101 ENFILE The system-wide limit on the total number of open files has been
102 reached.
103
104 ENOMEM There was insufficient memory to create a new anonymous file.
105
106 EPERM The MFD_HUGETLB flag was specified, but the caller was not priv‐
107 ileged (did not have the CAP_IPC_LOCK capability) and is not a
108 member of the sysctl_hugetlb_shm_group group; see the descrip‐
109 tion of /proc/sys/vm/sysctl_hugetlb_shm_group in proc(5).
110
112 Linux.
113
115 Linux 3.17, glibc 2.27.
116
118 The memfd_create() system call provides a simple alternative to manu‐
119 ally mounting a tmpfs(5) filesystem and creating and opening a file in
120 that filesystem. The primary purpose of memfd_create() is to create
121 files and associated file descriptors that are used with the file-seal‐
122 ing APIs provided by fcntl(2).
123
124 The memfd_create() system call also has uses without file sealing
125 (which is why file-sealing is disabled, unless explicitly requested
126 with the MFD_ALLOW_SEALING flag). In particular, it can be used as an
127 alternative to creating files in tmp or as an alternative to using the
128 open(2) O_TMPFILE in cases where there is no intention to actually link
129 the resulting file into the filesystem.
130
131 File sealing
132 In the absence of file sealing, processes that communicate via shared
133 memory must either trust each other, or take measures to deal with the
134 possibility that an untrusted peer may manipulate the shared memory re‐
135 gion in problematic ways. For example, an untrusted peer might modify
136 the contents of the shared memory at any time, or shrink the shared
137 memory region. The former possibility leaves the local process vulner‐
138 able to time-of-check-to-time-of-use race conditions (typically dealt
139 with by copying data from the shared memory region before checking and
140 using it). The latter possibility leaves the local process vulnerable
141 to SIGBUS signals when an attempt is made to access a now-nonexistent
142 location in the shared memory region. (Dealing with this possibility
143 necessitates the use of a handler for the SIGBUS signal.)
144
145 Dealing with untrusted peers imposes extra complexity on code that em‐
146 ploys shared memory. Memory sealing enables that extra complexity to
147 be eliminated, by allowing a process to operate secure in the knowledge
148 that its peer can't modify the shared memory in an undesired fashion.
149
150 An example of the usage of the sealing mechanism is as follows:
151
152 (1) The first process creates a tmpfs(5) file using memfd_create().
153 The call yields a file descriptor used in subsequent steps.
154
155 (2) The first process sizes the file created in the previous step us‐
156 ing ftruncate(2), maps it using mmap(2), and populates the shared
157 memory with the desired data.
158
159 (3) The first process uses the fcntl(2) F_ADD_SEALS operation to place
160 one or more seals on the file, in order to restrict further modi‐
161 fications on the file. (If placing the seal F_SEAL_WRITE, then it
162 will be necessary to first unmap the shared writable mapping cre‐
163 ated in the previous step. Otherwise, behavior similar to
164 F_SEAL_WRITE can be achieved by using F_SEAL_FUTURE_WRITE, which
165 will prevent future writes via mmap(2) and write(2) from succeed‐
166 ing while keeping existing shared writable mappings).
167
168 (4) A second process obtains a file descriptor for the tmpfs(5) file
169 and maps it. Among the possible ways in which this could happen
170 are the following:
171
172 • The process that called memfd_create() could transfer the re‐
173 sulting file descriptor to the second process via a UNIX domain
174 socket (see unix(7) and cmsg(3)). The second process then maps
175 the file using mmap(2).
176
177 • The second process is created via fork(2) and thus automati‐
178 cally inherits the file descriptor and mapping. (Note that in
179 this case and the next, there is a natural trust relationship
180 between the two processes, since they are running under the
181 same user ID. Therefore, file sealing would not normally be
182 necessary.)
183
184 • The second process opens the file /proc/pid/fd/fd, where <pid>
185 is the PID of the first process (the one that called memfd_cre‐
186 ate()), and <fd> is the number of the file descriptor returned
187 by the call to memfd_create() in that process. The second
188 process then maps the file using mmap(2).
189
190 (5) The second process uses the fcntl(2) F_GET_SEALS operation to re‐
191 trieve the bit mask of seals that has been applied to the file.
192 This bit mask can be inspected in order to determine what kinds of
193 restrictions have been placed on file modifications. If desired,
194 the second process can apply further seals to impose additional
195 restrictions (so long as the F_SEAL_SEAL seal has not yet been ap‐
196 plied).
197
199 Below are shown two example programs that demonstrate the use of
200 memfd_create() and the file sealing API.
201
202 The first program, t_memfd_create.c, creates a tmpfs(5) file using
203 memfd_create(), sets a size for the file, maps it into memory, and op‐
204 tionally places some seals on the file. The program accepts up to
205 three command-line arguments, of which the first two are required. The
206 first argument is the name to associate with the file, the second argu‐
207 ment is the size to be set for the file, and the optional third argu‐
208 ment is a string of characters that specify seals to be set on the
209 file.
210
211 The second program, t_get_seals.c, can be used to open an existing file
212 that was created via memfd_create() and inspect the set of seals that
213 have been applied to that file.
214
215 The following shell session demonstrates the use of these programs.
216 First we create a tmpfs(5) file and set some seals on it:
217
218 $ ./t_memfd_create my_memfd_file 4096 sw &
219 [1] 11775
220 PID: 11775; fd: 3; /proc/11775/fd/3
221
222 At this point, the t_memfd_create program continues to run in the back‐
223 ground. From another program, we can obtain a file descriptor for the
224 file created by memfd_create() by opening the /proc/pid/fd file that
225 corresponds to the file descriptor opened by memfd_create(). Using
226 that pathname, we inspect the content of the /proc/pid/fd symbolic
227 link, and use our t_get_seals program to view the seals that have been
228 placed on the file:
229
230 $ readlink /proc/11775/fd/3
231 /memfd:my_memfd_file (deleted)
232 $ ./t_get_seals /proc/11775/fd/3
233 Existing seals: WRITE SHRINK
234
235 Program source: t_memfd_create.c
236
237 #define _GNU_SOURCE
238 #include <err.h>
239 #include <fcntl.h>
240 #include <stdint.h>
241 #include <stdio.h>
242 #include <stdlib.h>
243 #include <string.h>
244 #include <sys/mman.h>
245 #include <unistd.h>
246
247 int
248 main(int argc, char *argv[])
249 {
250 int fd;
251 char *name, *seals_arg;
252 ssize_t len;
253 unsigned int seals;
254
255 if (argc < 3) {
256 fprintf(stderr, "%s name size [seals]\n", argv[0]);
257 fprintf(stderr, "\t'seals' can contain any of the "
258 "following characters:\n");
259 fprintf(stderr, "\t\tg - F_SEAL_GROW\n");
260 fprintf(stderr, "\t\ts - F_SEAL_SHRINK\n");
261 fprintf(stderr, "\t\tw - F_SEAL_WRITE\n");
262 fprintf(stderr, "\t\tW - F_SEAL_FUTURE_WRITE\n");
263 fprintf(stderr, "\t\tS - F_SEAL_SEAL\n");
264 exit(EXIT_FAILURE);
265 }
266
267 name = argv[1];
268 len = atoi(argv[2]);
269 seals_arg = argv[3];
270
271 /* Create an anonymous file in tmpfs; allow seals to be
272 placed on the file. */
273
274 fd = memfd_create(name, MFD_ALLOW_SEALING);
275 if (fd == -1)
276 err(EXIT_FAILURE, "memfd_create");
277
278 /* Size the file as specified on the command line. */
279
280 if (ftruncate(fd, len) == -1)
281 err(EXIT_FAILURE, "truncate");
282
283 printf("PID: %jd; fd: %d; /proc/%jd/fd/%d\n",
284 (intmax_t) getpid(), fd, (intmax_t) getpid(), fd);
285
286 /* Code to map the file and populate the mapping with data
287 omitted. */
288
289 /* If a 'seals' command-line argument was supplied, set some
290 seals on the file. */
291
292 if (seals_arg != NULL) {
293 seals = 0;
294
295 if (strchr(seals_arg, 'g') != NULL)
296 seals |= F_SEAL_GROW;
297 if (strchr(seals_arg, 's') != NULL)
298 seals |= F_SEAL_SHRINK;
299 if (strchr(seals_arg, 'w') != NULL)
300 seals |= F_SEAL_WRITE;
301 if (strchr(seals_arg, 'W') != NULL)
302 seals |= F_SEAL_FUTURE_WRITE;
303 if (strchr(seals_arg, 'S') != NULL)
304 seals |= F_SEAL_SEAL;
305
306 if (fcntl(fd, F_ADD_SEALS, seals) == -1)
307 err(EXIT_FAILURE, "fcntl");
308 }
309
310 /* Keep running, so that the file created by memfd_create()
311 continues to exist. */
312
313 pause();
314
315 exit(EXIT_SUCCESS);
316 }
317
318 Program source: t_get_seals.c
319
320 #define _GNU_SOURCE
321 #include <err.h>
322 #include <fcntl.h>
323 #include <stdio.h>
324 #include <stdlib.h>
325
326 int
327 main(int argc, char *argv[])
328 {
329 int fd;
330 unsigned int seals;
331
332 if (argc != 2) {
333 fprintf(stderr, "%s /proc/PID/fd/FD\n", argv[0]);
334 exit(EXIT_FAILURE);
335 }
336
337 fd = open(argv[1], O_RDWR);
338 if (fd == -1)
339 err(EXIT_FAILURE, "open");
340
341 seals = fcntl(fd, F_GET_SEALS);
342 if (seals == -1)
343 err(EXIT_FAILURE, "fcntl");
344
345 printf("Existing seals:");
346 if (seals & F_SEAL_SEAL)
347 printf(" SEAL");
348 if (seals & F_SEAL_GROW)
349 printf(" GROW");
350 if (seals & F_SEAL_WRITE)
351 printf(" WRITE");
352 if (seals & F_SEAL_FUTURE_WRITE)
353 printf(" FUTURE_WRITE");
354 if (seals & F_SEAL_SHRINK)
355 printf(" SHRINK");
356 printf("\n");
357
358 /* Code to map the file and access the contents of the
359 resulting mapping omitted. */
360
361 exit(EXIT_SUCCESS);
362 }
363
365 fcntl(2), ftruncate(2), memfd_secret(2), mmap(2), shmget(2),
366 shm_open(3)
367
368
369
370Linux man-pages 6.04 2023-04-03 memfd_create(2)