1memfd_create(2)               System Calls Manual              memfd_create(2)
2
3
4

NAME

6       memfd_create - create an anonymous file
7

LIBRARY

9       Standard C library (libc, -lc)
10

SYNOPSIS

12       #define _GNU_SOURCE         /* See feature_test_macros(7) */
13       #include <sys/mman.h>
14
15       int memfd_create(const char *name, unsigned int flags);
16

DESCRIPTION

18       memfd_create()  creates an anonymous file and returns a file descriptor
19       that refers to it.  The file behaves like a regular file, and so can be
20       modified, truncated, memory-mapped, and so on.  However, unlike a regu‐
21       lar file, it lives in RAM and has a volatile backing storage.  Once all
22       references  to  the  file  are  dropped,  it is automatically released.
23       Anonymous memory is used for all backing pages of the file.  Therefore,
24       files created by memfd_create() have the same semantics as other anony‐
25       mous memory allocations such as those allocated using mmap(2) with  the
26       MAP_ANONYMOUS flag.
27
28       The initial size of the file is set to 0.  Following the call, the file
29       size should be set using ftruncate(2).  (Alternatively, the file may be
30       populated by calls to write(2) or similar.)
31
32       The  name  supplied in name is used as a filename and will be displayed
33       as the target of the  corresponding  symbolic  link  in  the  directory
34       /proc/self/fd/.   The displayed name is always prefixed with memfd: and
35       serves only for debugging purposes.  Names do not affect  the  behavior
36       of  the  file  descriptor, and as such multiple files can have the same
37       name without any side effects.
38
39       The following values may be bitwise ORed in flags to change the  behav‐
40       ior of memfd_create():
41
42       MFD_CLOEXEC
43              Set the close-on-exec (FD_CLOEXEC) flag on the new file descrip‐
44              tor.  See the description of the O_CLOEXEC flag in  open(2)  for
45              reasons why this may be useful.
46
47       MFD_ALLOW_SEALING
48              Allow  sealing  operations  on this file.  See the discussion of
49              the F_ADD_SEALS and F_GET_SEALS operations in fcntl(2), and also
50              NOTES,  below.  The initial set of seals is empty.  If this flag
51              is not set, the initial set of seals will be F_SEAL_SEAL,  mean‐
52              ing that no other seals can be set on the file.
53
54       MFD_HUGETLB (since Linux 4.14)
55              The  anonymous  file will be created in the hugetlbfs filesystem
56              using huge pages.  See the Linux kernel source  file  Documenta‐
57              tion/admin-guide/mm/hugetlbpage.rst  for  more information about
58              hugetlbfs.  Specifying both MFD_HUGETLB and MFD_ALLOW_SEALING in
59              flags is supported since Linux 4.16.
60
61       MFD_HUGE_2MB, MFD_HUGE_1GB, ...
62              Used  in  conjunction  with  MFD_HUGETLB  to  select alternative
63              hugetlb page sizes (respectively, 2 MB, 1 GB, ...)   on  systems
64              that support multiple hugetlb page sizes.  Definitions for known
65              huge page sizes are included in the header file <linux/memfd.h>.
66
67              For details on encoding huge page  sizes  not  included  in  the
68              header file, see the discussion of the similarly named constants
69              in mmap(2).
70
71       Unused bits in flags must be 0.
72
73       As its return value, memfd_create() returns a new file descriptor  that
74       can  be  used to refer to the file.  This file descriptor is opened for
75       both reading and writing (O_RDWR) and O_LARGEFILE is set for  the  file
76       descriptor.
77
78       With  respect  to  fork(2) and execve(2), the usual semantics apply for
79       the file descriptor created by memfd_create().  A copy of the file  de‐
80       scriptor  is  inherited  by the child produced by fork(2) and refers to
81       the same file.  The file descriptor is preserved across execve(2),  un‐
82       less the close-on-exec flag has been set.
83

RETURN VALUE

85       On success, memfd_create() returns a new file descriptor.  On error, -1
86       is returned and errno is set to indicate the error.
87

ERRORS

89       EFAULT The address in name points to invalid memory.
90
91       EINVAL flags included unknown bits.
92
93       EINVAL name was too long.  (The limit is 249 bytes, excluding the  ter‐
94              minating null byte.)
95
96       EINVAL Both MFD_HUGETLB and MFD_ALLOW_SEALING were specified in flags.
97
98       EMFILE The per-process limit on the number of open file descriptors has
99              been reached.
100
101       ENFILE The system-wide limit on the total number of open files has been
102              reached.
103
104       ENOMEM There was insufficient memory to create a new anonymous file.
105
106       EPERM  The MFD_HUGETLB flag was specified, but the caller was not priv‐
107              ileged (did not have the CAP_IPC_LOCK capability) and is  not  a
108              member  of  the sysctl_hugetlb_shm_group group; see the descrip‐
109              tion of /proc/sys/vm/sysctl_hugetlb_shm_group in proc(5).
110

STANDARDS

112       Linux.
113

HISTORY

115       Linux 3.17, glibc 2.27.
116

NOTES

118       The memfd_create() system call provides a simple alternative  to  manu‐
119       ally  mounting a tmpfs(5) filesystem and creating and opening a file in
120       that filesystem.  The primary purpose of memfd_create()  is  to  create
121       files and associated file descriptors that are used with the file-seal‐
122       ing APIs provided by fcntl(2).
123
124       The memfd_create() system call  also  has  uses  without  file  sealing
125       (which  is  why  file-sealing  is disabled, unless explicitly requested
126       with the MFD_ALLOW_SEALING flag).  In particular, it can be used as  an
127       alternative  to creating files in tmp or as an alternative to using the
128       open(2) O_TMPFILE in cases where there is no intention to actually link
129       the resulting file into the filesystem.
130
131   File sealing
132       In  the  absence of file sealing, processes that communicate via shared
133       memory must either trust each other, or take measures to deal with  the
134       possibility that an untrusted peer may manipulate the shared memory re‐
135       gion in problematic ways.  For example, an untrusted peer might  modify
136       the  contents  of  the  shared memory at any time, or shrink the shared
137       memory region.  The former possibility leaves the local process vulner‐
138       able  to  time-of-check-to-time-of-use race conditions (typically dealt
139       with by copying data from the shared memory region before checking  and
140       using  it).  The latter possibility leaves the local process vulnerable
141       to SIGBUS signals when an attempt is made to access  a  now-nonexistent
142       location  in  the shared memory region.  (Dealing with this possibility
143       necessitates the use of a handler for the SIGBUS signal.)
144
145       Dealing with untrusted peers imposes extra complexity on code that  em‐
146       ploys  shared  memory.  Memory sealing enables that extra complexity to
147       be eliminated, by allowing a process to operate secure in the knowledge
148       that its peer can't modify the shared memory in an undesired fashion.
149
150       An example of the usage of the sealing mechanism is as follows:
151
152       (1)  The  first  process  creates a tmpfs(5) file using memfd_create().
153            The call yields a file descriptor used in subsequent steps.
154
155       (2)  The first process sizes the file created in the previous step  us‐
156            ing  ftruncate(2), maps it using mmap(2), and populates the shared
157            memory with the desired data.
158
159       (3)  The first process uses the fcntl(2) F_ADD_SEALS operation to place
160            one  or more seals on the file, in order to restrict further modi‐
161            fications on the file.  (If placing the seal F_SEAL_WRITE, then it
162            will  be necessary to first unmap the shared writable mapping cre‐
163            ated  in  the  previous  step.   Otherwise,  behavior  similar  to
164            F_SEAL_WRITE  can  be achieved by using F_SEAL_FUTURE_WRITE, which
165            will prevent future writes via mmap(2) and write(2) from  succeed‐
166            ing while keeping existing shared writable mappings).
167
168       (4)  A  second  process obtains a file descriptor for the tmpfs(5) file
169            and maps it.  Among the possible ways in which this  could  happen
170            are the following:
171
172            •  The  process  that called memfd_create() could transfer the re‐
173               sulting file descriptor to the second process via a UNIX domain
174               socket (see unix(7) and cmsg(3)).  The second process then maps
175               the file using mmap(2).
176
177            •  The second process is created via fork(2)  and  thus  automati‐
178               cally  inherits the file descriptor and mapping.  (Note that in
179               this case and the next, there is a natural  trust  relationship
180               between  the  two  processes,  since they are running under the
181               same user ID.  Therefore, file sealing would  not  normally  be
182               necessary.)
183
184            •  The  second process opens the file /proc/pid/fd/fd, where <pid>
185               is the PID of the first process (the one that called memfd_cre‐
186               ate()),  and <fd> is the number of the file descriptor returned
187               by the call to memfd_create()  in  that  process.   The  second
188               process then maps the file using mmap(2).
189
190       (5)  The  second process uses the fcntl(2) F_GET_SEALS operation to re‐
191            trieve the bit mask of seals that has been applied  to  the  file.
192            This bit mask can be inspected in order to determine what kinds of
193            restrictions have been placed on file modifications.  If  desired,
194            the  second  process  can apply further seals to impose additional
195            restrictions (so long as the F_SEAL_SEAL seal has not yet been ap‐
196            plied).
197

EXAMPLES

199       Below  are  shown  two  example  programs  that  demonstrate the use of
200       memfd_create() and the file sealing API.
201
202       The first program, t_memfd_create.c,  creates  a  tmpfs(5)  file  using
203       memfd_create(),  sets a size for the file, maps it into memory, and op‐
204       tionally places some seals on the file.   The  program  accepts  up  to
205       three command-line arguments, of which the first two are required.  The
206       first argument is the name to associate with the file, the second argu‐
207       ment  is  the size to be set for the file, and the optional third argu‐
208       ment is a string of characters that specify seals  to  be  set  on  the
209       file.
210
211       The second program, t_get_seals.c, can be used to open an existing file
212       that was created via memfd_create() and inspect the set of  seals  that
213       have been applied to that file.
214
215       The  following  shell  session  demonstrates the use of these programs.
216       First we create a tmpfs(5) file and set some seals on it:
217
218           $ ./t_memfd_create my_memfd_file 4096 sw &
219           [1] 11775
220           PID: 11775; fd: 3; /proc/11775/fd/3
221
222       At this point, the t_memfd_create program continues to run in the back‐
223       ground.   From another program, we can obtain a file descriptor for the
224       file created by memfd_create() by opening the  /proc/pid/fd  file  that
225       corresponds  to  the  file  descriptor opened by memfd_create().  Using
226       that pathname, we inspect the  content  of  the  /proc/pid/fd  symbolic
227       link,  and use our t_get_seals program to view the seals that have been
228       placed on the file:
229
230           $ readlink /proc/11775/fd/3
231           /memfd:my_memfd_file (deleted)
232           $ ./t_get_seals /proc/11775/fd/3
233           Existing seals: WRITE SHRINK
234
235   Program source: t_memfd_create.c
236
237       #define _GNU_SOURCE
238       #include <err.h>
239       #include <fcntl.h>
240       #include <stdint.h>
241       #include <stdio.h>
242       #include <stdlib.h>
243       #include <string.h>
244       #include <sys/mman.h>
245       #include <unistd.h>
246
247       int
248       main(int argc, char *argv[])
249       {
250           int           fd;
251           char          *name, *seals_arg;
252           ssize_t       len;
253           unsigned int  seals;
254
255           if (argc < 3) {
256               fprintf(stderr, "%s name size [seals]\n", argv[0]);
257               fprintf(stderr, "\t'seals' can contain any of the "
258                       "following characters:\n");
259               fprintf(stderr, "\t\tg - F_SEAL_GROW\n");
260               fprintf(stderr, "\t\ts - F_SEAL_SHRINK\n");
261               fprintf(stderr, "\t\tw - F_SEAL_WRITE\n");
262               fprintf(stderr, "\t\tW - F_SEAL_FUTURE_WRITE\n");
263               fprintf(stderr, "\t\tS - F_SEAL_SEAL\n");
264               exit(EXIT_FAILURE);
265           }
266
267           name = argv[1];
268           len = atoi(argv[2]);
269           seals_arg = argv[3];
270
271           /* Create an anonymous file in tmpfs; allow seals to be
272              placed on the file. */
273
274           fd = memfd_create(name, MFD_ALLOW_SEALING);
275           if (fd == -1)
276               err(EXIT_FAILURE, "memfd_create");
277
278           /* Size the file as specified on the command line. */
279
280           if (ftruncate(fd, len) == -1)
281               err(EXIT_FAILURE, "truncate");
282
283           printf("PID: %jd; fd: %d; /proc/%jd/fd/%d\n",
284                  (intmax_t) getpid(), fd, (intmax_t) getpid(), fd);
285
286           /* Code to map the file and populate the mapping with data
287              omitted. */
288
289           /* If a 'seals' command-line argument was supplied, set some
290              seals on the file. */
291
292           if (seals_arg != NULL) {
293               seals = 0;
294
295               if (strchr(seals_arg, 'g') != NULL)
296                   seals |= F_SEAL_GROW;
297               if (strchr(seals_arg, 's') != NULL)
298                   seals |= F_SEAL_SHRINK;
299               if (strchr(seals_arg, 'w') != NULL)
300                   seals |= F_SEAL_WRITE;
301               if (strchr(seals_arg, 'W') != NULL)
302                   seals |= F_SEAL_FUTURE_WRITE;
303               if (strchr(seals_arg, 'S') != NULL)
304                   seals |= F_SEAL_SEAL;
305
306               if (fcntl(fd, F_ADD_SEALS, seals) == -1)
307                   err(EXIT_FAILURE, "fcntl");
308           }
309
310           /* Keep running, so that the file created by memfd_create()
311              continues to exist. */
312
313           pause();
314
315           exit(EXIT_SUCCESS);
316       }
317
318   Program source: t_get_seals.c
319
320       #define _GNU_SOURCE
321       #include <err.h>
322       #include <fcntl.h>
323       #include <stdio.h>
324       #include <stdlib.h>
325
326       int
327       main(int argc, char *argv[])
328       {
329           int           fd;
330           unsigned int  seals;
331
332           if (argc != 2) {
333               fprintf(stderr, "%s /proc/PID/fd/FD\n", argv[0]);
334               exit(EXIT_FAILURE);
335           }
336
337           fd = open(argv[1], O_RDWR);
338           if (fd == -1)
339               err(EXIT_FAILURE, "open");
340
341           seals = fcntl(fd, F_GET_SEALS);
342           if (seals == -1)
343               err(EXIT_FAILURE, "fcntl");
344
345           printf("Existing seals:");
346           if (seals & F_SEAL_SEAL)
347               printf(" SEAL");
348           if (seals & F_SEAL_GROW)
349               printf(" GROW");
350           if (seals & F_SEAL_WRITE)
351               printf(" WRITE");
352           if (seals & F_SEAL_FUTURE_WRITE)
353               printf(" FUTURE_WRITE");
354           if (seals & F_SEAL_SHRINK)
355               printf(" SHRINK");
356           printf("\n");
357
358           /* Code to map the file and access the contents of the
359              resulting mapping omitted. */
360
361           exit(EXIT_SUCCESS);
362       }
363

SEE ALSO

365       fcntl(2),   ftruncate(2),    memfd_secret(2),    mmap(2),    shmget(2),
366       shm_open(3)
367
368
369
370Linux man-pages 6.05              2023-05-03                   memfd_create(2)
Impressum