1memfd_secret(2) System Calls Manual memfd_secret(2)
2
3
4
6 memfd_secret - create an anonymous RAM-based file to access secret mem‐
7 ory regions
8
10 Standard C library (libc, -lc)
11
13 #include <sys/syscall.h> /* Definition of SYS_* constants */
14 #include <unistd.h>
15
16 int syscall(SYS_memfd_secret, unsigned int flags);
17
18 Note: glibc provides no wrapper for memfd_secret(), necessitating the
19 use of syscall(2).
20
22 memfd_secret() creates an anonymous RAM-based file and returns a file
23 descriptor that refers to it. The file provides a way to create and
24 access memory regions with stronger protection than usual RAM-based
25 files and anonymous memory mappings. Once all open references to the
26 file are closed, it is automatically released. The initial size of the
27 file is set to 0. Following the call, the file size should be set us‐
28 ing ftruncate(2).
29
30 The memory areas backing the file created with memfd_secret(2) are vis‐
31 ible only to the processes that have access to the file descriptor.
32 The memory region is removed from the kernel page tables and only the
33 page tables of the processes holding the file descriptor map the corre‐
34 sponding physical memory. (Thus, the pages in the region can't be ac‐
35 cessed by the kernel itself, so that, for example, pointers to the re‐
36 gion can't be passed to system calls.)
37
38 The following values may be bitwise ORed in flags to control the behav‐
39 ior of memfd_secret():
40
41 FD_CLOEXEC
42 Set the close-on-exec flag on the new file descriptor, which
43 causes the region to be removed from the process on execve(2).
44 See the description of the O_CLOEXEC flag in open(2)
45
46 As its return value, memfd_secret() returns a new file descriptor that
47 refers to an anonymous file. This file descriptor is opened for both
48 reading and writing (O_RDWR) and O_LARGEFILE is set for the file de‐
49 scriptor.
50
51 With respect to fork(2) and execve(2), the usual semantics apply for
52 the file descriptor created by memfd_secret(). A copy of the file de‐
53 scriptor is inherited by the child produced by fork(2) and refers to
54 the same file. The file descriptor is preserved across execve(2), un‐
55 less the close-on-exec flag has been set.
56
57 The memory region is locked into memory in the same way as with
58 mlock(2), so that it will never be written into swap, and hibernation
59 is inhibited for as long as any memfd_secret() descriptions exist.
60 However the implementation of memfd_secret() will not try to populate
61 the whole range during the mmap(2) call that attaches the region into
62 the process's address space; instead, the pages are only actually allo‐
63 cated as they are faulted in. The amount of memory allowed for memory
64 mappings of the file descriptor obeys the same rules as mlock(2) and
65 cannot exceed RLIMIT_MEMLOCK.
66
68 On success, memfd_secret() returns a new file descriptor. On error, -1
69 is returned and errno is set to indicate the error.
70
72 EINVAL flags included unknown bits.
73
74 EMFILE The per-process limit on the number of open file descriptors has
75 been reached.
76
77 EMFILE The system-wide limit on the total number of open files has been
78 reached.
79
80 ENOMEM There was insufficient memory to create a new anonymous file.
81
82 ENOSYS memfd_secret() is not implemented on this architecture, or has
83 not been enabled on the kernel command-line with secretmem_en‐
84 able=1.
85
87 Linux.
88
90 Linux 5.14.
91
93 The memfd_secret() system call is designed to allow a user-space
94 process to create a range of memory that is inaccessible to anybody
95 else - kernel included. There is no 100% guarantee that kernel won't
96 be able to access memory ranges backed by memfd_secret() in any circum‐
97 stances, but nevertheless, it is much harder to exfiltrate data from
98 these regions.
99
100 memfd_secret() provides the following protections:
101
102 • Enhanced protection (in conjunction with all the other in-kernel at‐
103 tack prevention systems) against ROP attacks. Absence of any in-
104 kernel primitive for accessing memory backed by memfd_secret() means
105 that one-gadget ROP attack can't work to perform data exfiltration.
106 The attacker would need to find enough ROP gadgets to reconstruct
107 the missing page table entries, which significantly increases diffi‐
108 culty of the attack, especially when other protections like the ker‐
109 nel stack size limit and address space layout randomization are in
110 place.
111
112 • Prevent cross-process user-space memory exposures. Once a region
113 for a memfd_secret() memory mapping is allocated, the user can't ac‐
114 cidentally pass it into the kernel to be transmitted somewhere. The
115 memory pages in this region cannot be accessed via the direct map
116 and they are disallowed in get_user_pages.
117
118 • Harden against exploited kernel flaws. In order to access memory
119 areas backed by memfd_secret(), a kernel-side attack would need to
120 either walk the page tables and create new ones, or spawn a new
121 privileged user-space process to perform secrets exfiltration using
122 ptrace(2).
123
124 The way memfd_secret() allocates and locks the memory may impact over‐
125 all system performance, therefore the system call is disabled by de‐
126 fault and only available if the system administrator turned it on using
127 "secretmem.enable=y" kernel parameter.
128
129 To prevent potential data leaks of memory regions backed by memfd_se‐
130 cret() from a hybernation image, hybernation is prevented when there
131 are active memfd_secret() users.
132
134 fcntl(2), ftruncate(2), mlock(2), memfd_create(2), mmap(2), setr‐
135 limit(2)
136
137
138
139Linux man-pages 6.05 2023-03-30 memfd_secret(2)