1VIRTIOFSD(1) QEMU VIRTIOFSD(1)
2
3
4
6 virtiofsd - QEMU virtio-fs shared file system daemon
7
9 virtiofsd [OPTIONS]
10
12 Share a host directory tree with a guest through a virtio-fs device.
13 This program is a vhost-user backend that implements the virtio-fs de‐
14 vice. Each virtio-fs device instance requires its own virtiofsd
15 process.
16
17 This program is designed to work with QEMU's --device vhost-user-fs-pci
18 but should work with any virtual machine monitor (VMM) that supports
19 vhost-user. See the Examples section below.
20
21 This program must be run as the root user. The program drops privi‐
22 leges where possible during startup although it must be able to create
23 and access files with any uid/gid:
24
25 • The ability to invoke syscalls is limited using seccomp(2).
26
27 • Linux capabilities(7) are dropped.
28
29 In "namespace" sandbox mode the program switches into a new file system
30 namespace and invokes pivot_root(2) to make the shared directory tree
31 its root. A new pid and net namespace is also created to isolate the
32 process.
33
34 In "chroot" sandbox mode the program invokes chroot(2) to make the
35 shared directory tree its root. This mode is intended for container en‐
36 vironments where the container runtime has already set up the name‐
37 spaces and the program does not have permission to create namespaces
38 itself.
39
40 Both sandbox modes prevent "file system escapes" due to symlinks and
41 other file system objects that might lead to files outside the shared
42 directory.
43
45 -h, --help
46 Print help.
47
48 -V, --version
49 Print version.
50
51 -d Enable debug output.
52
53 --syslog
54 Print log messages to syslog instead of stderr.
55
56 -o OPTION
57
58 • debug - Enable debug output.
59
60 • flock|no_flock - Enable/disable flock. The default is
61 no_flock.
62
63 • modcaps=CAPLIST Modify the list of capabilities allowed;
64 CAPLIST is a colon separated list of capabilities, each pre‐
65 ceded by either + or -, e.g. ''+sys_admin:-chown''.
66
67 • log_level=LEVEL - Print only log messages matching LEVEL or
68 more severe. LEVEL is one of err, warn, info, or debug. The
69 default is info.
70
71 • posix_lock|no_posix_lock - Enable/disable remote POSIX locks.
72 The default is no_posix_lock.
73
74 • readdirplus|no_readdirplus - Enable/disable readdirplus. The
75 default is readdirplus.
76
77 • sandbox=namespace|chroot - Sandbox mode: - namespace: Create
78 mount, pid, and net namespaces and pivot_root(2) into the
79 shared directory. - chroot: chroot(2) into shared directory
80 (use in containers). The default is "namespace".
81
82 • source=PATH - Share host directory tree located at PATH. This
83 option is required.
84
85 • timeout=TIMEOUT - I/O timeout in seconds. The default depends
86 on cache= option.
87
88 • writeback|no_writeback - Enable/disable writeback cache. The
89 cache allows the FUSE client to buffer and merge write re‐
90 quests. The default is no_writeback.
91
92 • xattr|no_xattr - Enable/disable extended attributes (xattr) on
93 files and directories. The default is no_xattr.
94
95 • posix_acl|no_posix_acl - Enable/disable posix acl support.
96 Posix ACLs are disabled by default.
97
98 --socket-path=PATH
99 Listen on vhost-user UNIX domain socket at PATH.
100
101 --socket-group=GROUP
102 Set the vhost-user UNIX domain socket gid to GROUP.
103
104 --fd=FDNUM
105 Accept connections from vhost-user UNIX domain socket file de‐
106 scriptor FDNUM. The file descriptor must already be listening
107 for connections.
108
109 --thread-pool-size=NUM
110 Restrict the number of worker threads per request queue to NUM.
111 The default is 64.
112
113 --cache=none|auto|always
114 Select the desired trade-off between coherency and performance.
115 none forbids the FUSE client from caching to achieve best co‐
116 herency at the cost of performance. auto acts similar to NFS
117 with a 1 second metadata cache timeout. always sets a long
118 cache lifetime at the expense of coherency. The default is
119 auto.
120
122 By default the name of xattr's used by the client are passed through to
123 the server file system. This can be a problem where either those xattr
124 names are used by something on the server (e.g. selinux client/server
125 confusion) or if the virtiofsd is running in a container with re‐
126 stricted privileges where it cannot access some attributes.
127
128 Mapping syntax
129 A mapping of xattr names can be made using -o xattrmap=mapping where
130 the mapping string consists of a series of rules.
131
132 The first matching rule terminates the mapping. The set of rules must
133 include a terminating rule to match any remaining attributes at the
134 end.
135
136 Each rule consists of a number of fields separated with a separator
137 that is the first non-white space character in the rule. This separa‐
138 tor must then be used for the whole rule. White space may be added be‐
139 fore and after each rule.
140
141 Using ':' as the separator a rule is of the form:
142
143 :type:scope:key:prepend:
144
145 scope is:
146
147 •
148
149 'client' - match 'key' against a xattr name from the client for
150 setxattr/getxattr/removexattr
151
152 •
153
154 'server' - match 'prepend' against a xattr name from the server
155 for listxattr
156
157 •
158
159 'all' - can be used to make a single rule where both the server
160 and client matches are triggered.
161
162 type is one of:
163
164 • 'prefix' - is designed to prepend and strip a prefix; the modified
165 attributes then being passed on to the client/server.
166
167 • 'ok' - Causes the rule set to be terminated when a match is found
168 while allowing matching xattr's through unchanged. It is intended
169 both as a way of explicitly terminating the list of rules, and to al‐
170 low some xattr's to skip following rules.
171
172 • 'bad' - If a client tries to use a name matching 'key' it's denied
173 using EPERM; when the server passes an attribute name matching
174 'prepend' it's hidden. In many ways it's use is very like 'ok' as
175 either an explicit terminator or for special handling of certain pat‐
176 terns.
177
178 • 'unsupported' - If a client tries to use a name matching 'key' it's
179 denied using ENOTSUP; when the server passes an attribute name match‐
180 ing 'prepend' it's hidden. In many ways it's use is very like 'ok'
181 as either an explicit terminator or for special handling of certain
182 patterns.
183
184 key is a string tested as a prefix on an attribute name originating on
185 the client. It maybe empty in which case a 'client' rule will always
186 match on client names.
187
188 prepend is a string tested as a prefix on an attribute name originating
189 on the server, and used as a new prefix. It may be empty in which case
190 a 'server' rule will always match on all names from the server.
191
192 e.g.:
193 :prefix:client:trusted.:user.virtiofs.:
194
195 will match 'trusted.' attributes in client calls and prefix them be‐
196 fore passing them to the server.
197
198 :prefix:server::user.virtiofs.:
199
200 will strip 'user.virtiofs.' from all server replies.
201
202 :prefix:all:trusted.:user.virtiofs.:
203
204 combines the previous two cases into a single rule.
205
206 :ok:client:user.::
207
208 will allow get/set xattr for 'user.' xattr's and ignore following
209 rules.
210
211 :ok:server::security.:
212
213 will pass 'securty.' xattr's in listxattr from the server and ignore
214 following rules.
215
216 :ok:all:::
217
218 will terminate the rule search passing any remaining attributes in
219 both directions.
220
221 :bad:server::security.:
222
223 would hide 'security.' xattr's in listxattr from the server.
224
225 A simpler 'map' type provides a shorter syntax for the common case:
226
227 :map:key:prepend:
228
229 The 'map' type adds a number of separate rules to add prepend as a pre‐
230 fix to the matched key (or all attributes if key is empty). There may
231 be at most one 'map' rule and it must be the last rule in the set.
232
233 Note: When the 'security.capability' xattr is remapped, the daemon has
234 to do extra work to remove it during many operations, which the host
235 kernel normally does itself.
236
237 Security considerations
238 Operating systems typically partition the xattr namespace using well
239 defined name prefixes. Each partition may have different access con‐
240 trols applied. For example, on Linux there are multiple partitions
241
242 • system.* - access varies depending on attribute & filesystem
243
244 • security.* - only processes with CAP_SYS_ADMIN
245
246 • trusted.* - only processes with CAP_SYS_ADMIN
247
248 • user.* - any process granted by file permissions / ownership
249
250 While other OS such as FreeBSD have different name prefixes and access
251 control rules.
252
253 When remapping attributes on the host, it is important to ensure that
254 the remapping does not allow a guest user to evade the guest access
255 control rules.
256
257 Consider if trusted.* from the guest was remapped to user.vir‐
258 tiofs.trusted* in the host. An unprivileged user in a Linux guest has
259 the ability to write to xattrs under user.*. Thus the user can evade
260 the access control restriction on trusted.* by instead writing to
261 user.virtiofs.trusted.*.
262
263 As noted above, the partitions used and access controls applied, will
264 vary across guest OS, so it is not wise to try to predict what the
265 guest OS will use.
266
267 The simplest way to avoid an insecure configuration is to remap all
268 xattrs at once, to a given fixed prefix. This is shown in example (1)
269 below.
270
271 If selectively mapping only a subset of xattr prefixes, then rules must
272 be added to explicitly block direct access to the target of the remap‐
273 ping. This is shown in example (2) below.
274
275 Mapping examples
276 1. Prefix all attributes with 'user.virtiofs.'
277
278 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
279
280 This uses two rules, using : as the field separator; the first rule
281 prefixes and strips 'user.virtiofs.', the second rule hides any
282 non-prefixed attributes that the host set.
283
284 This is equivalent to the 'map' rule:
285
286 -o xattrmap=":map::user.virtiofs.:"
287
288 2. Prefix 'trusted.' attributes, allow others through
289
290 "/prefix/all/trusted./user.virtiofs./
291 /bad/server//trusted./
292 /bad/client/user.virtiofs.//
293 /ok/all///"
294
295 Here there are four rules, using / as the field separator, and also
296 demonstrating that new lines can be included between rules. The first
297 rule is the prefixing of 'trusted.' and stripping of 'user.virtiofs.'.
298 The second rule hides unprefixed 'trusted.' attributes on the host.
299 The third rule stops a guest from explicitly setting the 'user.vir‐
300 tiofs.' path directly to prevent access control bypass on the target of
301 the earlier prefix remapping. Finally, the fourth rule lets all re‐
302 maining attributes through.
303
304 This is equivalent to the 'map' rule:
305
306 -o xattrmap="/map/trusted./user.virtiofs./"
307
308 3. Hide 'security.' attributes, and allow everything else
309
310 "/bad/all/security./security./
311 /ok/all///'
312
313 The first rule combines what could be separate client and server rules
314 into a single 'all' rule, matching 'security.' in either client argu‐
315 ments or lists returned from the host. This stops the client seeing
316 any 'security.' attributes on the server and stops it setting any.
317
319 Export /var/lib/fs/vm001/ on vhost-user UNIX domain socket
320 /var/run/vm001-vhost-fs.sock:
321
322 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
323 host# qemu-system-x86_64 \
324 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
325 -device vhost-user-fs-pci,chardev=char0,tag=myfs \
326 -object memory-backend-memfd,id=mem,size=4G,share=on \
327 -numa node,memdev=mem \
328 ...
329 guest# mount -t virtiofs myfs /mnt
330
332 Stefan Hajnoczi <stefanha@redhat.com>, Masayoshi Mizuma
333 <m.mizuma@jp.fujitsu.com>
334
336 2022, The QEMU Project Developers
337
338
339
340
3416.2.0 Jun 11, 2022 VIRTIOFSD(1)