1VIRTIOFSD(1) QEMU VIRTIOFSD(1)
2
3
4
6 virtiofsd - QEMU virtio-fs shared file system daemon
7
9 virtiofsd [OPTIONS]
10
12 Share a host directory tree with a guest through a virtio-fs device.
13 This program is a vhost-user backend that implements the virtio-fs de‐
14 vice. Each virtio-fs device instance requires its own virtiofsd
15 process.
16
17 This program is designed to work with QEMU's --device vhost-user-fs-pci
18 but should work with any virtual machine monitor (VMM) that supports
19 vhost-user. See the Examples section below.
20
21 This program must be run as the root user. The program drops privi‐
22 leges where possible during startup although it must be able to create
23 and access files with any uid/gid:
24
25 • The ability to invoke syscalls is limited using seccomp(2).
26
27 • Linux capabilities(7) are dropped.
28
29 In "namespace" sandbox mode the program switches into a new file system
30 namespace and invokes pivot_root(2) to make the shared directory tree
31 its root. A new pid and net namespace is also created to isolate the
32 process.
33
34 In "chroot" sandbox mode the program invokes chroot(2) to make the
35 shared directory tree its root. This mode is intended for container en‐
36 vironments where the container runtime has already set up the name‐
37 spaces and the program does not have permission to create namespaces
38 itself.
39
40 Both sandbox modes prevent "file system escapes" due to symlinks and
41 other file system objects that might lead to files outside the shared
42 directory.
43
45 -h, --help
46 Print help.
47
48 -V, --version
49 Print version.
50
51 -d Enable debug output.
52
53 --syslog
54 Print log messages to syslog instead of stderr.
55
56 -o OPTION
57
58 • debug - Enable debug output.
59
60 • flock|no_flock - Enable/disable flock. The default is
61 no_flock.
62
63 • modcaps=CAPLIST Modify the list of capabilities allowed;
64 CAPLIST is a colon separated list of capabilities, each pre‐
65 ceded by either + or -, e.g. ''+sys_admin:-chown''.
66
67 • log_level=LEVEL - Print only log messages matching LEVEL or
68 more severe. LEVEL is one of err, warn, info, or debug. The
69 default is info.
70
71 • posix_lock|no_posix_lock - Enable/disable remote POSIX locks.
72 The default is no_posix_lock.
73
74 • readdirplus|no_readdirplus - Enable/disable readdirplus. The
75 default is readdirplus.
76
77 • sandbox=namespace|chroot - Sandbox mode: - namespace: Create
78 mount, pid, and net namespaces and pivot_root(2) into the
79 shared directory. - chroot: chroot(2) into shared directory
80 (use in containers). The default is "namespace".
81
82 • source=PATH - Share host directory tree located at PATH. This
83 option is required.
84
85 • timeout=TIMEOUT - I/O timeout in seconds. The default depends
86 on cache= option.
87
88 • writeback|no_writeback - Enable/disable writeback cache. The
89 cache allows the FUSE client to buffer and merge write re‐
90 quests. The default is no_writeback.
91
92 • xattr|no_xattr - Enable/disable extended attributes (xattr) on
93 files and directories. The default is no_xattr.
94
95 • posix_acl|no_posix_acl - Enable/disable posix acl support.
96 Posix ACLs are disabled by default.
97
98 • security_label|no_security_label - Enable/disable security la‐
99 bel support. Security labels are disabled by default. This
100 will allow client to send a MAC label of file during file cre‐
101 ation. Typically this is expected to be SELinux security la‐
102 bel. Server will try to set that label on newly created file
103 atomically wherever possible.
104
105 --socket-path=PATH
106 Listen on vhost-user UNIX domain socket at PATH.
107
108 --socket-group=GROUP
109 Set the vhost-user UNIX domain socket gid to GROUP.
110
111 --fd=FDNUM
112 Accept connections from vhost-user UNIX domain socket file de‐
113 scriptor FDNUM. The file descriptor must already be listening
114 for connections.
115
116 --thread-pool-size=NUM
117 Restrict the number of worker threads per request queue to NUM.
118 The default is 64.
119
120 --cache=none|auto|always
121 Select the desired trade-off between coherency and performance.
122 none forbids the FUSE client from caching to achieve best co‐
123 herency at the cost of performance. auto acts similar to NFS
124 with a 1 second metadata cache timeout. always sets a long
125 cache lifetime at the expense of coherency. The default is
126 auto.
127
129 By default the name of xattr's used by the client are passed through to
130 the server file system. This can be a problem where either those xattr
131 names are used by something on the server (e.g. selinux client/server
132 confusion) or if the virtiofsd is running in a container with re‐
133 stricted privileges where it cannot access some attributes.
134
135 Mapping syntax
136 A mapping of xattr names can be made using -o xattrmap=mapping where
137 the mapping string consists of a series of rules.
138
139 The first matching rule terminates the mapping. The set of rules must
140 include a terminating rule to match any remaining attributes at the
141 end.
142
143 Each rule consists of a number of fields separated with a separator
144 that is the first non-white space character in the rule. This separa‐
145 tor must then be used for the whole rule. White space may be added be‐
146 fore and after each rule.
147
148 Using ':' as the separator a rule is of the form:
149
150 :type:scope:key:prepend:
151
152 scope is:
153
154 •
155
156 'client' - match 'key' against a xattr name from the client for
157 setxattr/getxattr/removexattr
158
159 •
160
161 'server' - match 'prepend' against a xattr name from the server
162 for listxattr
163
164 •
165
166 'all' - can be used to make a single rule where both the server
167 and client matches are triggered.
168
169 type is one of:
170
171 • 'prefix' - is designed to prepend and strip a prefix; the modified
172 attributes then being passed on to the client/server.
173
174 • 'ok' - Causes the rule set to be terminated when a match is found
175 while allowing matching xattr's through unchanged. It is intended
176 both as a way of explicitly terminating the list of rules, and to al‐
177 low some xattr's to skip following rules.
178
179 • 'bad' - If a client tries to use a name matching 'key' it's denied
180 using EPERM; when the server passes an attribute name matching
181 'prepend' it's hidden. In many ways it's use is very like 'ok' as
182 either an explicit terminator or for special handling of certain pat‐
183 terns.
184
185 • 'unsupported' - If a client tries to use a name matching 'key' it's
186 denied using ENOTSUP; when the server passes an attribute name match‐
187 ing 'prepend' it's hidden. In many ways it's use is very like 'ok'
188 as either an explicit terminator or for special handling of certain
189 patterns.
190
191 key is a string tested as a prefix on an attribute name originating on
192 the client. It maybe empty in which case a 'client' rule will always
193 match on client names.
194
195 prepend is a string tested as a prefix on an attribute name originating
196 on the server, and used as a new prefix. It may be empty in which case
197 a 'server' rule will always match on all names from the server.
198
199 e.g.:
200 :prefix:client:trusted.:user.virtiofs.:
201
202 will match 'trusted.' attributes in client calls and prefix them be‐
203 fore passing them to the server.
204
205 :prefix:server::user.virtiofs.:
206
207 will strip 'user.virtiofs.' from all server replies.
208
209 :prefix:all:trusted.:user.virtiofs.:
210
211 combines the previous two cases into a single rule.
212
213 :ok:client:user.::
214
215 will allow get/set xattr for 'user.' xattr's and ignore following
216 rules.
217
218 :ok:server::security.:
219
220 will pass 'securty.' xattr's in listxattr from the server and ignore
221 following rules.
222
223 :ok:all:::
224
225 will terminate the rule search passing any remaining attributes in
226 both directions.
227
228 :bad:server::security.:
229
230 would hide 'security.' xattr's in listxattr from the server.
231
232 A simpler 'map' type provides a shorter syntax for the common case:
233
234 :map:key:prepend:
235
236 The 'map' type adds a number of separate rules to add prepend as a pre‐
237 fix to the matched key (or all attributes if key is empty). There may
238 be at most one 'map' rule and it must be the last rule in the set.
239
240 Note: When the 'security.capability' xattr is remapped, the daemon has
241 to do extra work to remove it during many operations, which the host
242 kernel normally does itself.
243
244 Security considerations
245 Operating systems typically partition the xattr namespace using well
246 defined name prefixes. Each partition may have different access con‐
247 trols applied. For example, on Linux there are multiple partitions
248
249 • system.* - access varies depending on attribute & filesystem
250
251 • security.* - only processes with CAP_SYS_ADMIN
252
253 • trusted.* - only processes with CAP_SYS_ADMIN
254
255 • user.* - any process granted by file permissions / ownership
256
257 While other OS such as FreeBSD have different name prefixes and access
258 control rules.
259
260 When remapping attributes on the host, it is important to ensure that
261 the remapping does not allow a guest user to evade the guest access
262 control rules.
263
264 Consider if trusted.* from the guest was remapped to user.vir‐
265 tiofs.trusted* in the host. An unprivileged user in a Linux guest has
266 the ability to write to xattrs under user.*. Thus the user can evade
267 the access control restriction on trusted.* by instead writing to
268 user.virtiofs.trusted.*.
269
270 As noted above, the partitions used and access controls applied, will
271 vary across guest OS, so it is not wise to try to predict what the
272 guest OS will use.
273
274 The simplest way to avoid an insecure configuration is to remap all
275 xattrs at once, to a given fixed prefix. This is shown in example (1)
276 below.
277
278 If selectively mapping only a subset of xattr prefixes, then rules must
279 be added to explicitly block direct access to the target of the remap‐
280 ping. This is shown in example (2) below.
281
282 Mapping examples
283 1. Prefix all attributes with 'user.virtiofs.'
284
285 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
286
287 This uses two rules, using : as the field separator; the first rule
288 prefixes and strips 'user.virtiofs.', the second rule hides any
289 non-prefixed attributes that the host set.
290
291 This is equivalent to the 'map' rule:
292
293 -o xattrmap=":map::user.virtiofs.:"
294
295 2. Prefix 'trusted.' attributes, allow others through
296
297 "/prefix/all/trusted./user.virtiofs./
298 /bad/server//trusted./
299 /bad/client/user.virtiofs.//
300 /ok/all///"
301
302 Here there are four rules, using / as the field separator, and also
303 demonstrating that new lines can be included between rules. The first
304 rule is the prefixing of 'trusted.' and stripping of 'user.virtiofs.'.
305 The second rule hides unprefixed 'trusted.' attributes on the host.
306 The third rule stops a guest from explicitly setting the 'user.vir‐
307 tiofs.' path directly to prevent access control bypass on the target of
308 the earlier prefix remapping. Finally, the fourth rule lets all re‐
309 maining attributes through.
310
311 This is equivalent to the 'map' rule:
312
313 -o xattrmap="/map/trusted./user.virtiofs./"
314
315 3. Hide 'security.' attributes, and allow everything else
316
317 "/bad/all/security./security./
318 /ok/all///'
319
320 The first rule combines what could be separate client and server rules
321 into a single 'all' rule, matching 'security.' in either client argu‐
322 ments or lists returned from the host. This stops the client seeing
323 any 'security.' attributes on the server and stops it setting any.
324
326 One can enable support for SELinux by running virtiofsd with option "-o
327 security_label". But this will try to save guest's security context in
328 xattr security.selinux on host and it might fail if host's SELinux pol‐
329 icy does not permit virtiofsd to do this operation.
330
331 Hence, it is preferred to remap guest's "security.selinux" xattr to say
332 "trusted.virtiofs.security.selinux" on host.
333
334 "-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
335
336 This will make sure that guest and host's SELinux xattrs on same file
337 remain separate and not interfere with each other. And will allow both
338 host and guest to implement their own separate SELinux policies.
339
340 Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need
341 add this capability to daemon.
342
343 "-o modcaps=+sys_admin"
344
345 Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is
346 more powerful and if gets compromised, it can do lot of damage to host
347 system. So keep this trade-off in my mind while making a decision.
348
350 Export /var/lib/fs/vm001/ on vhost-user UNIX domain socket
351 /var/run/vm001-vhost-fs.sock:
352
353 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
354 host# qemu-system-x86_64 \
355 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
356 -device vhost-user-fs-pci,chardev=char0,tag=myfs \
357 -object memory-backend-memfd,id=mem,size=4G,share=on \
358 -numa node,memdev=mem \
359 ...
360 guest# mount -t virtiofs myfs /mnt
361
363 Stefan Hajnoczi <stefanha@redhat.com>, Masayoshi Mizuma
364 <m.mizuma@jp.fujitsu.com>
365
367 2023, The QEMU Project Developers
368
369
370
371
3727.0.0 Jan 19, 2023 VIRTIOFSD(1)