1VIRTIOFSD(1) QEMU VIRTIOFSD(1)
2
3
4
6 virtiofsd - QEMU virtio-fs shared file system daemon
7
9 virtiofsd [OPTIONS]
10
12 Share a host directory tree with a guest through a virtio-fs device.
13 This program is a vhost-user backend that implements the virtio-fs de‐
14 vice. Each virtio-fs device instance requires its own virtiofsd
15 process.
16
17 This program is designed to work with QEMU's --device vhost-user-fs-pci
18 but should work with any virtual machine monitor (VMM) that supports
19 vhost-user. See the Examples section below.
20
21 This program must be run as the root user. The program drops privi‐
22 leges where possible during startup although it must be able to create
23 and access files with any uid/gid:
24
25 • The ability to invoke syscalls is limited using seccomp(2).
26
27 • Linux capabilities(7) are dropped.
28
29 In "namespace" sandbox mode the program switches into a new file system
30 namespace and invokes pivot_root(2) to make the shared directory tree
31 its root. A new pid and net namespace is also created to isolate the
32 process.
33
34 In "chroot" sandbox mode the program invokes chroot(2) to make the
35 shared directory tree its root. This mode is intended for container en‐
36 vironments where the container runtime has already set up the name‐
37 spaces and the program does not have permission to create namespaces
38 itself.
39
40 Both sandbox modes prevent "file system escapes" due to symlinks and
41 other file system objects that might lead to files outside the shared
42 directory.
43
45 -h, --help
46 Print help.
47
48 -V, --version
49 Print version.
50
51 -d Enable debug output.
52
53 --syslog
54 Print log messages to syslog instead of stderr.
55
56 -o OPTION
57
58 • debug - Enable debug output.
59
60 • flock|no_flock - Enable/disable flock. The default is
61 no_flock.
62
63 • modcaps=CAPLIST Modify the list of capabilities allowed;
64 CAPLIST is a colon separated list of capabilities, each pre‐
65 ceded by either + or -, e.g. ''+sys_admin:-chown''.
66
67 • log_level=LEVEL - Print only log messages matching LEVEL or
68 more severe. LEVEL is one of err, warn, info, or debug. The
69 default is info.
70
71 • posix_lock|no_posix_lock - Enable/disable remote POSIX locks.
72 The default is no_posix_lock.
73
74 • readdirplus|no_readdirplus - Enable/disable readdirplus. The
75 default is readdirplus.
76
77 • sandbox=namespace|chroot - Sandbox mode: - namespace: Create
78 mount, pid, and net namespaces and pivot_root(2) into the
79 shared directory. - chroot: chroot(2) into shared directory
80 (use in containers). The default is "namespace".
81
82 • source=PATH - Share host directory tree located at PATH. This
83 option is required.
84
85 • timeout=TIMEOUT - I/O timeout in seconds. The default depends
86 on cache= option.
87
88 • writeback|no_writeback - Enable/disable writeback cache. The
89 cache allows the FUSE client to buffer and merge write re‐
90 quests. The default is no_writeback.
91
92 • xattr|no_xattr - Enable/disable extended attributes (xattr) on
93 files and directories. The default is no_xattr.
94
95 • posix_acl|no_posix_acl - Enable/disable posix acl support.
96 Posix ACLs are disabled by default.
97
98 • security_label|no_security_label - Enable/disable security la‐
99 bel support. Security labels are disabled by default. This
100 will allow client to send a MAC label of file during file cre‐
101 ation. Typically this is expected to be SELinux security la‐
102 bel. Server will try to set that label on newly created file
103 atomically wherever possible.
104
105 • killpriv_v2|no_killpriv_v2 - Enable/disable FUSE_HANDLE_KILL‐
106 PRIV_V2 support. KILLPRIV_V2 is enabled by default as long as
107 the client supports it. Enabling this option helps with per‐
108 formance in write path.
109
110 --socket-path=PATH
111 Listen on vhost-user UNIX domain socket at PATH.
112
113 --socket-group=GROUP
114 Set the vhost-user UNIX domain socket gid to GROUP.
115
116 --fd=FDNUM
117 Accept connections from vhost-user UNIX domain socket file de‐
118 scriptor FDNUM. The file descriptor must already be listening
119 for connections.
120
121 --thread-pool-size=NUM
122 Restrict the number of worker threads per request queue to NUM.
123 The default is 0.
124
125 --cache=none|auto|always
126 Select the desired trade-off between coherency and performance.
127 none forbids the FUSE client from caching to achieve best co‐
128 herency at the cost of performance. auto acts similar to NFS
129 with a 1 second metadata cache timeout. always sets a long
130 cache lifetime at the expense of coherency. The default is
131 auto.
132
134 By default the name of xattr's used by the client are passed through to
135 the server file system. This can be a problem where either those xattr
136 names are used by something on the server (e.g. selinux client/server
137 confusion) or if the virtiofsd is running in a container with re‐
138 stricted privileges where it cannot access some attributes.
139
140 Mapping syntax
141 A mapping of xattr names can be made using -o xattrmap=mapping where
142 the mapping string consists of a series of rules.
143
144 The first matching rule terminates the mapping. The set of rules must
145 include a terminating rule to match any remaining attributes at the
146 end.
147
148 Each rule consists of a number of fields separated with a separator
149 that is the first non-white space character in the rule. This separa‐
150 tor must then be used for the whole rule. White space may be added be‐
151 fore and after each rule.
152
153 Using ':' as the separator a rule is of the form:
154
155 :type:scope:key:prepend:
156
157 scope is:
158
159 •
160
161 'client' - match 'key' against a xattr name from the client for
162 setxattr/getxattr/removexattr
163
164 •
165
166 'server' - match 'prepend' against a xattr name from the server
167 for listxattr
168
169 •
170
171 'all' - can be used to make a single rule where both the server
172 and client matches are triggered.
173
174 type is one of:
175
176 • 'prefix' - is designed to prepend and strip a prefix; the modified
177 attributes then being passed on to the client/server.
178
179 • 'ok' - Causes the rule set to be terminated when a match is found
180 while allowing matching xattr's through unchanged. It is intended
181 both as a way of explicitly terminating the list of rules, and to al‐
182 low some xattr's to skip following rules.
183
184 • 'bad' - If a client tries to use a name matching 'key' it's denied
185 using EPERM; when the server passes an attribute name matching
186 'prepend' it's hidden. In many ways it's use is very like 'ok' as
187 either an explicit terminator or for special handling of certain pat‐
188 terns.
189
190 • 'unsupported' - If a client tries to use a name matching 'key' it's
191 denied using ENOTSUP; when the server passes an attribute name match‐
192 ing 'prepend' it's hidden. In many ways it's use is very like 'ok'
193 as either an explicit terminator or for special handling of certain
194 patterns.
195
196 key is a string tested as a prefix on an attribute name originating on
197 the client. It maybe empty in which case a 'client' rule will always
198 match on client names.
199
200 prepend is a string tested as a prefix on an attribute name originating
201 on the server, and used as a new prefix. It may be empty in which case
202 a 'server' rule will always match on all names from the server.
203
204 e.g.:
205 :prefix:client:trusted.:user.virtiofs.:
206
207 will match 'trusted.' attributes in client calls and prefix them be‐
208 fore passing them to the server.
209
210 :prefix:server::user.virtiofs.:
211
212 will strip 'user.virtiofs.' from all server replies.
213
214 :prefix:all:trusted.:user.virtiofs.:
215
216 combines the previous two cases into a single rule.
217
218 :ok:client:user.::
219
220 will allow get/set xattr for 'user.' xattr's and ignore following
221 rules.
222
223 :ok:server::security.:
224
225 will pass 'security.' xattr's in listxattr from the server and ig‐
226 nore following rules.
227
228 :ok:all:::
229
230 will terminate the rule search passing any remaining attributes in
231 both directions.
232
233 :bad:server::security.:
234
235 would hide 'security.' xattr's in listxattr from the server.
236
237 A simpler 'map' type provides a shorter syntax for the common case:
238
239 :map:key:prepend:
240
241 The 'map' type adds a number of separate rules to add prepend as a pre‐
242 fix to the matched key (or all attributes if key is empty). There may
243 be at most one 'map' rule and it must be the last rule in the set.
244
245 Note: When the 'security.capability' xattr is remapped, the daemon has
246 to do extra work to remove it during many operations, which the host
247 kernel normally does itself.
248
249 Security considerations
250 Operating systems typically partition the xattr namespace using well
251 defined name prefixes. Each partition may have different access con‐
252 trols applied. For example, on Linux there are multiple partitions
253
254 • system.* - access varies depending on attribute & filesystem
255
256 • security.* - only processes with CAP_SYS_ADMIN
257
258 • trusted.* - only processes with CAP_SYS_ADMIN
259
260 • user.* - any process granted by file permissions / ownership
261
262 While other OS such as FreeBSD have different name prefixes and access
263 control rules.
264
265 When remapping attributes on the host, it is important to ensure that
266 the remapping does not allow a guest user to evade the guest access
267 control rules.
268
269 Consider if trusted.* from the guest was remapped to user.vir‐
270 tiofs.trusted* in the host. An unprivileged user in a Linux guest has
271 the ability to write to xattrs under user.*. Thus the user can evade
272 the access control restriction on trusted.* by instead writing to
273 user.virtiofs.trusted.*.
274
275 As noted above, the partitions used and access controls applied, will
276 vary across guest OS, so it is not wise to try to predict what the
277 guest OS will use.
278
279 The simplest way to avoid an insecure configuration is to remap all
280 xattrs at once, to a given fixed prefix. This is shown in example (1)
281 below.
282
283 If selectively mapping only a subset of xattr prefixes, then rules must
284 be added to explicitly block direct access to the target of the remap‐
285 ping. This is shown in example (2) below.
286
287 Mapping examples
288 1. Prefix all attributes with 'user.virtiofs.'
289
290 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
291
292 This uses two rules, using : as the field separator; the first rule
293 prefixes and strips 'user.virtiofs.', the second rule hides any
294 non-prefixed attributes that the host set.
295
296 This is equivalent to the 'map' rule:
297
298 -o xattrmap=":map::user.virtiofs.:"
299
300 2. Prefix 'trusted.' attributes, allow others through
301
302 "/prefix/all/trusted./user.virtiofs./
303 /bad/server//trusted./
304 /bad/client/user.virtiofs.//
305 /ok/all///"
306
307 Here there are four rules, using / as the field separator, and also
308 demonstrating that new lines can be included between rules. The first
309 rule is the prefixing of 'trusted.' and stripping of 'user.virtiofs.'.
310 The second rule hides unprefixed 'trusted.' attributes on the host.
311 The third rule stops a guest from explicitly setting the 'user.vir‐
312 tiofs.' path directly to prevent access control bypass on the target of
313 the earlier prefix remapping. Finally, the fourth rule lets all re‐
314 maining attributes through.
315
316 This is equivalent to the 'map' rule:
317
318 -o xattrmap="/map/trusted./user.virtiofs./"
319
320 3. Hide 'security.' attributes, and allow everything else
321
322 "/bad/all/security./security./
323 /ok/all///'
324
325 The first rule combines what could be separate client and server rules
326 into a single 'all' rule, matching 'security.' in either client argu‐
327 ments or lists returned from the host. This stops the client seeing
328 any 'security.' attributes on the server and stops it setting any.
329
331 One can enable support for SELinux by running virtiofsd with option "-o
332 security_label". But this will try to save guest's security context in
333 xattr security.selinux on host and it might fail if host's SELinux pol‐
334 icy does not permit virtiofsd to do this operation.
335
336 Hence, it is preferred to remap guest's "security.selinux" xattr to say
337 "trusted.virtiofs.security.selinux" on host.
338
339 "-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
340
341 This will make sure that guest and host's SELinux xattrs on same file
342 remain separate and not interfere with each other. And will allow both
343 host and guest to implement their own separate SELinux policies.
344
345 Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need
346 add this capability to daemon.
347
348 "-o modcaps=+sys_admin"
349
350 Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is
351 more powerful and if gets compromised, it can do lot of damage to host
352 system. So keep this trade-off in my mind while making a decision.
353
355 Export /var/lib/fs/vm001/ on vhost-user UNIX domain socket
356 /var/run/vm001-vhost-fs.sock:
357
358 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
359 host# qemu-system-x86_64 \
360 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
361 -device vhost-user-fs-pci,chardev=char0,tag=myfs \
362 -object memory-backend-memfd,id=mem,size=4G,share=on \
363 -numa node,memdev=mem \
364 ...
365 guest# mount -t virtiofs myfs /mnt
366
368 Stefan Hajnoczi <stefanha@redhat.com>, Masayoshi Mizuma
369 <m.mizuma@jp.fujitsu.com>
370
372 2023, The QEMU Project Developers
373
374
375
376
3777.2.6 Sep 26, 2023 VIRTIOFSD(1)