1VIRTIOFSD(1)                         QEMU                         VIRTIOFSD(1)
2
3
4

NAME

6       virtiofsd - QEMU virtio-fs shared file system daemon
7

SYNOPSIS

9       virtiofsd [OPTIONS]
10

DESCRIPTION

12       Share  a  host  directory tree with a guest through a virtio-fs device.
13       This program is a vhost-user backend that implements the virtio-fs  de‐
14       vice.   Each  virtio-fs  device  instance  requires  its  own virtiofsd
15       process.
16
17       This program is designed to work with QEMU's --device vhost-user-fs-pci
18       but  should  work  with any virtual machine monitor (VMM) that supports
19       vhost-user.  See the Examples section below.
20
21       This program must be run as the root user.  The  program  drops  privi‐
22       leges  where possible during startup although it must be able to create
23       and access files with any uid/gid:
24
25       • The ability to invoke syscalls is limited using seccomp(2).
26
27       • Linux capabilities(7) are dropped.
28
29       In "namespace" sandbox mode the program switches into a new file system
30       namespace  and  invokes pivot_root(2) to make the shared directory tree
31       its root.  A new pid and net namespace is also created to  isolate  the
32       process.
33
34       In  "chroot"  sandbox  mode  the  program invokes chroot(2) to make the
35       shared directory tree its root. This mode is intended for container en‐
36       vironments  where  the  container  runtime has already set up the name‐
37       spaces and the program does not have permission  to  create  namespaces
38       itself.
39
40       Both  sandbox  modes  prevent "file system escapes" due to symlinks and
41       other file system objects that might lead to files outside  the  shared
42       directory.
43

OPTIONS

45       -h, --help
46              Print help.
47
48       -V, --version
49              Print version.
50
51       -d     Enable debug output.
52
53       --syslog
54              Print log messages to syslog instead of stderr.
55
56       -o OPTION
57
58              • debug - Enable debug output.
59
60              • flock|no_flock   -   Enable/disable  flock.   The  default  is
61                no_flock.
62
63              • modcaps=CAPLIST  Modify  the  list  of  capabilities  allowed;
64                CAPLIST  is  a colon separated list of capabilities, each pre‐
65                ceded by either + or -, e.g.  ''+sys_admin:-chown''.
66
67              • log_level=LEVEL - Print only log messages  matching  LEVEL  or
68                more  severe.  LEVEL is one of err, warn, info, or debug.  The
69                default is info.
70
71              • posix_lock|no_posix_lock - Enable/disable remote POSIX  locks.
72                The default is no_posix_lock.
73
74              • readdirplus|no_readdirplus  - Enable/disable readdirplus.  The
75                default is readdirplus.
76
77              • sandbox=namespace|chroot - Sandbox mode: -  namespace:  Create
78                mount,  pid,  and  net  namespaces  and pivot_root(2) into the
79                shared directory.  - chroot: chroot(2) into  shared  directory
80                (use in containers).  The default is "namespace".
81
82              • source=PATH - Share host directory tree located at PATH.  This
83                option is required.
84
85              • timeout=TIMEOUT - I/O timeout in seconds.  The default depends
86                on cache= option.
87
88              • writeback|no_writeback  -  Enable/disable writeback cache. The
89                cache allows the FUSE client to buffer  and  merge  write  re‐
90                quests.  The default is no_writeback.
91
92              • xattr|no_xattr - Enable/disable extended attributes (xattr) on
93                files and directories.  The default is no_xattr.
94
95              • posix_acl|no_posix_acl -  Enable/disable  posix  acl  support.
96                Posix ACLs are disabled by default.
97
98              • security_label|no_security_label - Enable/disable security la‐
99                bel support. Security labels are  disabled  by  default.  This
100                will allow client to send a MAC label of file during file cre‐
101                ation. Typically this is expected to be SELinux  security  la‐
102                bel.  Server  will try to set that label on newly created file
103                atomically wherever possible.
104
105       --socket-path=PATH
106              Listen on vhost-user UNIX domain socket at PATH.
107
108       --socket-group=GROUP
109              Set the vhost-user UNIX domain socket gid to GROUP.
110
111       --fd=FDNUM
112              Accept connections from vhost-user UNIX domain socket  file  de‐
113              scriptor  FDNUM.   The file descriptor must already be listening
114              for connections.
115
116       --thread-pool-size=NUM
117              Restrict the number of worker threads per request queue to  NUM.
118              The default is 64.
119
120       --cache=none|auto|always
121              Select  the desired trade-off between coherency and performance.
122              none forbids the FUSE client from caching to  achieve  best  co‐
123              herency  at  the  cost of performance.  auto acts similar to NFS
124              with a 1 second metadata cache  timeout.   always  sets  a  long
125              cache  lifetime  at  the  expense  of coherency.  The default is
126              auto.
127

EXTENDED ATTRIBUTE (XATTR) MAPPING

129       By default the name of xattr's used by the client are passed through to
130       the server file system.  This can be a problem where either those xattr
131       names are used by something on the server (e.g.  selinux  client/server
132       confusion)  or  if  the  virtiofsd  is  running in a container with re‐
133       stricted privileges where it cannot access some attributes.
134
135   Mapping syntax
136       A mapping of xattr names can be made using  -o  xattrmap=mapping  where
137       the mapping string consists of a series of rules.
138
139       The  first matching rule terminates the mapping.  The set of rules must
140       include a terminating rule to match any  remaining  attributes  at  the
141       end.
142
143       Each  rule  consists  of  a number of fields separated with a separator
144       that is the first non-white space character in the rule.  This  separa‐
145       tor must then be used for the whole rule.  White space may be added be‐
146       fore and after each rule.
147
148       Using ':' as the separator a rule is of the form:
149
150       :type:scope:key:prepend:
151
152       scope is:
153
154
155
156         'client' - match 'key' against a xattr name from the client for
157                setxattr/getxattr/removexattr
158
159
160
161         'server' - match 'prepend' against a xattr name from the server
162                for listxattr
163
164
165
166         'all' - can be used to make a single rule where both the server
167                and client matches are triggered.
168
169       type is one of:
170
171       • 'prefix' - is designed to prepend and strip a prefix;   the  modified
172         attributes then being passed on to the client/server.
173
174       • 'ok'  -  Causes  the  rule set to be terminated when a match is found
175         while allowing matching xattr's through unchanged.   It  is  intended
176         both as a way of explicitly terminating the list of rules, and to al‐
177         low some xattr's to skip following rules.
178
179       • 'bad' - If a client tries to use a name matching  'key'  it's  denied
180         using  EPERM;  when  the  server  passes  an  attribute name matching
181         'prepend' it's hidden.  In many ways it's use is very  like  'ok'  as
182         either an explicit terminator or for special handling of certain pat‐
183         terns.
184
185       • 'unsupported' - If a client tries to use a name matching  'key'  it's
186         denied using ENOTSUP; when the server passes an attribute name match‐
187         ing 'prepend' it's hidden.  In many ways it's use is very  like  'ok'
188         as  either  an explicit terminator or for special handling of certain
189         patterns.
190
191       key is a string tested as a prefix on an attribute name originating  on
192       the  client.   It maybe empty in which case a 'client' rule will always
193       match on client names.
194
195       prepend is a string tested as a prefix on an attribute name originating
196       on the server, and used as a new prefix.  It may be empty in which case
197       a 'server' rule will always match on all names from the server.
198
199       e.g.:
200          :prefix:client:trusted.:user.virtiofs.:
201
202          will match 'trusted.' attributes in client calls and prefix them be‐
203          fore passing them to the server.
204
205          :prefix:server::user.virtiofs.:
206
207          will strip 'user.virtiofs.' from all server replies.
208
209          :prefix:all:trusted.:user.virtiofs.:
210
211          combines the previous two cases into a single rule.
212
213          :ok:client:user.::
214
215          will  allow  get/set  xattr for 'user.' xattr's and ignore following
216          rules.
217
218          :ok:server::security.:
219
220          will pass 'securty.' xattr's in listxattr from the server and ignore
221          following rules.
222
223          :ok:all:::
224
225          will  terminate  the rule search passing any remaining attributes in
226          both directions.
227
228          :bad:server::security.:
229
230          would hide 'security.' xattr's in listxattr from the server.
231
232       A simpler 'map' type provides a shorter syntax for the common case:
233
234       :map:key:prepend:
235
236       The 'map' type adds a number of separate rules to add prepend as a pre‐
237       fix  to the matched key (or all attributes if key is empty).  There may
238       be at most one 'map' rule and it must be the last rule in the set.
239
240       Note: When the 'security.capability' xattr is remapped, the daemon  has
241       to  do  extra  work to remove it during many operations, which the host
242       kernel normally does itself.
243
244   Security considerations
245       Operating systems typically partition the xattr  namespace  using  well
246       defined  name  prefixes.  Each partition may have different access con‐
247       trols applied. For example, on Linux there are multiple partitions
248
249system.* - access varies depending on attribute & filesystem
250
251security.* - only processes with CAP_SYS_ADMIN
252
253trusted.* - only processes with CAP_SYS_ADMIN
254
255user.* - any process granted by file permissions / ownership
256
257       While other OS such as FreeBSD have different name prefixes and  access
258       control rules.
259
260       When  remapping  attributes on the host, it is important to ensure that
261       the remapping does not allow a guest user to  evade  the  guest  access
262       control rules.
263
264       Consider  if  trusted.*  from  the  guest  was  remapped  to  user.vir‐
265       tiofs.trusted* in the host. An unprivileged user in a Linux  guest  has
266       the  ability  to  write to xattrs under user.*. Thus the user can evade
267       the access control restriction  on  trusted.*  by  instead  writing  to
268       user.virtiofs.trusted.*.
269
270       As  noted  above, the partitions used and access controls applied, will
271       vary across guest OS, so it is not wise to  try  to  predict  what  the
272       guest OS will use.
273
274       The  simplest  way  to  avoid an insecure configuration is to remap all
275       xattrs at once, to a given fixed prefix.  This is shown in example  (1)
276       below.
277
278       If selectively mapping only a subset of xattr prefixes, then rules must
279       be added to explicitly block direct access to the target of the  remap‐
280       ping. This is shown in example (2) below.
281
282   Mapping examples
283       1. Prefix all attributes with 'user.virtiofs.'
284
285          -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
286
287       This  uses  two  rules,  using : as the field separator; the first rule
288       prefixes  and  strips  'user.virtiofs.',  the  second  rule  hides  any
289       non-prefixed attributes that the host set.
290
291       This is equivalent to the 'map' rule:
292
293          -o xattrmap=":map::user.virtiofs.:"
294
295       2. Prefix 'trusted.' attributes, allow others through
296
297          "/prefix/all/trusted./user.virtiofs./
298           /bad/server//trusted./
299           /bad/client/user.virtiofs.//
300           /ok/all///"
301
302       Here  there  are  four  rules, using / as the field separator, and also
303       demonstrating that new lines can be included between rules.  The  first
304       rule  is the prefixing of 'trusted.' and stripping of 'user.virtiofs.'.
305       The second rule hides unprefixed 'trusted.'  attributes  on  the  host.
306       The  third  rule  stops  a guest from explicitly setting the 'user.vir‐
307       tiofs.' path directly to prevent access control bypass on the target of
308       the  earlier  prefix  remapping.  Finally, the fourth rule lets all re‐
309       maining attributes through.
310
311       This is equivalent to the 'map' rule:
312
313          -o xattrmap="/map/trusted./user.virtiofs./"
314
315       3. Hide 'security.' attributes, and allow everything else
316
317          "/bad/all/security./security./
318           /ok/all///'
319
320       The first rule combines what could be separate client and server  rules
321       into  a  single 'all' rule, matching 'security.' in either client argu‐
322       ments or lists returned from the host.  This stops  the  client  seeing
323       any 'security.' attributes on the server and stops it setting any.
324

SELINUX SUPPORT

326       One can enable support for SELinux by running virtiofsd with option "-o
327       security_label". But this will try to save guest's security context  in
328       xattr security.selinux on host and it might fail if host's SELinux pol‐
329       icy does not permit virtiofsd to do this operation.
330
331       Hence, it is preferred to remap guest's "security.selinux" xattr to say
332       "trusted.virtiofs.security.selinux" on host.
333
334       "-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
335
336       This  will  make sure that guest and host's SELinux xattrs on same file
337       remain separate and not interfere with each other. And will allow  both
338       host and guest to implement their own separate SELinux policies.
339
340       Setting  trusted xattr on host requires CAP_SYS_ADMIN. So one will need
341       add this capability to daemon.
342
343       "-o modcaps=+sys_admin"
344
345       Giving CAP_SYS_ADMIN increases the risk on  system.  Now  virtiofsd  is
346       more  powerful and if gets compromised, it can do lot of damage to host
347       system.  So keep this trade-off in my mind while making a decision.
348

EXAMPLES

350       Export   /var/lib/fs/vm001/   on   vhost-user   UNIX   domain    socket
351       /var/run/vm001-vhost-fs.sock:
352
353          host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
354          host# qemu-system-x86_64 \
355                -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
356                -device vhost-user-fs-pci,chardev=char0,tag=myfs \
357                -object memory-backend-memfd,id=mem,size=4G,share=on \
358                -numa node,memdev=mem \
359                ...
360          guest# mount -t virtiofs myfs /mnt
361

AUTHOR

363       Stefan     Hajnoczi     <stefanha@redhat.com>,     Masayoshi     Mizuma
364       <m.mizuma@jp.fujitsu.com>
365
367       2023, The QEMU Project Developers
368
369
370
371
3727.0.0                            Jan 19, 2023                     VIRTIOFSD(1)
Impressum