1STACD.CONF(5)                                                    STACD.CONF(5)
2
3
4

NAME

6       stacd.conf - stacd(8) configuration file
7

SYNOPSIS

9       /etc/stas/stacd.conf
10

DESCRIPTION

12       When stacd(8) starts up, it reads its configuration from stacd.conf.
13

CONFIGURATION FILE FORMAT

15       stacd.conf is a plain text file divided into sections, with
16       configuration entries in the style key=value. Spaces immediately before
17       or after the = are ignored. Empty lines are ignored as well as lines
18       starting with #, which may be used for commenting.
19

OPTIONS

21   [Global] section
22       The following options are available in the [Global] section:
23
24       tron=
25           Trace ON. Takes a boolean argument. If true, enables full code
26           tracing. The trace will be displayed in the system log such as
27           systemd's journal. Defaults to false.
28
29       hdr-digest=
30           Enable Protocol Data Unit (PDU) Header Digest. Takes a boolean
31           argument. NVMe/TCP facilitates an optional PDU Header digest.
32           Digests are calculated using the CRC32C algorithm. If true, Header
33           Digests are inserted in PDUs and checked for errors. Defaults to
34           false.
35
36       data-digest=
37           Enable Protocol Data Unit (PDU) Data Digest. Takes a boolean
38           argument. NVMe/TCP facilitates an optional PDU Data digest. Digests
39           are calculated using the CRC32C algorithm. If true, Data Digests
40           are inserted in PDUs and checked for errors. Defaults to false.
41
42       kato=
43           Keep Alive Timeout (KATO) in seconds. Takes an unsigned integer.
44           This field specifies the timeout value for the Keep Alive feature
45           in seconds. Defaults to 30 seconds for Discovery Controller
46           connections and 120 seconds for I/O Controller connections.
47
48       ip-family=
49           Takes a string argument. With this you can specify whether IPv4,
50           IPv6, or both are supported when connecting to a Controller.
51           Connections will not be attempted to IP addresses (whether
52           discovered or manually configured with controller=) disabled by
53           this option. If an invalid value is entered, then the default (see
54           below) will apply.
55
56           Choices are ipv4, ipv6, or ipv4+ipv6.
57
58           Defaults to ipv4+ipv6.
59
60       ignore-iface=
61           Takes a boolean argument. This option controls how connections with
62           I/O Controllers (IOC) are made.
63
64           There is no guarantee that there will be a route to reach that IOC.
65           However, we can use the socket option SO_BINDTODEVICE to force the
66           connection to be made on a specific interface instead of letting
67           the routing tables decide where to make the connection.
68
69           This option determines whether stacd will use SO_BINDTODEVICE to
70           force connections on an interface or just rely on the routing
71           tables. The default is to use SO_BINDTODEVICE, in other words,
72           stacd does not ignore the interface.
73
74           BACKGROUND: By default, stacd will connect to IOCs on the same
75           interface that was used to retrieve the discovery log pages. If
76           stafd discovers a DC on an interface using mDNS, and stafd connects
77           to that DC and retrieves the log pages, it is expected that the
78           storage subsystems listed in the log pages are reachable on the
79           same interface where the DC was discovered.
80
81           For example, let's say a DC is discovered on interface ens102. Then
82           all the subsystems listed in the log pages retrieved from that DC
83           must be reachable on interface ens102. If this doesn't work, for
84           example you cannot "ping -I ens102 [storage-ip]", then the most
85           likely explanation is that proxy arp is not enabled on the switch
86           that the host is connected to on interface ens102. Whatever you do,
87           resist the temptation to manually set up the routing tables or to
88           add alternate routes going over a different interface than the one
89           where the DC is located. That simply won't work. Make sure proxy
90           arp is enabled on the switch first.
91
92           Setting routes won't work because, by default, stacd uses the
93           SO_BINDTODEVICE socket option when it connects to IOCs. This option
94           is used to force a socket connection to be made on a specific
95           interface instead of letting the routing tables decide where to
96           connect the socket. Even if you were to manually configure an
97           alternate route on a different interface, the connections (i.e.
98           host to IOC) will still be made on the interface where the DC was
99           discovered by stafd.
100
101           Defaults to false.
102
103       udev-rule=
104           Takes a string argument enabled or disabled. This option determines
105           whether nvme-cli's udev rules for TCP connections will be executed
106           or ignored.
107
108           A set of udev rules get installed with nvme-cli that tells the udev
109           daemon (udevd) to look for Asynchronous Event Notifications (AEN)
110           indicating a change of Discovery Log Page Entries (DPLE). These
111           udev rules are typically installed as:
112           /usr/lib/udev/rules.d/70-nvmf-autoconnect.rules
113
114           When an AEN is detected, udevd instructs systemd to start a service
115           that invokes nvme-cli's connect-all command. This command retrieves
116           the DLPEs from the Discovery Controller (DC) that sent the AEN and
117           connects to all the I/O Controllers (IOC) listed in the DPLEs.
118
119           In parallel, stafd and stacd react to the AEN in the same way. This
120           results in a race condition between udevd and nvme-stas.  nvme-stas
121           is written in Python and runs slower than nvme-cli written in C. In
122           other words, nvme-stas usually loses the race.
123
124           This can be a problem for TCP connections because nvme-cli
125           traditionally doesn't specify the interface (host-iface) when
126           making TCP connections and leaves it to the kernel (and the routing
127           table) to select the best interface.  nvme-stas, on the other hand,
128           always tries to make connections on a specific interface (per
129           configuration). Note that a fix was added to nvme-cli so that TCP
130           connections to IOCs will now be made with host-iface specified.
131           That, however, will only be available in post-2.1.2 versions of
132           nvme-cli.
133
134           To add insult to injury, when a connection is made without
135           specifying the host-iface, and therefore the kernel decides which
136           interface to use, there is no way to tell from user space (i.e. by
137           nvme-stas) which interface the kernel actually used. A fix was made
138           to the kernel to make TCP connection's interface available to user
139           space applications, but that will only be available in Linux 6.1
140           (or later).
141
142           Being able to identify the interface (host-iface) is important to
143           nvme-stas. That's because it uses a Transport Identifier (TID)
144           containing all the parameters (including the host-iface) needed to
145           make connections (see table below). The parameters that compose the
146           TID can be retrieved from the sysfs under /sys/class/nvme/.
147
148           Table 1. Transport Identifier
149           ┌────────────┬────────────────────────────┐
150           │trtype      │ Transport type (tcp, rdma, │
151           │            │ fc, loop)                  │
152           ├────────────┼────────────────────────────┤
153           │traddr      │ Transport address (e.g. IP │
154           │            │ address)                   │
155           ├────────────┼────────────────────────────┤
156           │trsvcid     │ Transport service ID (e.g. │
157           │            │ IP port)                   │
158           ├────────────┼────────────────────────────┤
159           │subnqn      │ Subsystem NQN              │
160           ├────────────┼────────────────────────────┤
161           │host-traddr │ Host transport address     │
162           │            │ (e.g. source IP address)   │
163           ├────────────┼────────────────────────────┤
164           │host-iface  │ Host interface (e.g. eth1) │
165           └────────────┴────────────────────────────┘
166
167           When nvme-stas makes a connection, it first looks for an existing
168           connection that matches the TID (including a matching host-iface).
169           Since connections made by nvme-cli lack the host-iface, nvme-stas
170           does not find a match. Therefore, nvme-stas will try to make a new
171           connection, which will often be refused by the kernel because a
172           connection already exists.
173
174           Suffice it to say that issues may arise when both nvme-stas and
175           nvme-cli operate in parallel. These issues may vary depending on
176           your version of Linux, nvme-cli, nvme-stas, and/or libnvme. These
177           issues will often result in messages printed by the kernel to the
178           syslog. A typical error message from the kernel may look something
179           like these: "[...] failed to connect controller, error 1006".
180           "[...] failed to connect socket: -111". "[...] failed to write to
181           nvme-fabrics device". "[...] Failed to write to /dev/nvme-fabrics:
182           Connection refused".
183
184           The udev-rule option allows a user to disable nvme-cli's udev rule
185           for TCP connections. Only TCP connections rely on the host-iface
186           parameter, and therefore the udev rule need only be disabled for
187           this type of transport.
188
189           Defaults to disabled.
190
191   [I/O controller connection management] section
192       Connectivity between hosts and subsystems in a fabric is controlled by
193       Fabric Zoning. Entities that share a common zone (i.e., are zoned
194       together) are allowed to discover each other and establish connections
195       between them. Fabric Zoning is configured on Discovery Controllers
196       (DC). Users can add/remove controllers and/or hosts to/from zones.
197
198       Hosts have no direct knowledge of the Fabric Zoning configuration that
199       is active on a given DC. As a result, if a host is impacted by a Fabric
200       Zoning configuration change, it will be notified of the connectivity
201       configuration change by the DC via Asynchronous Event Notifications
202       (AEN).
203
204       Table 2. List of terms used in this section:
205       ┌─────────────────┬────────────────────────────┐
206Term             Description                
207       ├─────────────────┼────────────────────────────┤
208       │AEN              │ Asynchronous Event         │
209       │                 │ Notification. A CQE        │
210       │                 │ (Completion Queue Entry)   │
211       │                 │ for an Asynchronous Event  │
212       │                 │ Request that was           │
213       │                 │ previously transmitted by  │
214       │                 │ the host to a Discovery    │
215       │                 │ Controller. AENs are used  │
216       │                 │ by DCs to notify hosts     │
217       │                 │ that a change (e.g., a     │
218       │                 │ connectivity configuration │
219       │                 │ change) has occurred.      │
220       ├─────────────────┼────────────────────────────┤
221       │DC               │ Discovery Controller.      │
222       ├─────────────────┼────────────────────────────┤
223       │DLP              │ Discovery Log Page. A host │
224       │                 │ will issue a Get Log Page  │
225       │                 │ command to retrieve the    │
226       │                 │ list of controllers it may │
227       │                 │ connect to.                │
228       ├─────────────────┼────────────────────────────┤
229       │DLPE             │                            │
230       │                 │ Discovery Log Page Entry.  │
231       │                 │ The response to a Get Log  │
232       │                 │ Page command contains a    │
233       │                 │ list of DLPEs identifying  │
234       │                 │ each controller that the   │
235       │                 │ host is allowed to connect │
236       │                 │ with.                      │
237       │                 │                            │
238       │                 │ Note that DLPEs may        │
239       │                 │ contain both I/O           │
240       │                 │ Controllers (IOCs) and     │
241       │                 │ Discovery Controllers      │
242       │                 │ (DCs). DCs listed in DLPEs │
243       │                 │ are called referrals.      │
244       │                 │ stacd only deals with      │
245       │                 │ IOCs. Referrals (DCs) are  │
246       │                 │ handled by stafd.          │
247       ├─────────────────┼────────────────────────────┤
248       │IOC              │ I/O Controller.            │
249       ├─────────────────┼────────────────────────────┤
250       │Manual Config    │ Refers to manually adding  │
251       │                 │ entries to stacd.conf with │
252       │                 │ the controller= parameter. │
253       ├─────────────────┼────────────────────────────┤
254       │Automatic Config │ Refers to receiving        │
255       │                 │ configuration from a DC as │
256       │                 │ DLPEs                      │
257       ├─────────────────┼────────────────────────────┤
258       │External Config  │ Refers to configuration    │
259       │                 │ done outside of the        │
260       │                 │ nvme-stas framework, for   │
261       │                 │ example using nvme-cli     │
262       │                 │ commands                   │
263       └─────────────────┴────────────────────────────┘
264
265       DCs notify hosts of connectivity configuration changes by sending AENs
266       indicating a "Discovery Log" change. The host uses these AENs as a
267       trigger to issue a Get Log Page command. The response to this command
268       is used to update the list of DLPEs containing the controllers the host
269       is allowed to access. Upon reception of the current DLPEs, the host
270       will determine whether DLPEs were added and/or removed, which will
271       trigger the addition and/or removal of controller connections. This
272       happens in real time and may affect active connections to controllers
273       including controllers that support I/O operations (IOCs). A host that
274       was previously connected to an IOC may suddenly be told that it is no
275       longer allowed to connect to that IOC and should disconnect from it.
276
277       IOC connection creation. There are 3 ways to configure IOC connections
278       on a host:
279
280        1. Manual Config by adding controller= entries to the [Controllers]
281           section (see below).
282
283        2. Automatic Config received in the form of DLPEs from a remote DC.
284
285        3. External Config using nvme-cli (e.g. "nvme connect")
286
287       IOC connection removal/prevention. There are 3 ways to remove (or
288       prevent) connections to an IOC:
289
290        1. Manual Config.
291
292            1. by adding exclude= entries to the [Controllers] section (see
293               below).
294
295            2. by removing controller= entries from the [Controllers] section.
296
297
298        2. Automatic Config. As explained above, a host gets a new list of
299           DLPEs upon connectivity configuration changes. On DLPE removal, the
300           host should remove the connection to the IOC matching that DLPE.
301           This behavior is configurable using the disconnect-scope= parameter
302           described below.
303
304        3. External Config using nvme-cli (e.g. "nvme disconnect" or "nvme
305           disconnect-all")
306
307       The decision by the host to automatically disconnect from an IOC
308       following connectivity configuration changes is controlled by two
309       parameters: disconnect-scope and disconnect-trtypes.
310
311       disconnect-scope=
312           Takes one of: only-stas-connections,
313           all-connections-matching-disconnect-trtypes, or no-disconnect.
314
315           In theory, hosts should only connect to IOCs that have been zoned
316           for them. Connections to IOCs that a host is not zoned to have
317           access to should simply not exist. In practice, however, users may
318           not want hosts to disconnect from all IOCs in reaction to
319           connectivity configuration changes (or at least for some of the IOC
320           connections).
321
322           Some users may prefer for IOC connections to be "sticky" and only
323           be removed manually (nvme-cli or exclude=) or removed by a system
324           reboot. Specifically, they don't want IOC connections to be removed
325           unexpectedly on DLPE removal. These users may want to set
326           disconnect-scope to no-disconnect.
327
328           It is important to note that when IOC connections are removed,
329           ongoing I/O transactions will be terminated immediately. There is
330           no way to tell what happens to the data being exchanged when such
331           an abrupt termination happens. If a host was in the middle of
332           writing to a storage subsystem, there is a chance that outstanding
333           I/O operations may not successfully complete.
334
335           Values:
336               only-stas-connections
337                   Only remove connections previously made by stacd.
338
339                   In this mode, when a DLPE is removed as a result of
340                   connectivity configuration changes, the corresponding IOC
341                   connection will be removed by stacd.
342
343                   Connections to IOCs made externally, e.g. using nvme-cli,
344                   will not be affected, unless they happen to be duplicates
345                   of connections made by stacd. It's simply not possible for
346                   stacd to tell that a connection was previously made with
347                   nvme-cli (or any other external tool). So, it's good
348                   practice to avoid duplicating configuration between stacd
349                   and external tools.
350
351                   Users wanting to persist some of their IOC connections
352                   regardless of connectivity configuration changes should not
353                   use nvme-cli to make those connections. Instead, they
354                   should hard-code them in stacd.conf with the controller=
355                   parameter. Using the controller= parameter is the only way
356                   for a user to tell stacd that a connection must be made and
357                   not be deleted "no-matter-what".
358
359               all-connections-matching-disconnect-trtypes
360                   All connections that match the transport type specified by
361                   disconnect-trtypes=, whether they were made automatically
362                   by stacd or externally (e.g., nvme-cli), will be audited
363                   and are subject to removal on DLPE removal.
364
365                   In this mode, as DLPEs are removed as a result of
366                   connectivity configuration changes, the corresponding IOC
367                   connections will be removed by the host immediately whether
368                   they were made by stacd, nvme-cli, or any other way.
369                   Basically, stacd audits all IOC connections matching the
370                   transport type specified by disconnect-trtypes=.
371
372                   NOTE. This mode implies that stacd will only allow Manually
373                   Configured or Automatically Configured IOC connections to
374                   exist. Externally Configured connections using nvme-cli (or
375                   other external mechanism) that do not match any Manual
376                   Config (stacd.conf) or Automatic Config (DLPEs) will get
377                   deleted immediately by stacd.
378
379               no-disconnect
380                   stacd does not disconnect from IOCs when a DPLE is removed
381                   or a controller= entry is removed from stacd.conf. All IOC
382                   connections are "sticky".
383
384                   Instead, users can remove connections by issuing the
385                   nvme-cli command "nvme disconnect", add an exclude= entry
386                   to stacd.conf, or wait until the next system reboot at
387                   which time all connections will be removed.
388           Defaults to only-stas-connections.
389
390       disconnect-trtypes=
391           This parameter only applies when disconnect-scope is set to
392           all-connections-matching-disconnect-trtypes. It limits the scope of
393           the audit to specific transport types.
394
395           Can take the values tcp, rdma, fc, or a combination thereof by
396           separating them with a plus (+) sign. For example: tcp+fc. No
397           spaces are allowed between values and the plus (+) sign.
398
399           Values:
400               tcp
401                   Audit TCP connections.
402
403               rdma
404                   Audit RDMA connections.
405
406               fc
407                   Audit Fibre Channel connections.
408           Defaults to tcp.
409
410       connect-attempts-on-ncc=
411           The NCC bit (Not Connected to CDC) is a bit returned by the CDC in
412           the EFLAGS field of the DLPE. Only CDCs will set the NCC bit. DDCs
413           will always clear NCC to 0. The NCC bit is a way for the CDC to let
414           hosts know that the subsystem is currently not reachable by the
415           CDC. This may indicate that the subsystem is currently down or that
416           there is an outage on the section of the network connecting the CDC
417           to the subsystem.
418
419           If a host is currently failing to connect to an I/O controller and
420           if the NCC bit associated with that I/O controller is asserted, the
421           host can decide to stop trying to connect to that subsystem until
422           connectivity is restored. This will be indicated by the CDC when it
423           clears the NCC bit.
424
425           The parameter connect-attempts-on-ncc= controls whether stacd will
426           take the NCC bit into account when attempting to connect to an I/O
427           Controller. Setting connect-attempts-on-ncc= to 0 means that stacd
428           will ignore the NCC bit and will keep trying to connect. Setting
429           connect-attempts-on-ncc= to a non-zero value indicates the number
430           of connection attempts that will be made before stacd gives up
431           trying. Note that this value should be set to a value greater than
432           1. In fact, when set to 1, stacd will automatically use 2 instead.
433           The reason for this is simple. It is possible that a first connect
434           attempt may fail especially if nvme-cli's udev rule is enabled (see
435           race condition discussion under the udev-rule= parameter above).
436
437           Defaults to 0.
438
439   [Controllers] section
440       The following options are available in the [Controllers] section:
441
442       controller=
443           Controllers are specified with the controller option. This option
444           may be specified more than once to specify more than one
445           controller. The format is one line per Controller composed of a
446           series of fields separated by semi-colons as follows:
447
448               controller=transport=[trtype];traddr=[traddr];trsvcid=[trsvcid];host-traddr=[traddr],host-iface=[iface];nqn=[nqn]
449
450
451           Fields
452               transport=
453                   This is a mandatory field that specifies the network fabric
454                   being used for a NVMe-over-Fabrics network. Current trtype
455                   values understood are:
456
457                   Table 3. Transport type
458                   ┌───────┬────────────────────────────┐
459trtype Definition                 
460                   ├───────┼────────────────────────────┤
461                   │rdma   │ The network fabric is an   │
462                   │       │ rdma network (RoCE, iWARP, │
463                   │       │ Infiniband, basic rdma,    │
464                   │       │ etc)                       │
465                   ├───────┼────────────────────────────┤
466                   │fc     │ The network fabric is a    │
467                   │       │ Fibre Channel network.     │
468                   ├───────┼────────────────────────────┤
469                   │tcp    │ The network fabric is a    │
470                   │       │ TCP/IP network.            │
471                   ├───────┼────────────────────────────┤
472                   │loop   │ Connect to a NVMe over     │
473                   │       │ Fabrics target on the      │
474                   │       │ local host                 │
475                   └───────┴────────────────────────────┘
476
477               traddr=
478                   This is a mandatory field that specifies the network
479                   address of the Controller. For transports using IP
480                   addressing (e.g. rdma) this should be an IP-based address
481                   (ex. IPv4, IPv6). It could also be a resolvable host name
482                   (e.g. localhost).
483
484               trsvcid=
485                   This is an optional field that specifies the transport
486                   service id. For transports using IP addressing (e.g. rdma,
487                   tcp) this field is the port number.
488
489                   Depending on the transport type, this field will default to
490                   either 8009 or 4420 as follows.
491
492                   UDP port 4420 and TCP port 4420 have been assigned by IANA
493                   for use by NVMe over Fabrics. NVMe/RoCEv2 controllers use
494                   UDP port 4420 by default. NVMe/iWARP controllers use TCP
495                   port 4420 by default.
496
497                   TCP port 4420 has been assigned for use by NVMe over
498                   Fabrics and TCP port 8009 has been assigned by IANA for use
499                   by NVMe over Fabrics discovery. TCP port 8009 is the
500                   default TCP port for NVMe/TCP discovery controllers. There
501                   is no default TCP port for NVMe/TCP I/O controllers, the
502                   Transport Service Identifier (TRSVCID) field in the
503                   Discovery Log Entry indicates the TCP port to use.
504
505                   The TCP ports that may be used for NVMe/TCP I/O controllers
506                   include TCP port 4420, and the Dynamic and/or Private TCP
507                   ports (i.e., ports in the TCP port number range from 49152
508                   to 65535). NVMe/TCP I/O controllers should not use TCP port
509                   8009. TCP port 4420 shall not be used for both NVMe/iWARP
510                   and NVMe/TCP at the same IP address on the same network.
511
512                   Ref: IANA Service names port numbers[1]
513
514               nqn=
515                   This field specifies the Controller's NVMe Qualified Name.
516
517                   This field is mandatory for I/O Controllers, but is
518                   optional for Discovery Controllers (DC). For the latter,
519                   the NQN will default to the well-known DC NQN:
520                   nqn.2014-08.org.nvmexpress.discovery if left undefined.
521
522               host-traddr=
523                   This is an optional field that specifies the network
524                   address used on the host to connect to the Controller. For
525                   TCP, this sets the source address on the socket.
526
527               host-iface=
528                   This is an optional field that specifies the network
529                   interface used on the host to connect to the Controller
530                   (e.g. IP eth1, enp2s0, enx78e7d1ea46da). This forces the
531                   connection to be made on a specific interface instead of
532                   letting the system decide.
533           Examples:
534
535               controller = transport=tcp;traddr=localhost;trsvcid=8009
536               controller = transport=tcp;traddr=2001:db8::370:7334;host-iface=enp0s8
537               controller = transport=fc;traddr=nn-0x204600a098cbcac6:pn-0x204700a098cbcac6
538
539
540
541       exclude=
542           Controllers that should be excluded can be specified with the
543           exclude= option. Using mDNS to automatically discover and connect
544           to controllers, can result in unintentional connections being made.
545           This keyword allows configuring the controllers that should not be
546           connected to.
547
548           The syntax is the same as for "controller", except that the
549           parameter host-traddr does not apply. Multiple exclude= keywords
550           may appear in the config file to specify more than 1 excluded
551           controller.
552
553           Note 1: A minimal match approach is used to eliminate unwanted
554           controllers. That is, you do not need to specify all the parameters
555           to identify a controller. Just specifying the host-iface, for
556           example, can be used to exclude all controllers on an interface.
557
558           Note 2: exclude= takes precedence over controller. A controller
559           specified by the controller keyword, can be eliminated by the
560           exclude= keyword.
561
562           Examples:
563
564               exclude = transport=tcp;traddr=fe80::2c6e:dee7:857:26bb # Eliminate a specific address
565               exclude = host-iface=enp0s8                             # Eliminate everything on this interface
566
567
568

SEE ALSO

570       stacd(8)
571

NOTES

573        1. IANA Service names port numbers
574           https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=nvme
575
576
577
578nvme-stas 2.0-rc5                                                STACD.CONF(5)
Impressum