1STACD.CONF(5)                                                    STACD.CONF(5)
2
3
4

NAME

6       stacd.conf - stacd(8) configuration file
7

SYNOPSIS

9       /etc/stas/stacd.conf
10

DESCRIPTION

12       When stacd(8) starts up, it reads its configuration from stacd.conf.
13

CONFIGURATION FILE FORMAT

15       stacd.conf is a plain text file divided into sections, with
16       configuration entries in the style key=value. Spaces immediately before
17       or after the = are ignored. Empty lines are ignored as well as lines
18       starting with #, which may be used for commenting.
19

OPTIONS

21   [Global] section
22       The following options are available in the [Global] section:
23
24       tron=
25           Trace ON. Takes a boolean argument. If true, enables full code
26           tracing. The trace will be displayed in the system log such as
27           systemd's journal. Defaults to false.
28
29       hdr-digest=
30           Enable Protocol Data Unit (PDU) Header Digest. Takes a boolean
31           argument. NVMe/TCP facilitates an optional PDU Header digest.
32           Digests are calculated using the CRC32C algorithm. If true, Header
33           Digests are inserted in PDUs and checked for errors. Defaults to
34           false.
35
36       data-digest=
37           Enable Protocol Data Unit (PDU) Data Digest. Takes a boolean
38           argument. NVMe/TCP facilitates an optional PDU Data digest. Digests
39           are calculated using the CRC32C algorithm. If true, Data Digests
40           are inserted in PDUs and checked for errors. Defaults to false.
41
42       kato=
43           Keep Alive Timeout (KATO) in seconds. Takes an unsigned integer.
44           This field specifies the timeout value for the Keep Alive feature
45           in seconds. Defaults to 30 seconds for Discovery Controller
46           connections and 120 seconds for I/O Controller connections.
47
48       ip-family=
49           Takes a string argument. With this you can specify whether IPv4,
50           IPv6, or both are supported when connecting to a Controller.
51           Connections will not be attempted to IP addresses (whether
52           discovered or manually configured with controller=) disabled by
53           this option. If an invalid value is entered, then the default (see
54           below) will apply.
55
56           Choices are ipv4, ipv6, or ipv4+ipv6.
57
58           Defaults to ipv4+ipv6.
59
60       nr-io-queues=
61           Takes a value in the range 1...N. Overrides the default number of
62           I/O queues create by the driver.
63
64           Note: This parameter is identical to that provided by nvme-cli.
65
66           Default: Depends on kernel and other run time factors (e.g. number
67           of CPUs).
68
69       nr-write-queues=
70           Takes a value in the range 1...N. Adds additional queues that will
71           be used for write I/O.
72
73           Note: This parameter is identical to that provided by nvme-cli.
74
75           Default: Depends on kernel and other run time factors (e.g. number
76           of CPUs).
77
78       nr-poll-queues=
79           Takes a value in the range 1...N. Adds additional queues that will
80           be used for polling latency sensitive I/O.
81
82           Note: This parameter is identical to that provided by nvme-cli.
83
84           Default: Depends on kernel and other run time factors (e.g. number
85           of CPUs).
86
87       queue-size=
88           Takes a value in the range 16...1024.
89
90           Overrides the default number of elements in the I/O queues created
91           by the driver. This option will be ignored for discovery, but will
92           be passed on to the subsequent connect call.
93
94           Note: This parameter is identical to that provided by nvme-cli.
95
96           Defaults to 128.
97
98       reconnect-delay=
99           Takes a value in the range 1 to N seconds.
100
101           Overrides the default delay before reconnect is attempted after a
102           connect loss.
103
104           Note: This parameter is identical to that provided by nvme-cli.
105
106           Defaults to 10. Retry to connect every 10 seconds.
107
108       ctrl-loss-tmo=
109           Takes a value in the range -1, 0, ..., N seconds. -1 means retry
110           forever. 0 means do not retry.
111
112           Overrides the default controller loss timeout period (in seconds).
113
114           Note: This parameter is identical to that provided by nvme-cli.
115
116           Defaults to 600 seconds (10 minutes).
117
118       disable-sqflow=
119           Takes a boolean argument. Disables SQ flow control to omit head
120           doorbell update for submission queues when sending nvme
121           completions.
122
123           Note: This parameter is identical to that provided by nvme-cli.
124
125           Defaults to false.
126
127       ignore-iface=
128           Takes a boolean argument. This option controls how connections with
129           I/O Controllers (IOC) are made.
130
131           There is no guarantee that there will be a route to reach that IOC.
132           However, we can use the socket option SO_BINDTODEVICE to force the
133           connection to be made on a specific interface instead of letting
134           the routing tables decide where to make the connection.
135
136           This option determines whether stacd will use SO_BINDTODEVICE to
137           force connections on an interface or just rely on the routing
138           tables. The default is to use SO_BINDTODEVICE, in other words,
139           stacd does not ignore the interface.
140
141           BACKGROUND: By default, stacd will connect to IOCs on the same
142           interface that was used to retrieve the discovery log pages. If
143           stafd discovers a DC on an interface using mDNS, and stafd connects
144           to that DC and retrieves the log pages, it is expected that the
145           storage subsystems listed in the log pages are reachable on the
146           same interface where the DC was discovered.
147
148           For example, let's say a DC is discovered on interface ens102. Then
149           all the subsystems listed in the log pages retrieved from that DC
150           must be reachable on interface ens102. If this doesn't work, for
151           example you cannot "ping -I ens102 [storage-ip]", then the most
152           likely explanation is that proxy arp is not enabled on the switch
153           that the host is connected to on interface ens102. Whatever you do,
154           resist the temptation to manually set up the routing tables or to
155           add alternate routes going over a different interface than the one
156           where the DC is located. That simply won't work. Make sure proxy
157           arp is enabled on the switch first.
158
159           Setting routes won't work because, by default, stacd uses the
160           SO_BINDTODEVICE socket option when it connects to IOCs. This option
161           is used to force a socket connection to be made on a specific
162           interface instead of letting the routing tables decide where to
163           connect the socket. Even if you were to manually configure an
164           alternate route on a different interface, the connections (i.e.
165           host to IOC) will still be made on the interface where the DC was
166           discovered by stafd.
167
168           Defaults to false.
169
170   [I/O controller connection management] section
171       Connectivity between hosts and subsystems in a fabric is controlled by
172       Fabric Zoning. Entities that share a common zone (i.e., are zoned
173       together) are allowed to discover each other and establish connections
174       between them. Fabric Zoning is configured on Discovery Controllers
175       (DC). Users can add/remove controllers and/or hosts to/from zones.
176
177       Hosts have no direct knowledge of the Fabric Zoning configuration that
178       is active on a given DC. As a result, if a host is impacted by a Fabric
179       Zoning configuration change, it will be notified of the connectivity
180       configuration change by the DC via Asynchronous Event Notifications
181       (AEN).
182
183       Table 1. List of terms used in this section:
184       ┌─────────────────┬────────────────────────────┐
185Term             Description                
186       ├─────────────────┼────────────────────────────┤
187       │AEN              │ Asynchronous Event         │
188       │                 │ Notification. A CQE        │
189       │                 │ (Completion Queue Entry)   │
190       │                 │ for an Asynchronous Event  │
191       │                 │ Request that was           │
192       │                 │ previously transmitted by  │
193       │                 │ the host to a Discovery    │
194       │                 │ Controller. AENs are used  │
195       │                 │ by DCs to notify hosts     │
196       │                 │ that a change (e.g., a     │
197       │                 │ connectivity configuration │
198       │                 │ change) has occurred.      │
199       ├─────────────────┼────────────────────────────┤
200       │DC               │ Discovery Controller.      │
201       ├─────────────────┼────────────────────────────┤
202       │DLP              │ Discovery Log Page. A host │
203       │                 │ will issue a Get Log Page  │
204       │                 │ command to retrieve the    │
205       │                 │ list of controllers it may │
206       │                 │ connect to.                │
207       ├─────────────────┼────────────────────────────┤
208       │DLPE             │                            │
209       │                 │ Discovery Log Page Entry.  │
210       │                 │ The response to a Get Log  │
211       │                 │ Page command contains a    │
212       │                 │ list of DLPEs identifying  │
213       │                 │ each controller that the   │
214       │                 │ host is allowed to connect │
215       │                 │ with.                      │
216       │                 │                            │
217       │                 │ Note that DLPEs may        │
218       │                 │ contain both I/O           │
219       │                 │ Controllers (IOCs) and     │
220       │                 │ Discovery Controllers      │
221       │                 │ (DCs). DCs listed in DLPEs │
222       │                 │ are called referrals.      │
223       │                 │ stacd only deals with      │
224       │                 │ IOCs. Referrals (DCs) are  │
225       │                 │ handled by stafd.          │
226       ├─────────────────┼────────────────────────────┤
227       │IOC              │ I/O Controller.            │
228       ├─────────────────┼────────────────────────────┤
229       │Manual Config    │ Refers to manually adding  │
230       │                 │ entries to stacd.conf with │
231       │                 │ the controller= parameter. │
232       ├─────────────────┼────────────────────────────┤
233       │Automatic Config │ Refers to receiving        │
234       │                 │ configuration from a DC as │
235       │                 │ DLPEs                      │
236       ├─────────────────┼────────────────────────────┤
237       │External Config  │ Refers to configuration    │
238       │                 │ done outside of the        │
239       │                 │ nvme-stas framework, for   │
240       │                 │ example using nvme-cli     │
241       │                 │ commands                   │
242       └─────────────────┴────────────────────────────┘
243
244       DCs notify hosts of connectivity configuration changes by sending AENs
245       indicating a "Discovery Log" change. The host uses these AENs as a
246       trigger to issue a Get Log Page command. The response to this command
247       is used to update the list of DLPEs containing the controllers the host
248       is allowed to access. Upon reception of the current DLPEs, the host
249       will determine whether DLPEs were added and/or removed, which will
250       trigger the addition and/or removal of controller connections. This
251       happens in real time and may affect active connections to controllers
252       including controllers that support I/O operations (IOCs). A host that
253       was previously connected to an IOC may suddenly be told that it is no
254       longer allowed to connect to that IOC and should disconnect from it.
255
256       IOC connection creation. There are 3 ways to configure IOC connections
257       on a host:
258
259        1. Manual Config by adding controller= entries to the [Controllers]
260           section (see below).
261
262        2. Automatic Config received in the form of DLPEs from a remote DC.
263
264        3. External Config using nvme-cli (e.g. "nvme connect")
265
266       IOC connection removal/prevention. There are 3 ways to remove (or
267       prevent) connections to an IOC:
268
269        1. Manual Config.
270
271            1. by adding exclude= entries to the [Controllers] section (see
272               below).
273
274            2. by removing controller= entries from the [Controllers] section.
275
276
277        2. Automatic Config. As explained above, a host gets a new list of
278           DLPEs upon connectivity configuration changes. On DLPE removal, the
279           host should remove the connection to the IOC matching that DLPE.
280           This behavior is configurable using the disconnect-scope= parameter
281           described below.
282
283        3. External Config using nvme-cli (e.g. "nvme disconnect" or "nvme
284           disconnect-all")
285
286       The decision by the host to automatically disconnect from an IOC
287       following connectivity configuration changes is controlled by two
288       parameters: disconnect-scope and disconnect-trtypes.
289
290       disconnect-scope=
291           Takes one of: only-stas-connections,
292           all-connections-matching-disconnect-trtypes, or no-disconnect.
293
294           In theory, hosts should only connect to IOCs that have been zoned
295           for them. Connections to IOCs that a host is not zoned to have
296           access to should simply not exist. In practice, however, users may
297           not want hosts to disconnect from all IOCs in reaction to
298           connectivity configuration changes (or at least for some of the IOC
299           connections).
300
301           Some users may prefer for IOC connections to be "sticky" and only
302           be removed manually (nvme-cli or exclude=) or removed by a system
303           reboot. Specifically, they don't want IOC connections to be removed
304           unexpectedly on DLPE removal. These users may want to set
305           disconnect-scope to no-disconnect.
306
307           It is important to note that when IOC connections are removed,
308           ongoing I/O transactions will be terminated immediately. There is
309           no way to tell what happens to the data being exchanged when such
310           an abrupt termination happens. If a host was in the middle of
311           writing to a storage subsystem, there is a chance that outstanding
312           I/O operations may not successfully complete.
313
314           Values:
315               only-stas-connections
316                   Only remove connections previously made by stacd.
317
318                   In this mode, when a DLPE is removed as a result of
319                   connectivity configuration changes, the corresponding IOC
320                   connection will be removed by stacd.
321
322                   Connections to IOCs made externally, e.g. using nvme-cli,
323                   will not be affected, unless they happen to be duplicates
324                   of connections made by stacd. It's simply not possible for
325                   stacd to tell that a connection was previously made with
326                   nvme-cli (or any other external tool). So, it's good
327                   practice to avoid duplicating configuration between stacd
328                   and external tools.
329
330                   Users wanting to persist some of their IOC connections
331                   regardless of connectivity configuration changes should not
332                   use nvme-cli to make those connections. Instead, they
333                   should hard-code them in stacd.conf with the controller=
334                   parameter. Using the controller= parameter is the only way
335                   for a user to tell stacd that a connection must be made and
336                   not be deleted "no-matter-what".
337
338               all-connections-matching-disconnect-trtypes
339                   All connections that match the transport type specified by
340                   disconnect-trtypes=, whether they were made automatically
341                   by stacd or externally (e.g., nvme-cli), will be audited
342                   and are subject to removal on DLPE removal.
343
344                   In this mode, as DLPEs are removed as a result of
345                   connectivity configuration changes, the corresponding IOC
346                   connections will be removed by the host immediately whether
347                   they were made by stacd, nvme-cli, or any other way.
348                   Basically, stacd audits all IOC connections matching the
349                   transport type specified by disconnect-trtypes=.
350
351                   NOTE. This mode implies that stacd will only allow Manually
352                   Configured or Automatically Configured IOC connections to
353                   exist. Externally Configured connections using nvme-cli (or
354                   other external mechanism) that do not match any Manual
355                   Config (stacd.conf) or Automatic Config (DLPEs) will get
356                   deleted immediately by stacd.
357
358               no-disconnect
359                   stacd does not disconnect from IOCs when a DPLE is removed
360                   or a controller= entry is removed from stacd.conf. All IOC
361                   connections are "sticky".
362
363                   Instead, users can remove connections by issuing the
364                   nvme-cli command "nvme disconnect", add an exclude= entry
365                   to stacd.conf, or wait until the next system reboot at
366                   which time all connections will be removed.
367           Defaults to only-stas-connections.
368
369       disconnect-trtypes=
370           This parameter only applies when disconnect-scope is set to
371           all-connections-matching-disconnect-trtypes. It limits the scope of
372           the audit to specific transport types.
373
374           Can take the values tcp, rdma, fc, or a combination thereof by
375           separating them with a plus (+) sign. For example: tcp+fc. No
376           spaces are allowed between values and the plus (+) sign.
377
378           Values:
379               tcp
380                   Audit TCP connections.
381
382               rdma
383                   Audit RDMA connections.
384
385               fc
386                   Audit Fibre Channel connections.
387           Defaults to tcp.
388
389       connect-attempts-on-ncc=
390           The NCC bit (Not Connected to CDC) is a bit returned by the CDC in
391           the EFLAGS field of the DLPE. Only CDCs will set the NCC bit. DDCs
392           will always clear NCC to 0. The NCC bit is a way for the CDC to let
393           hosts know that the subsystem is currently not reachable by the
394           CDC. This may indicate that the subsystem is currently down or that
395           there is an outage on the section of the network connecting the CDC
396           to the subsystem.
397
398           If a host is currently failing to connect to an I/O controller and
399           if the NCC bit associated with that I/O controller is asserted, the
400           host can decide to stop trying to connect to that subsystem until
401           connectivity is restored. This will be indicated by the CDC when it
402           clears the NCC bit.
403
404           The parameter connect-attempts-on-ncc= controls whether stacd will
405           take the NCC bit into account when attempting to connect to an I/O
406           Controller. Setting connect-attempts-on-ncc= to 0 means that stacd
407           will ignore the NCC bit and will keep trying to connect. Setting
408           connect-attempts-on-ncc= to a non-zero value indicates the number
409           of connection attempts that will be made before stacd gives up
410           trying. Note that this value should be set to a value greater than
411           1. In fact, when set to 1, stacd will automatically use 2 instead.
412           The reason for this is simple. It is possible that a first connect
413           attempt may fail.
414
415           Defaults to 0.
416
417   [Controllers] section
418       The following options are available in the [Controllers] section:
419
420       controller=
421           Controllers are specified with the controller option. This option
422           may be specified more than once to specify more than one
423           controller. The format is one line per Controller composed of a
424           series of fields separated by semi-colons as follows:
425
426               controller=transport=[trtype];traddr=[traddr];trsvcid=[trsvcid];host-traddr=[traddr],host-iface=[iface];nqn=[nqn]
427
428
429           Fields
430               transport=
431                   This is a mandatory field that specifies the network fabric
432                   being used for a NVMe-over-Fabrics network. Current trtype
433                   values understood are:
434
435                   Table 2. Transport type
436                   ┌───────┬────────────────────────────┐
437trtype Definition                 
438                   ├───────┼────────────────────────────┤
439                   │rdma   │ The network fabric is an   │
440                   │       │ rdma network (RoCE, iWARP, │
441                   │       │ Infiniband, basic rdma,    │
442                   │       │ etc)                       │
443                   ├───────┼────────────────────────────┤
444                   │fc     │ The network fabric is a    │
445                   │       │ Fibre Channel network.     │
446                   ├───────┼────────────────────────────┤
447                   │tcp    │ The network fabric is a    │
448                   │       │ TCP/IP network.            │
449                   ├───────┼────────────────────────────┤
450                   │loop   │ Connect to a NVMe over     │
451                   │       │ Fabrics target on the      │
452                   │       │ local host                 │
453                   └───────┴────────────────────────────┘
454
455               traddr=
456                   This is a mandatory field that specifies the network
457                   address of the Controller. For transports using IP
458                   addressing (e.g. rdma) this should be an IP-based address
459                   (ex. IPv4, IPv6). It could also be a resolvable host name
460                   (e.g. localhost).
461
462               trsvcid=
463                   This is an optional field that specifies the transport
464                   service id. For transports using IP addressing (e.g. rdma,
465                   tcp) this field is the port number.
466
467                   Depending on the transport type, this field will default to
468                   either 8009 or 4420 as follows.
469
470                   UDP port 4420 and TCP port 4420 have been assigned by IANA
471                   for use by NVMe over Fabrics. NVMe/RoCEv2 controllers use
472                   UDP port 4420 by default. NVMe/iWARP controllers use TCP
473                   port 4420 by default.
474
475                   TCP port 4420 has been assigned for use by NVMe over
476                   Fabrics and TCP port 8009 has been assigned by IANA for use
477                   by NVMe over Fabrics discovery. TCP port 8009 is the
478                   default TCP port for NVMe/TCP discovery controllers. There
479                   is no default TCP port for NVMe/TCP I/O controllers, the
480                   Transport Service Identifier (TRSVCID) field in the
481                   Discovery Log Entry indicates the TCP port to use.
482
483                   The TCP ports that may be used for NVMe/TCP I/O controllers
484                   include TCP port 4420, and the Dynamic and/or Private TCP
485                   ports (i.e., ports in the TCP port number range from 49152
486                   to 65535). NVMe/TCP I/O controllers should not use TCP port
487                   8009. TCP port 4420 shall not be used for both NVMe/iWARP
488                   and NVMe/TCP at the same IP address on the same network.
489
490                   Ref: IANA Service names port numbers[1]
491
492               nqn=
493                   This field specifies the Controller's NVMe Qualified Name.
494
495                   This field is mandatory for I/O Controllers, but is
496                   optional for Discovery Controllers (DC). For the latter,
497                   the NQN will default to the well-known DC NQN:
498                   nqn.2014-08.org.nvmexpress.discovery if left undefined.
499
500               host-traddr=
501                   This is an optional field that specifies the network
502                   address used on the host to connect to the Controller. For
503                   TCP, this sets the source address on the socket.
504
505               host-iface=
506                   This is an optional field that specifies the network
507                   interface used on the host to connect to the Controller
508                   (e.g. IP eth1, enp2s0, enx78e7d1ea46da). This forces the
509                   connection to be made on a specific interface instead of
510                   letting the system decide.
511
512               dhchap-ctrl-secret=
513                   This is an optional field that specifies the NVMe In-band
514                   authentication controller secret (i.e. key) for
515                   bi-directional authentication; needs to be in ASCII format
516                   as specified in NVMe 2.0 section 8.13.5.8 'Secret
517                   representation'. Bi-directional authentication will be
518                   attempted when present.
519
520               hdr-digest=
521                   See definition in [Global] section. This is an optional
522                   field used to override the value specified in the [Global]
523                   section.
524
525               data-digest=
526                   See definition in [Global] section. This is an optional
527                   field used to override the value specified in the [Global]
528                   section.
529
530               nr-io-queues=
531                   See definition in [Global] section. This is an optional
532                   field used to override the value specified in the [Global]
533                   section.
534
535               nr-write-queues=
536                   See definition in [Global] section. This is an optional
537                   field used to override the value specified in the [Global]
538                   section.
539
540               nr-poll-queues=
541                   See definition in [Global] section. This is an optional
542                   field used to override the value specified in the [Global]
543                   section.
544
545               queue-size=
546                   See definition in [Global] section. This is an optional
547                   field used to override the value specified in the [Global]
548                   section.
549
550               kato=
551                   See definition in [Global] section. This is an optional
552                   field used to override the value specified in the [Global]
553                   section.
554
555               reconnect-delay=
556                   See definition in [Global] section. This is an optional
557                   field used to override the value specified in the [Global]
558                   section.
559
560               ctrl-loss-tmo=
561                   See definition in [Global] section. This is an optional
562                   field used to override the value specified in the [Global]
563                   section.
564
565               disable-sqflow=
566                   See definition in [Global] section. This is an optional
567                   field used to override the value specified in the [Global]
568                   section.
569           Examples:
570
571               controller = transport=tcp;traddr=localhost;trsvcid=8009
572               controller = transport=tcp;traddr=2001:db8::370:7334;host-iface=enp0s8
573               controller = transport=fc;traddr=nn-0x204600a098cbcac6:pn-0x204700a098cbcac6
574
575
576
577       exclude=
578           Controllers that should be excluded can be specified with the
579           exclude= option. Using mDNS to automatically discover and connect
580           to controllers, can result in unintentional connections being made.
581           This keyword allows configuring the controllers that should not be
582           connected to.
583
584           The syntax is the same as for "controller", except that only
585           transport, traddr, trsvcid, nqn, and host-iface apply. Multiple
586           exclude= keywords may appear in the config file to specify more
587           than 1 excluded controller.
588
589           Note 1: A minimal match approach is used to eliminate unwanted
590           controllers. That is, you do not need to specify all the parameters
591           to identify a controller. Just specifying the host-iface, for
592           example, can be used to exclude all controllers on an interface.
593
594           Note 2: exclude= takes precedence over controller. A controller
595           specified by the controller keyword, can be eliminated by the
596           exclude= keyword.
597
598           Examples:
599
600               exclude = transport=tcp;traddr=fe80::2c6e:dee7:857:26bb # Eliminate a specific address
601               exclude = host-iface=enp0s8                             # Eliminate everything on this interface
602
603
604

SEE ALSO

606       stacd(8)
607

NOTES

609        1. IANA Service names port numbers
610           https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=nvme
611
612
613
614nvme-stas 2.3.1                                                  STACD.CONF(5)
Impressum