1STACD.CONF(5) STACD.CONF(5)
2
3
4
6 stacd.conf - stacd(8) configuration file
7
9 /etc/stas/stacd.conf
10
12 When stacd(8) starts up, it reads its configuration from stacd.conf.
13
15 stacd.conf is a plain text file divided into sections, with
16 configuration entries in the style key=value. Spaces immediately before
17 or after the = are ignored. Empty lines are ignored as well as lines
18 starting with #, which may be used for commenting.
19
21 [Global] section
22 The following options are available in the [Global] section:
23
24 tron=
25 Trace ON. Takes a boolean argument. If true, enables full code
26 tracing. The trace will be displayed in the system log such as
27 systemd's journal. Defaults to false.
28
29 hdr-digest=
30 Enable Protocol Data Unit (PDU) Header Digest. Takes a boolean
31 argument. NVMe/TCP facilitates an optional PDU Header digest.
32 Digests are calculated using the CRC32C algorithm. If true, Header
33 Digests are inserted in PDUs and checked for errors. Defaults to
34 false.
35
36 data-digest=
37 Enable Protocol Data Unit (PDU) Data Digest. Takes a boolean
38 argument. NVMe/TCP facilitates an optional PDU Data digest. Digests
39 are calculated using the CRC32C algorithm. If true, Data Digests
40 are inserted in PDUs and checked for errors. Defaults to false.
41
42 kato=
43 Keep Alive Timeout (KATO) in seconds. Takes an unsigned integer.
44 This field specifies the timeout value for the Keep Alive feature
45 in seconds. Defaults to 30 seconds for Discovery Controller
46 connections and 120 seconds for I/O Controller connections.
47
48 ip-family=
49 Takes a string argument. With this you can specify whether IPv4,
50 IPv6, or both are supported when connecting to a Controller.
51 Connections will not be attempted to IP addresses (whether
52 discovered or manually configured with controller=) disabled by
53 this option. If an invalid value is entered, then the default (see
54 below) will apply.
55
56 Choices are ipv4, ipv6, or ipv4+ipv6.
57
58 Defaults to ipv4+ipv6.
59
60 nr-io-queues=
61 Takes a value in the range 1...N. Overrides the default number of
62 I/O queues create by the driver.
63
64 Note: This parameter is identical to that provided by nvme-cli.
65
66 Default: Depends on kernel and other run time factors (e.g. number
67 of CPUs).
68
69 nr-write-queues=
70 Takes a value in the range 1...N. Adds additional queues that will
71 be used for write I/O.
72
73 Note: This parameter is identical to that provided by nvme-cli.
74
75 Default: Depends on kernel and other run time factors (e.g. number
76 of CPUs).
77
78 nr-poll-queues=
79 Takes a value in the range 1...N. Adds additional queues that will
80 be used for polling latency sensitive I/O.
81
82 Note: This parameter is identical to that provided by nvme-cli.
83
84 Default: Depends on kernel and other run time factors (e.g. number
85 of CPUs).
86
87 queue-size=
88 Takes a value in the range 16...1024.
89
90 Overrides the default number of elements in the I/O queues created
91 by the driver. This option will be ignored for discovery, but will
92 be passed on to the subsequent connect call.
93
94 Note: This parameter is identical to that provided by nvme-cli.
95
96 Defaults to 128.
97
98 reconnect-delay=
99 Takes a value in the range 1 to N seconds.
100
101 Overrides the default delay before reconnect is attempted after a
102 connect loss.
103
104 Note: This parameter is identical to that provided by nvme-cli.
105
106 Defaults to 10. Retry to connect every 10 seconds.
107
108 ctrl-loss-tmo=
109 Takes a value in the range -1, 0, ..., N seconds. -1 means retry
110 forever. 0 means do not retry.
111
112 Overrides the default controller loss timeout period (in seconds).
113
114 Note: This parameter is identical to that provided by nvme-cli.
115
116 Defaults to 600 seconds (10 minutes).
117
118 disable-sqflow=
119 Takes a boolean argument. Disables SQ flow control to omit head
120 doorbell update for submission queues when sending nvme
121 completions.
122
123 Note: This parameter is identical to that provided by nvme-cli.
124
125 Defaults to false.
126
127 ignore-iface=
128 Takes a boolean argument. This option controls how connections with
129 I/O Controllers (IOC) are made.
130
131 There is no guarantee that there will be a route to reach that IOC.
132 However, we can use the socket option SO_BINDTODEVICE to force the
133 connection to be made on a specific interface instead of letting
134 the routing tables decide where to make the connection.
135
136 This option determines whether stacd will use SO_BINDTODEVICE to
137 force connections on an interface or just rely on the routing
138 tables. The default is to use SO_BINDTODEVICE, in other words,
139 stacd does not ignore the interface.
140
141 BACKGROUND: By default, stacd will connect to IOCs on the same
142 interface that was used to retrieve the discovery log pages. If
143 stafd discovers a DC on an interface using mDNS, and stafd connects
144 to that DC and retrieves the log pages, it is expected that the
145 storage subsystems listed in the log pages are reachable on the
146 same interface where the DC was discovered.
147
148 For example, let's say a DC is discovered on interface ens102. Then
149 all the subsystems listed in the log pages retrieved from that DC
150 must be reachable on interface ens102. If this doesn't work, for
151 example you cannot "ping -I ens102 [storage-ip]", then the most
152 likely explanation is that proxy arp is not enabled on the switch
153 that the host is connected to on interface ens102. Whatever you do,
154 resist the temptation to manually set up the routing tables or to
155 add alternate routes going over a different interface than the one
156 where the DC is located. That simply won't work. Make sure proxy
157 arp is enabled on the switch first.
158
159 Setting routes won't work because, by default, stacd uses the
160 SO_BINDTODEVICE socket option when it connects to IOCs. This option
161 is used to force a socket connection to be made on a specific
162 interface instead of letting the routing tables decide where to
163 connect the socket. Even if you were to manually configure an
164 alternate route on a different interface, the connections (i.e.
165 host to IOC) will still be made on the interface where the DC was
166 discovered by stafd.
167
168 Defaults to false.
169
170 [I/O controller connection management] section
171 Connectivity between hosts and subsystems in a fabric is controlled by
172 Fabric Zoning. Entities that share a common zone (i.e., are zoned
173 together) are allowed to discover each other and establish connections
174 between them. Fabric Zoning is configured on Discovery Controllers
175 (DC). Users can add/remove controllers and/or hosts to/from zones.
176
177 Hosts have no direct knowledge of the Fabric Zoning configuration that
178 is active on a given DC. As a result, if a host is impacted by a Fabric
179 Zoning configuration change, it will be notified of the connectivity
180 configuration change by the DC via Asynchronous Event Notifications
181 (AEN).
182
183 Table 1. List of terms used in this section:
184 ┌─────────────────┬────────────────────────────┐
185 │Term │ Description │
186 ├─────────────────┼────────────────────────────┤
187 │AEN │ Asynchronous Event │
188 │ │ Notification. A CQE │
189 │ │ (Completion Queue Entry) │
190 │ │ for an Asynchronous Event │
191 │ │ Request that was │
192 │ │ previously transmitted by │
193 │ │ the host to a Discovery │
194 │ │ Controller. AENs are used │
195 │ │ by DCs to notify hosts │
196 │ │ that a change (e.g., a │
197 │ │ connectivity configuration │
198 │ │ change) has occurred. │
199 ├─────────────────┼────────────────────────────┤
200 │DC │ Discovery Controller. │
201 ├─────────────────┼────────────────────────────┤
202 │DLP │ Discovery Log Page. A host │
203 │ │ will issue a Get Log Page │
204 │ │ command to retrieve the │
205 │ │ list of controllers it may │
206 │ │ connect to. │
207 ├─────────────────┼────────────────────────────┤
208 │DLPE │ │
209 │ │ Discovery Log Page Entry. │
210 │ │ The response to a Get Log │
211 │ │ Page command contains a │
212 │ │ list of DLPEs identifying │
213 │ │ each controller that the │
214 │ │ host is allowed to connect │
215 │ │ with. │
216 │ │ │
217 │ │ Note that DLPEs may │
218 │ │ contain both I/O │
219 │ │ Controllers (IOCs) and │
220 │ │ Discovery Controllers │
221 │ │ (DCs). DCs listed in DLPEs │
222 │ │ are called referrals. │
223 │ │ stacd only deals with │
224 │ │ IOCs. Referrals (DCs) are │
225 │ │ handled by stafd. │
226 ├─────────────────┼────────────────────────────┤
227 │IOC │ I/O Controller. │
228 ├─────────────────┼────────────────────────────┤
229 │Manual Config │ Refers to manually adding │
230 │ │ entries to stacd.conf with │
231 │ │ the controller= parameter. │
232 ├─────────────────┼────────────────────────────┤
233 │Automatic Config │ Refers to receiving │
234 │ │ configuration from a DC as │
235 │ │ DLPEs │
236 ├─────────────────┼────────────────────────────┤
237 │External Config │ Refers to configuration │
238 │ │ done outside of the │
239 │ │ nvme-stas framework, for │
240 │ │ example using nvme-cli │
241 │ │ commands │
242 └─────────────────┴────────────────────────────┘
243
244 DCs notify hosts of connectivity configuration changes by sending AENs
245 indicating a "Discovery Log" change. The host uses these AENs as a
246 trigger to issue a Get Log Page command. The response to this command
247 is used to update the list of DLPEs containing the controllers the host
248 is allowed to access. Upon reception of the current DLPEs, the host
249 will determine whether DLPEs were added and/or removed, which will
250 trigger the addition and/or removal of controller connections. This
251 happens in real time and may affect active connections to controllers
252 including controllers that support I/O operations (IOCs). A host that
253 was previously connected to an IOC may suddenly be told that it is no
254 longer allowed to connect to that IOC and should disconnect from it.
255
256 IOC connection creation. There are 3 ways to configure IOC connections
257 on a host:
258
259 1. Manual Config by adding controller= entries to the [Controllers]
260 section (see below).
261
262 2. Automatic Config received in the form of DLPEs from a remote DC.
263
264 3. External Config using nvme-cli (e.g. "nvme connect")
265
266 IOC connection removal/prevention. There are 3 ways to remove (or
267 prevent) connections to an IOC:
268
269 1. Manual Config.
270
271 1. by adding exclude= entries to the [Controllers] section (see
272 below).
273
274 2. by removing controller= entries from the [Controllers] section.
275
276
277 2. Automatic Config. As explained above, a host gets a new list of
278 DLPEs upon connectivity configuration changes. On DLPE removal, the
279 host should remove the connection to the IOC matching that DLPE.
280 This behavior is configurable using the disconnect-scope= parameter
281 described below.
282
283 3. External Config using nvme-cli (e.g. "nvme disconnect" or "nvme
284 disconnect-all")
285
286 The decision by the host to automatically disconnect from an IOC
287 following connectivity configuration changes is controlled by two
288 parameters: disconnect-scope and disconnect-trtypes.
289
290 disconnect-scope=
291 Takes one of: only-stas-connections,
292 all-connections-matching-disconnect-trtypes, or no-disconnect.
293
294 In theory, hosts should only connect to IOCs that have been zoned
295 for them. Connections to IOCs that a host is not zoned to have
296 access to should simply not exist. In practice, however, users may
297 not want hosts to disconnect from all IOCs in reaction to
298 connectivity configuration changes (or at least for some of the IOC
299 connections).
300
301 Some users may prefer for IOC connections to be "sticky" and only
302 be removed manually (nvme-cli or exclude=) or removed by a system
303 reboot. Specifically, they don't want IOC connections to be removed
304 unexpectedly on DLPE removal. These users may want to set
305 disconnect-scope to no-disconnect.
306
307 It is important to note that when IOC connections are removed,
308 ongoing I/O transactions will be terminated immediately. There is
309 no way to tell what happens to the data being exchanged when such
310 an abrupt termination happens. If a host was in the middle of
311 writing to a storage subsystem, there is a chance that outstanding
312 I/O operations may not successfully complete.
313
314 Values:
315 only-stas-connections
316 Only remove connections previously made by stacd.
317
318 In this mode, when a DLPE is removed as a result of
319 connectivity configuration changes, the corresponding IOC
320 connection will be removed by stacd.
321
322 Connections to IOCs made externally, e.g. using nvme-cli,
323 will not be affected, unless they happen to be duplicates
324 of connections made by stacd. It's simply not possible for
325 stacd to tell that a connection was previously made with
326 nvme-cli (or any other external tool). So, it's good
327 practice to avoid duplicating configuration between stacd
328 and external tools.
329
330 Users wanting to persist some of their IOC connections
331 regardless of connectivity configuration changes should not
332 use nvme-cli to make those connections. Instead, they
333 should hard-code them in stacd.conf with the controller=
334 parameter. Using the controller= parameter is the only way
335 for a user to tell stacd that a connection must be made and
336 not be deleted "no-matter-what".
337
338 all-connections-matching-disconnect-trtypes
339 All connections that match the transport type specified by
340 disconnect-trtypes=, whether they were made automatically
341 by stacd or externally (e.g., nvme-cli), will be audited
342 and are subject to removal on DLPE removal.
343
344 In this mode, as DLPEs are removed as a result of
345 connectivity configuration changes, the corresponding IOC
346 connections will be removed by the host immediately whether
347 they were made by stacd, nvme-cli, or any other way.
348 Basically, stacd audits all IOC connections matching the
349 transport type specified by disconnect-trtypes=.
350
351 NOTE. This mode implies that stacd will only allow Manually
352 Configured or Automatically Configured IOC connections to
353 exist. Externally Configured connections using nvme-cli (or
354 other external mechanism) that do not match any Manual
355 Config (stacd.conf) or Automatic Config (DLPEs) will get
356 deleted immediately by stacd.
357
358 no-disconnect
359 stacd does not disconnect from IOCs when a DPLE is removed
360 or a controller= entry is removed from stacd.conf. All IOC
361 connections are "sticky".
362
363 Instead, users can remove connections by issuing the
364 nvme-cli command "nvme disconnect", add an exclude= entry
365 to stacd.conf, or wait until the next system reboot at
366 which time all connections will be removed.
367 Defaults to only-stas-connections.
368
369 disconnect-trtypes=
370 This parameter only applies when disconnect-scope is set to
371 all-connections-matching-disconnect-trtypes. It limits the scope of
372 the audit to specific transport types.
373
374 Can take the values tcp, rdma, fc, or a combination thereof by
375 separating them with a plus (+) sign. For example: tcp+fc. No
376 spaces are allowed between values and the plus (+) sign.
377
378 Values:
379 tcp
380 Audit TCP connections.
381
382 rdma
383 Audit RDMA connections.
384
385 fc
386 Audit Fibre Channel connections.
387 Defaults to tcp.
388
389 connect-attempts-on-ncc=
390 The NCC bit (Not Connected to CDC) is a bit returned by the CDC in
391 the EFLAGS field of the DLPE. Only CDCs will set the NCC bit. DDCs
392 will always clear NCC to 0. The NCC bit is a way for the CDC to let
393 hosts know that the subsystem is currently not reachable by the
394 CDC. This may indicate that the subsystem is currently down or that
395 there is an outage on the section of the network connecting the CDC
396 to the subsystem.
397
398 If a host is currently failing to connect to an I/O controller and
399 if the NCC bit associated with that I/O controller is asserted, the
400 host can decide to stop trying to connect to that subsystem until
401 connectivity is restored. This will be indicated by the CDC when it
402 clears the NCC bit.
403
404 The parameter connect-attempts-on-ncc= controls whether stacd will
405 take the NCC bit into account when attempting to connect to an I/O
406 Controller. Setting connect-attempts-on-ncc= to 0 means that stacd
407 will ignore the NCC bit and will keep trying to connect. Setting
408 connect-attempts-on-ncc= to a non-zero value indicates the number
409 of connection attempts that will be made before stacd gives up
410 trying. Note that this value should be set to a value greater than
411 1. In fact, when set to 1, stacd will automatically use 2 instead.
412 The reason for this is simple. It is possible that a first connect
413 attempt may fail.
414
415 Defaults to 0.
416
417 [Controllers] section
418 The following options are available in the [Controllers] section:
419
420 controller=
421 Controllers are specified with the controller option. This option
422 may be specified more than once to specify more than one
423 controller. The format is one line per Controller composed of a
424 series of fields separated by semi-colons as follows:
425
426 controller=transport=[trtype];traddr=[traddr];trsvcid=[trsvcid];host-traddr=[traddr],host-iface=[iface];nqn=[nqn]
427
428
429 Fields
430 transport=
431 This is a mandatory field that specifies the network fabric
432 being used for a NVMe-over-Fabrics network. Current trtype
433 values understood are:
434
435 Table 2. Transport type
436 ┌───────┬────────────────────────────┐
437 │trtype │ Definition │
438 ├───────┼────────────────────────────┤
439 │rdma │ The network fabric is an │
440 │ │ rdma network (RoCE, iWARP, │
441 │ │ Infiniband, basic rdma, │
442 │ │ etc) │
443 ├───────┼────────────────────────────┤
444 │fc │ The network fabric is a │
445 │ │ Fibre Channel network. │
446 ├───────┼────────────────────────────┤
447 │tcp │ The network fabric is a │
448 │ │ TCP/IP network. │
449 ├───────┼────────────────────────────┤
450 │loop │ Connect to a NVMe over │
451 │ │ Fabrics target on the │
452 │ │ local host │
453 └───────┴────────────────────────────┘
454
455 traddr=
456 This is a mandatory field that specifies the network
457 address of the Controller. For transports using IP
458 addressing (e.g. rdma) this should be an IP-based address
459 (ex. IPv4, IPv6). It could also be a resolvable host name
460 (e.g. localhost).
461
462 trsvcid=
463 This is an optional field that specifies the transport
464 service id. For transports using IP addressing (e.g. rdma,
465 tcp) this field is the port number.
466
467 Depending on the transport type, this field will default to
468 either 8009 or 4420 as follows.
469
470 UDP port 4420 and TCP port 4420 have been assigned by IANA
471 for use by NVMe over Fabrics. NVMe/RoCEv2 controllers use
472 UDP port 4420 by default. NVMe/iWARP controllers use TCP
473 port 4420 by default.
474
475 TCP port 4420 has been assigned for use by NVMe over
476 Fabrics and TCP port 8009 has been assigned by IANA for use
477 by NVMe over Fabrics discovery. TCP port 8009 is the
478 default TCP port for NVMe/TCP discovery controllers. There
479 is no default TCP port for NVMe/TCP I/O controllers, the
480 Transport Service Identifier (TRSVCID) field in the
481 Discovery Log Entry indicates the TCP port to use.
482
483 The TCP ports that may be used for NVMe/TCP I/O controllers
484 include TCP port 4420, and the Dynamic and/or Private TCP
485 ports (i.e., ports in the TCP port number range from 49152
486 to 65535). NVMe/TCP I/O controllers should not use TCP port
487 8009. TCP port 4420 shall not be used for both NVMe/iWARP
488 and NVMe/TCP at the same IP address on the same network.
489
490 Ref: IANA Service names port numbers[1]
491
492 nqn=
493 This field specifies the Controller's NVMe Qualified Name.
494
495 This field is mandatory for I/O Controllers, but is
496 optional for Discovery Controllers (DC). For the latter,
497 the NQN will default to the well-known DC NQN:
498 nqn.2014-08.org.nvmexpress.discovery if left undefined.
499
500 host-traddr=
501 This is an optional field that specifies the network
502 address used on the host to connect to the Controller. For
503 TCP, this sets the source address on the socket.
504
505 host-iface=
506 This is an optional field that specifies the network
507 interface used on the host to connect to the Controller
508 (e.g. IP eth1, enp2s0, enx78e7d1ea46da). This forces the
509 connection to be made on a specific interface instead of
510 letting the system decide.
511
512 dhchap-ctrl-secret=
513 This is an optional field that specifies the NVMe In-band
514 authentication controller secret (i.e. key) for
515 bi-directional authentication; needs to be in ASCII format
516 as specified in NVMe 2.0 section 8.13.5.8 'Secret
517 representation'. Bi-directional authentication will be
518 attempted when present.
519
520 hdr-digest=
521 See definition in [Global] section. This is an optional
522 field used to override the value specified in the [Global]
523 section.
524
525 data-digest=
526 See definition in [Global] section. This is an optional
527 field used to override the value specified in the [Global]
528 section.
529
530 nr-io-queues=
531 See definition in [Global] section. This is an optional
532 field used to override the value specified in the [Global]
533 section.
534
535 nr-write-queues=
536 See definition in [Global] section. This is an optional
537 field used to override the value specified in the [Global]
538 section.
539
540 nr-poll-queues=
541 See definition in [Global] section. This is an optional
542 field used to override the value specified in the [Global]
543 section.
544
545 queue-size=
546 See definition in [Global] section. This is an optional
547 field used to override the value specified in the [Global]
548 section.
549
550 kato=
551 See definition in [Global] section. This is an optional
552 field used to override the value specified in the [Global]
553 section.
554
555 reconnect-delay=
556 See definition in [Global] section. This is an optional
557 field used to override the value specified in the [Global]
558 section.
559
560 ctrl-loss-tmo=
561 See definition in [Global] section. This is an optional
562 field used to override the value specified in the [Global]
563 section.
564
565 disable-sqflow=
566 See definition in [Global] section. This is an optional
567 field used to override the value specified in the [Global]
568 section.
569 Examples:
570
571 controller = transport=tcp;traddr=localhost;trsvcid=8009
572 controller = transport=tcp;traddr=2001:db8::370:7334;host-iface=enp0s8
573 controller = transport=fc;traddr=nn-0x204600a098cbcac6:pn-0x204700a098cbcac6
574
575
576
577 exclude=
578 Controllers that should be excluded can be specified with the
579 exclude= option. Using mDNS to automatically discover and connect
580 to controllers, can result in unintentional connections being made.
581 This keyword allows configuring the controllers that should not be
582 connected to.
583
584 The syntax is the same as for "controller", except that only
585 transport, traddr, trsvcid, nqn, and host-iface apply. Multiple
586 exclude= keywords may appear in the config file to specify more
587 than 1 excluded controller.
588
589 Note 1: A minimal match approach is used to eliminate unwanted
590 controllers. That is, you do not need to specify all the parameters
591 to identify a controller. Just specifying the host-iface, for
592 example, can be used to exclude all controllers on an interface.
593
594 Note 2: exclude= takes precedence over controller. A controller
595 specified by the controller keyword, can be eliminated by the
596 exclude= keyword.
597
598 Examples:
599
600 exclude = transport=tcp;traddr=fe80::2c6e:dee7:857:26bb # Eliminate a specific address
601 exclude = host-iface=enp0s8 # Eliminate everything on this interface
602
603
604
606 stacd(8)
607
609 1. IANA Service names port numbers
610 https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=nvme
611
612
613
614nvme-stas 2.3.1 STACD.CONF(5)