1INFINIBAND-DIAGS(8) Open IB Diagnostics INFINIBAND-DIAGS(8)
2
3
4
6 infiniband-diags - Diagnostics for InfiniBand Fabrics
7
9 infiniband-diags is a set of utilities designed to help configure,
10 debug, and maintain infiniband fabrics. Many tools and utilities are
11 provided. Some with similar functionality.
12
13 The base utilities use directed route MAD's to perform their opera‐
14 tions. They may therefore work even in unconfigured subnets. Other,
15 higher level utilities, require LID routed MAD's and to some extent
16 SA/SM access.
17
19 Many of the tools in this package rely on the use of SMPs via QP0 to
20 acquire data directly from the SMA. While this mode of operation is
21 not technically in compliance with the InfiniBand specification, prac‐
22 tical experience has found that this level of diagnostics is valuable
23 when working with a fabric which is broken or only partially config‐
24 ured. For this reason many of these tools may require the use of an
25 MKey or operation from Virtual Machines may be restricted for security
26 reasons.
27
29 Most OpenIB diagnostics take some of the following common flags. The
30 exact list of supported flags per utility can be found in the documen‐
31 tation for those commands.
32
33 Addressing Flags
34 The -D and -G option have two forms:
35
36 -D, --Direct The address specified is a directed route
37
38 Examples:
39 [options] -D [options] "0" # self port
40 [options] -D [options] "0,1,2,1,4" # out via port 1, then 2, ...
41
42 (Note the second number in the path specified must match the port being
43 used. This can be specified using the port selection flag '-P' or the
44 port found through the automatic selection process.)
45
46 -D, --Direct <dr_path> The address specified is a directed route
47
48 Examples:
49 -D "0" # self port
50 -D "0,1,2,1,4" # out via port 1, then 2, ...
51
52 (Note the second number in the path specified must match the port being
53 used. This can be specified using the port selection flag '-P' or the
54 port found through the automatic selection process.)
55
56 -G, --Guid The address specified is a Port GUID
57
58 --port-guid, -G <port_guid> Specify a port_guid
59
60 -L, --Lid The address specified is a LID
61
62 -s, --sm_port <smlid> use 'smlid' as the target lid for SA queries.
63
64 Port Selection flags
65 -C, --Ca <ca_name> use the specified ca_name.
66
67 -P, --Port <ca_port> use the specified ca_port.
68
69 Local port Selection
70 Multiple port/Multiple CA support: when no IB device or port is speci‐
71 fied (see the "local umad parameters" below), the libibumad library
72 selects the port to use by the following criteria:
73
74 1. the first port that is ACTIVE.
75
76 2. if not found, the first port that is UP (physical link up).
77
78 If a port and/or CA name is specified, the libibumad library
79 attempts to fulfill the user request, and will fail if it is not
80 possible.
81
82 For example:
83
84 ibaddr # use the first port (criteria #1 above)
85 ibaddr -C mthca1 # pick the best port from "mthca1" only.
86 ibaddr -P 2 # use the second (active/up) port from the first available IB device.
87 ibaddr -C mthca0 -P 2 # use the specified port only.
88
89 Debugging flags
90 -d raise the IB debugging level. May be used several times (-ddd
91 or -d -d -d).
92
93 -e show send and receive errors (timeouts and others)
94
95 -h, --help show the usage message
96
97 -v, --verbose
98 increase the application verbosity level. May be used several
99 times (-vv or -v -v -v)
100
101 -V, --version show the version info.
102
103 Configuration flags
104 -t, --timeout <timeout_ms> override the default timeout for the
105 solicited mads.
106
107 --outstanding_smps, -o <val>
108 Specify the number of outstanding SMP's which should be issued
109 during the scan
110
111 Default: 2
112
113 --node-name-map <node-name-map> Specify a node name map.
114 This file maps GUIDs to more user friendly names. See FILES sec‐
115 tion.
116
117 --config, -z <config_file> Specify alternate config file.
118 Default: /etc/infiniband-diags/ibdiag.conf
119
121 The following config files are common amongst many of the utilities.
122
123 CONFIG FILE
124 /etc/infiniband-diags/ibdiag.conf
125
126 A global config file is provided to set some of the common options for
127 all tools. See supplied config file for details.
128
129 NODE NAME MAP FILE FORMAT
130 The node name map is used to specify user friendly names for nodes in
131 the output. GUIDs are used to perform the lookup.
132
133 This functionality is provided by the opensm-libs package. See
134 opensm(8) for the file location for your installation.
135
136 Generically:
137
138 # comment
139 <guid> "<name>"
140
141 Example:
142
143 # IB1
144 # Line cards
145 0x0008f104003f125c "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096 Voltaire sLB-24D"
146 0x0008f104003f125d "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096 Voltaire sLB-24D"
147 0x0008f104003f10d2 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096 Voltaire sLB-24D"
148 0x0008f104003f10d3 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096 Voltaire sLB-24D"
149 0x0008f104003f10bf "IB1 (Rack 11 slot 12 ) ISR9288/ISR9096 Voltaire sLB-24D"
150
151 # Spines
152 0x0008f10400400e2d "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
153 0x0008f10400400e2e "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
154 0x0008f10400400e2f "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
155 0x0008f10400400e31 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire sFB-12D"
156 0x0008f10400400e32 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire sFB-12D"
157
158 # GUID Node Name
159 0x0008f10400411a08 "SW1 (Rack 3) ISR9024 Voltaire 9024D"
160 0x0008f10400411a28 "SW2 (Rack 3) ISR9024 Voltaire 9024D"
161 0x0008f10400411a34 "SW3 (Rack 3) ISR9024 Voltaire 9024D"
162 0x0008f104004119d0 "SW4 (Rack 3) ISR9024 Voltaire 9024D"
163
164 TOPOLOGY FILE FORMAT
165 The topology file format is human readable and largely intuitive. Most
166 identifiers are given textual names like vendor ID (vendid), device ID
167 (device ID), GUIDs of various types (sysimgguid, caguid, switchguid,
168 etc.). PortGUIDs are shown in parentheses (). For switches, this is
169 shown on the switchguid line. For CA and router ports, it is shown on
170 the connectivity lines. The IB node is identified followed by the num‐
171 ber of ports and a quoted the node GUID. On the right of this line is
172 a comment (#) followed by the NodeDescription in quotes. If the node
173 is a switch, this line also contains whether switch port 0 is base or
174 enhanced, and the LID and LMC of port 0. Subsequent lines pertaining
175 to this node show the connectivity. On the left is the port number of
176 the current node. On the right is the peer node (node at other end of
177 link). It is identified in quotes with nodetype followed by - followed
178 by NodeGUID with the port number in square brackets. Further on the
179 right is a comment (#). What follows the comment is dependent on the
180 node type. If it it a switch node, it is followed by the NodeDescrip‐
181 tion in quotes and the LID of the peer node. If it is a CA or router
182 node, it is followed by the local LID and LMC and then followed by the
183 NodeDescription in quotes and the LID of the peer node. The active
184 link width and speed are then appended to the end of this output line.
185
186 An example of this is:
187
188 #
189 # Topology file: generated on Tue Jun 5 14:15:10 2007
190 #
191 # Max of 3 hops discovered
192 # Initiated from node 0008f10403960558 port 0008f10403960559
193
194 Non-Chassis Nodes
195
196 vendid=0x8f1
197 devid=0x5a06
198 sysimgguid=0x5442ba00003000
199 switchguid=0x5442ba00003080(5442ba00003080)
200 Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0
201 [22] "H-0008f10403961354"[1](8f10403961355) # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR
202 [10] "S-0008f10400410015"[1] # "SW-6IB4 Voltaire" lid 3 4xSDR
203 [8] "H-0008f10403960558"[2](8f1040396055a) # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR
204 [6] "S-0008f10400410015"[3] # "SW-6IB4 Voltaire" lid 3 4xSDR
205 [12] "H-0008f10403960558"[1](8f10403960559) # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR
206
207 vendid=0x8f1
208 devid=0x5a05
209 switchguid=0x8f10400410015(8f10400410015)
210 Switch 8 "S-0008f10400410015" # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0
211 [6] "H-0008f10403960984"[1](8f10403960985) # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR
212 [4] "H-005442b100004900"[1](5442b100004901) # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR
213 [1] "S-005442ba00003080"[10] # "ISR9024 Voltaire" lid 6 1xSDR
214 [3] "S-005442ba00003080"[6] # "ISR9024 Voltaire" lid 6 4xSDR
215
216 vendid=0x2c9
217 devid=0x5a44
218 caguid=0x8f10403960984
219 Ca 2 "H-0008f10403960984" # "MT23108 InfiniHost Mellanox Technologies"
220 [1](8f10403960985) "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
221
222 vendid=0x2c9
223 devid=0x5a44
224 caguid=0x5442b100004900
225 Ca 2 "H-005442b100004900" # "MT23108 InfiniHost Mellanox Technologies"
226 [1](5442b100004901) "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
227
228 vendid=0x2c9
229 devid=0x5a44
230 caguid=0x8f10403961354
231 Ca 2 "H-0008f10403961354" # "MT23108 InfiniHost Mellanox Technologies"
232 [1](8f10403961355) "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
233
234 vendid=0x2c9
235 devid=0x5a44
236 caguid=0x8f10403960558
237 Ca 2 "H-0008f10403960558" # "MT23108 InfiniHost Mellanox Technologies"
238 [2](8f1040396055a) "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
239 [1](8f10403960559) "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR
240
241 When grouping is used, IB nodes are organized into chassis which are
242 numbered. Nodes which cannot be determined to be in a chassis are dis‐
243 played as "Non-Chassis Nodes". External ports are also shown on the
244 connectivity lines.
245
247 Basic fabric connectivity
248 See: ibnetdiscover, iblinkinfo
249
250 Node information
251 See: ibnodes, ibswitches, ibhosts, ibrouters
252
253 Port information
254 See: ibportstate, ibaddr
255
256 Switch Forwarding Table info
257 See: ibtracert, ibroute, dump_lfts, dump_mfts, check_lft_balance,
258 ibfindnodesusing
259
260 Performance counters
261 See: ibqueryerrors, perfquery
262
263 Local HCA info
264 See: ibstat, ibstatus
265
266 Connectivity check
267 See: ibping, ibsysstat
268
269 Low level query tools
270 See: smpquery, smpdump, saquery, sminfo
271
272 Fabric verification tools
273 See: ibidsverify
274
276 The following scripts have been identified as redundant and/or lower
277 performing as compared to the above scripts. They are provided as
278 legacy scripts when --enable-compat-utils is specified at build time.
279
280 ibcheckerrors, ibclearcounters, ibclearerrors, ibdatacounters ibcheck‐
281 net, ibchecknode, ibcheckport, ibcheckportstate, ibcheckportwidth,
282 ibcheckstate, ibcheckwidth, ibswportwatch, ibprintca, ibprintrt,
283 ibprintswitch, set_nodedesc.sh
284
286 Ira Weiny
287 < ira.weiny@intel.com >
288
289
290
291
292 2017-08-21 INFINIBAND-DIAGS(8)