1IBQUERYERRORS(8) OpenIB Diagnostics IBQUERYERRORS(8)
2
3
4
6 IBQUERYERRORS - query and report IB port counters
7
9 ibqueryerrors [options]
10
12 The default behavior is to report the port error counters which exceed
13 a threshold for each port in the fabric. The default threshold is zero
14 (0). Error fields can also be suppressed entirely.
15
16 In addition to reporting errors on every port. ibqueryerrors can re‐
17 port the port transmit and receive data as well as report full link in‐
18 formation to the remote port if available.
19
21 -s, --suppress <err1,err2,...> Suppress the errors listed in the comma
22 separated list provided.
23
24 -c, --suppress-common Suppress some of the common "side effect" coun‐
25 ters. These counters usually do not indicate an error condition and
26 can be usually be safely ignored.
27
28 -r, --report-port Report the port information. This includes LID,
29 port, external port (if applicable), link speed setting, remote GUID,
30 remote port, remote external port (if applicable), and remote node de‐
31 scription information.
32
33 --data Include the optional transmit and receive data counters.
34
35 --threshold-file <filename> Specify an alternate threshold file. The
36 default is /etc/infiniband-diags/error_thresholds
37
38 --switch print data for switch's only
39
40 --ca print data for CA's only
41
42 --skip-sl Use the default sl for queries. This is not recommended when
43 using a QoS aware routing engine as it can cause a credit deadlock.
44
45 --router print data for routers only
46
47 --clear-errors -k Clear error counters after read.
48
49 --clear-counts -K Clear data counters after read.
50
51 CAUTION clearing data or error counters will occur regardless of if
52 they are printed or not. See --counters and --data for details on con‐
53 trolling which counters are printed.
54
55 --details include receive error and transmit discard details
56
57 --counters print data counters only
58
59 Partial Scan flags
60 The node to start a partial scan can be specified with the following
61 addresses.
62
63 --port-guid, -G <port_guid> Specify a port_guid
64
65 -D, --Direct <dr_path> The address specified is a directed route
66
67 Examples:
68 -D "0" # self port
69 -D "0,1,2,1,4" # out via port 1, then 2, ...
70
71 (Note the second number in the path specified must match the port being
72 used. This can be specified using the port selection flag '-P' or the
73 port found through the automatic selection process.)
74
75 Note: For switches results are printed for all ports not just switch
76 port 0.
77
78 -S <port_guid> same as "-G". (provided only for backward compatibility)
79
80 Cache File flags
81 --load-cache <filename> Load and use the cached ibnetdiscover data
82 stored in the specified filename. May be useful for outputting and
83 learning about other fabrics or a previous state of a fabric.
84
85 Port Selection flags
86 -C, --Ca <ca_name> use the specified ca_name.
87
88 -P, --Port <ca_port> use the specified ca_port.
89
90 Local port Selection
91 Multiple port/Multiple CA support: when no IB device or port is speci‐
92 fied (see the "local umad parameters" below), the libibumad library se‐
93 lects the port to use by the following criteria:
94
95 1. the first port that is ACTIVE.
96
97 2. if not found, the first port that is UP (physical link up).
98
99 If a port and/or CA name is specified, the libibumad library at‐
100 tempts to fulfill the user request, and will fail if it is not pos‐
101 sible.
102
103 For example:
104
105 ibaddr # use the first port (criteria #1 above)
106 ibaddr -C mthca1 # pick the best port from "mthca1" only.
107 ibaddr -P 2 # use the second (active/up) port from the first available IB device.
108 ibaddr -C mthca0 -P 2 # use the specified port only.
109
110 Configuration flags
111 --config, -z <config_file> Specify alternate config file.
112 Default: /etc/infiniband-diags/ibdiag.conf
113
114 --outstanding_smps, -o <val>
115 Specify the number of outstanding SMP's which should be issued
116 during the scan
117
118 Default: 2
119
120 --node-name-map <node-name-map> Specify a node name map.
121 This file maps GUIDs to more user friendly names. See FILES sec‐
122 tion.
123
124 -t, --timeout <timeout_ms> override the default timeout for the so‐
125 licited mads.
126
127 -y, --m_key <key>
128 use the specified M_key for requests. If non-numeric value (like
129 'x') is specified then a value will be prompted for.
130
131 Debugging flags
132 -d raise the IB debugging level. May be used several times (-ddd
133 or -d -d -d).
134
135 -e show send and receive errors (timeouts and others)
136
137 -h, --help show the usage message
138
139 -v, --verbose
140 increase the application verbosity level. May be used several
141 times (-vv or -v -v -v)
142
143 -V, --version show the version info.
144
145 -R (This option is obsolete and does nothing)
146
148 -1 if scan fails.
149
150 0 if scan succeeds without errors beyond thresholds
151
152 1 if errors are found beyond thresholds or inconsistencies are found in
153 check mode.
154
156 ERROR THRESHOLD
157 /etc/infiniband-diags/error_thresholds
158
159 Define threshold values for errors. File format is simple "name=val".
160 Comments begin with '#'
161
162 Example:
163
164 # Define thresholds for error counters
165 SymbolErrorCounter=10
166 LinkErrorRecoveryCounter=10
167 VL15Dropped=100
168
169 CONFIG FILE
170 /etc/infiniband-diags/ibdiag.conf
171
172 A global config file is provided to set some of the common options for
173 all tools. See supplied config file for details.
174
175 NODE NAME MAP FILE FORMAT
176 The node name map is used to specify user friendly names for nodes in
177 the output. GUIDs are used to perform the lookup.
178
179 This functionality is provided by the opensm-libs package. See
180 opensm(8) for the file location for your installation.
181
182 Generically:
183
184 # comment
185 <guid> "<name>"
186
187 Example:
188
189 # IB1
190 # Line cards
191 0x0008f104003f125c "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096 Voltaire sLB-24D"
192 0x0008f104003f125d "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096 Voltaire sLB-24D"
193 0x0008f104003f10d2 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096 Voltaire sLB-24D"
194 0x0008f104003f10d3 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096 Voltaire sLB-24D"
195 0x0008f104003f10bf "IB1 (Rack 11 slot 12 ) ISR9288/ISR9096 Voltaire sLB-24D"
196
197 # Spines
198 0x0008f10400400e2d "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
199 0x0008f10400400e2e "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
200 0x0008f10400400e2f "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire sFB-12D"
201 0x0008f10400400e31 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire sFB-12D"
202 0x0008f10400400e32 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire sFB-12D"
203
204 # GUID Node Name
205 0x0008f10400411a08 "SW1 (Rack 3) ISR9024 Voltaire 9024D"
206 0x0008f10400411a28 "SW2 (Rack 3) ISR9024 Voltaire 9024D"
207 0x0008f10400411a34 "SW3 (Rack 3) ISR9024 Voltaire 9024D"
208 0x0008f104004119d0 "SW4 (Rack 3) ISR9024 Voltaire 9024D"
209
211 Ira Weiny
212 < ira.weiny@intel.com >
213
214
215
216
217 2016-09-26 IBQUERYERRORS(8)