1
2opafabricanalysis(8) Master map: IFSFFCLIRG (Man Page) opafabricanalysis(8)
3
4
5
7 opafabricanalysis
8
9
10
11 (All) Performs analysis of the fabric.
12
14 opafabricanalysis [-b|-e] [-s] [-d dir] [-c file] [-t portsfile]
15 [-p ports] [-T topology_input]
16
18 -- help Produces full help text.
19
20 -b Specifies the baseline mode, default is compare/check mode.
21
22 -e Evaluates health only, default is compare/check mode.
23
24 -s Saves history of failures (errors/differences).
25
26 -d dir Specifies the top-level directory for saving baseline and
27 history of failed checks. Default = /var/usr/lib/opa/analysis
28
29 -c file Specifies the error thresholds config file.Default =
30 /etc/opa/opamon.conf
31
32 -t portsfile
33 Specifies the file with list of local HFI ports used to
34 access fabric(s) for analysis. Default = /etc/opa/ports
35
36 -p ports Specifies the list of local HFI ports used to access fabrics
37 for analysis.
38
39
40 Default is first active port. The first HFI in the system is
41 1. The first port on an HFI is 1. Uses the format hfi:port,
42 for example:
43
44
45
46 0:0 First active port in system.
47
48
49
50
51
52 0:y Port y within system.
53
54
55
56
57
58 x:0 First active port on HFI x.
59
60
61
62
63
64 x:y HFI x, port y.
65
66
67
68 -T topology_input
69 Specifies the name of topology input file to use. Any %P
70 markers in this filename are replaced with the HFI:port being
71 operated on (such as 0:0 or 1:2). Default = /etc/opa/topol‐
72 ogy.%P.xml. If -T NONE is specified, no topology input file
73 is used. See Details and opareport for more information.
74
76 opafabricanalysis
77 opafabricanalysis -p '1:1 1:2 2:1 2:2'
78
79 The fabric analysis tool checks the following:
80
81 · Fabric links (both internal to switch chassis and external
82 cables)
83
84 · Fabric components (nodes, links, SMs, systems, and their SMA
85 configuration)
86
87 · Fabric PMA error counters and link speed mismatches
88
89 NOTE: The comparison includes components on the fabric. Therefore,
90 operations such as shutting down a server cause the server to no longer
91 appear on the fabric and are flagged as a fabric change or failure by
92 opafabricanalysis.
93
94
96 The following environment variables are also used by this command:
97
98 PORTS List of ports, used in absence of -t and -p.
99
100 PORTS_FILE
101 File containing list of ports, used in absence of -t and -p.
102
103 FF_TOPOLOGY_FILE
104 File containing topology_input (may have %P marker in file‐
105 name), used in absence of -T.
106
107 FF_ANALYSIS_DIR
108 Top-level directory for baselines and failed health checks.
109
111 For simple fabrics, the Intel(R) Omni-Path Fabric Suite FastFabric
112 Toolset host is connected to a single fabric. By default, the first
113 active port on the FastFabric Toolset host is used to analyze the fab‐
114 ric. However, in more complex fabrics, the FastFabric Toolset host may
115 be connected to more than one fabric or subnet. In this case, you can
116 specify the ports or HFIs to use with one of the following methods:
117
118 · On the command line using the -p option.
119
120 · In a file specified using the -t option.
121
122 · Through the environment variables PORTS or PORTS_FILE.
123
124 · Using the PORTS_FILE configuration option in opafastfabric.conf.
125
126 If the specified port does not exist or is empty, the first active port
127 on the local system is used. In more complex configurations, you must
128 specify the exact ports to use for all fabrics to be analyzed.
129
130 You can specify the topology_input file to be used with one of the fol‐
131 lowing methods:
132
133 · On the command line using the -T option.
134
135 · In a file specified through the environment variable FF_TOPOL‐
136 OGY_FILE.
137
138 · Using the ff_topology_file configuration option in opafastfab‐
139 ric.conf.
140
141 If the specified file does not exist, no topology_input file is used.
142 Alternately the filename can be specified as NONE to prevent use of an
143 input file.
144
145 For more information on topology_input, refer to opareport
146
147 By default, the error analysis includes PMA counters and slow links
148 (that is, links running below enabled speeds). You can change this
149 using the FF_FABRIC_HEALTH configuration parameter in opafastfab‐
150 ric.conf. This parameter specifies the opareport options and reports to
151 be used for the health analysis. It also can specify the PMA counter
152 clearing behavior (-I seconds, -C, or none at all).
153
154 When a topology_input file is used, it can also be useful to extend
155 FF_FABRIC_HEALTH to include fabric topology verification options such
156 as -o verifylinks.
157
158 The thresholds for PMA counter analysis default to /etc/opa/opa‐
159 mon.conf. However, you can specify an alternate configuration file for
160 thresholds using the -c option. The opamon.si.conf file can also be
161 used to check for any non-zero values for signal integrity (SI) coun‐
162 ters.
163
164 All files generated by opafabricanalysis start with fabric in their
165 file name. This is followed by the port selection option identifying
166 the port used for the analysis. Default is 0:0.
167
168 The opafabricanalysis tool generates files such as the following within
169 FF_ANALYSIS_DIR :
170
171 Health Check
172
173
174 · latest/fabric.0:0.errors stdout of opareport for errors encoun‐
175 tered during fabric error analysis.
176
177
178 · latest/fabric.0.0.errors.stderr stderr of opareport during fab‐
179 ric error analysis.
180
181
182 Baseline
183
184
185 During a baseline run, the following files are also created in FF_ANAL‐
186 YSIS_DIR/latest.
187
188 · baseline/fabric.0:0.snapshot.xml opareport snapshot of complete
189 fabric components and SMA configuration.
190
191
192 · baseline/fabric.0:0.comps opareport summary of fabric components
193 and basic SMA configuration.
194
195
196 · baseline/fabric.0.0.links opareport summary of internal and
197 external links.
198
199
200 Full Analysis
201
202
203 · latest/fabric.0:0.snapshot.xml opareport snapshot of complete
204 fabric components and SMA configuration.
205
206
207 · latest/fabric.0:0.snapshot.stderr stderr of opareport during
208 snapshot.
209
210
211 · latest/fabric.0:0.errors stdout of opareport for errors encoun‐
212 tered during fabric error analysis.
213
214
215 · latest/fabric.0.0.errors.stderr stderr of opareport during fab‐
216 ric error analysis.
217
218
219 · latest/fabric.0:0.comps stdout of opareport for fabric compo‐
220 nents and SMA configuration.
221
222
223 · latest/fabric.0:0.comps.stderr stderr of opareport for fabric
224 components.
225
226
227 · latest/fabric.0:0.comps.diff diff of baseline and latest fabric
228 components.
229
230
231 · latest/fabric.0:0.links stdout of opareport summary of internal
232 and external links.
233
234
235 · latest/fabric.0:0.links.stderr stderr of opareport summary of
236 internal and external links.
237
238
239 · latest/fabric.0:0.links.diff diff of baseline and latest fabric
240 internal and external links.
241
242
243 · latest/fabric.0:0.links.changes.stderr stderr of opareport com‐
244 parison of links.
245
246
247 · latest/fabric.0:0.links.changes opareport comparison of links
248 against baseline. This is typically easier to read than the
249 links.diff file and contains the same information.
250
251
252 · latest/fabric.0:0.comps.changes.stderr stderr of opareport com‐
253 parison of components.
254
255
256 · latest/fabric.0:0.comps.changes opareport comparison of compo‐
257 nents against baseline. This is typically easier to read than
258 the comps.diff file and contains the same information.
259
260
261 The .diff and .changes files are only created if differences are
262 detected.
263
264 If the -s option is used and failures are detected, files related to
265 the checks that failed are also copied to the time-stamped directory
266 name under FF_ANALYSIS_DIR.
267
269 Based on opareport -o links:
270
271 · Unconnected/down/missing cables
272
273 · Added/moved cables
274
275 · Changes in link width and speed
276
277 · Changes to Node GUIDs in fabric (replacement of HFI or Switch
278 hardware)
279
280 · Adding/Removing Nodes [FI, Virtual FIs, Virtual Switches, Physi‐
281 cal Switches, Physical Switch internal switching cards
282 (leaf/spine)]
283
284 · Changes to server or switch names
285
286 Based on opareport -o comps:
287
288 · Overlap with items from links report
289
290 · Changes in port MTU, LMC, number of VLs
291
292 · Changes in port speed/width enabled or supported
293
294 · Changes in HFI or switch device IDs/revisions/VendorID (for
295 example, ASIC hardware changes)
296
297 · Changes in port Capability mask (which features/agents run on
298 port/server)
299
300 · Changes to ErrorLimits and PKey enforcement per port
301
302 · Changes to IOUs/IOCs/IOC Services provided
303
304
305
306 Location (port, node) and number of SMs in fabric. Includes:
307
308 · Primary and backups
309
310 · Configured priority for SM
311
313 Based on opareport -s -C -o errors -o slowlinks:
314
315 · PMA error counters on all Intel(R) Omni-Path Fabric ports (HFI,
316 switch external and switch internal) checked against config‐
317 urable thresholds.
318
319 · Counters are cleared each time a health check is run. Each
320 health check reflects a counter delta since last health check.
321
322 · Typically identifies potential fabric errors, such as symbol
323 errors.
324
325 · May also identify transient congestion, depending on the coun‐
326 ters that are monitored.
327
328 · Link active speed/width as compared to Enabled speed.
329
330 · Identifies links whose active speed/width is < min (enabled
331 speed/width on each side of link).
332
333 · This typically reflects bad cables or bad ports or poor connec‐
334 tions.
335
336 · Side effect is the verification of SA health.
337
338
339
340Copyright(C) 2015-2018 Intel Corporation opafabricanalysis(8)