1
2opafabricanalysis(8)   Master map: IFSFFCLIRG (Man Page)  opafabricanalysis(8)
3
4
5

NAME

7       opafabricanalysis
8
9
10
11       (All) Performs analysis of the fabric.
12

Syntax

14       opafabricanalysis [-b|-e] [-s] [-d dir] [-c file] [-t portsfile]
15       [-p ports] [-T topology_input]
16

Options

18       -- help   Produces full help text.
19
20       -b        Specifies the baseline mode, default is compare/check mode.
21
22       -e        Evaluates health only, default is compare/check mode.
23
24       -s        Saves history of failures (errors/differences).
25
26       -d dir    Specifies  the  top-level  directory  for saving baseline and
27                 history of failed checks. Default = /var/usr/lib/opa/analysis
28
29       -c file   Specifies  the  error  thresholds   config   file.Default   =
30                 /etc/opa/opamon.conf
31
32       -t portsfile
33                 Specifies  the  file  with  list  of  local HFI ports used to
34                 access fabric(s) for analysis. Default = /etc/opa/ports
35
36       -p ports  Specifies the list of local HFI ports used to access  fabrics
37                 for analysis.
38
39
40                 Default  is first active port. The first HFI in the system is
41                 1. The first port on an HFI is 1. Uses the format hfi:port,
42                 for example:
43
44
45
46                 0:0       First active port in system.
47
48
49
50
51
52                 0:y       Port y within system.
53
54
55
56
57
58                 x:0       First active port on HFI x.
59
60
61
62
63
64                 x:y       HFI x, port y.
65
66
67
68       -T topology_input
69                 Specifies the name of topology input  file  to  use.  Any  %P
70                 markers in this filename are replaced with the HFI:port being
71                 operated on (such as 0:0 or 1:2). Default  =  /etc/opa/topol‐
72                 ogy.%P.xml.  If  -T NONE is specified, no topology input file
73                 is used. See Details and opareport for more information.
74

Example

76       opafabricanalysis
77       opafabricanalysis -p '1:1 1:2 2:1 2:2'
78
79       The fabric analysis tool checks the following:
80
81       ·      Fabric links (both  internal  to  switch  chassis  and  external
82              cables)
83
84       ·      Fabric  components  (nodes,  links,  SMs, systems, and their SMA
85              configuration)
86
87       ·      Fabric PMA error counters and link speed mismatches
88
89       NOTE: The comparison includes  components  on  the  fabric.  Therefore,
90       operations such as shutting down a server cause the server to no longer
91       appear on the fabric and are flagged as a fabric change or  failure  by
92       opafabricanalysis.
93
94

Environment Variables

96       The following environment variables are also used by this command:
97
98       PORTS     List of ports, used in absence of -t and -p.
99
100       PORTS_FILE
101                 File containing list of ports, used in absence of -t and -p.
102
103       FF_TOPOLOGY_FILE
104                 File  containing  topology_input (may have %P marker in file‐
105                 name), used in absence of -T.
106
107       FF_ANALYSIS_DIR
108                 Top-level directory for baselines and failed health checks.
109

Details

111       For simple fabrics, the  Intel(R)  Omni-Path  Fabric  Suite  FastFabric
112       Toolset  host  is  connected  to a single fabric. By default, the first
113       active port on the FastFabric Toolset host is used to analyze the  fab‐
114       ric.  However, in more complex fabrics, the FastFabric Toolset host may
115       be connected to more than one fabric or subnet. In this case,  you  can
116       specify the ports or HFIs to use with one of the following methods:
117
118       ·      On the command line using the -p option.
119
120       ·      In a file specified using the -t option.
121
122       ·      Through the environment variables PORTS or PORTS_FILE.
123
124       ·      Using the PORTS_FILE configuration option in opafastfabric.conf.
125
126       If the specified port does not exist or is empty, the first active port
127       on the local system is used. In more complex configurations,  you  must
128       specify the exact ports to use for all fabrics to be analyzed.
129
130       You can specify the topology_input file to be used with one of the fol‐
131       lowing methods:
132
133       ·      On the command line using the -T option.
134
135       ·      In a file specified through the environment  variable  FF_TOPOL‐
136              OGY_FILE.
137
138       ·      Using  the  ff_topology_file configuration option in opafastfab‐
139              ric.conf.
140
141       If the specified file does not exist, no topology_input file  is  used.
142       Alternately  the filename can be specified as NONE to prevent use of an
143       input file.
144
145       For more information on topology_input, refer to opareport
146
147       By default, the error analysis includes PMA  counters  and  slow  links
148       (that  is,  links  running  below  enabled speeds). You can change this
149       using  the  FF_FABRIC_HEALTH  configuration  parameter  in  opafastfab‐
150       ric.conf. This parameter specifies the opareport options and reports to
151       be used for the health analysis. It also can specify  the  PMA  counter
152       clearing behavior (-I seconds, -C, or none at all).
153
154       When  a  topology_input  file  is used, it can also be useful to extend
155       FF_FABRIC_HEALTH to include fabric topology verification  options  such
156       as -o verifylinks.
157
158       The  thresholds  for  PMA  counter  analysis  default  to /etc/opa/opa‐
159       mon.conf. However, you can specify an alternate configuration file  for
160       thresholds  using  the  -c  option. The opamon.si.conf file can also be
161       used to check for any non-zero values for signal integrity  (SI)  coun‐
162       ters.
163
164       All  files  generated  by  opafabricanalysis start with fabric in their
165       file name. This is followed by the port  selection  option  identifying
166       the port used for the analysis. Default is 0:0.
167
168       The opafabricanalysis tool generates files such as the following within
169       FF_ANALYSIS_DIR :
170
171       Health Check
172
173
174       ·      latest/fabric.0:0.errors stdout of opareport for errors  encoun‐
175              tered during fabric error analysis.
176
177
178       ·      latest/fabric.0.0.errors.stderr  stderr of opareport during fab‐
179              ric error analysis.
180
181
182       Baseline
183
184
185       During a baseline run, the following files are also created in FF_ANAL‐
186       YSIS_DIR/latest.
187
188       ·      baseline/fabric.0:0.snapshot.xml  opareport snapshot of complete
189              fabric components and SMA configuration.
190
191
192       ·      baseline/fabric.0:0.comps opareport summary of fabric components
193              and basic SMA configuration.
194
195
196       ·      baseline/fabric.0.0.links  opareport  summary  of  internal  and
197              external links.
198
199
200       Full Analysis
201
202
203       ·      latest/fabric.0:0.snapshot.xml opareport  snapshot  of  complete
204              fabric components and SMA configuration.
205
206
207       ·      latest/fabric.0:0.snapshot.stderr  stderr  of  opareport  during
208              snapshot.
209
210
211       ·      latest/fabric.0:0.errors stdout of opareport for errors  encoun‐
212              tered during fabric error analysis.
213
214
215       ·      latest/fabric.0.0.errors.stderr  stderr of opareport during fab‐
216              ric error analysis.
217
218
219       ·      latest/fabric.0:0.comps stdout of opareport  for  fabric  compo‐
220              nents and SMA configuration.
221
222
223       ·      latest/fabric.0:0.comps.stderr  stderr  of  opareport for fabric
224              components.
225
226
227       ·      latest/fabric.0:0.comps.diff diff of baseline and latest  fabric
228              components.
229
230
231       ·      latest/fabric.0:0.links  stdout of opareport summary of internal
232              and external links.
233
234
235       ·      latest/fabric.0:0.links.stderr stderr of  opareport  summary  of
236              internal and external links.
237
238
239       ·      latest/fabric.0:0.links.diff  diff of baseline and latest fabric
240              internal and external links.
241
242
243       ·      latest/fabric.0:0.links.changes.stderr stderr of opareport  com‐
244              parison of links.
245
246
247       ·      latest/fabric.0:0.links.changes  opareport  comparison  of links
248              against baseline. This is typically  easier  to  read  than  the
249              links.diff file and contains the same information.
250
251
252       ·      latest/fabric.0:0.comps.changes.stderr  stderr of opareport com‐
253              parison of components.
254
255
256       ·      latest/fabric.0:0.comps.changes opareport comparison  of  compo‐
257              nents  against  baseline.  This is typically easier to read than
258              the comps.diff file and contains the same information.
259
260
261       The .diff and .changes  files  are  only  created  if  differences  are
262       detected.
263
264       If  the  -s  option is used and failures are detected, files related to
265       the checks that failed are also copied to  the  time-stamped  directory
266       name under FF_ANALYSIS_DIR.
267

Fabric Items Checked Against the Baseline

269       Based on opareport -o links:
270
271       ·      Unconnected/down/missing cables
272
273       ·      Added/moved cables
274
275       ·      Changes in link width and speed
276
277       ·      Changes  to  Node  GUIDs in fabric (replacement of HFI or Switch
278              hardware)
279
280       ·      Adding/Removing Nodes [FI, Virtual FIs, Virtual Switches, Physi‐
281              cal   Switches,   Physical   Switch   internal  switching  cards
282              (leaf/spine)]
283
284       ·      Changes to server or switch names
285
286       Based on opareport -o comps:
287
288       ·      Overlap with items from links report
289
290       ·      Changes in port MTU, LMC, number of VLs
291
292       ·      Changes in port speed/width enabled or supported
293
294       ·      Changes in HFI  or  switch  device  IDs/revisions/VendorID  (for
295              example, ASIC hardware changes)
296
297       ·      Changes  in  port  Capability mask (which features/agents run on
298              port/server)
299
300       ·      Changes to ErrorLimits and PKey enforcement per port
301
302       ·      Changes to IOUs/IOCs/IOC Services provided
303
304
305
306       Location (port, node) and number of SMs in fabric. Includes:
307
308       ·      Primary and backups
309
310       ·      Configured priority for SM
311

Fabric Items Also Checked During Health Check

313       Based on opareport -s -C -o errors -o slowlinks:
314
315       ·      PMA error counters on all Intel(R) Omni-Path Fabric ports  (HFI,
316              switch  external  and  switch  internal) checked against config‐
317              urable thresholds.
318
319       ·      Counters are cleared each time  a  health  check  is  run.  Each
320              health check reflects a counter delta since last health check.
321
322       ·      Typically  identifies  potential  fabric  errors, such as symbol
323              errors.
324
325       ·      May also identify transient congestion, depending on  the  coun‐
326              ters that are monitored.
327
328       ·      Link active speed/width as compared to Enabled speed.
329
330       ·      Identifies  links  whose  active  speed/width  is < min (enabled
331              speed/width on each side of link).
332
333       ·      This typically reflects bad cables or bad ports or poor  connec‐
334              tions.
335
336       ·      Side effect is the verification of SA health.
337
338
339
340Copyright(C) 2015-2018         Intel Corporation          opafabricanalysis(8)
Impressum