1
2opahostadmin(8)        Master map: IFSFFCLIRG (Man Page)       opahostadmin(8)
3
4
5

NAME

7       opahostadmin
8
9
10
11       (Host)  Performs a number of multi-step host initialization and verifi‐
12       cation operations, including upgrading software or firmware,  rebooting
13       hosts,  and  other operations. In general, operations performed by opa‐
14       hostadmin involve a login to one or more host systems.
15

Syntax

17       opahostadmin [-c] [-i ipoib_suffix] [-f hostfile] [-h 'hosts']
18       [-r release] [-I install_options] [-U upgrade_options] [-d dir]
19       [-T product] [-P packages] [-m netmask] [-S] operation...
20

Options

22       --help    Produces full help text.
23
24
25       -c        Overwrites the result files  from  any  previous  run  before
26                 starting this run.
27
28
29       -i ipoib_suffix
30                 Specifies  the  suffix to apply to host names to create IPoIB
31                 host names. Default is -opa.
32
33
34       -f hostfile
35                 Specifies the file with the names  of  hosts  in  a  cluster.
36                 Default is /etc/opa/hosts file.
37
38
39       -h hosts  Specifies the list of hosts to execute the operation against.
40
41
42       -r release
43                 Specifies the software version to load/upgrade to. Default is
44                 the version of Intel(R) Omni-Path  Software  presently  being
45                 run on the server.
46
47
48       -d dir    Specifies the directory to retrieve  product. release.tgz for
49                 load or upgrade.
50
51
52       -I install_options
53                 Specifies the software install options.
54
55
56       -U upgrade_options
57                 Specifies the software upgrade options.
58
59
60       -T product
61                 Specifies the product type to install. Default  is  IntelOPA-
62                 Basic.  <distro>  or IntelOPA-IFS. <distro> where <distro> is
63                 the distribution and CPU.
64
65
66       -P packages
67                 Specifies the packages to install. Default  =  oftools  ipoib
68                 psm_mpi
69
70
71       -m netmask
72                 Specifies the IPoIB netmask to use for configipoib operation.
73
74
75       -S        Securely prompts for user password on remote system.
76
77
78       operation Performs the specified operation, which can be one or more of
79                 the following:
80
81
82
83
84                 load      Starts initial installation of all hosts.
85
86
87
88
89
90                 upgrade   Upgrades installation of all hosts.
91
92
93
94
95
96                 configipoib
97                           Creates  ifcfg-ib1  using  host  IP  address   from
98                           /etc/hosts file.
99
100
101
102
103
104                 reboot    Reboots hosts, ensures they go down and come back.
105
106
107
108
109
110                 sacache   Confirms sacache has all hosts in it.
111
112
113
114
115
116                 ipoibping Verifies  this  host  can  ping  each  host through
117                           IPoIB.
118
119
120
121
122
123                 mpiperf   Verifies latency and bandwidth for each host.
124
125
126
127
128
129                 mpiperfdeviation
130                           Verifies  latency  and  bandwidth  for  each   host
131                           against a defined threshold (or relative to average
132                           host performance).
133
134
135

Example

137       opahostadmin -c reboot
138       opahostadmin upgrade
139       opahostadmin -h 'elrond arwen' reboot
140       HOSTS='elrond arwen' opahostadmin reboot
141

Details

143       opahostadmin provides detailed logging of its results. During each run,
144       the following files are produced:
145
146       ·      test.res : Appended with summary results of run.
147
148       ·      test.log : Appended with detailed results of run.
149
150       ·      save_tmp/  :  Contains a directory per failed test with detailed
151              logs.
152
153       ·      test_tmp*/ : Intermediate result files while test is running.
154
155       The -c option removes all log files.
156
157       Results from opahostadmin are grouped into test suites, test cases, and
158       test items. A given run of opahostadmin represents a single test suite.
159       Within a test suite, multiple test cases occur; typically one test case
160       per  host  being  operated  on. Some of the more complex operations may
161       have multiple test items per test case. Each  test  item  represents  a
162       major step in the overall test case.
163
164       Each  opahostadmin  run  appends  to test.res and test.log, and creates
165       temporary files in test_tmp$PID in the current directory. test.res pro‐
166       vides an overall summary of operations performed and their results. The
167       same information is also displayed  while  opahostadmin  is  executing.
168       test.log  contains  detailed  information  about  what  was  performed,
169       including the specific commands executed and the resulting output.  The
170       test_tmp  directories  contain  temporary  files which reflect tests in
171       progress (or killed). The logs for  any  failures  are  logged  in  the
172       save_temp  directory with a directory per failed test case. If the same
173       test case fails more than once, save_temp retains the information  from
174       the  first  failure.  Subsequent  runs  of opahostadmin are appended to
175       test.log. Intel recommends reviewing failures and using the  -c  option
176       to remove old logs before subsequent runs of opahostadmin.
177
178       opahostadmin  implicitly  performs its operations in parallel. However,
179       as for the other tools, FF_MAX_PARALLEL can be exported to  change  the
180       degree of parallelism. Twenty (20) parallel operations is the default.
181

Environment Variables

183       The following environment variables are also used by this command:
184
185       HOSTS     List of hosts, used if -h option not supplied.
186
187
188       HOSTS_FILE
189                 File containing list of hosts, used in absence of -f and -h.
190
191
192       FF_MAX_PARALLEL
193                 Maximum concurrent operations are performed.
194
195
196       FF_SERIALIZE_OUTPUT
197                 Serialize output of parallel operations (yes or no).
198
199
200       FF_TIMEOUT_MULT
201                 Multiplier  for  all  timeouts  associated with this command.
202                 Used if the systems are slow for some reason.
203
204
205

opahostadmin Operation Details

207       (Host) Intel recommends that you set up password SSH  or  SCP  for  use
208       during  this  operation.  Alternatively,  the  -S option can be used to
209       securely prompt for a password, in which case the same password is used
210       for  all hosts. Alternately, the password may be put in the environment
211       or the opafastfabric.conf file using FF_PASSWORD and FF_ROOTPASS.
212
213       load      Performs an initial installation of Intel(R) Omni-Path  Soft‐
214                 ware  on a group of hosts. Any existing installation is unin‐
215                 stalled and existing configuration files are removed.  Subse‐
216                 quently,  the  hosts  are  installed  with a default Intel(R)
217                 Omni-Path Software configuration. The -I option can  be  used
218                 to select different install packages. Default = oftools ipoib
219                 mpi The -r option can be used to specify a release to install
220                 other  than  the one that this host is presently running. The
221                 FF_PRODUCT.   FF_PRODUCT_VERSION.tgz   file   (for   example,
222                 IntelOPA-Basic.  version.tgz)  is  expected  to  exist in the
223                 directory specified by -d. Default  is  the  current  working
224                 directory.  The  specified  software  is  copied  to  all the
225                 selected hosts and installed.
226
227
228       upgrade   Upgrades all selected hosts without modifying  existing  con‐
229                 figurations.  This  operation  is comparable to the -U option
230                 when running ./INSTALL manually. The -r option can be used to
231                 upgrade to a release different from this host. The default is
232                 to upgrade to the same release as this host. The  FF_PRODUCT.
233                 FF_PRODUCT_VERSION.tgz  file  (for  example,  IntelOPA-Basic.
234                 version.tgz) is expected to exist in the directory  specified
235                 by  -d.  (The  default is the current working directory.) The
236                 specified software  is  copied  to  all  the  end  nodes  and
237                 installed.
238
239
240
241
242       NOTE:  Only  components that are currently installed are upgraded. This
243       operation fails for hosts that do not have Intel(R) Omni-Path  Software
244       installed.
245
246
247
248       configipoib
249                 Creates  a  ifcfg-ib1  configuration file for each node using
250                 the IP address found using the  resolver  on  the  node.  The
251                 standard  Linux*  resolver  is used through the host command.
252                 (If running OFA Delta, this option configures ifcfg-ib0 .)
253
254
255                 If the host is not found, /etc/hosts on the node is  checked.
256                 The  -i option specifies an IPoIB suffix to apply to the host
257                 name to create the IPoIB host name for the node. The  default
258                 suffix  is  -ib. The -m option specifies a netmask other than
259                 the default for the given class of IP address, such  as  when
260                 dividing  a  class  A  or  B address into smaller IP subnets.
261                 IPoIB  is  configured  for  a  static  IP  address   and   is
262                 autostarted  at boot. For the Intel(R) OP Software Stack, the
263                 default /etc/ipoib.cfg file is used, which provides a  redun‐
264                 dant IPoIB configuration using both ports of the first HFI in
265                 the system.
266
267
268
269
270       NOTE: opahostadmin  configipoib  now  supports  DHCP  (auto  or  static
271       options)  for  configuring  the IPoIB interface. You must specify these
272       options  in  /etc/opa/opafastfabric.conf  against  the  FF_IPOIB_CONFIG
273       variable.  If no options are found, the static IP configuration is used
274       by default. If auto is specified,  then  one  IP  address  from  either
275       static  or  dhcp  is  chosen.  Static  is used if the IP address can be
276       obtained out of /etc/hosts or the resolver, otherwise DHCP is used.
277
278
279
280       reboot    Reboots the given hosts and ensures they  go  down  and  come
281                 back  up  by pinging them during the reboot process. The ping
282                 rate is slow (5 seconds), so if the servers boot faster  than
283                 this, false failures may be seen.
284
285
286       sacache   Verifies the given hosts can properly communicate with the SA
287                 and any cached SA data that is up to date. To run  this  com‐
288                 mand,  Intel(R)  Omni-Path  Fabric software must be installed
289                 and running on  the  given  hosts.  The  subnet  manager  and
290                 switches  must  be  up. If this test fails: opacmdall 'opasa‐
291                 query -o desc' can be run against any problem hosts.
292
293
294
295
296
297       NOTE: This operation requires that the hosts being queried  are  speci‐
298       fied  by  a  resolvable  TCP/IP  host name. This operation FAILS if the
299       selected hosts are specified by IP address.
300
301
302
303       ipoibping Verifies IPoIB basic operation by ensuring that the host  can
304                 ping  all  other  nodes  through  IPoIB. To run this command,
305                 Intel(R) Omni-Path Fabric software must be  installed,  IPoIB
306                 must  be  configured  and  running on the host, and the given
307                 hosts, the SM, and switches must be up.  The  -i  option  can
308                 specify an alternate IPoIB hostname suffix.
309
310
311       mpiperf   Verifies  that  MPI  is operational and checks MPI end-to-end
312                 latency and bandwidth between pairs of  nodes  (for  example,
313                 1-2,  3-4,  5-6). Use this to verify switch latency/hops, PCI
314                 bandwidth, and overall MPI  performance.  The  test.res  file
315                 contains the results of each pair of nodes tested.
316
317
318
319
320
321       NOTE:  This  option is available for the Intel(R) Omni-Path Fabric Host
322       Software OFA Delta packaging, but is not presently available for  other
323       packagings of OFED.
324
325
326
327              To  obtain  accurate  results, this test should be run at a time
328              when no other stressful applications (for example, MPI  jobs  or
329              high  stress  file  system  operations) are running on the given
330              hosts.
331
332              Bandwidth issues typically indicate server configuration  issues
333              (for  example,  incorrect slot used, incorrect BIOS settings, or
334              incorrect HFI model), or  fabric  issues  (for  example,  symbol
335              errors,  incorrect link width, or speed). Assuming opareport has
336              previously been used to check for link  errors  and  link  speed
337              issues, the server configuration should be verified.
338
339              Note  that  BIOS  settings and differences between server models
340              can account  for  10-20%  differences  in  bandwidth.  For  more
341              details  about BIOS settings, consult the documentation from the
342              server supplier and/or the server PCI chipset manufacturer.
343
344       mpiperfdeviation
345                 Specifies the enhanced version of mpiperf that  verifies  MPI
346                 performance.  Can  be used to verify switch latency/hops, PCI
347                 bandwidth, and overall MPI performance. It performs  assorted
348                 pair-wise bandwidth and latency tests, and reports pairs out‐
349                 side an acceptable tolerance range. The tool identifies  spe‐
350                 cific nodes that have problems and provides a concise summary
351                 of results. The test.res file contains the  results  of  each
352                 pair of nodes tested.
353
354
355                 By  default,  concurrent  mode is used to quickly analyze the
356                 fabric and host performance. Pairs that have 20%  less  band‐
357                 width  or 50% more latency than the average pair are reported
358                 as failures.
359
360                 The tool can be run in a sequential  or  a  concurrent  mode.
361                 Sequential  mode  runs each host against a reference host. By
362                 default, the reference host is selected  based  on  the  best
363                 performance  from a quick test of the first 40 hosts. In con‐
364                 current mode, hosts are paired up and all pairs are run  con‐
365                 currently. Since there may be fabric contention during such a
366                 run, any poor performing pairs are  then  rerun  sequentially
367                 against the reference host.
368
369                 Concurrent  mode  runs  the  tests  in the shortest amount of
370                 time, however, the results could be  slightly  less  accurate
371                 due  to  switch  contention. In heavily oversubscribed fabric
372                 designs, if concurrent mode  is  producing  unexpectedly  low
373                 performance, try sequential mode.
374
375
376
377
378       NOTE:  This  option is available for the Intel(R) Omni-Path Fabric Host
379       Software OFA Delta packaging, but is not presently available for  other
380       packagings of OFED.
381
382
383
384              To  obtain  accurate  results, this test should be run at a time
385              when no other stressful applications  (for  example,  MPI  jobs,
386              high  stress  file  system  operations) are running on the given
387              hosts.
388
389              Bandwidth issues typically indicate server configuration  issues
390              (for  example,  incorrect slot used, incorrect BIOS settings, or
391              incorrect HFI model), or  fabric  issues  (for  example,  symbol
392              errors,  incorrect link width, or speed). Assuming opareport has
393              previously been used to check for link  errors  and  link  speed
394              issues, the server configuration should be verified.
395
396              Note  that  BIOS  settings and differences between server models
397              can account for 10-20% differences in bandwidth. A result  5-10%
398              below  the average is typically not cause for serious alarm, but
399              may reflect limitations in the server design or the chosen  BIOS
400              settings.
401
402              For  more details about BIOS settings, consult the documentation
403              from the server supplier and/or the server PCI chipset  manufac‐
404              turer.
405
406              The  deviation application supports a number of parameters which
407              allow for more precise control  over  the  mode,  benchmark  and
408              pass/fail  criteria. The parameters to use can be selected using
409              the FF_DEVIATION_ARGS  configuration  parameter  in  opafastfab‐
410              ric.conf
411
412              Available parameters for deviation application:
413
414              [-bwtol bwtol] [-bwdelta MBs] [-bwthres MBs]
415              [-bwloop count] [-bwsize size] [-lattol latol]
416              [-latdelta usec] [-latthres usec] [-latloop count]
417              [-latsize size][-c] [-b] [-v] [-vv]
418              [-h reference_host]
419
420
421
422
423
424              -bwtol    Specifies the percent of bandwidth degradation allowed
425                        below average value.
426
427
428
429
430
431              -bwbidir  Performs a bidirectional bandwidth test.
432
433
434
435
436
437              -bwunidir Performs a unidirectional bandwidth test (default).
438
439
440
441
442
443              -bwdelta  Specifies the limit in MB/s of  bandwidth  degradation
444                        allowed below average value.
445
446
447
448
449
450              -bwthres  Specifies   the  lower  limit  in  MB/s  of  bandwidth
451                        allowed.
452
453
454
455
456
457              -bwloop   Specifies the number of loops to  execute  each  band‐
458                        width test.
459
460
461
462
463
464              -bwsize   Specifies  the  size  of  message to use for bandwidth
465                        test.
466
467
468
469
470
471              -lattol   Specifies the percent of latency  degradation  allowed
472                        above average value.
473
474
475
476
477
478              -latdelta Specifies the imit in &#181;sec of latency degradation
479                        allowed above average value.
480
481
482
483
484
485              -latthres Specifies the lower  limit  in  &#181;sec  of  latency
486                        allowed.
487
488
489
490
491
492              -latloop  Specifies  the number of loops to execute each latency
493                        test.
494
495
496
497
498
499              -latsize  Specifies the size of message to use for latency test.
500
501
502
503
504
505              -c        Runs test pairs concurrently instead of the default of
506                        sequential.
507
508
509
510
511
512              -b        When  comparing  results  against tolerance and delta,
513                        uses best instead of average.
514
515
516
517
518
519              -v        Specifies the verbose output.
520
521
522
523
524
525              -vv       Specifies the very verbose output.
526
527
528
529
530
531              -h        Specifies the reference host  to  use  for  sequential
532                        pairing.
533
534
535
536              Both bwtol and bwdelta must be exceeded to fail bandwidth test.
537
538              When bwthres is supplied, bwtol and bwdelta are ignored.
539
540              Both lattol and latdelta must be exceeded to fail latency test.
541
542              When latthres is supplied, lattol and latdelta are ignored.
543
544              For  consistency with OSU benchmarks, MB/s is defined as 1000000
545              bytes/s.
546
547
548
549Copyright(C) 2015-2018         Intel Corporation               opahostadmin(8)
Impressum