1
2opafindgood(8)         Master map: IFSFFCLIRG (Man Page)        opafindgood(8)
3
4
5

NAME

7       opafindgood
8
9
10
11       Checks  for  hosts  that  are  able to be pinged, accessed via SSH, and
12       active on the Intel(R) Omni-Path Fabric. Produces a list of good  hosts
13       meeting  all criteria. Typically used to identify good hosts to undergo
14       further testing and benchmarking during  initial  cluster  staging  and
15       startup.
16
17       The  resulting  good  file lists each good host exactly once and can be
18       used as input to create mpi_hosts files for running  mpi_apps  and  the
19       HFI-SW  cable test. The files alive, running, active, good, and bad are
20       created in the selected directory listing hosts passing each criteria.
21
22       This command assumes the Node Description for each host is based on the
23       hostname-s  output  in conjunction with an optional hfi1_# suffix. When
24       using a /etc/opa/hosts file that lists the hostnames,  this  assumption
25       may not be correct.
26
27       This  command  automatically  generates  the  file FF_RESULT_DIR/punch‐
28       list.csv. This file provides a concise summary of the bad hosts  found.
29       This  can  be  imported  into  Excel directly as a *.csv file. Alterna‐
30       tively, it can be cut/pasted into Excel, and the Data/Text  to  Columns
31       toolbar  can  be used to separate the information into multiple columns
32       at the semicolons.
33
34       A sample generated output is:
35
36       # opafindgood
37       3 hosts will be checked
38       2 hosts are pingable (alive)
39       2 hosts are ssh'able (running)
40       2 total hosts have FIs active on one or more fabrics (active)
41       No Quarantine Node Records Returned
42       1 hosts are alive, running, active (good)
43       2 hosts are bad (bad)
44       Bad hosts have been added to /root/punchlist.csv
45       # cat /root/punchlist.csv
46       2015/10/04 11:33:22;phs1fnivd13u07n1 hfi1_0 p1 phs1swivd13u06  p16;Link
47       errors
48       2015/10/07 10:21:05;phs1swivd13u06;Switch not found in SA DB
49       2015/10/09 14:36:48;phs1fnivd13u07n4;Doesn't ping
50       2015/10/09 14:36:48;phs1fnivd13u07n3;No active port
51
52
53
54       For  a  given run, a line is generated for each failing host. Hosts are
55       reported exactly once for a given run. Therefore, a host that does  not
56       ping  is NOT listed as can't ssh nor No active port. There may be cases
57       where ports could be active for hosts that do not ping,  especially  if
58       Ethernet  host  names  are used for the ping test. However, the lack of
59       ping often implies there are other fundamental issues, such as PXE boot
60       or  inability  to  access  DNS  or  DHCP to get proper host name and IP
61       address. Therefore, reporting hosts that do not ping  is  typically  of
62       limited value.
63
64       Note  that opafindgood queries the SA for NodeDescriptions to determine
65       hosts with active ports. As such, ports may be active  for  hosts  that
66       cannot be accessed via SSH or pinged.
67
68       By  default,  opafindgood checks for and reports nodes that are quaran‐
69       tined for security reasons. To skip this, use the -Q option.
70

Syntax

72       opafindgood [-R|-A|-Q] [-d dir] [-f hostfile] [-h 'hosts']
73       [-t portsfile] [-p ports] [-T timelimit]
74

Options

76       --help    Produces full help text.
77
78       -R        Skips the running test (SSH).  Recommended  if  password-less
79                 SSH is not set up.
80
81       -A        Skips the active test. Recommended if Intel(R) Omni-Path Fab‐
82                 ric software or fabric is not up.
83
84       -Q        Skips the quarantine test. Recommended if Intel(R)  Omni-Path
85                 Fabric software or fabric is not up.
86
87       -d dir    Specifies  the  directory  in  which to create alive, active,
88                 running, good, and bad files. Default is /etc/opa directory.
89
90       -f hostfile
91                 Specifies  the  file  with  hosts  in  cluster.  Default   is
92                 /etc/opa/hosts directory.
93
94       -h hosts  Specifies the list of hosts to ping.
95
96       -t portsfile
97                 Specifies  the  file  with  list  of  local HFI ports used to
98                 access fabric(s)  for  analysis.  Default  is  /etc/opa/ports
99                 file.
100
101       -p ports  Specifies  the  list  of  local HFI ports used to access fab‐
102                 ric(s) for analysis.
103
104
105                 Default is first active port. The first HFI in the system  is
106                 1. The first port on an HFI is 1. Uses the format hfi:port,
107                 for example:
108
109
110
111                 0:0       First active port in system.
112
113
114
115
116
117                 0:y       Port y within system.
118
119
120
121
122
123                 x:0       First active port on HFI x.
124
125
126
127
128
129                 x:y       HFI x, port y.
130
131
132
133       -T timelimit
134                 Specifies  the  time  limit in seconds for host to respond to
135                 SSH. Default = 20 seconds.
136
137

Environment Variables

139       The following environment variables are also used by this command:
140
141       HOSTS     List of hosts, used if -h option not supplied.
142
143
144       HOSTS_FILE
145                 File containing list of hosts, used in absence of -f and -h.
146
147
148       PORTS     List of ports, used in absence of -t and -p.
149
150
151       PORTS_FILE
152                 File containing list of ports, used in absence of -t and -p.
153
154
155       FF_MAX_PARALLEL
156                 Maximum concurrent operations.
157
158

Examples

160       opafindgood
161       opafindgood -f allhosts
162       opafindgood -h 'arwen elrond'
163       HOSTS='arwen elrond' opafindgood
164       HOSTS_FILE=allhosts opafindgood
165       opafindgood -p '1:1 1:2 2:1 2:2'
166
167
168
169Copyright(C) 2015-2018         Intel Corporation                opafindgood(8)
Impressum