1
2ethfindgood(8) EFSFFCLIRG (Man Page) ethfindgood(8)
3
4
5
7 ethfindgood
8
9
10
11 Checks for hosts that are able to be pinged, accessed via SSH, and ac‐
12 tive on the Intel(R) Ethernet Fabric. Produces a list of good hosts
13 meeting all criteria. Typically used to identify good hosts to undergo
14 further testing and benchmarking during initial cluster staging and
15 startup.
16
17 The resulting good file lists each good host exactly once and can be
18 used as input to create mpi_hosts files for running mpi_apps and the
19 NIC-SW cable test. The files alive, running, active, good, and bad are
20 created in the selected directory listing hosts passing each criteria.
21 If a plane name is provided, filename will be xxx_<plane>, e.g.
22 good_plane1
23
24 This command automatically generates the file FF_RESULT_DIR/punch‐
25 list.csv. This file provides a concise summary of the bad hosts found.
26 This can be imported into Excel directly as a *.csv file. Alterna‐
27 tively, it can be cut/pasted into Excel, and the Data/Text to Columns
28 toolbar can be used to separate the information into multiple columns
29 at the semicolons.
30
31 A sample generated output is:
32
33 # ethfindgood
34
35 3 hosts will be checked
36
37 2 hosts are pingable (alive)
38
39 2 hosts are ssh'able (running)
40
41 2 total hosts have RDMA active on one or more fabrics (active)
42
43 1 hosts are alive, running, active (good)
44
45 2 hosts are bad (bad)
46
47 Bad hosts have been added to /root/punchlist.csv
48
49 # cat /root/punchlist.csv
50
51 2015/10/09 14:36:48;phs1fnivd13u07n4;Doesn't ping
52
53 2015/10/09 14:36:48;phs1fnivd13u07n4;Can't ssh
54
55 2015/10/09 14:36:48;phs1fnivd13u07n3;No active RDMA port
56
57
58
59 For a given run, a line is generated for each failing host. Hosts are
60 reported exactly once for a given run. Therefore, a host that does not
61 ping is NOT listed as can't ssh nor No active RDMA port. There may be
62 cases where ports could be active for hosts that do not ping. However,
63 the lack of ping often implies there are other fundamental issues, such
64 as PXE boot or inability to access DNS or DHCP to get proper host name
65 and IP address. Therefore, reporting hosts that do not ping is typi‐
66 cally of limited value.
67
69 ethfindgood [-R|-A] [-d dir] [-p plane] [-f hostfile] [-h 'hosts']
70 [-T timelimit]
71
73 --help
74
75 Produces full help text.
76
77 -R
78
79 Skips the running test (SSH). Recommended if password-less
80 SSH is not set up.
81
82 -A
83
84 Skips the active test. Recommended if Intel(R) Ethernet Fab‐
85 ric Suite software or fabric is not up.
86
87 -p plane
88
89 Specifies the name of the plane to use.
90
91 -d dir
92
93 Specifies the directory in which to create alive, active,
94 running, good, and bad files. Default is /etc/eth-tools di‐
95 rectory.
96
97 -f hostfile
98
99 Specifies the file with hosts in cluster. Default is
100 /etc/eth-tools/hosts directory.
101
102 -h hosts
103
104 Specifies the list of hosts to ping.
105
106 -T timelimit
107
108 Specifies the time limit in seconds for host to respond to
109 SSH. Default is 20 seconds.
110
111
113 The following environment variables are also used by this command:
114
115 HOSTS
116
117 List of hosts, used if -h option not supplied.
118
119
120 HOSTS_FILE
121
122 File containing list of hosts, used in absence of -f and -h.
123
124
125 FF_MAX_PARALLEL
126
127 Maximum concurrent operations.
128
129
131 ethfindgood
132
133 ethfindgood -f allhosts
134
135 ethfindgood -h 'arwen elrond'
136
137 HOSTS='arwen elrond' ethfindgood
138
139 HOSTS_FILE=allhosts ethfindgood
140
141
142
143Copyright(C) 2020-2022 Intel Corporation ethfindgood(8)