1condor_gpu_discovery(1) General Commands Manual condor_gpu_discovery(1)
2
3
4
6 condor_gpu_discoveryOutput GPU-related ClassAd attributes
7
9 condor_gpu_discovery-help
10
11 condor_gpu_discovery[<options>]
12
14 condor_gpu_discoveryoutputs ClassAd attributes corresponding to a
15 host's GPU capabilities. It can presently report CUDA and OpenCL
16 devices; which type(s) of device(s) it reports is determined by which
17 libraries, if any, it can find when it runs; this reflects what GPU
18 jobs will find on that host when they run. (Note that some HTCondor
19 configuration settings may cause the environment to differ between jobs
20 and the HTCondor daemons in ways that change library discovery.)
21
22 If CUDA_VISIBLE_DEVICESor GPU_DEVICE_ORDINALis set in the environment
23 when condor_gpu_discoveryis run, it will report only devices present in
24 the those lists.
25
26 This tool is not available for MAC OS platforms.
27
28 With no command line options, the single ClassAd attribute DetectedGPU‐
29 sis printed. If the value is 0, no GPUs were detected. If one or more
30 GPUS were detected, the value is a string, presented as a comma and
31 space separated list of the GPUs discovered, where each is given a name
32 further used as the prefix stringin other attribute names. Where there
33 is more than one GPU of a particular type, the prefix stringincludes an
34 integer value numbering the device; these integer values monotonically
35 increase from 0 (unless otherwise specified in the environment; see
36 above). For example, a discovery of two GPUs may output
37
38 DetectedGPUs="CUDA0, CUDA1"
39
40 Further command line options use "CUDA"either with or without one of
41 the integer values 0 or 1 as the prefix stringin attribute names.
42
44 -help
45
46 Print usage information and exit.
47
48
49
50 -properties
51
52 In addition to the DetectedGPUsattribute, display some of the
53 attributes of the GPUs. Each of these attributes will have a prefix
54 stringat the beginning of its name. The displayed CUDA attributes
55 are Capability, DeviceName, DriverVersion, ECCEnabled, GlobalMemo‐
56 ryMb, and RuntimeVersion. The displayed Open CL attributes are Devi‐
57 ceName, ECCEnabled, OpenCLVersion, and GlobalMemoryMb.
58
59
60
61 -extra
62
63 Display more attributes of the GPUs. Each of these attribute names
64 will have a prefix stringat the beginning of its name. The addi‐
65 tional CUDA attributes are ClockMhz, ComputeUnits, and CoresPerCU.
66 The additional Open CL attributes are ClockMhzand ComputeUnits.
67
68
69
70 -dynamic
71
72 Display attributes of NVIDIA devices that change values as the GPU
73 is working. Each of these attribute names will have a prefix
74 stringat the beginning of its name. These are FanSpeedPct, Board‐
75 TempC, DieTempC, EccErrorsSingleBit, and EccErrorsDoubleBit.
76
77
78
79 -mixed
80
81 When displaying attribute values, assume that the machine has a het‐
82 erogeneous set of GPUs, so always include the integer value in the
83 prefix string.
84
85
86
87 -device <N>
88
89 Display properties only for GPU device <N>, where <N>is the integer
90 value defined for the prefix string. This option may be specified
91 more than once; additional <N>are listed along with the first. This
92 option adds to the devices(s) specified by the environment variables
93 CUDA_VISIBLE_DEVICESand GPU_DEVICE_ORDINAL, if any.
94
95
96
97 -tag string
98
99 Set the resource tag portion of the intended machine ClassAd
100 attribute Detected<ResourceTag>to be string. If this option is not
101 specified, the resource tag is "GPUs", resulting in attribute name
102 DetectedGPUs.
103
104
105
106 -prefix str
107
108 When naming attributes, use stras the prefix string. When this
109 option is not specified, the prefix stringis either CUDAor OCL.
110
111
112
113 -simulate:D,N
114
115 For testing purposes, assume that N devices of type D were detected.
116 No discovery software is invoked. If D is 0, it refers to GeForce GT
117 330, and a default value for N is 1. If D is 1, it refers to GeForce
118 GTX 480, and a default value for N is 2.
119
120
121
122 -opencl
123
124 Prefer detection via OpenCL rather than CUDA. Without this option,
125 CUDA detection software is invoked first, and no further Open CL
126 software is invoked if CUDA devices are detected.
127
128
129
130 -cuda
131
132 Do only CUDA detection.
133
134
135
136 -nvcuda
137
138 For Windows platforms only, use a CUDA driver rather than the CUDA
139 run time.
140
141
142
143 -config
144
145 Output in the syntax of HTCondor configuration, instead of ClassAd
146 language. An additional attribute is produced NUM_DETECTED_GPUswhich
147 is set to the number of GPUs detected.
148
149
150
151 -cron
152
153 This option suppresses the DetectedGpusattribute so that the output
154 is suitable for use with condor_startdcron. Combine this option with
155 the -dynamicoption to periodically refresh the dynamic Gpu informa‐
156 tion such as temperature. For example, to refresh GPU temperatures
157 every 5 minutes
158
159 use FEATURE : StartdCronPeriodic(DYNGPUS, 5*60, $(LIBEXEC)/con‐
160 dor_gpu_discovery, -dynamic -cron)
161
162
163
164 -verbose
165
166 For interactive use of the tool, output extra information to show
167 detection while in progress.
168
169
170
171 -diagnostic
172
173 Show diagnostic information, to aid in tool development.
174
175
176
178 condor_gpu_discoverywill exit with a status value of 0 (zero) upon suc‐
179 cess, and it will exit with the value 1 (one) upon failure.
180
182 Center for High Throughput Computing, University of Wiscon‐
183 sin–Madison
184
186 Copyright © 1990-2019 Center for High Throughput Computing, Computer
187 Sciences Department, University of Wisconsin-Madison, Madison, WI. All
188 Rights Reserved. Licensed under the Apache License, Version 2.0.
189
190
191
192 date condor_gpu_discovery(1)