1condor_gpu_discovery(1)     General Commands Manual    condor_gpu_discovery(1)
2
3
4

Name

6       condor_gpu_discoveryOutput GPU-related ClassAd attributes
7

Synopsis

9       condor_gpu_discovery-help
10
11       condor_gpu_discovery[<options>]
12

Description

14       condor_gpu_discoveryoutputs   ClassAd  attributes  corresponding  to  a
15       host's GPU capabilities.  It  can  presently  report  CUDA  and  OpenCL
16       devices;  which  type(s) of device(s) it reports is determined by which
17       libraries, if any, it can find when it runs;  this  reflects  what  GPU
18       jobs  will  find  on  that host when they run. (Note that some HTCondor
19       configuration settings may cause the environment to differ between jobs
20       and the HTCondor daemons in ways that change library discovery.)
21
22       If  CUDA_VISIBLE_DEVICESor  GPU_DEVICE_ORDINALis set in the environment
23       when condor_gpu_discoveryis run, it will report only devices present in
24       the those lists.
25
26       This tool is not available for MAC OS platforms.
27
28       With no command line options, the single ClassAd attribute DetectedGPU‐
29       sis printed. If the value is 0, no GPUs were detected. If one  or  more
30       GPUS  were  detected,  the  value is a string, presented as a comma and
31       space separated list of the GPUs discovered, where each is given a name
32       further  used as the prefix stringin other attribute names. Where there
33       is more than one GPU of a particular type, the prefix stringincludes an
34       integer  value numbering the device; these integer values monotonically
35       increase from 0 (unless otherwise specified  in  the  environment;  see
36       above). For example, a discovery of two GPUs may output
37
38       DetectedGPUs="CUDA0, CUDA1"
39
40       Further  command  line  options use "CUDA"either with or without one of
41       the integer values 0 or 1 as the prefix stringin attribute names.
42

Options

44       -help
45
46          Print usage information and exit.
47
48
49
50       -properties
51
52          In addition  to  the  DetectedGPUsattribute,  display  some  of  the
53          attributes  of the GPUs. Each of these attributes will have a prefix
54          stringat the beginning of its name. The  displayed  CUDA  attributes
55          are  Capability,  DeviceName, DriverVersion, ECCEnabled, GlobalMemo‐
56          ryMb, and RuntimeVersion. The displayed Open CL attributes are Devi‐
57          ceName, ECCEnabled, OpenCLVersion, and GlobalMemoryMb.
58
59
60
61       -extra
62
63          Display  more  attributes of the GPUs. Each of these attribute names
64          will have a prefix stringat the beginning of  its  name.  The  addi‐
65          tional  CUDA  attributes are ClockMhz, ComputeUnits, and CoresPerCU.
66          The additional Open CL attributes are ClockMhzand ComputeUnits.
67
68
69
70       -dynamic
71
72          Display attributes of NVIDIA devices that change values as  the  GPU
73          is  working.  Each  of  these  attribute  names  will  have a prefix
74          stringat the beginning of its name. These  are  FanSpeedPct,  Board‐
75          TempC, DieTempC, EccErrorsSingleBit, and EccErrorsDoubleBit.
76
77
78
79       -mixed
80
81          When displaying attribute values, assume that the machine has a het‐
82          erogeneous set of GPUs, so always include the integer value  in  the
83          prefix string.
84
85
86
87       -device <N>
88
89          Display  properties only for GPU device <N>, where <N>is the integer
90          value defined for the prefix string. This option  may  be  specified
91          more  than once; additional <N>are listed along with the first. This
92          option adds to the devices(s) specified by the environment variables
93          CUDA_VISIBLE_DEVICESand GPU_DEVICE_ORDINAL, if any.
94
95
96
97       -tag string
98
99          Set  the  resource  tag  portion  of  the  intended  machine ClassAd
100          attribute Detected<ResourceTag>to be string. If this option  is  not
101          specified,  the  resource tag is "GPUs", resulting in attribute name
102          DetectedGPUs.
103
104
105
106       -prefix str
107
108          When naming attributes, use  stras  the  prefix  string.  When  this
109          option is not specified, the prefix stringis either CUDAor OCL.
110
111
112
113       -simulate:D,N
114
115          For testing purposes, assume that N devices of type D were detected.
116          No discovery software is invoked. If D is 0, it refers to GeForce GT
117          330, and a default value for N is 1. If D is 1, it refers to GeForce
118          GTX 480, and a default value for N is 2.
119
120
121
122       -opencl
123
124          Prefer detection via OpenCL rather than CUDA. Without  this  option,
125          CUDA  detection  software  is  invoked first, and no further Open CL
126          software is invoked if CUDA devices are detected.
127
128
129
130       -cuda
131
132          Do only CUDA detection.
133
134
135
136       -nvcuda
137
138          For Windows platforms only, use a CUDA driver rather than  the  CUDA
139          run time.
140
141
142
143       -config
144
145          Output  in  the syntax of HTCondor configuration, instead of ClassAd
146          language. An additional attribute is produced NUM_DETECTED_GPUswhich
147          is set to the number of GPUs detected.
148
149
150
151       -cron
152
153          This  option suppresses the DetectedGpusattribute so that the output
154          is suitable for use with condor_startdcron. Combine this option with
155          the  -dynamicoption to periodically refresh the dynamic Gpu informa‐
156          tion such as temperature. For example, to refresh  GPU  temperatures
157          every 5 minutes
158
159         use   FEATURE  :  StartdCronPeriodic(DYNGPUS,  5*60,  $(LIBEXEC)/con‐
160       dor_gpu_discovery, -dynamic -cron)
161
162
163
164       -verbose
165
166          For interactive use of the tool, output extra  information  to  show
167          detection while in progress.
168
169
170
171       -diagnostic
172
173          Show diagnostic information, to aid in tool development.
174
175
176

Exit Status

178       condor_gpu_discoverywill exit with a status value of 0 (zero) upon suc‐
179       cess, and it will exit with the value 1 (one) upon failure.
180

Author

182       Center for High Throughput Computing, University of Wisconsin-Madison
183
185       Copyright (C) 1990-2019 Center for High Throughput Computing,  Computer
186       Sciences  Department, University of Wisconsin-Madison, Madison, WI. All
187       Rights Reserved. Licensed under the Apache License, Version 2.0.
188
189
190
191                                     date              condor_gpu_discovery(1)
Impressum