1CONDOR_GPU_DISCOVERY(1) HTCondor Manual CONDOR_GPU_DISCOVERY(1)
2
3
4
6 condor_gpu_discovery - HTCondor Manual
7
8 Output GPU-related ClassAd attributes
9
10
12 condor_gpu_discovery -help
13
14 condor_gpu_discovery [<options> ]
15
17 condor_gpu_discovery outputs ClassAd attributes corresponding to a
18 host's GPU capabilities. It can presently report CUDA and OpenCL de‐
19 vices; which type(s) of device(s) it reports is determined by which li‐
20 braries, if any, it can find when it runs; this reflects what GPU jobs
21 will find on that host when they run. (Note that some HTCondor configu‐
22 ration settings may cause the environment to differ between jobs and
23 the HTCondor daemons in ways that change library discovery.)
24
25 If CUDA_VISIBLE_DEVICES or GPU_DEVICE_ORDINAL is set in the environment
26 when condor_gpu_discovery is run, it will report only devices present
27 in the those lists.
28
29 This tool is not available for MAC OS platforms.
30
31 With no command line options, the single ClassAd attribute DetectedGPUs
32 is printed. If the value is 0, no GPUs were detected. If one or more
33 GPUS were detected, the value is a string, presented as a comma and
34 space separated list of the GPUs discovered, where each is given a name
35 further used as the prefix string in other attribute names. Where there
36 is more than one GPU of a particular type, the prefix string includes
37 an integer value numbering the device; these integer values monotoni‐
38 cally increase from 0 (unless otherwise specified in the environment;
39 see above). For example, a discovery of two GPUs may output
40
41 DetectedGPUs="CUDA0, CUDA1"
42
43 Further command line options use "CUDA" either with or without one of
44 the integer values 0 or 1 as the prefix string in attribute names.
45
47 -help Print usage information and exit.
48
49 -properties
50 In addition to the DetectedGPUs attribute, display some of
51 the attributes of the GPUs. Each of these attributes will
52 have a prefix string at the beginning of its name. The dis‐
53 played CUDA attributes are Capability, DeviceName, DriverVer‐
54 sion, ECCEnabled, GlobalMemoryMb, and RuntimeVersion. The
55 displayed Open CL attributes are DeviceName, ECCEnabled,
56 OpenCLVersion, and GlobalMemoryMb.
57
58 -extra Display more attributes of the GPUs. Each of these attribute
59 names will have a prefix string at the beginning of its name.
60 The additional CUDA attributes are ClockMhz, ComputeUnits,
61 and CoresPerCU. The additional Open CL attributes are
62 ClockMhz and ComputeUnits.
63
64 -dynamic
65 Display attributes of NVIDIA devices that change values as
66 the GPU is working. Each of these attribute names will have a
67 prefix string at the beginning of its name. These are
68 FanSpeedPct, BoardTempC, DieTempC, EccErrorsSingleBit, and
69 EccErrorsDoubleBit.
70
71 -mixed When displaying attribute values, assume that the machine has
72 a heterogeneous set of GPUs, so always include the integer
73 value in the prefix string.
74
75 -device <N>
76 Display properties only for GPU device <N>, where <N> is the
77 integer value defined for the prefix string. This option may
78 be specified more than once; additional <N> are listed along
79 with the first. This option adds to the devices(s) specified
80 by the environment variables CUDA_VISIBLE_DEVICES and GPU_DE‐
81 VICE_ORDINAL, if any.
82
83 -tag string
84 Set the resource tag portion of the intended machine ClassAd
85 attribute Detected<ResourceTag> to be string. If this option
86 is not specified, the resource tag is "GPUs", resulting in
87 attribute name DetectedGPUs.
88
89 -prefix str
90 When naming attributes, use str as the prefix string. When
91 this option is not specified, the prefix string is either
92 CUDA or OCL.
93
94 -simulate:D,N
95 For testing purposes, assume that N devices of type D were
96 detected. No discovery software is invoked. If D is 0, it
97 refers to GeForce GT 330, and a default value for N is 1. If
98 D is 1, it refers to GeForce GTX 480, and a default value for
99 N is 2.
100
101 -opencl
102 Prefer detection via OpenCL rather than CUDA. Without this
103 option, CUDA detection software is invoked first, and no fur‐
104 ther Open CL software is invoked if CUDA devices are de‐
105 tected.
106
107 -cuda Do only CUDA detection.
108
109 -nvcuda
110 For Windows platforms only, use a CUDA driver rather than the
111 CUDA run time.
112
113 -config
114 Output in the syntax of HTCondor configuration, instead of
115 ClassAd language. An additional attribute is produced NUM_DE‐
116 TECTED_GPUs which is set to the number of GPUs detected.
117
118 -cron This option suppresses the DetectedGpus attribute so that the
119 output is suitable for use with condor_startd cron. Combine
120 this option with the -dynamic option to periodically refresh
121 the dynamic Gpu information such as temperature. For example,
122 to refresh GPU temperatures every 5 minutes
123
124 use FEATURE : StartdCronPeriodic(DYNGPUS, 5*60, $(LIBEXEC)/condor_gpu_discovery, -dynamic -cron)
125
126 -verbose
127 For interactive use of the tool, output extra information to
128 show detection while in progress.
129
130 -diagnostic
131 Show diagnostic information, to aid in tool development.
132
134 condor_gpu_discovery will exit with a status value of 0 (zero) upon
135 success, and it will exit with the value 1 (one) upon failure.
136
138 HTCondor Team
139
141 1990-2021, Center for High Throughput Computing, Computer Sciences De‐
142 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
143 under the Apache License, Version 2.0.
144
145
146
147
1488.8 Aug 23, 2021 CONDOR_GPU_DISCOVERY(1)