1CONDOR_GPU_DISCOVERY(1)         HTCondor Manual        CONDOR_GPU_DISCOVERY(1)
2
3
4

NAME

6       condor_gpu_discovery - HTCondor Manual
7
8       Output GPU-related ClassAd attributes
9
10

SYNOPSIS

12       condor_gpu_discovery -help
13
14       condor_gpu_discovery [<options> ]
15

DESCRIPTION

17       condor_gpu_discovery  outputs  ClassAd  attributes  corresponding  to a
18       host's GPU capabilities.  It  can  presently  report  CUDA  and  OpenCL
19       devices;  which  type(s) of device(s) it reports is determined by which
20       libraries, if any, it can find when it runs;  this  reflects  what  GPU
21       jobs  will  find  on  that host when they run. (Note that some HTCondor
22       configuration settings may cause the environment to differ between jobs
23       and the HTCondor daemons in ways that change library discovery.)
24
25       If CUDA_VISIBLE_DEVICES or GPU_DEVICE_ORDINAL is set in the environment
26       when condor_gpu_discovery is run, it will report only  devices  present
27       in the those lists.
28
29       This tool is not available for MAC OS platforms.
30
31       With no command line options, the single ClassAd attribute DetectedGPUs
32       is printed. If the value is 0, no GPUs were detected.  If one  or  more
33       GPUS  were  detected,  the  value is a string, presented as a comma and
34       space separated list of the GPUs discovered, where each is given a name
35       further used as the prefix string in other attribute names. Where there
36       is more than one GPU of a particular type, the prefix  string  includes
37       an  integer  value numbering the device; these integer values monotoni‐
38       cally increase from 0 (unless otherwise specified in  the  environment;
39       see above). For example, a discovery of two GPUs may output
40
41          DetectedGPUs="CUDA0, CUDA1"
42
43       Further  command  line options use "CUDA" either with or without one of
44       the integer values 0 or 1 as the prefix string in attribute names.
45

OPTIONS

47          -help  Print usage information and exit.
48
49          -properties
50                 In addition to the DetectedGPUs attribute,  display  some  of
51                 the  attributes  of  the  GPUs. Each of these attributes will
52                 have a prefix string at the beginning of its name.  The  dis‐
53                 played CUDA attributes are Capability, DeviceName, DriverVer‐
54                 sion, ECCEnabled,  GlobalMemoryMb,  and  RuntimeVersion.  The
55                 displayed  Open  CL  attributes  are  DeviceName, ECCEnabled,
56                 OpenCLVersion, and GlobalMemoryMb.
57
58          -extra Display more attributes of the GPUs. Each of these  attribute
59                 names will have a prefix string at the beginning of its name.
60                 The additional CUDA attributes  are  ClockMhz,  ComputeUnits,
61                 and   CoresPerCU.  The  additional  Open  CL  attributes  are
62                 ClockMhz and ComputeUnits.
63
64          -dynamic
65                 Display attributes of NVIDIA devices that  change  values  as
66                 the GPU is working. Each of these attribute names will have a
67                 prefix string  at  the  beginning  of  its  name.  These  are
68                 FanSpeedPct,  BoardTempC,  DieTempC,  EccErrorsSingleBit, and
69                 EccErrorsDoubleBit.
70
71          -mixed When displaying attribute values, assume that the machine has
72                 a  heterogeneous  set  of GPUs, so always include the integer
73                 value in the prefix string.
74
75          -device <N>
76                 Display properties only for GPU device <N>, where <N> is  the
77                 integer  value defined for the prefix string. This option may
78                 be specified more than once; additional <N> are listed  along
79                 with  the first. This option adds to the devices(s) specified
80                 by  the  environment   variables   CUDA_VISIBLE_DEVICES   and
81                 GPU_DEVICE_ORDINAL, if any.
82
83          -tag string
84                 Set  the resource tag portion of the intended machine ClassAd
85                 attribute Detected<ResourceTag> to be string. If this  option
86                 is  not  specified,  the resource tag is "GPUs", resulting in
87                 attribute name DetectedGPUs.
88
89          -prefix str
90                 When naming attributes, use str as the  prefix  string.  When
91                 this  option  is  not  specified, the prefix string is either
92                 CUDA or OCL.
93
94          -simulate:D,N
95                 For testing purposes, assume that N devices of  type  D  were
96                 detected.   No  discovery  software is invoked. If D is 0, it
97                 refers to GeForce GT 330, and a default value for N is 1.  If
98                 D is 1, it refers to GeForce GTX 480, and a default value for
99                 N is 2.
100
101          -opencl
102                 Prefer detection via OpenCL rather than  CUDA.  Without  this
103                 option, CUDA detection software is invoked first, and no fur‐
104                 ther  Open  CL  software  is  invoked  if  CUDA  devices  are
105                 detected.
106
107          -cuda  Do only CUDA detection.
108
109          -nvcuda
110                 For Windows platforms only, use a CUDA driver rather than the
111                 CUDA run time.
112
113          -config
114                 Output in the syntax of HTCondor  configuration,  instead  of
115                 ClassAd   language.   An  additional  attribute  is  produced
116                 NUM_DETECTED_GPUs  which  is  set  to  the  number  of   GPUs
117                 detected.
118
119          -cron  This option suppresses the DetectedGpus attribute so that the
120                 output is suitable for use with condor_startd  cron.  Combine
121                 this  option with the -dynamic option to periodically refresh
122                 the dynamic Gpu information such as temperature. For example,
123                 to refresh GPU temperatures every 5 minutes
124
125                     use FEATURE : StartdCronPeriodic(DYNGPUS, 5*60, $(LIBEXEC)/condor_gpu_discovery, -dynamic -cron)
126
127          -verbose
128                 For  interactive use of the tool, output extra information to
129                 show detection while in progress.
130
131          -diagnostic
132                 Show diagnostic information, to aid in tool development.
133

EXIT STATUS

135       condor_gpu_discovery will exit with a status value  of  0  (zero)  upon
136       success, and it will exit with the value 1 (one) upon failure.
137

AUTHOR

139       HTCondor Team
140
142       1990-2020,  Center  for  High  Throughput  Computing, Computer Sciences
143       Department, University of Wisconsin-Madison, Madison, WI, US.  Licensed
144       under the Apache License, Version 2.0.
145
146
147
148
1498.8                              Aug 06, 2020          CONDOR_GPU_DISCOVERY(1)
Impressum