1pbs_(gpureset) Local pbs_(gpureset)
2
3
4
6 pbs_ gpureset - reset GPU error counts
7
9 #include <pbs_error.h>
10 #include <pbs_ifl.h>
11
12 int pbs_ gpureset(int connect, char *mom_node, int gpu_id, int ecc_perm, int ecc_vol)
13
15 Issue a batch request for the pbs_mom to reset the ECC counts on one of
16 it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Con‐
17 trol batch request to the batch server.
18
19 The argument, mom_node, specifies the host within the cluster on which
20 the GPU is located. The argument is the name of a host that is a member
21 of the cluster of hosts managed by the server.
22
23 The argument, gpu_id, specifies ID of the GPU on the MOM node.
24
25 The argument, ecc_perm, specifies whether or not to reset the GPU's
26 permanent ECC error count. Value of 1 resets, value of 0 does not.
27
28 The argument, ecc_vol, specifies whether or not to reset the GPU's
29 volatile ECC error count. Value of 1 resets, value of 0 does not.
30
31 This call requires PBS Operator or Manager privilege. It also requires
32 that Torque be configured with --enable-nvidia-gpu.
33
35 qgpureset(1B)
36
38 When the batch request generated by the pbs_ gpureset() function has
39 been completed successfully by a batch server, the routine will return
40 0 (zero). Otherwise, a non zero error is returned. The error number
41 is also set in pbs_errno.
42
43
44
45
46
47 3B pbs_(gpureset)