1UMR(1)                           User Manuals                           UMR(1)
2
3
4

NAME

6       umr - AMDGPU Userspace Register Debugger
7

DESCRIPTION

9       umr  is  a  tool to read and display, as well as write to AMDGPU device
10       MMIO, PCIE, SMC, and DIDT registers via userspace.  It  can  autodetect
11       and scan AMDGPU devices (SI and up).
12

Device Selection

14       --database-path, -dbp <path>
15              Specify a database path for register, ip, and asic model data.
16
17       --gpu, -g <asicname>(@<instance> | =<pcidevice>)
18              Select  a gpu by ASIC name and either the instance number or the
19              PCI bus identifier.  For instance,  "raven1@1"  would  pick  the
20              raven1   device  in  the  2nd  DRI  instance  slot.   Similarly,
21              "raven1=0000:06:00.0" would pick a raven1 device  with  the  PCI
22              bus address '0000:06:00.0'.
23
24       --instance, -i <number>
25              Pick  a  device instance to work with.  Defaults to the 0'th de‐
26              vice.  The instance refers to a directory under  /sys/kernel/de‐
27              bug/dri/ where 0 is the first card probed.
28
29       --force, -f <number>
30              Force  a PCIE Device ID in hex or by asic name.  This is used in
31              case the amdgpu driver is not yet loaded or a display is not yet
32              attached.   A  '.' prefix will specify a virtual device which is
33              handy for looking up register decodings for a device not present
34              in the system, for instance, '.vega10'.
35
36       --pci <device>
37              Force  a  specific PCI device using the domain:bus:slot.function
38              format in hex.  This is useful when more than one GPU is  avail‐
39              able.  If the amdgpu driver is loaded the corresponding instance
40              will be automatically detected.
41
42       --gfxoff, -go <0 | 1>
43              Turn on or off GFXOFF on select hardware.  A non-zero value  en‐
44              ables the GFXOFF feature and a zero value disables it.
45
46       --vm-partition, -vmp <-1, 0...n>
47              Select  a  VM  partition  for all GPUVM accesses.  Default is -1
48              which refers to the 0'th instance of the VM hub which is not the
49              same as specifying '0'.  Values above -1 are for ASICs with mul‐
50              tiple IP instances.
51
52       --vgpr-granularity, -vgpr <-1, 0...n>
53              Specify the VGPR size granularity as a power  of  2,  e.g.,  '2'
54              means 4 DWORDs per increment.
55
56       --option, -O <string>[,<string>,...]
57              Specify  options to the tool.  Multiple options can be specified
58              as comma separated strings.  Options should be specified  before
59              --update  or  --force  commands (among others) to enable options
60              specified.
61
62              quiet
63                   Disable various informative but not required (for function‐
64              ality) outputs.
65
66              read_smc
67                   Enable scanning of SMC registers.
68
69              bits
70                   enables displaying bitfields for scanned blocks.
71
72              bitsfull
73                   enables  displaying  bitfields  using their entire path for
74              scanned blocks.
75
76              empty_log
77                   Empties the MMIO log after reading it.
78
79              follow
80                   Causes the --logscan command to repeatedly  produce  output
81              without
82                   exiting.
83
84              no_follow_ib
85                   Instruct the --ring-stream command to not attempt to follow
86              IBs pointed to by the packets
87                   in the ring.
88
89              use_pci
90                   Enable PCI access for MMIO instead of using debugfs.   Used
91              by the --read,
92                   --scan, --top, --write, and --write-bit commands.  Does not
93              currently
94                   support multiple instances of the same GPU (PCI device ID).
95              Note that access
96                   to  non-MMIO  registers  might  be disabled when using this
97              flag.
98
99              use_colour
100                   Enable colour output for --top command, scales  from  blue,
101              green, yellow, to red.  Also
102                   accepted is 'use_color'.
103
104              no_kernel
105                   Disable  using  kernel files to access the device.  Implies
106              ''use_pci''.  This is meant to
107                   be used only if the KMD is hung or  otherwise  not  working
108              correctly.  Using it on live systems
109                   may result in race conditions.
110
111              verbose
112                   Enable verbose diagnostics (used in --vram).
113
114              halt_waves
115                   Halt/resume all waves while reading wave status.
116
117              disasm_early_term
118                   Terminate  shader  disassembly  when first s_endpgm is hit.
119              This is required for
120                   older UMDs (or non-mesa UMDs) that don't use the  quintuple
121              0xBF9F0000 to signal the true
122                   end of a shader.
123
124              no_disasm
125                   Disable  shader disassembler logic (still outputs text just
126              doesn't use LLVM to decode).  Useful
127                   if the linked llvm-dev doesn't support the  hardware  being
128              debugged.  Avoids segfualts/asserts.
129
130              disasm_anyways
131                   Enable  shader  disassembly  in  --waves  even if the rings
132              aren't halted.
133
134              wave64
135                   Enable full 64 wave disassembly
136
137              full_shader
138                   Enable full shader disassembly in --waves when '-O bits' is
139              used and the shader is found in
140                   a gfx or compute ring.
141
142              no_fold_vm_decode
143                  Disable  folding  of PDEs when VM decoding multiple pages of
144              memory.  By default,
145                  when subsequent pages are decoded  if  PDEs  match  previous
146              pages they are omitted to cut down
147                  on  the  verbosity of the output.  This option disables this
148              and will print the full chain of
149                  PDEs for every page decoded.
150
151              no_scan_waves
152                 Disable scanning wave data during --ring-stream output.
153
154              force_asic_file
155                 Force using a database .asic file matching in pci.did instead
156              of IP discovery.
157
158

Bank Selection

160       --bank, -b <se> <sh> <instance>
161              Select  a  GRBM  se/sh/instance bank in decimal.  Can use 'x' to
162              denote a broadcast selection.
163
164       --sbank, -sb <me> <pipe> <queue> [vmid]
165              Select a SRBM me/pipe/queue bank in decimal.  VMID  is  optional
166              (default: 0).
167
168       --cbank, -cb <context_reg_bank>
169              Select  a context register bank (value is multiplied by 0x1000).
170              Used for context registers in the range 0xA000..0xAFFF.
171

Device Information

173       --config, -c
174              Print out configuation data read from kernel driver.
175
176       --enumerate, -e
177              Enumerate all AMDGPU supported devices.
178
179       --list-blocks -lb
180              List all blocks attached to the asic that have been detected.
181
182       --list-regs, -lr <string>
183              List all registers in an IP block (can use  '-O  bits'  to  list
184              bitfields)
185
186

Register Access

188       --lookup, -lu <address_or_regname> <number>
189              Look  up  an  MMIO  register  by address and bitfield decode the
190              value specified (with 0x prefix) or by register name.  The  reg‐
191              ister name string must include the ipname, e.g., uvd6.mmUVD_CON‐
192              TEXT_ID.
193
194       --write -w <string> <number>
195              Write a value specified in hex to a register  specified  with  a
196              complete  register path in the form < asicname.ipname.regname >.
197              For example, fiji.uvd6.mmUVD_CGC_GATE.  The  value  of  asicname
198              and/or  ipname can be * to simplify scripting.  This command can
199              be used multiple times to write to multiple registers in a  sin‐
200              gle invocation.
201
202       --writebit -wb <string> <number>
203              Write  a value specified in hex to a register bitfield specified
204              with a complete register path as in the --write command.
205
206       --read, -r <string>
207              Read a value from a register specified by  a  register  path  to
208              stdout.   This  command uses the same syntax as the --write com‐
209              mand but also allows * for the regname field to read  an  entire
210              block.   Additionally, a * can be appended to a register name to
211              read any register that contains a partial match.  For  instance,
212              "*.vcn10.ADDR*"  would  read any register from the 'VCN10' block
213              which contains 'ADDR' in the name.
214
215       --scan, -s <string>
216              Scan and print an IP block by name, for example,  uvd6  or  car‐
217              rizo.uvd6.  Can be used multiple times in a single invocation.
218
219

Device Utilization

221       --top, -t
222              Summarize  GPU  utilization.  Can select a SE block with --bank.
223              Relevant options that apply are: use_colour and use_pci
224
225       --waves, -wa [ <ring_name> | <vmid>@<addr>.<size> ]
226              Print out information about any active CU waves.  Note  that  if
227              GFX  power  gating  is  enabled this command may result in a GPU
228              hang.  It's unlikely unless you're  invoking  it  very  rapidly.
229              Unlike the wave count reading in --top this command will operate
230              regardless of whether GFX PG is enabled or not.  Can use bits to
231              decode  the wave bitfields.  An optional ring name can be speci‐
232              fied (default: gfx) to search for pointers to active shaders  to
233              find  extra  debugging information.  Alternatively, an IB can be
234              specified by a vmid, address, and size (in hex bytes) triplet.
235
236       --profiler, -prof [pixel= | vertex= | compute=]<nsamples> [ring]
237              Capture 'nsamples' samples of wave data.  Optionally  specify  a
238              ring  to  use when searching for IBs that point to shaders.  De‐
239              faults to 'gfx'.  Additionally, the type of shader  can  be  se‐
240              lected for as well to only profile a given type of shader.
241
242

Virtual Memory Access

244       VMIDs are specified in umr as 16 bit numbers where the lower 8 bits in‐
245       dicate the hardware VMID and the upper 8 bits  indicate  the  which  VM
246       space to use.
247
248       0 - GFX hub
249
250       1 - MM hub
251
252       2 - VC0 hub
253
254       3 - VC1 hub
255
256
257       For instance, 0x107 would specify the 7'th VMID on the MM hub.
258
259
260
261       --vm-decode, -vm vmid@<address> <num_of_pages>
262              Decode  page  mappings  at a specified address (in hex) from the
263              VMID specified.  The VMID can be specified in hexadecimal  (with
264              leading '0x') or in decimal.  Implies '-O verbose' for the dura‐
265              tion of the command so does not require it to be manually speci‐
266              fied.
267
268
269       --vm-read, -vr [vmid@]<address> <size>
270              Read  'size' bytes (in hex) from the address specified (in hexa‐
271              decimal) from VRAM to stdout.  Optionally specify the  VMID  (in
272              decimal  or  in  hex with a 0x prefix) treating the address as a
273              virtual address instead.  Can use 'use_pci' to  directly  access
274              VRAM.
275
276
277       --vm-write, -vw [vmid@]<address> <size>
278              Write  'size'  bytes (in hex) to the address specified (in hexa‐
279              decimal) to VRAM from stdin.
280
281
282       --vm-write-word, -vww [vmid@]<address> <data>
283              Write a 32-bit word 'data' (in hex) to a given address (in  hex)
284              in host machine order.
285
286
287       --vm-disasm, -vdis [<vmid>@]<address> <size>
288              Disassemble 'size' bytes (in hex) from a given address (in hex).
289              The size can be specified as zero to have umr  try  and  compute
290              the shader size.
291
292

Ring and PM4 Decoding

294       --ring-stream, -RS <string>[range]
295              Read   the   contents   of   the   ring   named  by  the  string
296              amdgpu_ring_<string>, i.e. without the  amdgpu_ring  prefix.  By
297              default  it  reads  and  prints the entire ring.  A range is op‐
298              tional and has the format '[start:end]'. The starting and ending
299              address are non-negative integers or the '.' (dot) symbol, which
300              indicates the rptr when on the left side and wptr  when  on  the
301              right side of the range.  For instance, "-RS gfx" prints the en‐
302              tire gfx ring, "-R gfx[0:16]" prints the contents from 0  to  16
303              inclusively, and "-RS gfx[.]" or "-RS gfx[.:.]" prints the range
304              [rptr,wptr]. When one of the range limits is a number while  the
305              other  is  the  dot, '.', then the number indicates the relative
306              range before or after the corresponding ring  pointer.  For  in‐
307              stance,  "-RS  sdma0[16:.]"  prints [wptr-16, wptr] words of the
308              SDMA0 ring, and "-RS sdma1[.:32]" prints [rptr, rptr+32] double-
309              words  of the SDMA1 ring. The contents of the ring is always in‐
310              terpreted, if it can be interpreted.
311
312       --dump-ib, -di [vmid@]address length [pm]
313              Dump an IB packet at an address  with  an  optional  VMID.   The
314              length  is  specified in bytes.  The type of decoder <pm> is op‐
315              tional and defaults to PM4 packets.  Can specify  '3'  for  SDMA
316              packets, '2' for MES packets.
317
318       --dump-ib-file, -df filename [pm]
319              Dump  an  IB  stored in a file as a series of hexadecimal DWORDS
320              one per line.  Optionally supply a PM type, can specify '2'  for
321              MES  packets, '3' for SDMA IBs, or '4' for PM4 IBs.  The default
322              is PM4.
323
324       --header-dump, -hd [HEADER_DUMP_reg]
325              Dump the contents of the HEADER_DUMP buffer and decode  the  op‐
326              code into a human readable string.
327
328       --print-cpc, -cpc
329              Dump CPC register data.
330
331       --print-sdma, -sdma
332              Dump SDMA register data.
333
334       --logscan, -ls
335              Read  and  display  contents  of the MMIO register log.  Usually
336              specified with '-O bits,follow,empty_log'  to  enable  continual
337              dumping of the trace log.
338
339

Power and Clock

341       --power, -p
342              Read  the content of clocks, temperature, gpu loading at runtime
343              options 'use_colour' to colourize output.
344
345
346       --clock-scan -cs [clock]
347              Scan the current hierarchy value of each  clock.   Default  will
348              list all the hierarchy value of clocks.  otherwise will list the
349              corresponding clock, eg. sclk.
350
351
352       --clock-manual, -cm [clock] [value]
353              Set the value of the corresponding clock.  Use  -cs  command  to
354              check  hierarchy  values  of clock and then use -cm value to set
355              the clock.
356
357
358       --clock-high, -ch
359              Set power_dpm_force_performance_level to high.
360
361
362       --clock-low, -cl
363              Set power_dpm_force_performance_level to low.
364
365
366       --clock-auto, -ca
367              Set power_dpm_force_performance_level to auto.
368
369
370       --ppt-read, -pptr [ppt_field_name]
371              Read powerplay table value and print it to stdout.  This command
372              will  print  all  the  powerplay table information or the corre‐
373              sponding string in powerplay table.
374
375
376       --gpu-metrics, -gm
377              Print the GPU metrics table for the device.
378
379

Notes

381       - The "Waves" field in the DRM section of --top only works  if  GFX  PG
382       has  been disabled.  Otherwise, GPU hangs occur frequently.  When PG is
383       enabled it will read a constant 0.
384
385

Environmental Variables

387       UMR_LOGGER
388           Directory to output "umr.log" file when capturing samples with  the
389       --top command.
390
391       UMR_DATABASE_PATH
392           Should  be  set  to the top directory of the database tree used for
393       register, IP, and ASIC model data.
394
395

FILES

397       ${CMAKE_INSTALL_PREFIX}/share/bash-completion/completions/umr  contains
398       completion  for  bash  shells.  You'd normally source this file in your
399       ~/.bashrc.
400
401       ${CMAKE_INSTALL_PREFIX}/share/umr/database contains database files  for
402       ASICs,  IPs,  and registers.  UMR_DATABASE_PATH is usually set to point
403       to here.
404
405
406
407AMD (c) 2022                     February 2022                          UMR(1)
Impressum