1LLVM-EXEGESIS(1)                     LLVM                     LLVM-EXEGESIS(1)
2
3
4

NAME

6       llvm-exegesis - LLVM Machine Instruction Benchmark
7

SYNOPSIS

9       llvm-exegesis [options]
10

DESCRIPTION

12       llvm-exegesis is a benchmarking tool that uses information available in
13       LLVM to measure host machine instruction characteristics like  latency,
14       throughput, or port decomposition.
15
16       Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17       ates a code snippet that makes execution as serial (resp. as  parallel)
18       as  possible so that we can measure the latency (resp. inverse through‐
19       put/uop decomposition) of the instruction.  The code snippet is  jitted
20       and  executed on the host subtarget. The time taken (resp. resource us‐
21       age) is measured using hardware performance  counters.  The  result  is
22       printed out as YAML to the standard output.
23
24       The  main goal of this tool is to automatically (in)validate the LLVM's
25       TableDef scheduling models. To that end, we also  provide  analysis  of
26       the results.
27
28       llvm-exegesis can also benchmark arbitrary user-provided code snippets.
29

EXAMPLE 1: BENCHMARKING INSTRUCTIONS

31       Assume  you  have an X86-64 machine. To measure the latency of a single
32       instruction, run:
33
34          $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
35
36       Measuring the uop decomposition or inverse throughput of an instruction
37       works similarly:
38
39          $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
40          $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
41
42       The  output  is a YAML document (the default is to write to stdout, but
43       you can redirect the output to a file using -benchmarks-file):
44
45          ---
46          key:
47            opcode_name:     ADD64rr
48            mode:            latency
49            config:          ''
50          cpu_name:        haswell
51          llvm_triple:     x86_64-unknown-linux-gnu
52          num_repetitions: 10000
53          measurements:
54            - { key: latency, value: 1.0058, debug_string: '' }
55          error:           ''
56          info:            'explicit self cycles, selecting one aliasing configuration.
57          Snippet:
58          ADD64rr R8, R8, R10
59          '
60          ...
61
62       To measure the latency of all instructions for the  host  architecture,
63       run:
64
65          #!/bin/bash
66          readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
67          for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
68          do
69            ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
70          done
71
72       FIXME: Provide an llvm-exegesis option to test all instructions.
73

EXAMPLE 2: BENCHMARKING A CUSTOM CODE SNIPPET

75       To  measure the latency/uops of a custom piece of code, you can specify
76       the snippets-file option (- reads from standard input).
77
78          $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
79
80       Real-life code  snippets  typically  depend  on  registers  or  memory.
81       llvm-exegesis checks the liveliness of registers (i.e. any register use
82       has a corresponding def or is a "live in"). If your code depends on the
83       value of some registers, you have two options:
84
85       • Mark the register as requiring a definition. llvm-exegesis will auto‐
86         matically assign a value to the register. This can be done using  the
87         directive   LLVM-EXEGESIS-DEFREG   <reg   name>   <hex_value>,  where
88         <hex_value> is a bit pattern used to fill <reg_name>. If  <hex_value>
89         is smaller than the register width, it will be sign-extended.
90
91       • Mark  the register as a "live in". llvm-exegesis will benchmark using
92         whatever value was in this registers on entry. This can be done using
93         the directive LLVM-EXEGESIS-LIVEIN <reg name>.
94
95       For  example,  the following code snippet depends on the values of XMM1
96       (which will be set by the tool) and the memory  buffer  passed  in  RDI
97       (live in).
98
99          # LLVM-EXEGESIS-LIVEIN RDI
100          # LLVM-EXEGESIS-DEFREG XMM1 42
101          vmulps        (%rdi), %xmm1, %xmm2
102          vhaddps       %xmm2, %xmm2, %xmm3
103          addq $0x10, %rdi
104

EXAMPLE 3: ANALYSIS

106       Assuming  you have a set of benchmarked instructions (either latency or
107       uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
108       using the following command:
109
110            $ llvm-exegesis -mode=analysis \
111          -benchmarks-file=/tmp/benchmarks.yaml \
112          -analysis-clusters-output-file=/tmp/clusters.csv \
113          -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
114
115       This  will  group  the instructions into clusters with the same perfor‐
116       mance characteristics. The clusters will be written out  to  /tmp/clus‐
117       ters.csv in the following format:
118
119          cluster_id,opcode_name,config,sched_class
120          ...
121          2,ADD32ri8_DB,,WriteALU,1.00
122          2,ADD32ri_DB,,WriteALU,1.01
123          2,ADD32rr,,WriteALU,1.01
124          2,ADD32rr_DB,,WriteALU,1.00
125          2,ADD32rr_REV,,WriteALU,1.00
126          2,ADD64i32,,WriteALU,1.01
127          2,ADD64ri32,,WriteALU,1.01
128          2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
129          2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
130          2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
131          2,ADD64ri8,,WriteALU,1.00
132          2,SETBr,,WriteSETCC,1.01
133          ...
134
135       llvm-exegesis  will also analyze the clusters to point out inconsisten‐
136       cies in the scheduling information. The output is an html file. For ex‐
137       ample, /tmp/inconsistencies.html will contain messages like the follow‐
138       ing : [image]
139
140       Note that the  scheduling  class  names  will  be  resolved  only  when
141       llvm-exegesis is compiled in debug mode, else only the class id will be
142       shown. This does not invalidate any of the analysis results though.
143

OPTIONS

145       -help  Print a summary of command line options.
146
147       -opcode-index=<LLVM opcode index>
148              Specify the opcode to measure, by index. Specifying -1 will  re‐
149              sult  in  measuring every existing opcode. See example 1 for de‐
150              tails.  Either opcode-index, opcode-name or  snippets-file  must
151              be set.
152
153       -opcode-name=<opcode name 1>,<opcode name 2>,...
154              Specify  the  opcode to measure, by name. Several opcodes can be
155              specified as a comma-separated list. See example 1 for  details.
156              Either opcode-index, opcode-name or snippets-file must be set.
157
158       -snippets-file=<filename>
159              Specify  the  custom  code snippet to measure. See example 2 for
160              details.  Either opcode-index, opcode-name or snippets-file must
161              be set.
162
163       -mode=[latency|uops|inverse_throughput|analysis]
164              Specify  the  run mode. Note that if you pick analysis mode, you
165              also need to specify at least one of the -analysis-clusters-out‐
166              put-file= and -analysis-inconsistencies-output-file=.
167
168       -repetition-mode=[duplicate|loop|min]
169              Specify  the  repetition  mode.  duplicate  will create a large,
170              straight line basic block with  num-repetitions  copies  of  the
171              snippet.  loop will wrap the snippet in a loop which will be run
172              num-repetitions times. The loop mode tends to  better  hide  the
173              effects  of the CPU frontend on architectures that cache decoded
174              instructions, but consumes a register for  counting  iterations.
175              If  performing  an analysis over many opcodes, it may be best to
176              instead use the min mode, which will run each  other  mode,  and
177              produce the minimal measured result.
178
179       -num-repetitions=<Number of repetitions>
180              Specify  the  number  of repetitions of the asm snippet.  Higher
181              values lead to  more  accurate  measurements  but  lengthen  the
182              benchmark.
183
184       -max-configs-per-opcode=<value>
185              Specify  the  maximum  configurations  that can be generated for
186              each opcode.  By default this is 1, meaning that we assume  that
187              a  single  measurement is enough to characterize an opcode. This
188              might not be true of all instructions: for example, the  perfor‐
189              mance  characteristics  of the LEA instruction on X86 depends on
190              the value of assigned registers and immediates. Setting a  value
191              of -max-configs-per-opcode larger than 1 allows llvm-exegesis to
192              explore more configurations to discover if some register or  im‐
193              mediate  assignments  lead to different performance characteris‐
194              tics.
195
196       -benchmarks-file=</path/to/file>
197              File  to  read  (analysis  mode)  or   write   (latency/uops/in‐
198              verse_throughput  modes)  benchmark results. "-" uses stdin/std‐
199              out.
200
201       -analysis-clusters-output-file=</path/to/file>
202              If provided, write the analysis clusters as CSV  to  this  file.
203              "-" prints to stdout. By default, this analysis is not run.
204
205       -analysis-inconsistencies-output-file=</path/to/file>
206              If  non-empty,  write  inconsistencies  found during analysis to
207              this file. - prints to stdout. By default, this analysis is  not
208              run.
209
210       -analysis-clustering=[dbscan,naive]
211              Specify  the clustering algorithm to use. By default DBSCAN will
212              be used.  Naive clustering algorithm is better for doing further
213              work  on  the  -analysis-inconsistencies-output-file= output, it
214              will create one cluster per opcode, and check that  the  cluster
215              is stable (all points are neighbours).
216
217       -analysis-numpoints=<dbscan numPoints parameter>
218              Specify  the numPoints parameters to be used for DBSCAN cluster‐
219              ing (analysis mode, DBSCAN only).
220
221       -analysis-clustering-epsilon=<dbscan epsilon parameter>
222              Specify the epsilon parameter used for clustering  of  benchmark
223              points (analysis mode).
224
225       -analysis-inconsistency-epsilon=<epsilon>
226              Specify  the  epsilon  parameter  used for detection of when the
227              cluster is different  from  the  LLVM  schedule  profile  values
228              (analysis mode).
229
230       -analysis-display-unstable-clusters
231              If  there  is more than one benchmark for an opcode, said bench‐
232              marks may end up not being clustered into the  same  cluster  if
233              the  measured  performance characteristics are different. by de‐
234              fault all such opcodes are filtered out.  This flag will instead
235              show only such unstable opcodes.
236
237       -ignore-invalid-sched-class=false
238              If  set,  ignore  instructions  that  do  not have a sched class
239              (class idx = 0).
240
241       -mcpu=<cpu name>
242              If set, measure the cpu characteristics using the  counters  for
243              this  CPU.  This  is  useful when creating new sched models (the
244              host CPU is unknown to LLVM).
245
246       --dump-object-to-disk=true
247              By default, llvm-exegesis will dump the generated code to a tem‐
248              porary  file  to  enable  code inspection. You may disable it to
249              speed up the execution and save disk space.
250

EXIT STATUS

252       llvm-exegesis returns 0 on success.  Otherwise,  an  error  message  is
253       printed to standard error, and the tool returns a non 0 value.
254

AUTHOR

256       Maintained by the LLVM Team (https://llvm.org/).
257
259       2003-2021, LLVM Project
260
261
262
263
26411                                2021-07-22                  LLVM-EXEGESIS(1)
Impressum