llvm-exegesis-10(1)

1LLVM-EXEGESIS(1)                     LLVM                     LLVM-EXEGESIS(1)
2
3
4

NAME

6       llvm-exegesis - LLVM Machine Instruction Benchmark
7

SYNOPSIS

9       llvm-exegesis [options]
10

DESCRIPTION

12       llvm-exegesis is a benchmarking tool that uses information available in
13       LLVM to measure host machine instruction characteristics like  latency,
14       throughput, or port decomposition.
15
16       Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17       ates a code snippet that makes execution as serial (resp. as  parallel)
18       as  possible so that we can measure the latency (resp. inverse through‐
19       put/uop decomposition) of the instruction.  The code snippet is  jitted
20       and  executed on the host subtarget. The time taken (resp. resource us‐
21       age) is measured using hardware performance  counters.  The  result  is
22       printed out as YAML to the standard output.
23
24       The  main goal of this tool is to automatically (in)validate the LLVM's
25       TableDef scheduling models. To that end, we also  provide  analysis  of
26       the results.
27
28       llvm-exegesis can also benchmark arbitrary user-provided code snippets.
29

EXAMPLE 1: BENCHMARKING INSTRUCTIONS

31       Assume  you  have an X86-64 machine. To measure the latency of a single
32       instruction, run:
33
34          $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
35
36       Measuring the uop decomposition or inverse throughput of an instruction
37       works similarly:
38
39          $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
40          $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
41
42       The  output  is a YAML document (the default is to write to stdout, but
43       you can redirect the output to a file using -benchmarks-file):
44
45          ---
46          key:
47            opcode_name:     ADD64rr
48            mode:            latency
49            config:          ''
50          cpu_name:        haswell
51          llvm_triple:     x86_64-unknown-linux-gnu
52          num_repetitions: 10000
53          measurements:
54            - { key: latency, value: 1.0058, debug_string: '' }
55          error:           ''
56          info:            'explicit self cycles, selecting one aliasing configuration.
57          Snippet:
58          ADD64rr R8, R8, R10
59          '
60          ...
61
62       To measure the latency of all instructions for the  host  architecture,
63       run:
64
65          #!/bin/bash
66          readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
67          for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
68          do
69            ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
70          done
71
72       FIXME: Provide an llvm-exegesis option to test all instructions.
73

EXAMPLE 2: BENCHMARKING A CUSTOM CODE SNIPPET

75       To  measure the latency/uops of a custom piece of code, you can specify
76       the snippets-file option (- reads from standard input).
77
78          $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
79
80       Real-life code  snippets  typically  depend  on  registers  or  memory.
81       llvm-exegesis checks the liveliness of registers (i.e. any register use
82       has a corresponding def or is a "live in"). If your code depends on the
83       value of some registers, you have two options:
84
85       • Mark the register as requiring a definition. llvm-exegesis will auto‐
86         matically assign a value to the register. This can be done using  the
87         directive   LLVM-EXEGESIS-DEFREG   <reg   name>   <hex_value>,  where
88         <hex_value> is a bit pattern used to fill <reg_name>. If  <hex_value>
89         is smaller than the register width, it will be sign-extended.
90
91       • Mark  the register as a "live in". llvm-exegesis will benchmark using
92         whatever value was in this registers on entry. This can be done using
93         the directive LLVM-EXEGESIS-LIVEIN <reg name>.
94
95       For  example,  the following code snippet depends on the values of XMM1
96       (which will be set by the tool) and the memory  buffer  passed  in  RDI
97       (live in).
98
99          # LLVM-EXEGESIS-LIVEIN RDI
100          # LLVM-EXEGESIS-DEFREG XMM1 42
101          vmulps        (%rdi), %xmm1, %xmm2
102          vhaddps       %xmm2, %xmm2, %xmm3
103          addq $0x10, %rdi
104

EXAMPLE 3: ANALYSIS

106       Assuming  you have a set of benchmarked instructions (either latency or
107       uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
108       using the following command:
109
110            $ llvm-exegesis -mode=analysis \
111          -benchmarks-file=/tmp/benchmarks.yaml \
112          -analysis-clusters-output-file=/tmp/clusters.csv \
113          -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
114
115       This  will  group  the instructions into clusters with the same perfor‐
116       mance characteristics. The clusters will be written out  to  /tmp/clus‐
117       ters.csv in the following format:
118
119          cluster_id,opcode_name,config,sched_class
120          ...
121          2,ADD32ri8_DB,,WriteALU,1.00
122          2,ADD32ri_DB,,WriteALU,1.01
123          2,ADD32rr,,WriteALU,1.01
124          2,ADD32rr_DB,,WriteALU,1.00
125          2,ADD32rr_REV,,WriteALU,1.00
126          2,ADD64i32,,WriteALU,1.01
127          2,ADD64ri32,,WriteALU,1.01
128          2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
129          2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
130          2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
131          2,ADD64ri8,,WriteALU,1.00
132          2,SETBr,,WriteSETCC,1.01
133          ...
134
135       llvm-exegesis  will also analyze the clusters to point out inconsisten‐
136       cies in the scheduling information. The output is an html file. For ex‐
137       ample, /tmp/inconsistencies.html will contain messages like the follow‐
138       ing : [image]
139
140       Note that the  scheduling  class  names  will  be  resolved  only  when
141       llvm-exegesis is compiled in debug mode, else only the class id will be
142       shown. This does not invalidate any of the analysis results though.
143

OPTIONS

145       -help  Print a summary of command line options.
146
147       -opcode-index=<LLVM opcode index>
148              Specify the opcode to measure, by index. See example 1  for  de‐
149              tails.   Either  opcode-index, opcode-name or snippets-file must
150              be set.
151
152       -opcode-name=<opcode name 1>,<opcode name 2>,...
153              Specify the opcode to measure, by name. Several opcodes  can  be
154              specified  as a comma-separated list. See example 1 for details.
155              Either opcode-index, opcode-name or snippets-file must be set.
156
157              -snippets-file=<filename>
158                     Specify the custom code snippet to measure. See example 2
159                     for  details.   Either opcode-index, opcode-name or snip‐
160                     pets-file must be set.
161
162       -mode=[latency|uops|inverse_throughput|analysis]
163              Specify the run mode. Note that if you pick analysis  mode,  you
164              also need to specify at least one of the -analysis-clusters-out‐
165              put-file= and -analysis-inconsistencies-output-file=.
166
167       -num-repetitions=<Number of repetitions>
168              Specify the number of repetitions of the  asm  snippet.   Higher
169              values  lead  to  more  accurate  measurements  but lengthen the
170              benchmark.
171
172       -max-configs-per-opcode=<value>
173              Specify the maximum configurations that  can  be  generated  for
174              each  opcode.  By default this is 1, meaning that we assume that
175              a single measurement is enough to characterize an  opcode.  This
176              might  not be true of all instructions: for example, the perfor‐
177              mance characteristics of the LEA instruction on X86  depends  on
178              the  value of assigned registers and immediates. Setting a value
179              of -max-configs-per-opcode larger than 1 allows llvm-exegesis to
180              explore  more configurations to discover if some register or im‐
181              mediate assignments lead to different  performance  characteris‐
182              tics.
183
184       -benchmarks-file=</path/to/file>
185              File   to   read  (analysis  mode)  or  write  (latency/uops/in‐
186              verse_throughput modes) benchmark results. "-"  uses  stdin/std‐
187              out.
188
189       -analysis-clusters-output-file=</path/to/file>
190              If  provided,  write  the analysis clusters as CSV to this file.
191              "-" prints to stdout. By default, this analysis is not run.
192
193       -analysis-inconsistencies-output-file=</path/to/file>
194              If non-empty, write inconsistencies  found  during  analysis  to
195              this  file. - prints to stdout. By default, this analysis is not
196              run.
197
198       -analysis-clustering=[dbscan,naive]
199              Specify the clustering algorithm to use. By default DBSCAN  will
200              be used.  Naive clustering algorithm is better for doing further
201              work on the  -analysis-inconsistencies-output-file=  output,  it
202              will  create  one cluster per opcode, and check that the cluster
203              is stable (all points are neighbours).
204
205       -analysis-numpoints=<dbscan numPoints parameter>
206              Specify the numPoints parameters to be used for DBSCAN  cluster‐
207              ing (analysis mode, DBSCAN only).
208
209       -analysis-clustering-epsilon=<dbscan epsilon parameter>
210              Specify  the  epsilon parameter used for clustering of benchmark
211              points (analysis mode).
212
213       -analysis-inconsistency-epsilon=<epsilon>
214              Specify the epsilon parameter used for  detection  of  when  the
215              cluster  is  different  from  the  LLVM  schedule profile values
216              (analysis mode).
217
218       -analysis-display-unstable-clusters
219              If there is more than one benchmark for an opcode,  said  bench‐
220              marks  may  end  up not being clustered into the same cluster if
221              the measured performance characteristics are different.  by  de‐
222              fault all such opcodes are filtered out.  This flag will instead
223              show only such unstable opcodes.
224
225       -ignore-invalid-sched-class=false
226              If set, ignore instructions that  do  not  have  a  sched  class
227              (class idx = 0).
228
229       -mcpu=<cpu name>
230              If  set,  measure the cpu characteristics using the counters for
231              this CPU. This is useful when creating  new  sched  models  (the
232              host CPU is unknown to LLVM).
233
234       --dump-object-to-disk=true
235              By default, llvm-exegesis will dump the generated code to a tem‐
236              porary file to enable code inspection. You  may  disable  it  to
237              speed up the execution and save disk space.
238

EXIT STATUS

240       llvm-exegesis  returns  0  on  success.  Otherwise, an error message is
241       printed to standard error, and the tool returns a non 0 value.
242

AUTHOR

244       Maintained by the LLVM Team (https://llvm.org/).
245

COPYRIGHT

247       2003-2021, LLVM Project
248
249
250
251
25210                                2021-07-22                  LLVM-EXEGESIS(1)