llvm-exegesis(1)

1LLVM-EXEGESIS(1)                     LLVM                     LLVM-EXEGESIS(1)
2
3
4

NAME

6       llvm-exegesis - LLVM Machine Instruction Benchmark
7

SYNOPSIS

9       llvm-exegesis [options]
10

DESCRIPTION

12       llvm-exegesis is a benchmarking tool that uses information available in
13       LLVM to measure host machine instruction characteristics  like  latency
14       or port decomposition.
15
16       Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17       ates a code snippet that makes execution as serial (resp. as  parallel)
18       as  possible  so  that we can measure the latency (resp. uop decomposi‐
19       tion) of the instruction.  The code snippet is jitted and  executed  on
20       the  host  subtarget. The time taken (resp. resource usage) is measured
21       using hardware performance counters. The result is printed out as  YAML
22       to the standard output.
23
24       The  main goal of this tool is to automatically (in)validate the LLVM's
25       TableDef scheduling models. To that end, we also  provide  analysis  of
26       the results.
27
28       llvm-exegesis can also benchmark arbitrary user-provided code snippets.
29

EXAMPLE 1: BENCHMARKING INSTRUCTIONS

31       Assume  you  have an X86-64 machine. To measure the latency of a single
32       instruction, run:
33
34          $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
35
36       Measuring the uop decomposition of an instruction works similarly:
37
38          $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
39
40       The output is a YAML document (the default is to write to  stdout,  but
41       you can redirect the output to a file using -benchmarks-file):
42
43          ---
44          key:
45            opcode_name:     ADD64rr
46            mode:            latency
47            config:          ''
48          cpu_name:        haswell
49          llvm_triple:     x86_64-unknown-linux-gnu
50          num_repetitions: 10000
51          measurements:
52            - { key: latency, value: 1.0058, debug_string: '' }
53          error:           ''
54          info:            'explicit self cycles, selecting one aliasing configuration.
55          Snippet:
56          ADD64rr R8, R8, R10
57          '
58          ...
59
60       To  measure  the latency of all instructions for the host architecture,
61       run:
62
63          #!/bin/bash
64          readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
65          for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
66          do
67            ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
68          done
69
70       FIXME: Provide an llvm-exegesis option to test all instructions.
71

EXAMPLE 2: BENCHMARKING A CUSTOM CODE SNIPPET

73       To measure the latency/uops of a custom piece of code, you can  specify
74       the snippets-file option (- reads from standard input).
75
76          $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
77
78       Real-life  code  snippets  typically  depend  on  registers  or memory.
79       llvm-exegesis checks the liveliness of registers (i.e. any register use
80       has a corresponding def or is a "live in"). If your code depends on the
81       value of some registers, you have two options:
82
83       · Mark the register as requiring a definition. llvm-exegesis will auto‐
84         matically  assign a value to the register. This can be done using the
85         directive  LLVM-EXEGESIS-DEFREG   <reg   name>   <hex_value>,   where
86         <hex_value>  is a bit pattern used to fill <reg_name>. If <hex_value>
87         is smaller than the register width, it will be sign-extended.
88
89       · Mark the register as a "live in". llvm-exegesis will benchmark  using
90         whatever value was in this registers on entry. This can be done using
91         the directive LLVM-EXEGESIS-LIVEIN <reg name>.
92
93       For example, the following code snippet depends on the values  of  XMM1
94       (which  will  be  set  by the tool) and the memory buffer passed in RDI
95       (live in).
96
97          # LLVM-EXEGESIS-LIVEIN RDI
98          # LLVM-EXEGESIS-DEFREG XMM1 42
99          vmulps        (%rdi), %xmm1, %xmm2
100          vhaddps       %xmm2, %xmm2, %xmm3
101          addq $0x10, %rdi
102

EXAMPLE 3: ANALYSIS

104       Assuming you have a set of benchmarked instructions (either latency  or
105       uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
106       using the following command:
107
108            $ llvm-exegesis -mode=analysis \
109          -benchmarks-file=/tmp/benchmarks.yaml \
110          -analysis-clusters-output-file=/tmp/clusters.csv \
111          -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
112
113       This will group the instructions into clusters with  the  same  perfor‐
114       mance  characteristics.  The clusters will be written out to /tmp/clus‐
115       ters.csv in the following format:
116
117          cluster_id,opcode_name,config,sched_class
118          ...
119          2,ADD32ri8_DB,,WriteALU,1.00
120          2,ADD32ri_DB,,WriteALU,1.01
121          2,ADD32rr,,WriteALU,1.01
122          2,ADD32rr_DB,,WriteALU,1.00
123          2,ADD32rr_REV,,WriteALU,1.00
124          2,ADD64i32,,WriteALU,1.01
125          2,ADD64ri32,,WriteALU,1.01
126          2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
127          2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
128          2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
129          2,ADD64ri8,,WriteALU,1.00
130          2,SETBr,,WriteSETCC,1.01
131          ...
132
133       llvm-exegesis will also analyze the clusters to point out  inconsisten‐
134       cies  in  the  scheduling  information. The output is an html file. For
135       example, /tmp/inconsistencies.html will contain messages like the  fol‐
136       lowing : [image]
137
138       Note  that  the  scheduling  class  names  will  be  resolved only when
139       llvm-exegesis is compiled in debug mode, else only the class id will be
140       shown. This does not invalidate any of the analysis results though.
141

OPTIONS

143       -help  Print a summary of command line options.
144
145       -opcode-index=<LLVM opcode index>
146              Specify  the  opcode  to  measure,  by  index. See example 1 for
147              details.  Either opcode-index, opcode-name or snippets-file must
148              be set.
149
150       -opcode-name=<opcode name 1>,<opcode name 2>,...
151              Specify  the  opcode to measure, by name. Several opcodes can be
152              specified as a comma-separated list. See example 1 for  details.
153              Either opcode-index, opcode-name or snippets-file must be set.
154
155              -snippets-file=<filename>
156                     Specify the custom code snippet to measure. See example 2
157                     for details.  Either opcode-index, opcode-name  or  snip‐
158                     pets-file must be set.
159
160       -mode=[latency|uops|analysis]
161              Specify the run mode.
162
163       -num-repetitions=<Number of repetition>
164              Specify  the  number  of repetitions of the asm snippet.  Higher
165              values lead to  more  accurate  measurements  but  lengthen  the
166              benchmark.
167
168       -benchmarks-file=</path/to/file>
169              File  to  read  (analysis  mode)  or  write (latency/uops modes)
170              benchmark results. "-" uses stdin/stdout.
171
172       -analysis-clusters-output-file=</path/to/file>
173              If provided, write the analysis clusters as CSV  to  this  file.
174              "-" prints to stdout.
175
176       -analysis-inconsistencies-output-file=</path/to/file>
177              If  non-empty,  write  inconsistencies  found during analysis to
178              this file. - prints to stdout.
179
180       -analysis-numpoints=<dbscan numPoints parameter>
181              Specify the numPoints parameters to be used for DBSCAN  cluster‐
182              ing (analysis mode).
183
184       -analysis-espilon=<dbscan epsilon parameter>
185              Specify  the numPoints parameters to be used for DBSCAN cluster‐
186              ing (analysis mode).
187
188       -ignore-invalid-sched-class=false
189              If set, ignore instructions that  do  not  have  a  sched  class
190              (class idx = 0).
191
192              -mcpu=<cpu name>
193                     If  set,  measure the cpu characteristics using the coun‐
194                     ters for this CPU. This is useful when creating new sched
195                     models (the host CPU is unknown to LLVM).
196

EXIT STATUS

198       llvm-exegesis  returns  0  on  success.  Otherwise, an error message is
199       printed to standard error, and the tool returns a non 0 value.
200

AUTHOR

202       Maintained by the LLVM Team (https://llvm.org/).
203

COPYRIGHT

205       2003-2019, LLVM Project
206
207
208
209
2108                                 2019-04-25                  LLVM-EXEGESIS(1)