1LLVM-EXEGESIS(1) LLVM LLVM-EXEGESIS(1)
2
3
4
6 llvm-exegesis - LLVM Machine Instruction Benchmark
7
9 llvm-exegesis [options]
10
12 llvm-exegesis is a benchmarking tool that uses information available in
13 LLVM to measure host machine instruction characteristics like latency,
14 throughput, or port decomposition.
15
16 Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17 ates a code snippet that makes execution as serial (resp. as parallel)
18 as possible so that we can measure the latency (resp. inverse through‐
19 put/uop decomposition) of the instruction. The code snippet is jitted
20 and executed on the host subtarget. The time taken (resp. resource us‐
21 age) is measured using hardware performance counters. The result is
22 printed out as YAML to the standard output.
23
24 The main goal of this tool is to automatically (in)validate the LLVM's
25 TableDef scheduling models. To that end, we also provide analysis of
26 the results.
27
28 llvm-exegesis can also benchmark arbitrary user-provided code snippets.
29
31 Assume you have an X86-64 machine. To measure the latency of a single
32 instruction, run:
33
34 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
35
36 Measuring the uop decomposition or inverse throughput of an instruction
37 works similarly:
38
39 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
40 $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
41
42 The output is a YAML document (the default is to write to stdout, but
43 you can redirect the output to a file using -benchmarks-file):
44
45 ---
46 key:
47 opcode_name: ADD64rr
48 mode: latency
49 config: ''
50 cpu_name: haswell
51 llvm_triple: x86_64-unknown-linux-gnu
52 num_repetitions: 10000
53 measurements:
54 - { key: latency, value: 1.0058, debug_string: '' }
55 error: ''
56 info: 'explicit self cycles, selecting one aliasing configuration.
57 Snippet:
58 ADD64rr R8, R8, R10
59 '
60 ...
61
62 To measure the latency of all instructions for the host architecture,
63 run:
64
65 #!/bin/bash
66 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
67 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
68 do
69 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
70 done
71
72 FIXME: Provide an llvm-exegesis option to test all instructions.
73
75 To measure the latency/uops of a custom piece of code, you can specify
76 the snippets-file option (- reads from standard input).
77
78 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
79
80 Real-life code snippets typically depend on registers or memory.
81 llvm-exegesis checks the liveliness of registers (i.e. any register use
82 has a corresponding def or is a "live in"). If your code depends on the
83 value of some registers, you have two options:
84
85 • Mark the register as requiring a definition. llvm-exegesis will auto‐
86 matically assign a value to the register. This can be done using the
87 directive LLVM-EXEGESIS-DEFREG <reg name> <hex_value>, where
88 <hex_value> is a bit pattern used to fill <reg_name>. If <hex_value>
89 is smaller than the register width, it will be sign-extended.
90
91 • Mark the register as a "live in". llvm-exegesis will benchmark using
92 whatever value was in this registers on entry. This can be done using
93 the directive LLVM-EXEGESIS-LIVEIN <reg name>.
94
95 For example, the following code snippet depends on the values of XMM1
96 (which will be set by the tool) and the memory buffer passed in RDI
97 (live in).
98
99 # LLVM-EXEGESIS-LIVEIN RDI
100 # LLVM-EXEGESIS-DEFREG XMM1 42
101 vmulps (%rdi), %xmm1, %xmm2
102 vhaddps %xmm2, %xmm2, %xmm3
103 addq $0x10, %rdi
104
106 Assuming you have a set of benchmarked instructions (either latency or
107 uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
108 using the following command:
109
110 $ llvm-exegesis -mode=analysis \
111 -benchmarks-file=/tmp/benchmarks.yaml \
112 -analysis-clusters-output-file=/tmp/clusters.csv \
113 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
114
115 This will group the instructions into clusters with the same perfor‐
116 mance characteristics. The clusters will be written out to /tmp/clus‐
117 ters.csv in the following format:
118
119 cluster_id,opcode_name,config,sched_class
120 ...
121 2,ADD32ri8_DB,,WriteALU,1.00
122 2,ADD32ri_DB,,WriteALU,1.01
123 2,ADD32rr,,WriteALU,1.01
124 2,ADD32rr_DB,,WriteALU,1.00
125 2,ADD32rr_REV,,WriteALU,1.00
126 2,ADD64i32,,WriteALU,1.01
127 2,ADD64ri32,,WriteALU,1.01
128 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
129 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
130 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
131 2,ADD64ri8,,WriteALU,1.00
132 2,SETBr,,WriteSETCC,1.01
133 ...
134
135 llvm-exegesis will also analyze the clusters to point out inconsisten‐
136 cies in the scheduling information. The output is an html file. For ex‐
137 ample, /tmp/inconsistencies.html will contain messages like the follow‐
138 ing : [image]
139
140 Note that the scheduling class names will be resolved only when
141 llvm-exegesis is compiled in debug mode, else only the class id will be
142 shown. This does not invalidate any of the analysis results though.
143
145 -help Print a summary of command line options.
146
147 -opcode-index=<LLVM opcode index>
148 Specify the opcode to measure, by index. Specifying -1 will re‐
149 sult in measuring every existing opcode. See example 1 for de‐
150 tails. Either opcode-index, opcode-name or snippets-file must
151 be set.
152
153 -opcode-name=<opcode name 1>,<opcode name 2>,...
154 Specify the opcode to measure, by name. Several opcodes can be
155 specified as a comma-separated list. See example 1 for details.
156 Either opcode-index, opcode-name or snippets-file must be set.
157
158 -snippets-file=<filename>
159 Specify the custom code snippet to measure. See example 2 for
160 details. Either opcode-index, opcode-name or snippets-file must
161 be set.
162
163 -mode=[latency|uops|inverse_throughput|analysis]
164 Specify the run mode. Note that if you pick analysis mode, you
165 also need to specify at least one of the -analysis-clusters-out‐
166 put-file= and -analysis-inconsistencies-output-file=.
167
168 -repetition-mode=[duplicate|loop|min]
169 Specify the repetition mode. duplicate will create a large,
170 straight line basic block with num-repetitions copies of the
171 snippet. loop will wrap the snippet in a loop which will be run
172 num-repetitions times. The loop mode tends to better hide the
173 effects of the CPU frontend on architectures that cache decoded
174 instructions, but consumes a register for counting iterations.
175 If performing an analysis over many opcodes, it may be best to
176 instead use the min mode, which will run each other mode, and
177 produce the minimal measured result.
178
179 -num-repetitions=<Number of repetitions>
180 Specify the number of repetitions of the asm snippet. Higher
181 values lead to more accurate measurements but lengthen the
182 benchmark.
183
184 -max-configs-per-opcode=<value>
185 Specify the maximum configurations that can be generated for
186 each opcode. By default this is 1, meaning that we assume that
187 a single measurement is enough to characterize an opcode. This
188 might not be true of all instructions: for example, the perfor‐
189 mance characteristics of the LEA instruction on X86 depends on
190 the value of assigned registers and immediates. Setting a value
191 of -max-configs-per-opcode larger than 1 allows llvm-exegesis to
192 explore more configurations to discover if some register or im‐
193 mediate assignments lead to different performance characteris‐
194 tics.
195
196 -benchmarks-file=</path/to/file>
197 File to read (analysis mode) or write (latency/uops/in‐
198 verse_throughput modes) benchmark results. "-" uses stdin/std‐
199 out.
200
201 -analysis-clusters-output-file=</path/to/file>
202 If provided, write the analysis clusters as CSV to this file.
203 "-" prints to stdout. By default, this analysis is not run.
204
205 -analysis-inconsistencies-output-file=</path/to/file>
206 If non-empty, write inconsistencies found during analysis to
207 this file. - prints to stdout. By default, this analysis is not
208 run.
209
210 -analysis-clustering=[dbscan,naive]
211 Specify the clustering algorithm to use. By default DBSCAN will
212 be used. Naive clustering algorithm is better for doing further
213 work on the -analysis-inconsistencies-output-file= output, it
214 will create one cluster per opcode, and check that the cluster
215 is stable (all points are neighbours).
216
217 -analysis-numpoints=<dbscan numPoints parameter>
218 Specify the numPoints parameters to be used for DBSCAN cluster‐
219 ing (analysis mode, DBSCAN only).
220
221 -analysis-clustering-epsilon=<dbscan epsilon parameter>
222 Specify the epsilon parameter used for clustering of benchmark
223 points (analysis mode).
224
225 -analysis-inconsistency-epsilon=<epsilon>
226 Specify the epsilon parameter used for detection of when the
227 cluster is different from the LLVM schedule profile values
228 (analysis mode).
229
230 -analysis-display-unstable-clusters
231 If there is more than one benchmark for an opcode, said bench‐
232 marks may end up not being clustered into the same cluster if
233 the measured performance characteristics are different. by de‐
234 fault all such opcodes are filtered out. This flag will instead
235 show only such unstable opcodes.
236
237 -ignore-invalid-sched-class=false
238 If set, ignore instructions that do not have a sched class
239 (class idx = 0).
240
241 -mcpu=<cpu name>
242 If set, measure the cpu characteristics using the counters for
243 this CPU. This is useful when creating new sched models (the
244 host CPU is unknown to LLVM).
245
246 --dump-object-to-disk=true
247 By default, llvm-exegesis will dump the generated code to a tem‐
248 porary file to enable code inspection. You may disable it to
249 speed up the execution and save disk space.
250
252 llvm-exegesis returns 0 on success. Otherwise, an error message is
253 printed to standard error, and the tool returns a non 0 value.
254
256 Maintained by the LLVM Team (https://llvm.org/).
257
259 2003-2021, LLVM Project
260
261
262
263
26411 2021-07-22 LLVM-EXEGESIS(1)