1LLVM-EXEGESIS(1) LLVM LLVM-EXEGESIS(1)
2
3
4
6 llvm-exegesis - LLVM Machine Instruction Benchmark
7
9 llvm-exegesis [options]
10
12 llvm-exegesis is a benchmarking tool that uses information available in
13 LLVM to measure host machine instruction characteristics like latency
14 or port decomposition.
15
16 Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17 ates a code snippet that makes execution as serial (resp. as parallel)
18 as possible so that we can measure the latency (resp. uop decomposi‐
19 tion) of the instruction. The code snippet is jitted and executed on
20 the host subtarget. The time taken (resp. resource usage) is measured
21 using hardware performance counters. The result is printed out as YAML
22 to the standard output.
23
24 The main goal of this tool is to automatically (in)validate the LLVM's
25 TableDef scheduling models. To that end, we also provide analysis of
26 the results.
27
28 llvm-exegesis can also benchmark arbitrary user-provided code snippets.
29
31 Assume you have an X86-64 machine. To measure the latency of a single
32 instruction, run:
33
34 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
35
36 Measuring the uop decomposition of an instruction works similarly:
37
38 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
39
40 The output is a YAML document (the default is to write to stdout, but
41 you can redirect the output to a file using -benchmarks-file):
42
43 ---
44 key:
45 opcode_name: ADD64rr
46 mode: latency
47 config: ''
48 cpu_name: haswell
49 llvm_triple: x86_64-unknown-linux-gnu
50 num_repetitions: 10000
51 measurements:
52 - { key: latency, value: 1.0058, debug_string: '' }
53 error: ''
54 info: 'explicit self cycles, selecting one aliasing configuration.
55 Snippet:
56 ADD64rr R8, R8, R10
57 '
58 ...
59
60 To measure the latency of all instructions for the host architecture,
61 run:
62
63 #!/bin/bash
64 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
65 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
66 do
67 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
68 done
69
70 FIXME: Provide an llvm-exegesis option to test all instructions.
71
73 To measure the latency/uops of a custom piece of code, you can specify
74 the snippets-file option (- reads from standard input).
75
76 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
77
78 Real-life code snippets typically depend on registers or memory.
79 llvm-exegesis checks the liveliness of registers (i.e. any register use
80 has a corresponding def or is a "live in"). If your code depends on the
81 value of some registers, you have two options:
82
83 · Mark the register as requiring a definition. llvm-exegesis will auto‐
84 matically assign a value to the register. This can be done using the
85 directive LLVM-EXEGESIS-DEFREG <reg name> <hex_value>, where
86 <hex_value> is a bit pattern used to fill <reg_name>. If <hex_value>
87 is smaller than the register width, it will be sign-extended.
88
89 · Mark the register as a "live in". llvm-exegesis will benchmark using
90 whatever value was in this registers on entry. This can be done using
91 the directive LLVM-EXEGESIS-LIVEIN <reg name>.
92
93 For example, the following code snippet depends on the values of XMM1
94 (which will be set by the tool) and the memory buffer passed in RDI
95 (live in).
96
97 # LLVM-EXEGESIS-LIVEIN RDI
98 # LLVM-EXEGESIS-DEFREG XMM1 42
99 vmulps (%rdi), %xmm1, %xmm2
100 vhaddps %xmm2, %xmm2, %xmm3
101 addq $0x10, %rdi
102
104 Assuming you have a set of benchmarked instructions (either latency or
105 uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
106 using the following command:
107
108 $ llvm-exegesis -mode=analysis \
109 -benchmarks-file=/tmp/benchmarks.yaml \
110 -analysis-clusters-output-file=/tmp/clusters.csv \
111 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
112
113 This will group the instructions into clusters with the same perfor‐
114 mance characteristics. The clusters will be written out to /tmp/clus‐
115 ters.csv in the following format:
116
117 cluster_id,opcode_name,config,sched_class
118 ...
119 2,ADD32ri8_DB,,WriteALU,1.00
120 2,ADD32ri_DB,,WriteALU,1.01
121 2,ADD32rr,,WriteALU,1.01
122 2,ADD32rr_DB,,WriteALU,1.00
123 2,ADD32rr_REV,,WriteALU,1.00
124 2,ADD64i32,,WriteALU,1.01
125 2,ADD64ri32,,WriteALU,1.01
126 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
127 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
128 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
129 2,ADD64ri8,,WriteALU,1.00
130 2,SETBr,,WriteSETCC,1.01
131 ...
132
133 llvm-exegesis will also analyze the clusters to point out inconsisten‐
134 cies in the scheduling information. The output is an html file. For
135 example, /tmp/inconsistencies.html will contain messages like the fol‐
136 lowing : [image]
137
138 Note that the scheduling class names will be resolved only when
139 llvm-exegesis is compiled in debug mode, else only the class id will be
140 shown. This does not invalidate any of the analysis results though.
141
143 -help Print a summary of command line options.
144
145 -opcode-index=<LLVM opcode index>
146 Specify the opcode to measure, by index. See example 1 for
147 details. Either opcode-index, opcode-name or snippets-file must
148 be set.
149
150 -opcode-name=<opcode name 1>,<opcode name 2>,...
151 Specify the opcode to measure, by name. Several opcodes can be
152 specified as a comma-separated list. See example 1 for details.
153 Either opcode-index, opcode-name or snippets-file must be set.
154
155 -snippets-file=<filename>
156 Specify the custom code snippet to measure. See example 2
157 for details. Either opcode-index, opcode-name or snip‐
158 pets-file must be set.
159
160 -mode=[latency|uops|analysis]
161 Specify the run mode.
162
163 -num-repetitions=<Number of repetition>
164 Specify the number of repetitions of the asm snippet. Higher
165 values lead to more accurate measurements but lengthen the
166 benchmark.
167
168 -benchmarks-file=</path/to/file>
169 File to read (analysis mode) or write (latency/uops modes)
170 benchmark results. "-" uses stdin/stdout.
171
172 -analysis-clusters-output-file=</path/to/file>
173 If provided, write the analysis clusters as CSV to this file.
174 "-" prints to stdout.
175
176 -analysis-inconsistencies-output-file=</path/to/file>
177 If non-empty, write inconsistencies found during analysis to
178 this file. - prints to stdout.
179
180 -analysis-numpoints=<dbscan numPoints parameter>
181 Specify the numPoints parameters to be used for DBSCAN cluster‐
182 ing (analysis mode).
183
184 -analysis-espilon=<dbscan epsilon parameter>
185 Specify the numPoints parameters to be used for DBSCAN cluster‐
186 ing (analysis mode).
187
188 -ignore-invalid-sched-class=false
189 If set, ignore instructions that do not have a sched class
190 (class idx = 0).
191
192 -mcpu=<cpu name>
193 If set, measure the cpu characteristics using the coun‐
194 ters for this CPU. This is useful when creating new sched
195 models (the host CPU is unknown to LLVM).
196
198 llvm-exegesis returns 0 on success. Otherwise, an error message is
199 printed to standard error, and the tool returns a non 0 value.
200
202 Maintained by the LLVM Team (https://llvm.org/).
203
205 2003-2019, LLVM Project
206
207
208
209
2108 2019-04-25 LLVM-EXEGESIS(1)