1LLVM-EXEGESIS(1) LLVM LLVM-EXEGESIS(1)
2
3
4
6 llvm-exegesis - LLVM Machine Instruction Benchmark
7
9 llvm-exegesis [options]
10
12 llvm-exegesis is a benchmarking tool that uses information available in
13 LLVM to measure host machine instruction characteristics like latency
14 or port decomposition.
15
16 Given an LLVM opcode name and a benchmarking mode, llvm-exegesis gener‐
17 ates a code snippet that makes execution as serial (resp. as parallel)
18 as possible so that we can measure the latency (resp. uop decomposi‐
19 tion) of the instruction. The code snippet is jitted and executed on
20 the host subtarget. The time taken (resp. resource usage) is measured
21 using hardware performance counters. The result is printed out as YAML
22 to the standard output.
23
24 The main goal of this tool is to automatically (in)validate the LLVM's
25 TableDef scheduling models. To that end, we also provide analysis of
26 the results.
27
29 Assume you have an X86-64 machine. To measure the latency of a single
30 instruction, run:
31
32 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
33
34 Measuring the uop decomposition of an instruction works similarly:
35
36 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
37
38 The output is a YAML document (the default is to write to stdout, but
39 you can redirect the output to a file using -benchmarks-file):
40
41 ---
42 key:
43 opcode_name: ADD64rr
44 mode: latency
45 config: ''
46 cpu_name: haswell
47 llvm_triple: x86_64-unknown-linux-gnu
48 num_repetitions: 10000
49 measurements:
50 - { key: latency, value: 1.0058, debug_string: '' }
51 error: ''
52 info: 'explicit self cycles, selecting one aliasing configuration.
53 Snippet:
54 ADD64rr R8, R8, R10
55 '
56 ...
57
58 To measure the latency of all instructions for the host architecture,
59 run:
60
61 #!/bin/bash
62 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
63 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
64 do
65 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
66 done
67
68 FIXME: Provide an llvm-exegesis option to test all instructions.
69
71 Assuming you have a set of benchmarked instructions (either latency or
72 uops) as YAML in file /tmp/benchmarks.yaml, you can analyze the results
73 using the following command:
74
75 $ llvm-exegesis -mode=analysis \
76 -benchmarks-file=/tmp/benchmarks.yaml \
77 -analysis-clusters-output-file=/tmp/clusters.csv \
78 -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt
79
80 This will group the instructions into clusters with the same perfor‐
81 mance characteristics. The clusters will be written out to /tmp/clus‐
82 ters.csv in the following format:
83
84 cluster_id,opcode_name,config,sched_class
85 ...
86 2,ADD32ri8_DB,,WriteALU,1.00
87 2,ADD32ri_DB,,WriteALU,1.01
88 2,ADD32rr,,WriteALU,1.01
89 2,ADD32rr_DB,,WriteALU,1.00
90 2,ADD32rr_REV,,WriteALU,1.00
91 2,ADD64i32,,WriteALU,1.01
92 2,ADD64ri32,,WriteALU,1.01
93 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
94 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
95 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
96 2,ADD64ri8,,WriteALU,1.00
97 2,SETBr,,WriteSETCC,1.01
98 ...
99
100 llvm-exegesis will also analyze the clusters to point out inconsisten‐
101 cies in the scheduling information. The output is an html file. For ex‐
102 ample, /tmp/inconsistencies.html will contain messages like the follow‐
103 ing : [image]
104
105 Note that the scheduling class names will be resolved only when
106 llvm-exegesis is compiled in debug mode, else only the class id will be
107 shown. This does not invalidate any of the analysis results though.
108
110 -help Print a summary of command line options.
111
112 -opcode-index=<LLVM opcode index>
113 Specify the opcode to measure, by index. Either opcode-index or
114 opcode-name must be set.
115
116 -opcode-name=<LLVM opcode name>
117 Specify the opcode to measure, by name. Either opcode-index or
118 opcode-name must be set.
119
120 -mode=[latency|uops|analysis]
121 Specify the run mode.
122
123 -num-repetitions=<Number of repetition>
124 Specify the number of repetitions of the asm snippet. Higher
125 values lead to more accurate measurements but lengthen the
126 benchmark.
127
128 -benchmarks-file=</path/to/file>
129 File to read (analysis mode) or write (latency/uops modes)
130 benchmark results. "-" uses stdin/stdout.
131
132 -analysis-clusters-output-file=</path/to/file>
133 If provided, write the analysis clusters as CSV to this file.
134 "-" prints to stdout.
135
136 -analysis-inconsistencies-output-file=</path/to/file>
137 If non-empty, write inconsistencies found during analysis to
138 this file. - prints to stdout.
139
140 -analysis-numpoints=<dbscan numPoints parameter>
141 Specify the numPoints parameters to be used for DBSCAN cluster‐
142 ing (analysis mode).
143
144 -analysis-espilon=<dbscan epsilon parameter>
145 Specify the numPoints parameters to be used for DBSCAN cluster‐
146 ing (analysis mode).
147
148 -ignore-invalid-sched-class=false
149 If set, ignore instructions that do not have a sched class
150 (class idx = 0).
151
153 llvm-exegesis returns 0 on success. Otherwise, an error message is
154 printed to standard error, and the tool returns a non 0 value.
155
157 Maintained by The LLVM Team (http://llvm.org/).
158
160 2003-2023, LLVM Project
161
162
163
164
1657 2023-07-20 LLVM-EXEGESIS(1)