1PERF-STAT(1) perf Manual PERF-STAT(1)
2
3
4
6 perf-stat - Run a command and gather performance counter statistics
7
9 perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
10 perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]
11 perf stat [-e <EVENT> | --event=EVENT] [-a] record [-o file] — <command> [<options>]
12 perf stat report [-i file]
13
15 This command runs a command and gathers performance counter statistics
16 from it.
17
19 <command>...
20 Any command you can specify in a shell.
21
22 record
23 See STAT RECORD.
24
25 report
26 See STAT REPORT.
27
28 -e, --event=
29 Select the PMU event. Selection can be:
30
31 · a symbolic event name (use perf list to list all events)
32
33 · a raw PMU event (eventsel+umask) in the form of rNNN where NNN
34 is a hexadecimal event descriptor.
35
36 · a symbolically formed event like pmu/param1=0x3,param2/ where
37 param1 and param2 are defined as formats for the PMU in
38 /sys/bus/event_source/devices/<pmu>/format/*
39
40 · a symbolically formed event like
41 pmu/config=M,config1=N,config2=K/ where M, N, K are numbers (in
42 decimal, hex, octal format). Acceptable values for each of
43 config, config1 and config2 parameters are defined by
44 corresponding entries in
45 /sys/bus/event_source/devices/<pmu>/format/*
46
47 -i, --no-inherit
48 child tasks do not inherit counters
49
50 -p, --pid=<pid>
51 stat events on existing process id (comma separated list)
52
53 -t, --tid=<tid>
54 stat events on existing thread id (comma separated list)
55
56 -a, --all-cpus
57 system-wide collection from all CPUs (default if no target is
58 specified)
59
60 -c, --scale
61 scale/normalize counter values
62
63 -d, --detailed
64 print more detailed statistics, can be specified up to 3 times
65
66 -d: detailed events, L1 and LLC data cache
67 -d -d: more detailed events, dTLB and iTLB events
68 -d -d -d: very detailed events, adding prefetch events
69
70 -r, --repeat=<n>
71 repeat command and print average + stddev (max: 100). 0 means
72 forever.
73
74 -B, --big-num
75 print large numbers with thousands' separators according to locale
76
77 -C, --cpu=
78 Count only on the list of CPUs provided. Multiple CPUs can be
79 provided as a comma-separated list with no space: 0,1. Ranges of
80 CPUs are specified with -: 0-2. In per-thread mode, this option is
81 ignored. The -a option is still necessary to activate system-wide
82 monitoring. Default is to count on all CPUs.
83
84 -A, --no-aggr
85 Do not aggregate counts across all monitored CPUs.
86
87 -n, --null
88 null run - don’t start any counters
89
90 -v, --verbose
91 be more verbose (show counter open errors, etc)
92
93 -x SEP, --field-separator SEP
94 print counts using a CSV-style output to make it easy to import
95 directly into spreadsheets. Columns are separated by the string
96 specified in SEP.
97
98 -G name, --cgroup name
99 monitor only in the container (cgroup) called "name". This option
100 is available only in per-cpu mode. The cgroup filesystem must be
101 mounted. All threads belonging to container "name" are monitored
102 when they run on the monitored CPUs. Multiple cgroups can be
103 provided. Each cgroup is applied to the corresponding event, i.e.,
104 first cgroup to first event, second cgroup to second event and so
105 on. It is possible to provide an empty cgroup (monitor all the
106 time) using, e.g., -G foo,,bar. Cgroups must have corresponding
107 events, i.e., they always refer to events defined earlier on the
108 command line.
109
110 -o file, --output file
111 Print the output into the designated file.
112
113 --append
114 Append to the output file designated with the -o option. Ignored if
115 -o is not specified.
116
117 --log-fd
118 Log output to fd, instead of stderr. Complementary to --output, and
119 mutually exclusive with it. --append may be used here. Examples:
120 3>results perf stat --log-fd 3 — $cmd 3>>results perf stat
121 --log-fd 3 --append — $cmd
122
123 --pre, --post
124 Pre and post measurement hooks, e.g.:
125
126 perf stat --repeat 10 --null --sync --pre make -s
127 O=defconfig-build/clean — make -s -j64 O=defconfig-build/ bzImage
128
129 -I msecs, --interval-print msecs
130 Print count deltas every N milliseconds (minimum: 10ms) The
131 overhead percentage could be high in some cases, for instance with
132 small, sub 100ms intervals. Use with caution. example: perf stat -I
133 1000 -e cycles -a sleep 5
134
135 --metric-only
136 Only print computed metrics. Print them in a single line. Don’t
137 show any raw values. Not supported with --per-thread.
138
139 --per-socket
140 Aggregate counts per processor socket for system-wide mode
141 measurements. This is a useful mode to detect imbalance between
142 sockets. To enable this mode, use --per-socket in addition to -a.
143 (system-wide). The output includes the socket number and the number
144 of online processors on that socket. This is useful to gauge the
145 amount of aggregation.
146
147 --per-core
148 Aggregate counts per physical processor for system-wide mode
149 measurements. This is a useful mode to detect imbalance between
150 physical cores. To enable this mode, use --per-core in addition to
151 -a. (system-wide). The output includes the core number and the
152 number of online logical processors on that physical processor.
153
154 --per-thread
155 Aggregate counts per monitored threads, when monitoring threads (-t
156 option) or processes (-p option).
157
158 -D msecs, --delay msecs
159 After starting the program, wait msecs before measuring. This is
160 useful to filter out the startup phase of the program, which is
161 often very different.
162
163 -T, --transaction
164 Print statistics of transactional execution if supported.
165
167 Stores stat data into perf data file.
168
169 -o file, --output file
170 Output file name.
171
173 Reads and reports stat data from perf data file.
174
175 -i file, --input file
176 Input file name.
177
178 --per-socket
179 Aggregate counts per processor socket for system-wide mode
180 measurements.
181
182 --per-core
183 Aggregate counts per physical processor for system-wide mode
184 measurements.
185
186 -M, --metrics
187 Print metrics or metricgroups specified in a comma separated list.
188 For a group all metrics from the group are added. The events from
189 the metrics are automatically measured. See perf list output for
190 the possble metrics and metricgroups.
191
192 -A, --no-aggr
193 Do not aggregate counts across all monitored CPUs.
194
195 --topdown
196 Print top down level 1 metrics if supported by the CPU. This allows
197 to determine bottle necks in the CPU pipeline for CPU bound
198 workloads, by breaking the cycles consumed down into frontend
199 bound, backend bound, bad speculation and retiring.
200
201 Frontend bound means that the CPU cannot fetch and decode instructions
202 fast enough. Backend bound means that computation or memory access is
203 the bottle neck. Bad Speculation means that the CPU wasted cycles due
204 to branch mispredictions and similar issues. Retiring means that the
205 CPU computed without an apparently bottleneck. The bottleneck is only
206 the real bottleneck if the workload is actually bound by the CPU and
207 not by something else.
208
209 For best results it is usually a good idea to use it with interval mode
210 like -I 1000, as the bottleneck of workloads can change often.
211
212 The top down metrics are collected per core instead of per CPU thread.
213 Per core mode is automatically enabled and -a (global monitoring) is
214 needed, requiring root rights or perf.perf_event_paranoid=-1.
215
216 Topdown uses the full Performance Monitoring Unit, and needs disabling
217 of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog
218 for best results. Otherwise the bottlenecks may be inconsistent on
219 workload with changing phases.
220
221 This enables --metric-only, unless overriden with --no-metric-only.
222
223 To interpret the results it is usually needed to know on which CPUs the
224 workload runs on. If needed the CPUs can be forced using taskset.
225
226 --no-merge
227 Do not merge results from same PMUs.
228
229 --smi-cost
230 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
231
232 During the measurement, the /sys/device/cpu/freeze_on_smi will be set
233 to freeze core counters on SMI. The aperf counter will not be effected
234 by the setting. The cost of SMI can be measured by (aperf - unhalted
235 core cycles).
236
237 In practice, the percentages of SMI cycles is very useful for
238 performance oriented analysis. --metric_only will be applied by
239 default. The output is SMI cycles%, equals to (aperf - unhalted core
240 cycles) / aperf
241
242 Users who wants to get the actual value can apply --no-metric-only.
243
245 $ perf stat — make -j
246
247 Performance counter stats for 'make -j':
248
249 8117.370256 task clock ticks # 11.281 CPU utilization factor
250 678 context switches # 0.000 M/sec
251 133 CPU migrations # 0.000 M/sec
252 235724 pagefaults # 0.029 M/sec
253 24821162526 CPU cycles # 3057.784 M/sec
254 18687303457 instructions # 2302.138 M/sec
255 172158895 cache references # 21.209 M/sec
256 27075259 cache misses # 3.335 M/sec
257
258 Wall-clock time elapsed: 719.554352 msecs
259
261 With -x, perf stat is able to output a not-quite-CSV format output
262 Commas in the output are not put into "". To make it easy to parse it
263 is recommended to use a different character like -x \;
264
265 The fields are in this order:
266
267 · optional usec time stamp in fractions of second (with -I xxx)
268
269 · optional CPU, core, or socket identifier
270
271 · optional number of logical CPUs aggregated
272
273 · counter value
274
275 · unit of the counter value or empty
276
277 · event name
278
279 · run time of counter
280
281 · percentage of measurement time the counter was running
282
283 · optional variance if multiple values are collected with -r
284
285 · optional metric value
286
287 · optional unit of metric
288
289 Additional metrics may be printed with all earlier fields being empty.
290
292 perf-top(1), perf-list(1)
293
294
295
296perf 06/18/2019 PERF-STAT(1)