1PERF-REPORT(1) perf Manual PERF-REPORT(1)
2
3
4
6 perf-report - Read perf.data (created by perf record) and display the
7 profile
8
10 perf report [-i <file> | --input=file]
11
13 This command displays the performance counter profile information
14 recorded via perf record.
15
17 -i, --input=
18 Input file name. (default: perf.data unless stdin is a fifo)
19
20 -v, --verbose
21 Be more verbose. (show symbol address, etc)
22
23 -n, --show-nr-samples
24 Show the number of samples for each symbol
25
26 --showcpuutilization
27 Show sample percentage for different cpu modes.
28
29 -T, --threads
30 Show per-thread event counters
31
32 -c, --comms=
33 Only consider symbols in these comms. CSV that understands
34 file://filename entries. This option will affect the percentage of
35 the overhead column. See --percentage for more info.
36
37 --pid=
38 Only show events for given process ID (comma separated list).
39
40 --tid=
41 Only show events for given thread ID (comma separated list).
42
43 -d, --dsos=
44 Only consider symbols in these dsos. CSV that understands
45 file://filename entries. This option will affect the percentage of
46 the overhead column. See --percentage for more info.
47
48 -S, --symbols=
49 Only consider these symbols. CSV that understands file://filename
50 entries. This option will affect the percentage of the overhead
51 column. See --percentage for more info.
52
53 --symbol-filter=
54 Only show symbols that match (partially) with this filter.
55
56 -U, --hide-unresolved
57 Only display entries resolved to a symbol.
58
59 -s, --sort=
60 Sort histogram entries by given key(s) - multiple keys can be
61 specified in CSV format. Following sort keys are available: pid,
62 comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
63
64 Each key has following meaning:
65
66 · comm: command (name) of the task which can be read via
67 /proc/<pid>/comm
68
69 · pid: command and tid of the task
70
71 · dso: name of library or module executed at the time of sample
72
73 · symbol: name of function executed at the time of sample
74
75 · parent: name of function matched to the parent regex filter.
76 Unmatched entries are displayed as "[other]".
77
78 · cpu: cpu number the task ran at the time of sample
79
80 · srcline: filename and line number executed at the time of
81 sample. The DWARF debugging info must be provided.
82
83 · weight: Event specific weight, e.g. memory latency or
84 transaction abort cost. This is the global weight.
85
86 · local_weight: Local weight version of the weight above.
87
88 · transaction: Transaction abort flags.
89
90 · overhead: Overhead percentage of sample
91
92 · overhead_sys: Overhead percentage of sample running in system
93 mode
94
95 · overhead_us: Overhead percentage of sample running in user mode
96
97 · overhead_guest_sys: Overhead percentage of sample running in
98 system mode on guest machine
99
100 · overhead_guest_us: Overhead percentage of sample running in
101 user mode on guest machine
102
103 · sample: Number of sample
104
105 · period: Raw number of event count of sample
106
107 By default, comm, dso and symbol keys are used.
108 (i.e. --sort comm,dso,symbol)
109
110 If --branch-stack option is used, following sort keys are also
111 available:
112 dso_from, dso_to, symbol_from, symbol_to, mispredict.
113
114 · dso_from: name of library or module branched from
115
116 · dso_to: name of library or module branched to
117
118 · symbol_from: name of function branched from
119
120 · symbol_to: name of function branched to
121
122 · mispredict: "N" for predicted branch, "Y" for mispredicted
123 branch
124
125 · in_tx: branch in TSX transaction
126
127 · abort: TSX transaction abort.
128
129 · cycles: Cycles in basic block
130
131 And default sort keys are changed to comm, dso_from, symbol_from, dso_to
132 and symbol_to, see ´--branch-stack´.
133
134 -F, --fields=
135 Specify output field - multiple keys can be specified in CSV
136 format. Following fields are available: overhead, overhead_sys,
137 overhead_us, overhead_children, sample and period. Also it can
138 contain any sort key(s).
139
140 By default, every sort keys not specified in -F will be appended
141 automatically.
142
143 If --mem-mode option is used, following sort keys are also available
144 (incompatible with --branch-stack):
145 symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
146
147 · symbol_daddr: name of data symbol being executed on at the time
148 of sample
149
150 · dso_daddr: name of library or module containing the data being
151 executed on at the time of sample
152
153 · locked: whether the bus was locked at the time of sample
154
155 · tlb: type of tlb access for the data at the time of sample
156
157 · mem: type of memory access for the data at the time of sample
158
159 · snoop: type of snoop (if any) for the data at the time of
160 sample
161
162 · dcacheline: the cacheline the data address is on at the time of
163 sample
164
165 And default sort keys are changed to local_weight, mem, sym, dso,
166 symbol_daddr, dso_daddr, snoop, tlb, locked, see ´--mem-mode´.
167
168 -p, --parent=<regex>
169 A regex filter to identify parent. The parent is a caller of this
170 function and searched through the callchain, thus it requires
171 callchain information recorded. The pattern is in the exteneded
172 regex format and defaults to "^sys_|^do_page_fault", see --sort
173 parent.
174
175 -x, --exclude-other
176 Only display entries with parent-match.
177
178 -w, --column-widths=<width[,width...]>
179 Force each column width to the provided list, for large terminal
180 readability. 0 means no limit (default behavior).
181
182 -t, --field-separator=
183 Use a special separator character and don’t pad with spaces,
184 replacing all occurrences of this separator in symbol names (and
185 other output) with a . character, that thus it’s the only non
186 valid separator.
187
188 -D, --dump-raw-trace
189 Dump raw trace in ASCII.
190
191 -g [type,min[,limit],order[,key][,branch]], --call-graph
192 Display call chains using type, min percent threshold, optional
193 print limit and order. type can be either:
194
195 · flat: single column, linear exposure of call chains.
196
197 · graph: use a graph tree, displaying absolute overhead rates.
198
199 · fractal: like graph, but displays relative rates. Each branch
200 of the tree is considered as a new profiled object.
201
202
203 order can be either:
204 - callee: callee based call graph.
205 - caller: inverted caller based call graph.
206
207 key can be:
208 - function: compare on functions
209 - address: compare on individual code addresses
210
211 branch can be:
212 - branch: include last branch information in callgraph
213 when available. Usually more convenient to use --branch-history
214 for this.
215
216 Default: fractal,0.5,callee,function.
217
218 --children
219 Accumulate callchain of children to parent entry so that then can
220 show up in the output. The output will have a new "Children" column
221 and will be sorted on the data. It requires callchains are
222 recorded.
223
224 --max-stack
225 Set the stack depth limit when parsing the callchain, anything
226 beyond the specified depth will be ignored. This is a trade-off
227 between information loss and faster processing especially for
228 workloads that can have a very long callchain stack.
229
230 Default: 127
231
232 -G, --inverted
233 alias for inverted caller based call graph.
234
235 --ignore-callees=<regex>
236 Ignore callees of the function(s) matching the given regex. This
237 has the effect of collecting the callers of each such function into
238 one place in the call-graph tree.
239
240 --pretty=<key>
241 Pretty printing style. key: normal, raw
242
243 --stdio
244 Use the stdio interface.
245
246 --tui
247 Use the TUI interface, that is integrated with annotate and allows
248 zooming into DSOs or threads, among other features. Use of --tui
249 requires a tty, if one is not present, as when piping to other
250 commands, the stdio interface is used.
251
252 --gtk
253 Use the GTK2 interface.
254
255 -k, --vmlinux=<file>
256 vmlinux pathname
257
258 --kallsyms=<file>
259 kallsyms pathname
260
261 -m, --modules
262 Load module symbols. WARNING: This should only be used with -k and
263 a LIVE kernel.
264
265 -f, --force
266 Don’t complain, do it.
267
268 --symfs=<directory>
269 Look for files with symbols relative to this directory.
270
271 -C, --cpu
272 Only report samples for the list of CPUs provided. Multiple CPUs
273 can be provided as a comma-separated list with no space: 0,1.
274 Ranges of CPUs are specified with -: 0-2. Default is to report
275 samples on all CPUs.
276
277 -M, --disassembler-style=
278 Set disassembler style for objdump.
279
280 --source
281 Interleave source code with assembly code. Enabled by default,
282 disable with --no-source.
283
284 --asm-raw
285 Show raw instruction encoding of assembly instructions.
286
287 --show-total-period
288 Show a column with the sum of periods.
289
290 -I, --show-info
291 Display extended information about the perf.data file. This adds
292 information which may be very large and thus may clutter the
293 display. It currently includes: cpu and numa topology of the host
294 system.
295
296 -b, --branch-stack
297 Use the addresses of sampled taken branches instead of the
298 instruction address to build the histograms. To generate meaningful
299 output, the perf.data file must have been obtained using perf
300 record -b or perf record --branch-filter xxx where xxx is a branch
301 filter option. perf report is able to auto-detect whether a
302 perf.data file contains branch stacks and it will automatically
303 switch to the branch view mode, unless --no-branch-stack is used.
304
305 --branch-history
306 Add the addresses of sampled taken branches to the callstack. This
307 allows to examine the path the program took to each sample. The
308 data collection must have used -b (or -j) and -g.
309
310 --objdump=<path>
311 Path to objdump binary.
312
313 --group
314 Show event group information together.
315
316 --demangle
317 Demangle symbol names to human readable form. It’s enabled by
318 default, disable with --no-demangle.
319
320 --demangle-kernel
321 Demangle kernel symbol names to human readable form (for C++
322 kernels).
323
324 --mem-mode
325 Use the data addresses of samples in addition to instruction
326 addresses to build the histograms. To generate meaningful output,
327 the perf.data file must have been obtained using perf record -d -W
328 and using a special event -e cpu/mem-loads/ or -e cpu/mem-stores/.
329 See perf mem for simpler access.
330
331 --percent-limit
332 Do not show entries which have an overhead under that percent.
333 (Default: 0).
334
335 --percentage
336 Determine how to display the overhead percentage of filtered
337 entries. Filters can be applied by --comms, --dsos and/or --symbols
338 options and Zoom operations on the TUI (thread, dso, etc).
339
340 "relative" means it´s relative to filtered entries only so that the
341 sum of shown entries will be always 100%. "absolute" means it retains
342 the original value before and after the filter is applied.
343
344 --header
345 Show header information in the perf.data file. This includes
346 various information like hostname, OS and perf version, cpu/mem
347 info, perf command line, event list and so on. Currently only
348 --stdio output supports this feature.
349
350 --header-only
351 Show only perf.data header (forces --stdio).
352
353 --full-source-path
354 Show the full path for source files for srcline output.
355
357 perf-stat(1), perf-annotate(1)
358
359
360
361perf 06/18/2019 PERF-REPORT(1)