1PERF-RECORD(1) perf Manual PERF-RECORD(1)
2
3
4
6 perf-record - Run a command and record its profile into perf.data
7
9 perf record [-e <EVENT> | --event=EVENT] [-a] <command>
10 perf record [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]
11
13 This command runs a command and gathers a performance counter profile
14 from it, into perf.data - without displaying anything.
15
16 This file can then be inspected later on, using perf report.
17
19 <command>...
20 Any command you can specify in a shell.
21
22 -e, --event=
23 Select the PMU event. Selection can be:
24
25 · a symbolic event name (use perf list to list all events)
26
27 · a raw PMU event (eventsel+umask) in the form of rNNN where NNN
28 is a hexadecimal event descriptor.
29
30 · a symbolically formed PMU event like pmu/param1=0x3,param2/
31 where param1, param2, etc are defined as formats for the PMU in
32 /sys/bus/event_source/devices/<pmu>/format/*.
33
34 · a symbolically formed event like
35 pmu/config=M,config1=N,config3=K/
36
37 where M, N, K are numbers (in decimal, hex, octal format). Acceptable
38 values for each of 'config', 'config1' and 'config2' are defined by
39 corresponding entries in /sys/bus/event_source/devices/<pmu>/format/*
40 param1 and param2 are defined as formats for the PMU in:
41 /sys/bus/event_source/devices/<pmu>/format/*
42
43 There are also some parameters which are not defined in .../<pmu>/format/*.
44 These params can be used to overload default config values per event.
45 Here are some common parameters:
46 - 'period': Set event sampling period
47 - 'freq': Set event sampling frequency
48 - 'time': Disable/enable time stamping. Acceptable values are 1 for
49 enabling time stamping. 0 for disabling time stamping.
50 The default is 1.
51 - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
52 FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
53 "no" for disable callgraph.
54 - 'stack-size': user stack size for dwarf mode
55 - 'name' : User defined event name. Single quotes (') may be used to
56 escape symbols in the name from parsing by shell and tool
57 like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
58
59 See the linkperf:perf-list[1] man page for more parameters.
60
61 Note: If user explicitly sets options which conflict with the params,
62 the value set by the parameters will be overridden.
63
64 Also not defined in .../<pmu>/format/* are PMU driver specific
65 configuration parameters. Any configuration parameter preceded by
66 the letter '@' is not interpreted in user space and sent down directly
67 to the PMU driver. For example:
68
69 perf record -e some_event/@cfg1,@cfg2=config/ ...
70
71 will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated
72 with the event for further processing. There is no restriction on
73 what the configuration parameters are, as long as their semantic is
74 understood and supported by the PMU driver.
75
76 · a hardware breakpoint event in the form of
77 \mem:addr[/len][:access] where addr is the address in memory
78 you want to break in. Access is the memory access type (read,
79 write, execute) it can be passed as follows:
80 \mem:addr[:[r][w][x]]. len is the range, number of bytes from
81 specified addr, which the breakpoint will cover. If you want to
82 profile read-write accesses in 0x1000, just set mem:0x1000:rw.
83 If you want to profile write accesses in [0x1000~1008), just
84 set mem:0x1000/8:w.
85
86 · a group of events surrounded by a pair of brace
87 ("{event1,event2,...}"). Each event is separated by commas and
88 the group should be quoted to prevent the shell interpretation.
89 You also need to use --group on "perf report" to view group
90 events together.
91
92 --filter=<filter>
93 Event filter. This option should follow a event selector (-e) which
94 selects either tracepoint event(s) or a hardware trace PMU (e.g.
95 Intel PT or CoreSight).
96
97 · tracepoint filters
98
99 In the case of tracepoints, multiple '--filter' options are combined
100 using '&&'.
101
102 · address filters
103
104 A hardware trace PMU advertises its ability to accept a number of
105 address filters by specifying a non-zero value in
106 /sys/bus/event_source/devices/<pmu>/nr_addr_filters.
107
108 Address filters have the format:
109
110 filter|start|stop|tracestop <start> [/ <size>] [@<file name>]
111
112 Where:
113 - 'filter': defines a region that will be traced.
114 - 'start': defines an address at which tracing will begin.
115 - 'stop': defines an address at which tracing will stop.
116 - 'tracestop': defines a region in which tracing will stop.
117
118 <file name> is the name of the object file, <start> is the offset to the
119 code to trace in that file, and <size> is the size of the region to
120 trace. 'start' and 'stop' filters need not specify a <size>.
121
122 If no object file is specified then the kernel is assumed, in which case
123 the start address must be a current kernel memory address.
124
125 <start> can also be specified by providing the name of a symbol. If the
126 symbol name is not unique, it can be disambiguated by inserting #n where
127 'n' selects the n'th symbol in address order. Alternately #0, #g or #G
128 select only a global symbol. <size> can also be specified by providing
129 the name of a symbol, in which case the size is calculated to the end
130 of that symbol. For 'filter' and 'tracestop' filters, if <size> is
131 omitted and <start> is a symbol, then the size is calculated to the end
132 of that symbol.
133
134 If <size> is omitted and <start> is '*', then the start and size will
135 be calculated from the first and last symbols, i.e. to trace the whole
136 file.
137
138 If symbol names (or '*') are provided, they must be surrounded by white
139 space.
140
141 The filter passed to the kernel is not necessarily the same as entered.
142 To see the filter that is passed, use the -v option.
143
144 The kernel may not be able to configure a trace region if it is not
145 within a single mapping. MMAP events (or /proc/<pid>/maps) can be
146 examined to determine if that is a possibility.
147
148 Multiple filters can be separated with space or comma.
149
150 --exclude-perf
151 Don’t record events issued by perf itself. This option should
152 follow a event selector (-e) which selects tracepoint event(s). It
153 adds a filter expression common_pid != $PERFPID to filters. If
154 other --filter exists, the new filter expression will be combined
155 with them by &&.
156
157 -a, --all-cpus
158 System-wide collection from all CPUs (default if no target is
159 specified).
160
161 -p, --pid=
162 Record events on existing process ID (comma separated list).
163
164 -t, --tid=
165 Record events on existing thread ID (comma separated list). This
166 option also disables inheritance by default. Enable it by adding
167 --inherit.
168
169 -u, --uid=
170 Record events in threads owned by uid. Name or number.
171
172 -r, --realtime=
173 Collect data with this RT SCHED_FIFO priority.
174
175 --no-buffering
176 Collect data without buffering.
177
178 -c, --count=
179 Event period to sample.
180
181 -o, --output=
182 Output file name.
183
184 -i, --no-inherit
185 Child tasks do not inherit counters.
186
187 -F, --freq=
188 Profile at this frequency. Use max to use the currently maximum
189 allowed frequency, i.e. the value in the
190 kernel.perf_event_max_sample_rate sysctl. Will throttle down to the
191 currently maximum allowed frequency. See --strict-freq.
192
193 --strict-freq
194 Fail if the specified frequency can’t be used.
195
196 -m, --mmap-pages=
197 Number of mmap data pages (must be a power of two) or size
198 specification with appended unit character - B/K/M/G. The size is
199 rounded up to have nearest pages power of two value. Also, by
200 adding a comma, the number of mmap pages for AUX area tracing can
201 be specified.
202
203 --group
204 Put all events in a single event group. This precedes the --event
205 option and remains only for backward compatibility. See --event.
206
207 -g
208 Enables call-graph (stack chain/backtrace) recording.
209
210 --call-graph
211 Setup and enable call-graph (stack chain/backtrace) recording,
212 implies -g. Default is "fp".
213
214 Allows specifying "fp" (frame pointer) or "dwarf"
215 (DWARF's CFI - Call Frame Information) or "lbr"
216 (Hardware Last Branch Record facility) as the method to collect
217 the information used to show the call graphs.
218
219 In some systems, where binaries are build with gcc
220 --fomit-frame-pointer, using the "fp" method will produce bogus
221 call graphs, using "dwarf", if available (perf tools linked to
222 the libunwind or libdw library) should be used instead.
223 Using the "lbr" method doesn't require any compiler options. It
224 will produce call graphs from the hardware LBR registers. The
225 main limitation is that it is only available on new Intel
226 platforms, such as Haswell. It can only get user call chain. It
227 doesn't work with branch stack sampling at the same time.
228
229 When "dwarf" recording is used, perf also records (user) stack dump
230 when sampled. Default size of the stack dump is 8192 (bytes).
231 User can change the size by passing the size after comma like
232 "--call-graph dwarf,4096".
233
234 -q, --quiet
235 Don’t print any message, useful for scripting.
236
237 -v, --verbose
238 Be more verbose (show counter open errors, etc).
239
240 -s, --stat
241 Record per-thread event counts. Use it with perf report -T to see
242 the values.
243
244 -d, --data
245 Record the sample virtual addresses.
246
247 --phys-data
248 Record the sample physical addresses.
249
250 -T, --timestamp
251 Record the sample timestamps. Use it with perf report -D to see the
252 timestamps, for instance.
253
254 -P, --period
255 Record the sample period.
256
257 --sample-cpu
258 Record the sample cpu.
259
260 -n, --no-samples
261 Don’t sample.
262
263 -R, --raw-samples
264 Collect raw sample records from all opened counters (default for
265 tracepoint counters).
266
267 -C, --cpu
268 Collect samples only on the list of CPUs provided. Multiple CPUs
269 can be provided as a comma-separated list with no space: 0,1.
270 Ranges of CPUs are specified with -: 0-2. In per-thread mode with
271 inheritance mode on (default), samples are captured only when the
272 thread executes on the designated CPUs. Default is to monitor all
273 CPUs.
274
275 -B, --no-buildid
276 Do not save the build ids of binaries in the perf.data files. This
277 skips post processing after recording, which sometimes makes the
278 final step in the recording process to take a long time, as it
279 needs to process all events looking for mmap records. The downside
280 is that it can misresolve symbols if the workload binaries used
281 when recording get locally rebuilt or upgraded, because the only
282 key available in this case is the pathname. You can also set the
283 "record.build-id" config variable to 'skip to have this behaviour
284 permanently.
285
286 -N, --no-buildid-cache
287 Do not update the buildid cache. This saves some overhead in
288 situations where the information in the perf.data file (which
289 includes buildids) is sufficient. You can also set the
290 "record.build-id" config variable to no-cache to have the same
291 effect.
292
293 -G name,..., --cgroup name,...
294 monitor only in the container (cgroup) called "name". This option
295 is available only in per-cpu mode. The cgroup filesystem must be
296 mounted. All threads belonging to container "name" are monitored
297 when they run on the monitored CPUs. Multiple cgroups can be
298 provided. Each cgroup is applied to the corresponding event, i.e.,
299 first cgroup to first event, second cgroup to second event and so
300 on. It is possible to provide an empty cgroup (monitor all the
301 time) using, e.g., -G foo,,bar. Cgroups must have corresponding
302 events, i.e., they always refer to events defined earlier on the
303 command line. If the user wants to track multiple events for a
304 specific cgroup, the user can use -e e1 -e e2 -G foo,foo or just
305 use -e e1 -e e2 -G foo.
306
307 If wanting to monitor, say, cycles for a cgroup and also for system
308 wide, this command line can be used: perf stat -e cycles -G cgroup_name
309 -a -e cycles.
310
311 -b, --branch-any
312 Enable taken branch stack sampling. Any type of taken branch may be
313 sampled. This is a shortcut for --branch-filter any. See
314 --branch-filter for more infos.
315
316 -j, --branch-filter
317 Enable taken branch stack sampling. Each sample captures a series
318 of consecutive taken branches. The number of branches captured with
319 each sample depends on the underlying hardware, the type of
320 branches of interest, and the executed code. It is possible to
321 select the types of branches captured by enabling filters. The
322 following filters are defined:
323
324 · any: any type of branches
325
326 · any_call: any function call or system call
327
328 · any_ret: any function return or system call return
329
330 · ind_call: any indirect branch
331
332 · call: direct calls, including far (to/from kernel) calls
333
334 · u: only when the branch target is at the user level
335
336 · k: only when the branch target is in the kernel
337
338 · hv: only when the target is at the hypervisor level
339
340 · in_tx: only when the target is in a hardware transaction
341
342 · no_tx: only when the target is not in a hardware transaction
343
344 · abort_tx: only when the target is a hardware transaction abort
345
346 · cond: conditional branches
347
348 · save_type: save branch type during sampling in case binary is
349 not available later
350
351 The option requires at least one branch type among any, any_call,
352 any_ret, ind_call, cond. The privilege levels may be omitted, in
353 which case, the privilege levels of the associated event are
354 applied to the branch filter. Both kernel (k) and hypervisor (hv)
355 privilege levels are subject to permissions. When sampling on
356 multiple events, branch stack sampling is enabled for all the
357 sampling events. The sampled branch type is the same for all
358 events. The various filters must be specified as a comma separated
359 list: --branch-filter any_ret,u,k Note that this feature may not be
360 available on all processors.
361
362 --weight
363 Enable weightened sampling. An additional weight is recorded per
364 sample and can be displayed with the weight and local_weight sort
365 keys. This currently works for TSX abort events and some memory
366 events in precise mode on modern Intel CPUs.
367
368 --namespaces
369 Record events of type PERF_RECORD_NAMESPACES.
370
371 --transaction
372 Record transaction flags for transaction related events.
373
374 --per-thread
375 Use per-thread mmaps. By default per-cpu mmaps are created. This
376 option overrides that and uses per-thread mmaps. A side-effect of
377 that is that inheritance is automatically disabled. --per-thread is
378 ignored with a warning if combined with -a or -C options.
379
380 -D, --delay=
381 After starting the program, wait msecs before measuring. This is
382 useful to filter out the startup phase of the program, which is
383 often very different.
384
385 -I, --intr-regs
386 Capture machine state (registers) at interrupt, i.e., on counter
387 overflows for each sample. List of captured registers depends on
388 the architecture. This option is off by default. It is possible to
389 select the registers to sample using their symbolic names, e.g. on
390 x86, ax, si. To list the available registers use --intr-regs=\?. To
391 name registers, pass a comma separated list such as
392 --intr-regs=ax,bx. The list of register is architecture dependent.
393
394 --user-regs
395 Capture user registers at sample time. Same arguments as -I.
396
397 --running-time
398 Record running and enabled time for read events (:S)
399
400 -k, --clockid
401 Sets the clock id to use for the various time fields in the
402 perf_event_type records. See clock_gettime(). In particular
403 CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are supported, some events
404 might also allow CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
405
406 -S, --snapshot
407 Select AUX area tracing Snapshot Mode. This option is valid only
408 with an AUX area tracing event. Optionally the number of bytes to
409 capture per snapshot can be specified. In Snapshot Mode, trace data
410 is captured only when signal SIGUSR2 is received.
411
412 --proc-map-timeout
413 When processing pre-existing threads /proc/XXX/mmap, it may take a
414 long time, because the file may be huge. A time out is needed in
415 such cases. This option sets the time out limit. The default value
416 is 500 ms.
417
418 --switch-events
419 Record context switch events i.e. events of type PERF_RECORD_SWITCH
420 or PERF_RECORD_SWITCH_CPU_WIDE.
421
422 --clang-path=PATH
423 Path to clang binary to use for compiling BPF scriptlets. (enabled
424 when BPF support is on)
425
426 --clang-opt=OPTIONS
427 Options passed to clang when compiling BPF scriptlets. (enabled
428 when BPF support is on)
429
430 --vmlinux=PATH
431 Specify vmlinux path which has debuginfo. (enabled when BPF
432 prologue is on)
433
434 --buildid-all
435 Record build-id of all DSOs regardless whether it’s actually hit or
436 not.
437
438 --all-kernel
439 Configure all used events to run in kernel space.
440
441 --all-user
442 Configure all used events to run in user space.
443
444 --timestamp-filename Append timestamp to output file name.
445
446 --timestamp-boundary
447 Record timestamp boundary (time of first/last samples).
448
449 --switch-output[=mode]
450 Generate multiple perf.data files, timestamp prefixed, switching to
451 a new one based on mode value: "signal" - when receiving a SIGUSR2
452 (default value) or <size> - when reaching the size threshold, size
453 is expected to be a number with appended unit character - B/K/M/G
454 <time> - when reaching the time threshold, size is expected to be a
455 number with appended unit character - s/m/h/d
456
457 Note: the precision of the size threshold hugely depends
458 on your configuration - the number and size of your ring
459 buffers (-m). It is generally more precise for higher sizes
460 (like >5M), for lower values expect different sizes.
461
462 A possible use case is to, given an external event, slice the perf.data
463 file that gets then processed, possibly via a perf script, to decide if
464 that particular perf.data snapshot should be kept or not.
465
466 Implies --timestamp-filename, --no-buildid and --no-buildid-cache. The
467 reason for the latter two is to reduce the data file switching
468 overhead. You can still switch them on with:
469
470 --switch-output --no-no-buildid --no-no-buildid-cache
471
472 --dry-run
473 Parse options then exit. --dry-run can be used to detect errors in
474 cmdline options.
475
476 perf record --dry-run -e can act as a BPF script compiler if
477 llvm.dump-obj in config file is set to true.
478
479 --tail-synthesize
480 Instead of collecting non-sample events (for example, fork, comm,
481 mmap) at the beginning of record, collect them during finalizing an
482 output file. The collected non-sample events reflects the status of
483 the system when record is finished.
484
485 --overwrite
486 Makes all events use an overwritable ring buffer. An overwritable
487 ring buffer works like a flight recorder: when it gets full, the
488 kernel will overwrite the oldest records, that thus will never make
489 it to the perf.data file.
490
491 When --overwrite and --switch-output are used perf records and drops
492 events until it receives a signal, meaning that something unusual was
493 detected that warrants taking a snapshot of the most current events,
494 those fitting in the ring buffer at that moment.
495
496 overwrite attribute can also be set or canceled for an event using
497 config terms. For example: cycles/overwrite/ and
498 instructions/no-overwrite/.
499
500 Implies --tail-synthesize.
501
503 perf-stat(1), perf-list(1)
504
505
506
507perf 09/24/2019 PERF-RECORD(1)