perf-record(1)

1PERF-RECORD(1)                    perf Manual                   PERF-RECORD(1)
2
3
4

NAME

6       perf-record - Run a command and record its profile into perf.data
7

SYNOPSIS

9       perf record [-e <EVENT> | --event=EVENT] [-a] <command>
10       perf record [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]
11

DESCRIPTION

13       This command runs a command and gathers a performance counter profile
14       from it, into perf.data - without displaying anything.
15
16       This file can then be inspected later on, using perf report.
17

OPTIONS

19       <command>...
20           Any command you can specify in a shell.
21
22       -e, --event=
23           Select the PMU event. Selection can be:
24
25           ·   a symbolic event name (use perf list to list all events)
26
27           ·   a raw PMU event (eventsel+umask) in the form of rNNN where NNN
28               is a hexadecimal event descriptor.
29
30           ·   a symbolically formed PMU event like pmu/param1=0x3,param2/
31               where param1, param2, etc are defined as formats for the PMU in
32               /sys/bus/event_source/devices/<pmu>/format/*.
33
34           ·   a symbolically formed event like
35               pmu/config=M,config1=N,config3=K/
36
37                   where M, N, K are numbers (in decimal, hex, octal format). Acceptable
38                   values for each of 'config', 'config1' and 'config2' are defined by
39                   corresponding entries in /sys/bus/event_source/devices/<pmu>/format/*
40                   param1 and param2 are defined as formats for the PMU in:
41                   /sys/bus/event_source/devices/<pmu>/format/*
42
43                   There are also some parameters which are not defined in .../<pmu>/format/*.
44                   These params can be used to overload default config values per event.
45                   Here are some common parameters:
46                   - 'period': Set event sampling period
47                   - 'freq': Set event sampling frequency
48                   - 'time': Disable/enable time stamping. Acceptable values are 1 for
49                             enabling time stamping. 0 for disabling time stamping.
50                             The default is 1.
51                   - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
52                                  FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
53                                  "no" for disable callgraph.
54                   - 'stack-size': user stack size for dwarf mode
55                   - 'name' : User defined event name. Single quotes (') may be used to
56                             escape symbols in the name from parsing by shell and tool
57                             like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
58
59                   See the linkperf:perf-list[1] man page for more parameters.
60
61                   Note: If user explicitly sets options which conflict with the params,
62                   the value set by the parameters will be overridden.
63
64                   Also not defined in .../<pmu>/format/* are PMU driver specific
65                   configuration parameters.  Any configuration parameter preceded by
66                   the letter '@' is not interpreted in user space and sent down directly
67                   to the PMU driver.  For example:
68
69                   perf record -e some_event/@cfg1,@cfg2=config/ ...
70
71                   will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated
72                   with the event for further processing.  There is no restriction on
73                   what the configuration parameters are, as long as their semantic is
74                   understood and supported by the PMU driver.
75
76           ·   a hardware breakpoint event in the form of
77               \mem:addr[/len][:access] where addr is the address in memory
78               you want to break in. Access is the memory access type (read,
79               write, execute) it can be passed as follows:
80               \mem:addr[:[r][w][x]]. len is the range, number of bytes from
81               specified addr, which the breakpoint will cover. If you want to
82               profile read-write accesses in 0x1000, just set mem:0x1000:rw.
83               If you want to profile write accesses in [0x1000~1008), just
84               set mem:0x1000/8:w.
85
86           ·   a group of events surrounded by a pair of brace
87               ("{event1,event2,...}"). Each event is separated by commas and
88               the group should be quoted to prevent the shell interpretation.
89               You also need to use --group on "perf report" to view group
90               events together.
91
92       --filter=<filter>
93           Event filter. This option should follow a event selector (-e) which
94           selects either tracepoint event(s) or a hardware trace PMU (e.g.
95           Intel PT or CoreSight).
96
97           ·   tracepoint filters
98
99                   In the case of tracepoints, multiple '--filter' options are combined
100                   using '&&'.
101
102           ·   address filters
103
104                   A hardware trace PMU advertises its ability to accept a number of
105                   address filters by specifying a non-zero value in
106                   /sys/bus/event_source/devices/<pmu>/nr_addr_filters.
107
108                   Address filters have the format:
109
110                   filter|start|stop|tracestop <start> [/ <size>] [@<file name>]
111
112                   Where:
113                   - 'filter': defines a region that will be traced.
114                   - 'start': defines an address at which tracing will begin.
115                   - 'stop': defines an address at which tracing will stop.
116                   - 'tracestop': defines a region in which tracing will stop.
117
118                   <file name> is the name of the object file, <start> is the offset to the
119                   code to trace in that file, and <size> is the size of the region to
120                   trace. 'start' and 'stop' filters need not specify a <size>.
121
122                   If no object file is specified then the kernel is assumed, in which case
123                   the start address must be a current kernel memory address.
124
125                   <start> can also be specified by providing the name of a symbol. If the
126                   symbol name is not unique, it can be disambiguated by inserting #n where
127                   'n' selects the n'th symbol in address order. Alternately #0, #g or #G
128                   select only a global symbol. <size> can also be specified by providing
129                   the name of a symbol, in which case the size is calculated to the end
130                   of that symbol. For 'filter' and 'tracestop' filters, if <size> is
131                   omitted and <start> is a symbol, then the size is calculated to the end
132                   of that symbol.
133
134                   If <size> is omitted and <start> is '*', then the start and size will
135                   be calculated from the first and last symbols, i.e. to trace the whole
136                   file.
137
138                   If symbol names (or '*') are provided, they must be surrounded by white
139                   space.
140
141                   The filter passed to the kernel is not necessarily the same as entered.
142                   To see the filter that is passed, use the -v option.
143
144                   The kernel may not be able to configure a trace region if it is not
145                   within a single mapping.  MMAP events (or /proc/<pid>/maps) can be
146                   examined to determine if that is a possibility.
147
148                   Multiple filters can be separated with space or comma.
149
150       --exclude-perf
151           Don’t record events issued by perf itself. This option should
152           follow a event selector (-e) which selects tracepoint event(s). It
153           adds a filter expression common_pid != $PERFPID to filters. If
154           other --filter exists, the new filter expression will be combined
155           with them by &&.
156
157       -a, --all-cpus
158           System-wide collection from all CPUs (default if no target is
159           specified).
160
161       -p, --pid=
162           Record events on existing process ID (comma separated list).
163
164       -t, --tid=
165           Record events on existing thread ID (comma separated list). This
166           option also disables inheritance by default. Enable it by adding
167           --inherit.
168
169       -u, --uid=
170           Record events in threads owned by uid. Name or number.
171
172       -r, --realtime=
173           Collect data with this RT SCHED_FIFO priority.
174
175       --no-buffering
176           Collect data without buffering.
177
178       -c, --count=
179           Event period to sample.
180
181       -o, --output=
182           Output file name.
183
184       -i, --no-inherit
185           Child tasks do not inherit counters.
186
187       -F, --freq=
188           Profile at this frequency. Use max to use the currently maximum
189           allowed frequency, i.e. the value in the
190           kernel.perf_event_max_sample_rate sysctl. Will throttle down to the
191           currently maximum allowed frequency. See --strict-freq.
192
193       --strict-freq
194           Fail if the specified frequency can’t be used.
195
196       -m, --mmap-pages=
197           Number of mmap data pages (must be a power of two) or size
198           specification with appended unit character - B/K/M/G. The size is
199           rounded up to have nearest pages power of two value. Also, by
200           adding a comma, the number of mmap pages for AUX area tracing can
201           be specified.
202
203       --group
204           Put all events in a single event group. This precedes the --event
205           option and remains only for backward compatibility. See --event.
206
207       -g
208           Enables call-graph (stack chain/backtrace) recording.
209
210       --call-graph
211           Setup and enable call-graph (stack chain/backtrace) recording,
212           implies -g. Default is "fp".
213
214               Allows specifying "fp" (frame pointer) or "dwarf"
215               (DWARF's CFI - Call Frame Information) or "lbr"
216               (Hardware Last Branch Record facility) as the method to collect
217               the information used to show the call graphs.
218
219               In some systems, where binaries are build with gcc
220               --fomit-frame-pointer, using the "fp" method will produce bogus
221               call graphs, using "dwarf", if available (perf tools linked to
222               the libunwind or libdw library) should be used instead.
223               Using the "lbr" method doesn't require any compiler options. It
224               will produce call graphs from the hardware LBR registers. The
225               main limitation is that it is only available on new Intel
226               platforms, such as Haswell. It can only get user call chain. It
227               doesn't work with branch stack sampling at the same time.
228
229               When "dwarf" recording is used, perf also records (user) stack dump
230               when sampled.  Default size of the stack dump is 8192 (bytes).
231               User can change the size by passing the size after comma like
232               "--call-graph dwarf,4096".
233
234       -q, --quiet
235           Don’t print any message, useful for scripting.
236
237       -v, --verbose
238           Be more verbose (show counter open errors, etc).
239
240       -s, --stat
241           Record per-thread event counts. Use it with perf report -T to see
242           the values.
243
244       -d, --data
245           Record the sample virtual addresses.
246
247       --phys-data
248           Record the sample physical addresses.
249
250       -T, --timestamp
251           Record the sample timestamps. Use it with perf report -D to see the
252           timestamps, for instance.
253
254       -P, --period
255           Record the sample period.
256
257       --sample-cpu
258           Record the sample cpu.
259
260       -n, --no-samples
261           Don’t sample.
262
263       -R, --raw-samples
264           Collect raw sample records from all opened counters (default for
265           tracepoint counters).
266
267       -C, --cpu
268           Collect samples only on the list of CPUs provided. Multiple CPUs
269           can be provided as a comma-separated list with no space: 0,1.
270           Ranges of CPUs are specified with -: 0-2. In per-thread mode with
271           inheritance mode on (default), samples are captured only when the
272           thread executes on the designated CPUs. Default is to monitor all
273           CPUs.
274
275       -B, --no-buildid
276           Do not save the build ids of binaries in the perf.data files. This
277           skips post processing after recording, which sometimes makes the
278           final step in the recording process to take a long time, as it
279           needs to process all events looking for mmap records. The downside
280           is that it can misresolve symbols if the workload binaries used
281           when recording get locally rebuilt or upgraded, because the only
282           key available in this case is the pathname. You can also set the
283           "record.build-id" config variable to 'skip to have this behaviour
284           permanently.
285
286       -N, --no-buildid-cache
287           Do not update the buildid cache. This saves some overhead in
288           situations where the information in the perf.data file (which
289           includes buildids) is sufficient. You can also set the
290           "record.build-id" config variable to no-cache to have the same
291           effect.
292
293       -G name,..., --cgroup name,...
294           monitor only in the container (cgroup) called "name". This option
295           is available only in per-cpu mode. The cgroup filesystem must be
296           mounted. All threads belonging to container "name" are monitored
297           when they run on the monitored CPUs. Multiple cgroups can be
298           provided. Each cgroup is applied to the corresponding event, i.e.,
299           first cgroup to first event, second cgroup to second event and so
300           on. It is possible to provide an empty cgroup (monitor all the
301           time) using, e.g., -G foo,,bar. Cgroups must have corresponding
302           events, i.e., they always refer to events defined earlier on the
303           command line. If the user wants to track multiple events for a
304           specific cgroup, the user can use -e e1 -e e2 -G foo,foo or just
305           use -e e1 -e e2 -G foo.
306
307       If wanting to monitor, say, cycles for a cgroup and also for system
308       wide, this command line can be used: perf stat -e cycles -G cgroup_name
309       -a -e cycles.
310
311       -b, --branch-any
312           Enable taken branch stack sampling. Any type of taken branch may be
313           sampled. This is a shortcut for --branch-filter any. See
314           --branch-filter for more infos.
315
316       -j, --branch-filter
317           Enable taken branch stack sampling. Each sample captures a series
318           of consecutive taken branches. The number of branches captured with
319           each sample depends on the underlying hardware, the type of
320           branches of interest, and the executed code. It is possible to
321           select the types of branches captured by enabling filters. The
322           following filters are defined:
323
324           ·   any: any type of branches
325
326           ·   any_call: any function call or system call
327
328           ·   any_ret: any function return or system call return
329
330           ·   ind_call: any indirect branch
331
332           ·   call: direct calls, including far (to/from kernel) calls
333
334           ·   u: only when the branch target is at the user level
335
336           ·   k: only when the branch target is in the kernel
337
338           ·   hv: only when the target is at the hypervisor level
339
340           ·   in_tx: only when the target is in a hardware transaction
341
342           ·   no_tx: only when the target is not in a hardware transaction
343
344           ·   abort_tx: only when the target is a hardware transaction abort
345
346           ·   cond: conditional branches
347
348           ·   save_type: save branch type during sampling in case binary is
349               not available later
350
351           The option requires at least one branch type among any, any_call,
352           any_ret, ind_call, cond. The privilege levels may be omitted, in
353           which case, the privilege levels of the associated event are
354           applied to the branch filter. Both kernel (k) and hypervisor (hv)
355           privilege levels are subject to permissions. When sampling on
356           multiple events, branch stack sampling is enabled for all the
357           sampling events. The sampled branch type is the same for all
358           events. The various filters must be specified as a comma separated
359           list: --branch-filter any_ret,u,k Note that this feature may not be
360           available on all processors.
361
362       --weight
363           Enable weightened sampling. An additional weight is recorded per
364           sample and can be displayed with the weight and local_weight sort
365           keys. This currently works for TSX abort events and some memory
366           events in precise mode on modern Intel CPUs.
367
368       --namespaces
369           Record events of type PERF_RECORD_NAMESPACES.
370
371       --transaction
372           Record transaction flags for transaction related events.
373
374       --per-thread
375           Use per-thread mmaps. By default per-cpu mmaps are created. This
376           option overrides that and uses per-thread mmaps. A side-effect of
377           that is that inheritance is automatically disabled. --per-thread is
378           ignored with a warning if combined with -a or -C options.
379
380       -D, --delay=
381           After starting the program, wait msecs before measuring. This is
382           useful to filter out the startup phase of the program, which is
383           often very different.
384
385       -I, --intr-regs
386           Capture machine state (registers) at interrupt, i.e., on counter
387           overflows for each sample. List of captured registers depends on
388           the architecture. This option is off by default. It is possible to
389           select the registers to sample using their symbolic names, e.g. on
390           x86, ax, si. To list the available registers use --intr-regs=\?. To
391           name registers, pass a comma separated list such as
392           --intr-regs=ax,bx. The list of register is architecture dependent.
393
394       --user-regs
395           Capture user registers at sample time. Same arguments as -I.
396
397       --running-time
398           Record running and enabled time for read events (:S)
399
400       -k, --clockid
401           Sets the clock id to use for the various time fields in the
402           perf_event_type records. See clock_gettime(). In particular
403           CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are supported, some events
404           might also allow CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
405
406       -S, --snapshot
407           Select AUX area tracing Snapshot Mode. This option is valid only
408           with an AUX area tracing event. Optionally the number of bytes to
409           capture per snapshot can be specified. In Snapshot Mode, trace data
410           is captured only when signal SIGUSR2 is received.
411
412       --proc-map-timeout
413           When processing pre-existing threads /proc/XXX/mmap, it may take a
414           long time, because the file may be huge. A time out is needed in
415           such cases. This option sets the time out limit. The default value
416           is 500 ms.
417
418       --switch-events
419           Record context switch events i.e. events of type PERF_RECORD_SWITCH
420           or PERF_RECORD_SWITCH_CPU_WIDE.
421
422       --clang-path=PATH
423           Path to clang binary to use for compiling BPF scriptlets. (enabled
424           when BPF support is on)
425
426       --clang-opt=OPTIONS
427           Options passed to clang when compiling BPF scriptlets. (enabled
428           when BPF support is on)
429
430       --vmlinux=PATH
431           Specify vmlinux path which has debuginfo. (enabled when BPF
432           prologue is on)
433
434       --buildid-all
435           Record build-id of all DSOs regardless whether it’s actually hit or
436           not.
437
438       --all-kernel
439           Configure all used events to run in kernel space.
440
441       --all-user
442           Configure all used events to run in user space.
443
444       --timestamp-filename Append timestamp to output file name.
445
446       --timestamp-boundary
447           Record timestamp boundary (time of first/last samples).
448
449       --switch-output[=mode]
450           Generate multiple perf.data files, timestamp prefixed, switching to
451           a new one based on mode value: "signal" - when receiving a SIGUSR2
452           (default value) or <size> - when reaching the size threshold, size
453           is expected to be a number with appended unit character - B/K/M/G
454           <time> - when reaching the time threshold, size is expected to be a
455           number with appended unit character - s/m/h/d
456
457               Note: the precision of  the size  threshold  hugely depends
458               on your configuration  - the number and size of  your  ring
459               buffers (-m). It is generally more precise for higher sizes
460               (like >5M), for lower values expect different sizes.
461
462       A possible use case is to, given an external event, slice the perf.data
463       file that gets then processed, possibly via a perf script, to decide if
464       that particular perf.data snapshot should be kept or not.
465
466       Implies --timestamp-filename, --no-buildid and --no-buildid-cache. The
467       reason for the latter two is to reduce the data file switching
468       overhead. You can still switch them on with:
469
470           --switch-output --no-no-buildid  --no-no-buildid-cache
471
472       --dry-run
473           Parse options then exit. --dry-run can be used to detect errors in
474           cmdline options.
475
476       perf record --dry-run -e can act as a BPF script compiler if
477       llvm.dump-obj in config file is set to true.
478
479       --tail-synthesize
480           Instead of collecting non-sample events (for example, fork, comm,
481           mmap) at the beginning of record, collect them during finalizing an
482           output file. The collected non-sample events reflects the status of
483           the system when record is finished.
484
485       --overwrite
486           Makes all events use an overwritable ring buffer. An overwritable
487           ring buffer works like a flight recorder: when it gets full, the
488           kernel will overwrite the oldest records, that thus will never make
489           it to the perf.data file.
490
491       When --overwrite and --switch-output are used perf records and drops
492       events until it receives a signal, meaning that something unusual was
493       detected that warrants taking a snapshot of the most current events,
494       those fitting in the ring buffer at that moment.
495
496       overwrite attribute can also be set or canceled for an event using
497       config terms. For example: cycles/overwrite/ and
498       instructions/no-overwrite/.
499
500       Implies --tail-synthesize.
501

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO