1PERF-RECORD(1) perf Manual PERF-RECORD(1)
2
3
4
6 perf-record - Run a command and record its profile into perf.data
7
9 perf record [-e <EVENT> | --event=EVENT] [-l] [-a] <command>
10 perf record [-e <EVENT> | --event=EVENT] [-l] [-a] — <command> [<options>]
11
13 This command runs a command and gathers a performance counter profile
14 from it, into perf.data - without displaying anything.
15
16 This file can then be inspected later on, using perf report.
17
19 <command>...
20 Any command you can specify in a shell.
21
22 -e, --event=
23 Select the PMU event. Selection can be:
24
25 · a symbolic event name (use perf list to list all events)
26
27 · a raw PMU event (eventsel+umask) in the form of rNNN where NNN
28 is a hexadecimal event descriptor.
29
30 · a symbolically formed PMU event like pmu/param1=0x3,param2/
31 where param1, param2, etc are defined as formats for the PMU in
32 /sys/bus/event_sources/devices/<pmu>/format/*.
33
34 · a symbolically formed event like
35 pmu/config=M,config1=N,config3=K/
36
37 where M, N, K are numbers (in decimal, hex, octal format). Acceptable
38 values for each of ´config´, ´config1´ and ´config2´ are defined by
39 corresponding entries in /sys/bus/event_sources/devices/<pmu>/format/*
40 param1 and param2 are defined as formats for the PMU in:
41 /sys/bus/event_sources/devices/<pmu>/format/*
42
43 · a group of events surrounded by a pair of brace
44 ("{event1,event2,...}"). Each event is separated by commas and
45 the group should be quoted to prevent the shell interpretation.
46 You also need to use --group on "perf report" to view group
47 events together.
48
49 --filter=<filter>
50 Event filter.
51
52 -a, --all-cpus
53 System-wide collection from all CPUs.
54
55 -p, --pid=
56 Record events on existing process ID (comma separated list).
57
58 -t, --tid=
59 Record events on existing thread ID (comma separated list). This
60 option also disables inheritance by default. Enable it by adding
61 --inherit.
62
63 -u, --uid=
64 Record events in threads owned by uid. Name or number.
65
66 -r, --realtime=
67 Collect data with this RT SCHED_FIFO priority.
68
69 --no-buffering
70 Collect data without buffering.
71
72 -c, --count=
73 Event period to sample.
74
75 -o, --output=
76 Output file name.
77
78 -i, --no-inherit
79 Child tasks do not inherit counters.
80
81 -F, --freq=
82 Profile at this frequency.
83
84 -m, --mmap-pages=
85 Number of mmap data pages (must be a power of two) or size
86 specification with appended unit character - B/K/M/G. The size is
87 rounded up to have nearest pages power of two value.
88
89 --group
90 Put all events in a single event group. This precedes the --event
91 option and remains only for backward compatibility. See --event.
92
93 -g
94 Enables call-graph (stack chain/backtrace) recording.
95
96 --call-graph
97 Setup and enable call-graph (stack chain/backtrace) recording,
98 implies -g.
99
100 Allows specifying "fp" (frame pointer) or "dwarf"
101 (DWARF´s CFI - Call Frame Information) or "lbr"
102 (Hardware Last Branch Record facility) as the method to collect
103 the information used to show the call graphs.
104
105 In some systems, where binaries are build with gcc
106 --fomit-frame-pointer, using the "fp" method will produce bogus
107 call graphs, using "dwarf", if available (perf tools linked to
108 the libunwind library) should be used instead.
109 Using the "lbr" method doesn´t require any compiler options. It
110 will produce call graphs from the hardware LBR registers. The
111 main limition is that it is only available on new Intel
112 platforms, such as Haswell. It can only get user call chain. It
113 doesn´t work with branch stack sampling at the same time.
114
115 -q, --quiet
116 Don’t print any message, useful for scripting.
117
118 -v, --verbose
119 Be more verbose (show counter open errors, etc).
120
121 -s, --stat
122 Per thread counts.
123
124 -d, --data
125 Sample addresses.
126
127 -T, --timestamp
128 Sample timestamps. Use it with perf report -D to see the
129 timestamps, for instance.
130
131 -n, --no-samples
132 Don’t sample.
133
134 -R, --raw-samples
135 Collect raw sample records from all opened counters (default for
136 tracepoint counters).
137
138 -C, --cpu
139 Collect samples only on the list of CPUs provided. Multiple CPUs
140 can be provided as a comma-separated list with no space: 0,1.
141 Ranges of CPUs are specified with -: 0-2. In per-thread mode with
142 inheritance mode on (default), samples are captured only when the
143 thread executes on the designated CPUs. Default is to monitor all
144 CPUs.
145
146 -N, --no-buildid-cache
147 Do not update the buildid cache. This saves some overhead in
148 situations where the information in the perf.data file (which
149 includes buildids) is sufficient.
150
151 -G name,..., --cgroup name,...
152 monitor only in the container (cgroup) called "name". This option
153 is available only in per-cpu mode. The cgroup filesystem must be
154 mounted. All threads belonging to container "name" are monitored
155 when they run on the monitored CPUs. Multiple cgroups can be
156 provided. Each cgroup is applied to the corresponding event, i.e.,
157 first cgroup to first event, second cgroup to second event and so
158 on. It is possible to provide an empty cgroup (monitor all the
159 time) using, e.g., -G foo,,bar. Cgroups must have corresponding
160 events, i.e., they always refer to events defined earlier on the
161 command line.
162
163 -b, --branch-any
164 Enable taken branch stack sampling. Any type of taken branch may be
165 sampled. This is a shortcut for --branch-filter any. See
166 --branch-filter for more infos.
167
168 -j, --branch-filter
169 Enable taken branch stack sampling. Each sample captures a series
170 of consecutive taken branches. The number of branches captured with
171 each sample depends on the underlying hardware, the type of
172 branches of interest, and the executed code. It is possible to
173 select the types of branches captured by enabling filters. The
174 following filters are defined:
175
176 · any: any type of branches
177
178 · any_call: any function call or system call
179
180 · any_ret: any function return or system call return
181
182 · ind_call: any indirect branch
183
184 · u: only when the branch target is at the user level
185
186 · k: only when the branch target is in the kernel
187
188 · hv: only when the target is at the hypervisor level
189
190 · in_tx: only when the target is in a hardware transaction
191
192 · no_tx: only when the target is not in a hardware transaction
193
194 · abort_tx: only when the target is a hardware transaction abort
195 The option requires at least one branch type among any,
196 any_call, any_ret, ind_call. The privilege levels may be
197 omitted, in which case, the privilege levels of the associated
198 event are applied to the branch filter. Both kernel (k) and
199 hypervisor (hv) privilege levels are subject to permissions.
200 When sampling on multiple events, branch stack sampling is
201 enabled for all the sampling events. The sampled branch type is
202 the same for all events. The various filters must be specified
203 as a comma separated list: --branch-filter any_ret,u,k Note
204 that this feature may not be available on all processors.
205
206 --weight
207 Enable weightened sampling. An additional weight is recorded
208 per sample and can be displayed with the weight and
209 local_weight sort keys. This currently works for TSX abort
210 events and some memory events in precise mode on modern Intel
211 CPUs.
212
213 --transaction
214 Record transaction flags for transaction related events.
215
216 --per-thread
217 Use per-thread mmaps. By default per-cpu mmaps are created.
218 This option overrides that and uses per-thread mmaps. A
219 side-effect of that is that inheritance is automatically
220 disabled. --per-thread is ignored with a warning if combined
221 with -a or -C options.
222
223 -D, --delay=
224 After starting the program, wait msecs before measuring. This
225 is useful to filter out the startup phase of the program, which
226 is often very different.
227
228 -I, --intr-regs
229 Capture machine state (registers) at interrupt, i.e., on
230 counter overflows for each sample. List of captured registers
231 depends on the architecture. This option is off by default.
232
233 --running-time
234 Record running and enabled time for read events (:S)
235
237 perf-stat(1), perf-list(1)
238
239
240
241perf 06/18/2019 PERF-RECORD(1)