1profile(8) System Manager's Manual profile(8)
2
3
4
6 profile - Profile CPU usage by sampling stack traces. Uses Linux
7 eBPF/bcc.
8
10 profile [-adfh] [-p PID | -L TID] [-U | -K] [-F FREQUENCY | -c COUNT]
11 [--stack-storage-size COUNT] [-C CPU] [--cgroupmap CGROUPMAP]
12 [--mntnsmap MAPPATH] [duration]
13
15 This is a CPU profiler. It works by taking samples of stack traces at
16 timed intervals. It will help you understand and quantify CPU usage:
17 which code is executing, and by how much, including both user-level and
18 kernel code.
19
20 By default this samples at 49 Hertz (samples per second), across all
21 CPUs. This frequency can be tuned using a command line option. The
22 reason for 49, and not 50, is to avoid lock-step sampling.
23
24 This is also an efficient profiler, as stack traces are frequency
25 counted in kernel context, rather than passing each stack to user space
26 for frequency counting there. Only the unique stacks and counts are
27 passed to user space at the end of the profile, greatly reducing the
28 kernel<->user transfer.
29
31 CONFIG_BPF and bcc.
32
33 This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See
34 tools/old for an older version that may work on Linux 4.6 - 4.8.
35
37 -h Print usage message.
38
39 -p PID Trace process with one or more comma separated PIDs only (fil‐
40 tered in-kernel).
41
42 -L TID Trace thread with one or more comma separated TIDs only (fil‐
43 tered in-kernel).
44
45 -F frequency
46 Frequency to sample stacks.
47
48 -c count
49 Sample stacks every one in this many events.
50
51 -f Print output in folded stack format.
52
53 -d Include an output delimiter between kernel and user stacks (ei‐
54 ther "--", or, in folded mode, "-").
55
56 -U Show stacks from user space only (no kernel space stacks).
57
58 -K Show stacks from kernel space only (no user space stacks).
59
60 -I Include CPU idle stacks (by default these are excluded).
61
62 --stack-storage-size COUNT
63 The maximum number of unique stack traces that the kernel will
64 count (default 16384). If the sampled count exceeds this, a
65 warning will be printed.
66
67 -C cpu Collect stacks only from specified cpu.
68
69 --cgroupmap MAPPATH
70 Profile cgroups in this BPF map only (filtered in-kernel).
71
72 duration
73 Duration to trace, in seconds.
74
76 Profile (sample) stack traces system-wide at 49 Hertz (samples per sec‐
77 ond) until Ctrl-C:
78 # profile
79
80 Profile for 5 seconds only:
81 # profile 5
82
83 Profile at 99 Hertz for 5 seconds only:
84 # profile -F 99 5
85
86 Profile 1 in a million events for 5 seconds only:
87 # profile -c 1000000 5
88
89 Profile process with PID 181 only:
90 # profile -p 181
91
92 Profile thread with TID 181 only:
93 # profile -L 181
94
95 Profile for 5 seconds and output in folded stack format (suitable as
96 input for flame graphs), including a delimiter between kernel and user
97 stacks:
98 # profile -df 5
99
100 Profile kernel stacks only:
101 # profile -K
102
103 Profile a set of cgroups only (see special_filtering.md from bcc
104 sources for more details):
105 # profile --cgroupmap /sys/fs/bpf/test01
106
108 See "[unknown]" frames with bogus addresses? This can happen for dif‐
109 ferent reasons. Your best approach is to get Linux perf to work first,
110 and then to try this tool. Eg, "perf record -F 49 -a -g -- sleep 1;
111 perf script", and to check for unknown frames there.
112
113 The most common reason for "[unknown]" frames is that the target soft‐
114 ware has not been compiled with frame pointers, and so we can't use
115 that simple method for walking the stack. The fix in that case is to
116 use software that does have frame pointers, eg, gcc -fno-omit-frame-
117 pointer, or Java's -XX:+PreserveFramePointer.
118
119 Another reason for "[unknown]" frames is JIT compilers, which don't use
120 a traditional symbol table. The fix in that case is to populate a
121 /tmp/perf-PID.map file with the symbols, which this tool should read.
122 How you do this depends on the runtime (Java, Node.js).
123
124 If you seem to have unrelated samples in the output, check for other
125 sampling or tracing tools that may be running. The current version of
126 this tool can include their events if profiling happened concurrently.
127 Those samples may be filtered in a future version.
128
130 This is an efficient profiler, as stack traces are frequency counted in
131 kernel context, and only the unique stacks and their counts are passed
132 to user space. Contrast this with the current "perf record -F 99 -a"
133 method of profiling, which writes each sample to user space (via a ring
134 buffer), and then to the file system (perf.data), which must be post-
135 processed.
136
137 This uses perf_event_open to setup a timer which is instrumented by
138 BPF, and for efficiency it does not initialize the perf ring buffer, so
139 the redundant perf samples are not collected.
140
141 It's expected that the overhead while sampling at 49 Hertz (the de‐
142 fault), across all CPUs, should be negligible. If you increase the sam‐
143 ple rate, the overhead might begin to be measurable.
144
146 This is from bcc.
147
148 https://github.com/iovisor/bcc
149
150 Also look in the bcc distribution for a companion _examples.txt file
151 containing example usage, output, and commentary for this tool.
152
154 Linux
155
157 Unstable - in development.
158
160 Brendan Gregg
161
163 offcputime(8)
164
165
166
167USER COMMANDS 2020-03-18 profile(8)