1profile(8) System Manager's Manual profile(8)
2
3
4
6 profile - Profile CPU usage by sampling stack traces. Uses Linux
7 eBPF/bcc.
8
10 profile [-adfh] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT]
11 [--stack-storage-size COUNT] [duration]
12
14 This is a CPU profiler. It works by taking samples of stack traces at
15 timed intervals. It will help you understand and quantify CPU usage:
16 which code is executing, and by how much, including both user-level and
17 kernel code.
18
19 By default this samples at 49 Hertz (samples per second), across all
20 CPUs. This frequency can be tuned using a command line option. The
21 reason for 49, and not 50, is to avoid lock-step sampling.
22
23 This is also an efficient profiler, as stack traces are frequency
24 counted in kernel context, rather than passing each stack to user space
25 for frequency counting there. Only the unique stacks and counts are
26 passed to user space at the end of the profile, greatly reducing the
27 kernel<->user transfer.
28
30 CONFIG_BPF and bcc.
31
32 This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See
33 tools/old for an older version that may work on Linux 4.6 - 4.8.
34
36 -h Print usage message.
37
38 -p PID Trace this process ID only (filtered in-kernel). Without this,
39 all CPUs are profiled.
40
41 -F frequency
42 Frequency to sample stacks.
43
44 -c count
45 Sample stacks every one in this many events.
46
47 -f Print output in folded stack format.
48
49 -d Include an output delimiter between kernel and user stacks
50 (either "--", or, in folded mode, "-").
51
52 -U Show stacks from user space only (no kernel space stacks).
53
54 -K Show stacks from kernel space only (no user space stacks).
55
56 --stack-storage-size COUNT
57 The maximum number of unique stack traces that the kernel will
58 count (default 16384). If the sampled count exceeds this, a
59 warning will be printed.
60
61 -C cpu Collect stacks only from specified cpu.
62
63 duration
64 Duration to trace, in seconds.
65
67 Profile (sample) stack traces system-wide at 49 Hertz (samples per sec‐
68 ond) until Ctrl-C:
69 # profile
70
71 Profile for 5 seconds only:
72 # profile 5
73
74 Profile at 99 Hertz for 5 seconds only:
75 # profile -F 99 5
76
77 Profile 1 in a million events for 5 seconds only:
78 # profile -c 1000000 5
79
80 Profile PID 181 only:
81 # profile -p 181
82
83 Profile for 5 seconds and output in folded stack format (suitable as
84 input for flame graphs), including a delimiter between kernel and user
85 stacks:
86 # profile -df 5
87
88 Profile kernel stacks only:
89 # profile -K
90
92 See "[unknown]" frames with bogus addresses? This can happen for dif‐
93 ferent reasons. Your best approach is to get Linux perf to work first,
94 and then to try this tool. Eg, "perf record -F 49 -a -g -- sleep 1;
95 perf script", and to check for unknown frames there.
96
97 The most common reason for "[unknown]" frames is that the target soft‐
98 ware has not been compiled with frame pointers, and so we can't use
99 that simple method for walking the stack. The fix in that case is to
100 use software that does have frame pointers, eg, gcc -fno-omit-frame-
101 pointer, or Java's -XX:+PreserveFramePointer.
102
103 Another reason for "[unknown]" frames is JIT compilers, which don't use
104 a traditional symbol table. The fix in that case is to populate a
105 /tmp/perf-PID.map file with the symbols, which this tool should read.
106 How you do this depends on the runtime (Java, Node.js).
107
108 If you seem to have unrelated samples in the output, check for other
109 sampling or tracing tools that may be running. The current version of
110 this tool can include their events if profiling happened concurrently.
111 Those samples may be filtered in a future version.
112
114 This is an efficient profiler, as stack traces are frequency counted in
115 kernel context, and only the unique stacks and their counts are passed
116 to user space. Contrast this with the current "perf record -F 99 -a"
117 method of profiling, which writes each sample to user space (via a ring
118 buffer), and then to the file system (perf.data), which must be post-
119 processed.
120
121 This uses perf_event_open to setup a timer which is instrumented by
122 BPF, and for efficiency it does not initialize the perf ring buffer, so
123 the redundant perf samples are not collected.
124
125 It's expected that the overhead while sampling at 49 Hertz (the
126 default), across all CPUs, should be negligible. If you increase the
127 sample rate, the overhead might begin to be measurable.
128
130 This is from bcc.
131
132 https://github.com/iovisor/bcc
133
134 Also look in the bcc distribution for a companion _examples.txt file
135 containing example usage, output, and commentary for this tool.
136
138 Linux
139
141 Unstable - in development.
142
144 Brendan Gregg
145
147 offcputime(8)
148
149
150
151USER COMMANDS 2016-07-17 profile(8)