bcc-profile(8)

1profile(8)                  System Manager's Manual                 profile(8)
2
3
4

NAME

6       profile  -  Profile  CPU  usage  by  sampling  stack traces. Uses Linux
7       eBPF/bcc.
8

SYNOPSIS

10       profile  [-adfh]  [-p  PID]  [-U  |  -K]  [-F  FREQUENCY  |  -c  COUNT]
11       [--stack-storage-size COUNT] [duration]
12

DESCRIPTION

14       This  is  a CPU profiler. It works by taking samples of stack traces at
15       timed intervals. It will help you understand and  quantify  CPU  usage:
16       which code is executing, and by how much, including both user-level and
17       kernel code.
18
19       By default this samples at 49 Hertz (samples per  second),  across  all
20       CPUs.   This  frequency  can  be tuned using a command line option. The
21       reason for 49, and not 50, is to avoid lock-step sampling.
22
23       This is also an efficient  profiler,  as  stack  traces  are  frequency
24       counted in kernel context, rather than passing each stack to user space
25       for frequency counting there. Only the unique  stacks  and  counts  are
26       passed  to  user  space at the end of the profile, greatly reducing the
27       kernel<->user transfer.
28

REQUIREMENTS

30       CONFIG_BPF and bcc.
31
32       This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT  support).  See
33       tools/old for an older version that may work on Linux 4.6 - 4.8.
34

OPTIONS

36       -h     Print usage message.
37
38       -p PID Trace  this  process ID only (filtered in-kernel). Without this,
39              all CPUs are profiled.
40
41       -F frequency
42              Frequency to sample stacks.
43
44       -c count
45              Sample stacks every one in this many events.
46
47       -f     Print output in folded stack format.
48
49       -d     Include an output  delimiter  between  kernel  and  user  stacks
50              (either "--", or, in folded mode, "-").
51
52       -U     Show stacks from user space only (no kernel space stacks).
53
54       -K     Show stacks from kernel space only (no user space stacks).
55
56       --stack-storage-size COUNT
57              The  maximum  number of unique stack traces that the kernel will
58              count (default 16384). If the  sampled  count  exceeds  this,  a
59              warning will be printed.
60
61       -C cpu Collect stacks only from specified cpu.
62
63       duration
64              Duration to trace, in seconds.
65

EXAMPLES

67       Profile (sample) stack traces system-wide at 49 Hertz (samples per sec‐
68       ond) until Ctrl-C:
69              # profile
70
71       Profile for 5 seconds only:
72              # profile 5
73
74       Profile at 99 Hertz for 5 seconds only:
75              # profile -F 99 5
76
77       Profile 1 in a million events for 5 seconds only:
78              # profile -c 1000000 5
79
80       Profile PID 181 only:
81              # profile -p 181
82
83       Profile for 5 seconds and output in folded stack  format  (suitable  as
84       input  for flame graphs), including a delimiter between kernel and user
85       stacks:
86              # profile -df 5
87
88       Profile kernel stacks only:
89              # profile -K
90

DEBUGGING

92       See "[unknown]" frames with bogus addresses? This can happen  for  dif‐
93       ferent  reasons. Your best approach is to get Linux perf to work first,
94       and then to try this tool. Eg, "perf record -F 49 -a  -g  --  sleep  1;
95       perf script", and to check for unknown frames there.
96
97       The  most common reason for "[unknown]" frames is that the target soft‐
98       ware has not been compiled with frame pointers, and  so  we  can't  use
99       that  simple  method  for walking the stack. The fix in that case is to
100       use software that does have frame pointers,  eg,  gcc  -fno-omit-frame-
101       pointer, or Java's -XX:+PreserveFramePointer.
102
103       Another reason for "[unknown]" frames is JIT compilers, which don't use
104       a traditional symbol table. The fix in  that  case  is  to  populate  a
105       /tmp/perf-PID.map  file  with the symbols, which this tool should read.
106       How you do this depends on the runtime (Java, Node.js).
107
108       If you seem to have unrelated samples in the output,  check  for  other
109       sampling  or  tracing tools that may be running. The current version of
110       this tool can include their events if profiling happened  concurrently.
111       Those samples may be filtered in a future version.
112

OVERHEAD

114       This is an efficient profiler, as stack traces are frequency counted in
115       kernel context, and only the unique stacks and their counts are  passed
116       to  user  space.  Contrast this with the current "perf record -F 99 -a"
117       method of profiling, which writes each sample to user space (via a ring
118       buffer),  and  then to the file system (perf.data), which must be post-
119       processed.
120
121       This uses perf_event_open to setup a timer  which  is  instrumented  by
122       BPF, and for efficiency it does not initialize the perf ring buffer, so
123       the redundant perf samples are not collected.
124
125       It's expected that  the  overhead  while  sampling  at  49  Hertz  (the
126       default),  across  all  CPUs, should be negligible. If you increase the
127       sample rate, the overhead might begin to be measurable.
128

SOURCE

130       This is from bcc.
131
132              https://github.com/iovisor/bcc
133
134       Also look in the bcc distribution for a  companion  _examples.txt  file
135       containing example usage, output, and commentary for this tool.
136

OS

138       Linux
139

STABILITY

141       Unstable - in development.
142

AUTHOR

144       Brendan Gregg
145