1profile(8)                  System Manager's Manual                 profile(8)
2
3
4

NAME

6       profile  -  Profile  CPU  usage  by  sampling  stack traces. Uses Linux
7       eBPF/bcc.
8

SYNOPSIS

10       profile [-adfh] [-p PID | -L TID] [-U | -K] [-F FREQUENCY |  -c  COUNT]
11       [--stack-storage-size COUNT] [duration]
12

DESCRIPTION

14       This  is  a CPU profiler. It works by taking samples of stack traces at
15       timed intervals. It will help you understand and  quantify  CPU  usage:
16       which code is executing, and by how much, including both user-level and
17       kernel code.
18
19       By default this samples at 49 Hertz (samples per  second),  across  all
20       CPUs.   This  frequency  can  be tuned using a command line option. The
21       reason for 49, and not 50, is to avoid lock-step sampling.
22
23       This is also an efficient  profiler,  as  stack  traces  are  frequency
24       counted in kernel context, rather than passing each stack to user space
25       for frequency counting there. Only the unique  stacks  and  counts  are
26       passed  to  user  space at the end of the profile, greatly reducing the
27       kernel<->user transfer.
28

REQUIREMENTS

30       CONFIG_BPF and bcc.
31
32       This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT  support).  See
33       tools/old for an older version that may work on Linux 4.6 - 4.8.
34

OPTIONS

36       -h     Print usage message.
37
38       -p PID Trace this process ID only (filtered in-kernel).
39
40       -L TID Trace this thread ID only (filtered in-kernel).
41
42       -F frequency
43              Frequency to sample stacks.
44
45       -c count
46              Sample stacks every one in this many events.
47
48       -f     Print output in folded stack format.
49
50       -d     Include  an  output  delimiter  between  kernel  and user stacks
51              (either "--", or, in folded mode, "-").
52
53       -U     Show stacks from user space only (no kernel space stacks).
54
55       -K     Show stacks from kernel space only (no user space stacks).
56
57       -I     Include CPU idle stacks (by default these are excluded).
58
59       --stack-storage-size COUNT
60              The maximum number of unique stack traces that the  kernel  will
61              count  (default  16384).  If  the  sampled count exceeds this, a
62              warning will be printed.
63
64       -C cpu Collect stacks only from specified cpu.
65
66       duration
67              Duration to trace, in seconds.
68

EXAMPLES

70       Profile (sample) stack traces system-wide at 49 Hertz (samples per sec‐
71       ond) until Ctrl-C:
72              # profile
73
74       Profile for 5 seconds only:
75              # profile 5
76
77       Profile at 99 Hertz for 5 seconds only:
78              # profile -F 99 5
79
80       Profile 1 in a million events for 5 seconds only:
81              # profile -c 1000000 5
82
83       Profile process with PID 181 only:
84              # profile -p 181
85
86       Profile thread with TID 181 only:
87              # profile -L 181
88
89       Profile  for  5  seconds and output in folded stack format (suitable as
90       input for flame graphs), including a delimiter between kernel and  user
91       stacks:
92              # profile -df 5
93
94       Profile kernel stacks only:
95              # profile -K
96

DEBUGGING

98       See  "[unknown]"  frames with bogus addresses? This can happen for dif‐
99       ferent reasons. Your best approach is to get Linux perf to work  first,
100       and  then  to  try  this tool. Eg, "perf record -F 49 -a -g -- sleep 1;
101       perf script", and to check for unknown frames there.
102
103       The most common reason for "[unknown]" frames is that the target  soft‐
104       ware  has  not  been  compiled with frame pointers, and so we can't use
105       that simple method for walking the stack. The fix in that  case  is  to
106       use  software  that  does have frame pointers, eg, gcc -fno-omit-frame-
107       pointer, or Java's -XX:+PreserveFramePointer.
108
109       Another reason for "[unknown]" frames is JIT compilers, which don't use
110       a  traditional  symbol  table.  The  fix  in that case is to populate a
111       /tmp/perf-PID.map file with the symbols, which this tool  should  read.
112       How you do this depends on the runtime (Java, Node.js).
113
114       If  you  seem  to have unrelated samples in the output, check for other
115       sampling or tracing tools that may be running. The current  version  of
116       this  tool can include their events if profiling happened concurrently.
117       Those samples may be filtered in a future version.
118

OVERHEAD

120       This is an efficient profiler, as stack traces are frequency counted in
121       kernel  context, and only the unique stacks and their counts are passed
122       to user space. Contrast this with the current "perf record  -F  99  -a"
123       method of profiling, which writes each sample to user space (via a ring
124       buffer), and then to the file system (perf.data), which must  be  post-
125       processed.
126
127       This  uses  perf_event_open  to  setup a timer which is instrumented by
128       BPF, and for efficiency it does not initialize the perf ring buffer, so
129       the redundant perf samples are not collected.
130
131       It's  expected  that  the  overhead  while  sampling  at  49 Hertz (the
132       default), across all CPUs, should be negligible. If  you  increase  the
133       sample rate, the overhead might begin to be measurable.
134

SOURCE

136       This is from bcc.
137
138              https://github.com/iovisor/bcc
139
140       Also  look  in  the bcc distribution for a companion _examples.txt file
141       containing example usage, output, and commentary for this tool.
142

OS

144       Linux
145

STABILITY

147       Unstable - in development.
148

AUTHOR

150       Brendan Gregg
151

SEE ALSO

153       offcputime(8)
154
155
156
157USER COMMANDS                     2016-07-17                        profile(8)
Impressum