stapprobes(5)

1STAPPROBES(5)                 File Formats Manual                STAPPROBES(5)
2
3
4

NAME

6       stapprobes - systemtap probe points
7
8
9

DESCRIPTION

11       The  following sections enumerate the variety of probe points supported
12       by the systemtap translator, and additional aliases defined by standard
13       tapset scripts.
14
15       The  general  probe  point  syntax  is  a dotted-symbol sequence.  This
16       allows a breakdown of the event namespace into parts, somewhat like the
17       Domain Name System does on the Internet.  Each component identifier may
18       be parametrized by a string or number literal, with  a  syntax  like  a
19       function call.  A component may include a "*" character, to expand to a
20       set of matching probe points.  Probe aliases likewise expand  to  other
21       probe  points.   Each  and  every  resulting  probe  point  is normally
22       resolved to some low-level system  instrumentation  facility  (e.g.,  a
23       kprobe address, marker, or a timer configuration), otherwise the elabo‐
24       ration phase will fail.
25
26       However, a probe point may be followed by a "?" character, to  indicate
27       that  it  is  optional,  and that no error should result if it fails to
28       resolve.  Optionalness passes down through all levels of alias/wildcard
29       expansion.  Alternately, a probe point may be followed by a "!" charac‐
30       ter, to indicate that it  is  both  optional  and  sufficient.   (Think
31       vaguely  of  the prolog cut operator.) If it does resolve, then no fur‐
32       ther probe points in the same comma-separated list  will  be  resolved.
33       Therefore,  the  "!"   sufficiency  mark  only makes sense in a list of
34       probe point alternatives.
35
36       Additionally, a probe point may be followed by a "if (expr)" statement,
37       in  order  to  enable/disable the probe point on-the-fly. With the "if"
38       statement, if the "expr" is false when the  probe  point  is  hit,  the
39       whole  probe  body  including alias's body is skipped. The condition is
40       stacked up through all levels of alias/wildcard expansion. So the final
41       condition  becomes  the  logical-and  of  conditions  of  all  expanded
42       alias/wildcard.
43
44       These are all syntactically valid probe points:
45
46              kernel.function("foo").return
47              syscall(22)
48              user.inode("/bin/vi").statement(0x2222)
49              end
50              syscall.*
51              kernel.function("no_such_function") ?
52              module("awol").function("no_such_function") !
53              signal.*? if (switch)
54
55       Probes may be broadly classified into "synchronous" and "asynchronous".
56       A "synchronous" event is deemed to occur when any processor executes an
57       instruction matched by the specification.  This gives  these  probes  a
58       reference  point  (instruction address) from which more contextual data
59       may be available.  Other families of probe points refer  to  "asynchro‐
60       nous"  events  such  as timers/counters rolling over, where there is no
61       fixed reference point that is related.  Each probe point  specification
62       may match multiple locations (for example, using wildcards or aliases),
63       and all them are then probed.  A probe  declaration  may  also  contain
64       several comma-separated specifications, all of which are probed.
65
66
67   BEGIN/END/ERROR
68       The  probe  points begin and end are defined by the translator to refer
69       to the time of session startup and shutdown.  All  "begin"  probe  han‐
70       dlers  are  run,  in  some sequence, during the startup of the session.
71       All global variables will have been initialized prior  to  this  point.
72       All  "end" probes are run, in some sequence, during the normal shutdown
73       of a session, such as in the aftermath of an exit () function call,  or
74       an interruption from the user.  In the case of an error-triggered shut‐
75       down, "end" probes are not run.  There are no target  variables  avail‐
76       able in either context.
77
78       If the order of execution among "begin" or "end" probes is significant,
79       then an optional sequence number may be provided:
80
81              begin(N)
82              end(N)
83
84       The number N may be positive or negative.  The probe handlers  are  run
85       in  increasing  order, and the order between handlers with the same se‐
86       quence number is unspecified.  When "begin" or "end" are given  without
87       a sequence, they are effectively sequence zero.
88
89       The  error  probe  point  is similar to the end probe, except that each
90       such probe handler run when the session  ends  after  errors  have  oc‐
91       curred.   In  such  cases,  "end"  probes are skipped, but each "error"
92       prober is still attempted.  This kind of probe can be used to clean  up
93       or emit a "final gasp".  It may also be numerically parametrized to set
94       a sequence.
95
96
97   NEVER
98       The probe point never is specially defined by the  translator  to  mean
99       "never".  Its probe handler is never run, though its statements are an‐
100       alyzed for symbol / type correctness as usual.  This probe point may be
101       useful in conjunction with optional probes.
102
103
104   TIMERS
105       Intervals defined by the standard kernel "jiffies" timer may be used to
106       trigger probe handlers asynchronously.  Two probe  point  variants  are
107       supported by the translator:
108
109              timer.jiffies(N)
110              timer.jiffies(N).randomize(M)
111
112       The  probe  handler  is  run  every N jiffies (a kernel-defined unit of
113       time, typically between 1 and 60 ms).  If the "randomize" component  is
114       given,  a  linearly  distributed  random value in the range [-M..+M] is
115       added to N every time the handler is run.  N is restricted to a reason‐
116       able  range  (1 to around a million), and M is restricted to be smaller
117       than N.  There are no target variables provided in either context.   It
118       is possible for such probes to be run concurrently on a multi-processor
119       computer.
120
121       Alternatively, intervals may be specified in units of time.  There  are
122       two probe point variants similar to the jiffies timer:
123
124              timer.ms(N)
125              timer.ms(N).randomize(M)
126
127       Here,  N  and M are specified in milliseconds, but the full options for
128       units  are  seconds  (s/sec),  milliseconds   (ms/msec),   microseconds
129       (us/usec), nanoseconds (ns/nsec), and hertz (hz).  Randomization is not
130       supported for hertz timers.
131
132       The actual resolution of the timers depends on the target kernel.   For
133       kernels  prior  to 2.6.17, timers are limited to jiffies resolution, so
134       intervals are rounded  up  to  the  nearest  jiffies  interval.   After
135       2.6.17,  the implementation uses hrtimers for tighter precision, though
136       the actual resolution will be arch-dependent.  In either case,  if  the
137       "randomize"  component is given, then the random value will be added to
138       the interval before any rounding occurs.
139
140       Profiling timers are also available to provide probes that  execute  on
141       all  CPUs  at the rate of the system tick.  This probe takes no parame‐
142       ters.
143
144              timer.profile
145
146       Full context information of the interrupted process is available,  mak‐
147       ing this probe suitable for a time-based sampling profiler.
148
149
150   DWARF
151       This family of probe points uses symbolic debugging information for the
152       target kernel/module/program, as may be found  in  unstripped  executa‐
153       bles,  or  the  separate  debuginfo  packages.  They allow placement of
154       probes logically into the execution path  of  the  target  program,  by
155       specifying a set of points in the source or object code.  When a match‐
156       ing statement executes on any processor, the probe handler  is  run  in
157       that context.
158
159       Points  in  a kernel, which are identified by module, source file, line
160       number, function name, or some combination of these.
161
162       Here is a list of probe point families currently supported.  The .func‐
163       tion  variant  places a probe near the beginning of the named function,
164       so that parameters are available as  context  variables.   The  .return
165       variant places a probe at the moment of return from the named function,
166       so the return value is available as  the  "$return"  context  variable.
167       The  .inline modifier for .function filters the results to include only
168       instances of inlined functions.  The .call modifier selects  the  oppo‐
169       site  subset.   Inline  functions  do  not  have an identifiable return
170       point, so .return is not supported on .inline  probes.  The  .statement
171       variant  places  a  probe at the exact spot, exposing those local vari‐
172       ables that are visible there.
173
174              kernel.function(PATTERN)
175              kernel.function(PATTERN).call
176              kernel.function(PATTERN).return
177              kernel.function(PATTERN).inline
178              module(MPATTERN).function(PATTERN)
179              module(MPATTERN).function(PATTERN).call
180              module(MPATTERN).function(PATTERN).return
181              module(MPATTERN).function(PATTERN).inline
182              kernel.statement(PATTERN)
183              kernel.statement(ADDRESS).absolute
184              module(MPATTERN).statement(PATTERN)
185
186       In the above list, MPATTERN stands for a string literal  that  aims  to
187       identify  the  loaded  kernel  module of interest.  It may include "*",
188       "[]", and "?" wildcards.  PATTERN stands for a string literal that aims
189       to identify a point in the program.  It is made up of three parts:
190
191       ·   The first part is the name of a function, as would appear in the nm
192           program's output.  This part may use the "*"  and  "?"  wildcarding
193           operators to match multiple names.
194
195       ·   The  second part is optional and begins with the "@" character.  It
196           is followed by the path to the source file containing the function,
197           which  may  include  a wildcard pattern, such as mm/slab*.  In most
198           cases, the path should be relative to the top of the  linux  source
199           directory, although an absolute path may be necessary for some ker‐
200           nels.  If a relative pathname doesn't work, try absolute.
201
202       ·   Finally, the third part is optional if the file name part was  giv‐
203           en,  and identifies the line number in the source file, preceded by
204           a ":".
205
206       As an alternative, PATTERN may be a  numeric  constant,  indicating  an
207       (module-relative  or kernel-_stext-relative) address.  In guru mode on‐
208       ly, absolute kernel addresses may be  specified  with  the  ".absolute"
209       suffix.
210
211       Some  of  the  source-level variables, such as function parameters, lo‐
212       cals, globals visible in the compilation unit, may be visible to  probe
213       handlers.   They  may  refer to these variables by prefixing their name
214       with "$" within the scripts.  In addition, a special syntax allows lim‐
215       ited traversal of structures, pointers, and arrays.
216
217       $var   refers  to  an in-scope variable "var".  If it's an integer-like
218              type, it will be cast to a 64-bit int for systemtap script  use.
219              String-like  pointers (char *) may be copied to systemtap string
220              values using the kernel_string or user_string functions.
221
222       $var->field
223              traversal to a structure's field.  The indirection operator  may
224              be repeated to follow more levels of pointers.
225
226       $var[N]
227              indexes  into  an array.  The index is given with a literal num‐
228              ber.
229
230
231   USER-SPACE
232       Early prototype support for user-space probing is available in the form
233       of a non-symbolic probe point:
234              process(PID).statement(ADDRESS).absolute
235       is analogous to kernel.statement(ADDRESS).absolute in that both use raw
236       (unverified) virtual addresses and provide no $variables.   The  target
237       PID parameter must identify a running process, and ADDRESS should iden‐
238       tify a valid instruction address.  All threads of that process will  be
239       probed.
240
241
242   PROCFS
243       These  probe  points allow procfs "files" in /proc/systemtap/MODNAME to
244       be created, read and written (MODNAME is the name of the systemtap mod‐
245       ule).  The  proc  filesystem is a pseudo-filesystem which is used an an
246       interface to kernel data structures.  There are four probe point  vari‐
247       ants supported by the translator:
248
249              procfs("PATH").read
250              procfs("PATH").write
251              procfs.read
252              procfs.write
253
254       PATH  is the file name (relative to /proc/systemtap/MODNAME) to be cre‐
255       ated.  If no PATH is specified (as in the  last  two  variants  above),
256       PATH defaults to "command".
257
258       When  a  user  reads  /proc/systemtap/MODNAME/PATH,  the  corresponding
259       procfs read probe is triggered.  The string data to be read  should  be
260       assigned to a variable named $value, like this:
261
262              procfs("PATH").read { $value = "100\n" }
263
264       When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding
265       procfs write probe is triggered.  The data the user wrote is  available
266       in the string variable named $value, like this:
267
268              procfs("PATH").write { printf("user wrote: %s", $value) }
269
270
271   MARKERS
272       This family of probe points hooks up to static probing markers inserted
273       into the kernel or modules.  These markers are special macro calls  in‐
274       serted  by  kernel  developers to make probing faster and more reliable
275       than with DWARF-based probes.  Further, DWARF debugging information  is
276       not required to probe markers.
277
278       Marker  probe points begin with kernel.  The next part names the marker
279       itself: mark("name").  The marker name string, which  may  contain  the
280       usual  wildcard  characters,  is matched against the names given to the
281       marker macros when the kernel and/or module was compiled.     Optional‐
282       ly,  you  can  specify  format("format").  Specifying the marker format
283       string allows differentation between two markers with the same name but
284       different marker format strings.
285
286       The  handler associated with a marker-based probe may read the optional
287       parameters specified at the macro call site.   These  are  named  $arg1
288       through  $argNN,  where  NN is the number of parameters supplied by the
289       macro.  Number and string parameters are passed in a type-safe manner.
290
291       The marker format string associated with a marker is available in $for‐
292       mat.
293
294
295   PERFORMANCE MONITORING HARDWARE
296       The  perfmon  family  of probe points is used to access the performance
297       monitoring hardware available in  modern  processors.  This  family  of
298       probes  points  needs  the perfmon2 support in the kernel to access the
299       performance monitoring hardware.
300
301       Performance monitor hardware points begin with  a  perfmon.   The  next
302       part  of the names the event being counted counter("event").  The event
303       names are processor implementation specific with the execption  of  the
304       generic cycles and instructions events, which are available on all pro‐
305       cessors. This sets up a counter on the processor to count the number of
306       events  occuring  on the processor. For more details on the performance
307       monitoring events available on a specific  processor  use  the  command
308       perfmon2 command:
309
310              pfmon -l
311
312       $counter
313              is a handle used in the body of the probe for operations involv‐
314              ing the counter associated with the probe.
315
316       read_counter
317              is a function that is passed the handle for  the  perfmon  probe
318              and returns the current count for the event.
319
320

EXAMPLES

322       Here are some example probe points, defining the associated events.
323
324       begin, end, end
325              refers  to  the  startup and normal shutdown of the session.  In
326              this case, the handler would run once during startup  and  twice
327              during shutdown.
328
329       timer.jiffies(1000).randomize(200)
330              refers to a periodic interrupt, every 1000 +/- 200 jiffies.
331
332       kernel.function("*init*"), kernel.function("*exit*")
333              refers  to  all  kernel  functions  with "init" or "exit" in the
334              name.
335
336       kernel.function("*@kernel/sched.c:240")
337              refers to any functions within the  "kernel/sched.c"  file  that
338              span line 240.
339
340       kernel.mark("getuid")
341              refers to an STAP_MARK(getuid, ...) macro call in the kernel.
342
343       module("usb*").function("*sync*").return
344              refers to the moment of return from all functions with "sync" in
345              the name in any of the USB drivers.
346
347       kernel.statement(0xc0044852)
348              refers to the first byte of the  statement  whose  compiled  in‐
349              structions include the given address in the kernel.
350
351       kernel.statement("*@kernel/sched.c:2917")
352              refers   to   the  statement  of  line  2917  within  the  "ker‐
353              nel/sched.c".
354
355       syscall.*.return
356              refers to the group of probe aliases with any name in the  third
357              position
358
359

NAME

DESCRIPTION

EXAMPLES

SEE ALSO