stapprobes(3stap)

1STAPPROBES(3stap)                                            STAPPROBES(3stap)
2
3
4

NAME

6       stapprobes - systemtap probe points
7
8
9

DESCRIPTION

11       The  following sections enumerate the variety of probe points supported
12       by the systemtap translator, and some of the additional aliases defined
13       by  standard  tapset  scripts.  Many are individually documented in the
14       3stap manual section, with the probe:: prefix.
15
16

SYNTAX

18              probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
19
20
21       A probe declaration may list multiple comma-separated probe  points  in
22       order  to  attach  a handler to all of the named events.  Normally, the
23       handler statements are run whenever any of events occur.  Depending  on
24       the  type  of  probe point, the handler statements may refer to context
25       variables (denoted with a dollar-sign prefix  like  $foo)  to  read  or
26       write state.  This may include function parameters for function probes,
27       or local variables for statement probes.
28
29       The syntax of a single probe point is a general dotted-symbol sequence.
30       This  allows  a  breakdown  of the event namespace into parts, somewhat
31       like the Domain Name System does on the Internet.  Each component iden‐
32       tifier may be parametrized by a string or number literal, with a syntax
33       like a function call.  A component may include a "*" character, to  ex‐
34       pand  to  a  set of matching probe points.  It may also include "**" to
35       match multiple sequential components at once.  Probe  aliases  likewise
36       expand to other probe points.
37
38       Probe  aliases  can be given on their own, or with a suffix. The suffix
39       attaches to the underlying probe point that the alias is  expanded  to.
40       For example,
41
42              syscall.read.return.maxactive(10)
43
44       expands to
45
46              kernel.function("sys_read").return.maxactive(10)
47
48       with the component maxactive(10) being recognized as a suffix.
49
50       Normally,  each  and  every  probe  point  resulting from wildcard- and
51       alias-expansion must be resolved to some low-level system  instrumenta‐
52       tion  facility  (e.g.,  a kprobe address, marker, or a timer configura‐
53       tion), otherwise the elaboration phase will fail.
54
55       However, a probe point may be followed by a "?" character, to  indicate
56       that it is optional, and that no error should result if it fails to re‐
57       solve.  Optionalness passes down through all levels  of  alias/wildcard
58       expansion.  Alternately, a probe point may be followed by a "!" charac‐
59       ter, to indicate that it  is  both  optional  and  sufficient.   (Think
60       vaguely  of  the Prolog cut operator.) If it does resolve, then no fur‐
61       ther probe points in the same comma-separated list  will  be  resolved.
62       Therefore,  the  "!"   sufficiency  mark  only makes sense in a list of
63       probe point alternatives.
64
65       Additionally, a probe point may be followed by a "if (expr)" statement,
66       in  order  to  enable/disable the probe point on-the-fly. With the "if"
67       statement, if the "expr" is false when the  probe  point  is  hit,  the
68       whole  probe  body  including alias's body is skipped. The condition is
69       stacked up through all levels of alias/wildcard expansion. So the final
70       condition  becomes  the  logical-and  of  conditions  of  all  expanded
71       alias/wildcard.  The expressions are necessarily restricted  to  global
72       variables.
73
74       These  are  all  syntactically valid probe points.  (They are generally
75       semantically invalid, depending on the contents of the tapsets, and the
76       versions of kernel/user software installed.)
77
78
79              kernel.function("foo").return
80              process("/bin/vi").statement(0x2222)
81              end
82              syscall.*
83              syscall.*.return.maxactive(10)
84              syscall.{open,close}
85              sys**open
86              kernel.function("no_such_function") ?
87              module("awol").function("no_such_function") !
88              signal.*? if (switch)
89              kprobe.function("foo")
90
91
92       Probes may be broadly classified into "synchronous" and "asynchronous".
93       A "synchronous" event is deemed to occur when any processor executes an
94       instruction  matched  by  the specification.  This gives these probes a
95       reference point (instruction address) from which more  contextual  data
96       may  be  available.  Other families of probe points refer to "asynchro‐
97       nous" events such as timers/counters rolling over, where  there  is  no
98       fixed  reference point that is related.  Each probe point specification
99       may match multiple locations (for example, using wildcards or aliases),
100       and  all  them  are  then probed.  A probe declaration may also contain
101       several comma-separated specifications, all of which are probed.
102
103       Brace expansion is a mechanism which allows a list of probe  points  to
104       be generated. It is very similar to shell expansion. A component may be
105       surrounded by a pair of curly braces to indicate that  the  comma-sepa‐
106       rated  sequence of one or more subcomponents will each constitute a new
107       probe point. The braces may be arbitrarily nested. The ordering of  ex‐
108       panded results is based on product order.
109
110       The  question mark (?), exclamation mark (!) indicators and probe point
111       conditions may not be placed in any expansions that are before the last
112       component.
113
114       The following is an example of brace expansion.
115
116
117              syscall.{write,read}
118              # Expands to
119              syscall.write, syscall.read
120
121              {kernel,module("nfs")}.function("nfs*")!
122              # Expands to
123              kernel.function("nfs*")!, module("nfs").function("nfs*")!
124
125
126

DWARF DEBUGINFO

128       Resolving some probe points requires DWARF debuginfo or "debug symbols"
129       for the specific program being instrumented.  For some others, DWARF is
130       automatically  synthesized  on  the  fly from source code header files.
131       For others, it is not needed at all.  Since a systemtap script may  use
132       any mixture of probe points together, the union of their DWARF require‐
133       ments has to be met on the computer where  script  compilation  occurs.
134       (See the --use-server option and the stap-server(8) man page for infor‐
135       mation about the remote compilation facility, which  allows  these  re‐
136       quirements to be met on a different machine.)
137
138       The  following  point lists many of the available probe point families,
139       to classify them with respect to their need for DWARF debuginfo for the
140       specific program for that probe point.
141
142
143       DWARF                          NON-DWARF                    SYMBOL-TABLE
144
145       kernel.function, .statement    kernel.mark                  kernel.function*
146       module.function, .statement    process.mark, process.plt    module.function*
147       process.function, .statement   begin, end, error, never     process.function*
148       process.mark*                  timer
149       .function.callee               perf
150       python2, python3               procfs
151       debuginfod                     kernel.statement.absolute
152                                      kernel.data
153       AUTO-GENERATED-DWARF           kprobe.function
154       kernel.trace                   process.statement.absolute
155                                      process.begin, .end
156                                      netfilter
157                                      java
158
159
160       The probe types marked with * asterisks mark fallbacks, where systemtap
161       can sometimes infer subset or substitute information.  In general,  the
162       more  symbolic  /  debugging  information available, the higher quality
163       probing will be available.
164
165
166

ON-THE-FLY ARMING

168       The following types of probe points may be armed/disarmed on-the-fly to
169       save  overheads during uninteresting times.  Arming conditions may also
170       be added to other types of probes, but will be treated  as  a  wrapping
171       conditional and won't benefit from overhead savings.
172
173
174       DISARMABLE                                exceptions
175       kernel.function, kernel.statement
176       module.function, module.statement
177       process.*.function, process.*.statement
178       process.*.plt, process.*.mark
179       timer.                                    timer.profile
180       java
181
182

PROBE POINT FAMILIES

184   BEGIN/END/ERROR
185       The  probe  points begin and end are defined by the translator to refer
186       to the time of session startup and shutdown.  All  "begin"  probe  han‐
187       dlers  are  run,  in  some sequence, during the startup of the session.
188       All global variables will have been initialized prior  to  this  point.
189       All  "end" probes are run, in some sequence, during the normal shutdown
190       of a session, such as in the aftermath of an exit () function call,  or
191       an interruption from the user.  In the case of an error-triggered shut‐
192       down, "end" probes are not run.  There are no target  variables  avail‐
193       able in either context.
194
195       If the order of execution among "begin" or "end" probes is significant,
196       then an optional sequence number may be provided:
197
198
199              begin(N)
200              end(N)
201
202
203       The number N may be positive or negative.  The probe handlers  are  run
204       in  increasing  order, and the order between handlers with the same se‐
205       quence number is unspecified.  When "begin" or "end" are given  without
206       a sequence, they are effectively sequence zero.
207
208       The  error  probe  point  is similar to the end probe, except that each
209       such probe handler run when the session  ends  after  errors  have  oc‐
210       curred.   In  such  cases,  "end"  probes are skipped, but each "error"
211       probe is still attempted.  This kind of probe can be used to  clean  up
212       or emit a "final gasp".  It may also be numerically parametrized to set
213       a sequence.
214
215
216   NEVER
217       The probe point never is specially defined by the  translator  to  mean
218       "never".  Its probe handler is never run, though its statements are an‐
219       alyzed for symbol / type correctness as usual.  This probe point may be
220       useful in conjunction with optional probes.
221
222
223   SYSCALL and ND_SYSCALL
224       The  syscall.* and nd_syscall.*  aliases define several hundred probes,
225       too many to detail here.  They are of the general form:
226
227
228              syscall.NAME
229              nd_syscall.NAME
230              syscall.NAME.return
231              nd_syscall.NAME.return
232
233
234       Generally, a pair of probes are defined for each normal system call  as
235       listed  in  the  syscalls(2) manual page, one for entry and one for re‐
236       turn.  Those system calls that never return do not have a corresponding
237       .return probe.  The nd_* family of probes are about the same, except it
238       uses non-DWARF based searching mechanisms, which may result in a  lower
239       quality of symbolic context data (parameters), and may miss some system
240       calls.  You may want to try them first, in case kernel debugging infor‐
241       mation is not immediately available.
242
243       Each probe alias provides a variety of variables. Looking at the tapset
244       source code is the most reliable way.  Generally, each variable  listed
245       in  the  standard manual page is made available as a script-level vari‐
246       able, so syscall.open exposes filename, flags, and mode.  In  addition,
247       a standard suite of variables is available at most aliases:
248
249       argstr A  pretty-printed  form  of  the  entire  argument list, without
250              parentheses.
251
252       name   The name of the system call.
253
254       retval For return probes, the raw numeric system-call result.
255
256       retstr For return probes, a pretty-printed string form of  the  system-
257              call result.
258
259       As  usual  for  probe aliases, these variables are all initialized once
260       from the underlying $context variables, so that later changes to  $con‐
261       text  variables are not automatically reflected.  Not all probe aliases
262       obey all of these general guidelines.   Please  report  any  bothersome
263       ones you encounter as a bug.  Note that on some kernel/userspace archi‐
264       tecture combinations (e.g., 32-bit userspace on 64-bit kernel), the un‐
265       derlying $context variables may need explicit sign extension / masking.
266       When this is an issue, consider using the tapset-provided variables in‐
267       stead of raw $context variables.
268
269       If debuginfo availability is a problem, you may try using the non-DWARF
270       syscall probe aliases instead.  Use the nd_syscall.  prefix instead  of
271       syscall.  The same context variables are available, as far as possible.
272
273       nd_syscall  probes  on  kernels that use syscall wrappers to pass argu‐
274       ments via pt_regs (currently 4.17+ on x86_64 and 4.19+ on aarch64) sup‐
275       port  syscall  argument  writing  when guru mode is enabled. If a probe
276       syscall parameter is modified in the probe body then immediately before
277       the  probe  exits  the  parameter's  current  value  will be written to
278       pt_regs. This overwrites the previous value.   nd_syscall  probes  also
279       include  two  parameters  for  each of the syscall's string parameters.
280       One holds a quoted version of the string passed  to  the  syscall.  The
281       other  holds an unquoted version of the string intended to be used when
282       modifying the parameter.  If the probe  modifies  the  unquoted  string
283       variable  then as the probe is about to exit the contents of this vari‐
284       able will be written to the user space buffer passed to the syscall. It
285       is the user's responsibility to ensure that this buffer is large enough
286       to hold the modified string and that it is located in a writable memory
287       segment.
288
289
290   TIMERS
291       There  are  two  main types of timer probes: "jiffies" timer probes and
292       time interval timer probes.
293
294       Intervals defined by the standard kernel "jiffies" timer may be used to
295       trigger  probe  handlers  asynchronously.  Two probe point variants are
296       supported by the translator:
297
298
299              timer.jiffies(N)
300              timer.jiffies(N).randomize(M)
301
302
303       The probe handler is run every N  jiffies  (a  kernel-defined  unit  of
304       time,  typically between 1 and 60 ms).  If the "randomize" component is
305       given, a linearly distributed random value in  the  range  [-M..+M]  is
306       added to N every time the handler is run.  N is restricted to a reason‐
307       able range (1 to around a million), and M is restricted to  be  smaller
308       than  N.  There are no target variables provided in either context.  It
309       is possible for such probes to be run concurrently on a multi-processor
310       computer.
311
312       Alternatively,  intervals may be specified in units of time.  There are
313       two probe point variants similar to the jiffies timer:
314
315
316              timer.ms(N)
317              timer.ms(N).randomize(M)
318
319
320       Here, N and M are specified in milliseconds, but the full  options  for
321       units   are   seconds  (s/sec),  milliseconds  (ms/msec),  microseconds
322       (us/usec), nanoseconds (ns/nsec), and hertz (hz).  Randomization is not
323       supported for hertz timers.
324
325       The  actual resolution of the timers depends on the target kernel.  For
326       kernels prior to 2.6.17, timers are limited to jiffies  resolution,  so
327       intervals  are  rounded  up  to  the  nearest  jiffies interval.  After
328       2.6.17, the implementation uses hrtimers for tighter precision,  though
329       the  actual  resolution will be arch-dependent.  In either case, if the
330       "randomize" component is given, then the random value will be added  to
331       the interval before any rounding occurs.
332
333       Profiling  timers  are also available to provide probes that execute on
334       all CPUs at the rate of the system tick (CONFIG_HZ) or at a given  fre‐
335       quency  (hz).  On  some  kernels, this is a one-concurrent-user-only or
336       disabled facility, resulting in error -16 (EBUSY) during  probe  regis‐
337       tration.
338
339
340              timer.profile.tick
341              timer.profile.freq.hz(N)
342
343
344       Full  context information of the interrupted process is available, mak‐
345       ing this probe suitable for a time-based sampling profiler.
346
347       It is recommended to use the tapset  probe  timer.profile  rather  than
348       timer.profile.tick.  This probe point behaves identically to timer.pro‐
349       file.tick when the underlying functionality  is  available,  and  falls
350       back  to  using perf.sw.cpu_clock on some recent kernels which lack the
351       corresponding profile timer facility.
352
353       Profiling timers with specified frequencies are  only  accurate  up  to
354       around  100  hz.  You may need to provide a larger value to achieve the
355       desired rate.
356
357       Note that if a timer probe is set to fire at a very high  rate  and  if
358       the  probe  body  is  complex, succeeding timer probes can get skipped,
359       since the time for them to run has already passed.  Normally  systemtap
360       reports missed probes, but it will not report these skipped probes.
361
362
363   DWARF
364       This family of probe points uses symbolic debugging information for the
365       target kernel/module/program, as may be found  in  unstripped  executa‐
366       bles,  or  the  separate  debuginfo  packages.  They allow placement of
367       probes logically into the execution path  of  the  target  program,  by
368       specifying a set of points in the source or object code.  When a match‐
369       ing statement executes on any processor, the probe handler  is  run  in
370       that context.
371
372       Probe points in the DWARF family can be identified by the target kernel
373       module (or user process), source file, line number, function  name,  or
374       some combination of these.
375
376       Here is a list of DWARF probe points currently supported:
377
378              kernel.function(PATTERN)
379              kernel.function(PATTERN).call
380              kernel.function(PATTERN).callee(PATTERN)
381              kernel.function(PATTERN).callee(PATTERN).return
382              kernel.function(PATTERN).callee(PATTERN).call
383              kernel.function(PATTERN).callees(DEPTH)
384              kernel.function(PATTERN).return
385              kernel.function(PATTERN).inline
386              kernel.function(PATTERN).label(LPATTERN)
387              module(MPATTERN).function(PATTERN)
388              module(MPATTERN).function(PATTERN).call
389              module(MPATTERN).function(PATTERN).callee(PATTERN)
390              module(MPATTERN).function(PATTERN).callee(PATTERN).return
391              module(MPATTERN).function(PATTERN).callee(PATTERN).call
392              module(MPATTERN).function(PATTERN).callees(DEPTH)
393              module(MPATTERN).function(PATTERN).return
394              module(MPATTERN).function(PATTERN).inline
395              module(MPATTERN).function(PATTERN).label(LPATTERN)
396              kernel.statement(PATTERN)
397              kernel.statement(PATTERN).nearest
398              kernel.statement(ADDRESS).absolute
399              module(MPATTERN).statement(PATTERN)
400              process("PATH").function("NAME")
401              process("PATH").statement("*@FILE.c:123")
402              process("PATH").library("PATH").function("NAME")
403              process("PATH").library("PATH").statement("*@FILE.c:123")
404              process("PATH").library("PATH").statement("*@FILE.c:123").nearest
405              process("PATH").function("*").return
406              process("PATH").function("myfun").label("foo")
407              process("PATH").function("foo").callee("bar")
408              process("PATH").function("foo").callee("bar").return
409              process("PATH").function("foo").callee("bar").call
410              process("PATH").function("foo").callees(DEPTH)
411              process(PID).function("NAME")
412              process(PID).function("myfun").label("foo")
413              process(PID).plt("NAME")
414              process(PID).plt("NAME").return
415              process(PID).statement("*@FILE.c:123")
416              process(PID).statement("*@FILE.c:123").nearest
417              process(PID).statement(ADDRESS).absolute
418              debuginfod.process("PATH").**
419
420       (See  the  USER-SPACE section below for more information on the process
421       probes.)
422
423       The list above includes multiple variants and modifiers  which  provide
424       additional functionality or filters. They are:
425
426              .function
427                     Places  a probe near the beginning of the named function,
428                     so that parameters are available as context variables.
429
430              .return
431                     Places a probe at the moment after the  return  from  the
432                     named  function,  so the return value is available as the
433                     "$return" context variable.
434
435              .inline
436                     Filters the results to include only instances of  inlined
437                     functions.  Note  that  inlined  functions do not have an
438                     identifiable return point, so .return is not supported on
439                     .inline probes.
440
441              .call  Filters the results to include only non-inlined functions
442                     (the opposite set of .inline)
443
444              .exported
445                     Filters the results to include only exported functions.
446
447              .statement
448                     Places a probe at the exact spot,  exposing  those  local
449                     variables that are visible there.
450
451              .statement.nearest
452                     Places  a  probe at the nearest available line number for
453                     each line number given in the statement.
454
455              .callee
456                     Places a probe  on  the  callee  function  given  in  the
457                     .callee  modifier,  where  the  callee must be a function
458                     called by the target function given in .function. The ad‐
459                     vantage  of  doing  this over directly probing the callee
460                     function is that this probe point is run  only  when  the
461                     callee  is  called  from  the  target  function  (add the
462                     -DSTAP_CALLEE_MATCHALL directive to  override  this  when
463                     calling stap(1)).
464
465                     Note  that only callees that can be statically determined
466                     are  available.   For  example,  calls  through  function
467                     pointers are not available.  Additionally, calls to func‐
468                     tions located in other objects (e.g.  libraries) are  not
469                     available (instead use another probe point). This feature
470                     will only work for code compiled with GCC 4.7+.
471
472              .callees
473                     Shortcut for .callee("*"), which places a  probe  on  all
474                     callees of the function.
475
476              .callees(DEPTH)
477                     Recursively   places  probes  on  callees.  For  example,
478                     .callees(2) will probe both callees of the  target  func‐
479                     tion,   as   well   as  callees  of  those  callees.  And
480                     .callees(3) goes one level deeper, etc...  A callee probe
481                     at  depth  N  is only triggered when the N callers in the
482                     callstack match those  that  were  statically  determined
483                     during  analysis  (this  also  may  be  overridden  using
484                     -DSTAP_CALLEE_MATCHALL).
485
486       In the above list of probe points, MPATTERN stands for a string literal
487       that aims to identify the loaded kernel module of interest. For in-tree
488       kernel modules, the name suffices (e.g. "btrfs"). The name may also in‐
489       clude  the  "*", "[]", and "?" wildcards to match multiple in-tree mod‐
490       ules. Out-of-tree modules are also supported  by  specifying  the  full
491       path  to the ko file. Wildcards are not supported. The file must follow
492       the convention of being named <module_name>.ko (characters ',' and  '-'
493       are replaced by '_').
494
495       LPATTERN  stands  for  a source program label. It may also contain "*",
496       "[]", and "?" wildcards. PATTERN stands for a string literal that  aims
497       to identify a point in the program.  It is made up of three parts:
498
499       •   The first part is the name of a function, as would appear in the nm
500           program's output.  This part may use the "*"  and  "?"  wildcarding
501           operators to match multiple names.
502
503       •   The  second part is optional and begins with the "@" character.  It
504           is followed by the path to the source file containing the function,
505           which may include a wildcard pattern, such as mm/slab*.  If it does
506           not match as is, an implicit "*/" is optionally  added  before  the
507           pattern, so that a script need only name the last few components of
508           a possibly long source directory path.
509
510       •   Finally, the third part is optional if the file name part was  giv‐
511           en, and identifies the line number in the source file preceded by a
512           ":" or a "+".  The line number is assumed to be  an  absolute  line
513           number if preceded by a ":", or relative to the declaration line of
514           the function if preceded by a "+".  All the lines in  the  function
515           can  be  matched  with  ":*".   A range of lines x through y can be
516           matched with ":x-y". Ranges and specific lines can be  mixed  using
517           commas, e.g. ":x,y-z".
518
519       As an alternative, PATTERN may be a numeric constant, indicating an ad‐
520       dress.  Such an address may be found from symbol tables of  the  appro‐
521       priate  kernel  /  module  object  file.   It is verified against known
522       statement code boundaries, and will be relocated for use at run time.
523
524       In guru mode only, absolute kernel-space  addresses  may  be  specified
525       with the ".absolute" suffix.  Such an address is considered already re‐
526       located, as if it came from /proc/kallsyms, so  it  cannot  be  checked
527       against statement/instruction boundaries.
528
529
530   CONTEXT VARIABLES
531       Many  of  the  source-level context variables, such as function parame‐
532       ters, locals, globals visible in the compilation unit, may  be  visible
533       to  probe  handlers.   They  may  refer to these variables by prefixing
534       their name with "$" within the scripts.  In addition, a special  syntax
535       allows  limited  traversal  of  structures, pointers, and arrays.  More
536       syntax allows pretty-printing of individual variables or their  groups.
537       See  also  @cast.   Note that variables may be inaccessible due to them
538       being paged out, or  for  a  few  other  reasons.   See  also  man  er‐
539       ror::fault(7stap).
540
541
542       Functions  called  from  DWARF class probe points and from process.mark
543       probes may also refer to context variables.
544
545
546       $var   refers to an in-scope variable or thread local storage  variable
547              "var".   If  it's  an  integer-like  type,  it will be cast to a
548              64-bit int for systemtap script use.  String-like pointers (char
549              *)  may  be  copied  to  systemtap  string values using the ker‐
550              nel_string or user_string functions.
551
552       @var("varname")
553              an alternative syntax for $varname
554
555       @var("varname","module")
556              The global variable or global thread local storage  variable  in
557              scope of the given module already loaded into the current probed
558              process.  Useful to get an exported variable in a shared library
559              loaded  into the process being probed, or a global variable in a
560              process while a shared library probe is being executed.  For us‐
561              er-space  modules  only.  For example: @var("_r_debug","/lib/ld-
562              linux.so.2")
563
564       @var("varname@src/file.c")
565              refers to the global (either file local  or  external)  variable
566              varname defined when the file src/file.c was compiled. The CU in
567              which the variable is resolved is the first CU in the module  of
568              the probe point which matches the given file name at the end and
569              has    the    shortest    file    name    path    (e.g.    given
570              @var("foo@bar/baz.c")  and CUs with file name paths src/sub/mod‐
571              ule/bar/baz.c and src/bar/baz.c the second CU will be chosen  to
572              resolve the (file) global variable foo
573
574
575       @var("varname@src/file.c","module")
576              The  global  variable  in  scope of the given CU, defined in the
577              given module, even if the variable is static (so the name is not
578              unique without the CU name).
579
580
581       $var->field traversal via a structure's or a pointer's field.  This
582              generalized  indirection operator may be repeated to follow more
583              levels.  Note that the .  operator is not used for plain  struc‐
584              ture  members,  only -> for both purposes.  (This is because "."
585              is reserved for string concatenation.) Also note that for direct
586              dereferencing of $var pointer {kernel,user}_{char,int,...}($var)
587              should be used. (Refer to stapfuncs(5) for more details.)
588
589       $return
590              is available in return probes only for functions  that  are  de‐
591              clared  with  a return value, which can be determined using @de‐
592              fined($return).
593
594       $var[N]
595              indexes into an array.  The index given with a literal number or
596              even an arbitrary numeric expression.
597
598       A  number  of  operators  exist for such basic context variable expres‐
599       sions:
600
601       $$vars expands to a character string that is equivalent to
602
603              sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
604                      parm1, ..., parmN, var1, ..., varN)
605
606              for each variable in scope at the probe point.  Some values  may
607              be printed as =?  if their run-time location cannot be found.
608
609       $$locals
610              expands to a subset of $$vars for only local variables.
611
612       $$parms
613              expands to a subset of $$vars for only function parameters.
614
615       $$return
616              is available in return probes only.  It expands to a string that
617              is equivalent to sprintf("return=%x",  $return)  if  the  probed
618              function has a return value, or else an empty string.
619
620       & $EXPR
621              expands to the address of the given context variable expression,
622              if it is addressable.
623
624       @defined($EXPR)
625              expands to 1 or 0 iff the given context variable  expression  is
626              resolvable, for use in conditionals such as
627
628              @defined($foo->bar) ? $foo->bar : 0
629
630
631       @probewrite($VAR)
632              see the PROBES section of stap(1).
633
634       $EXPR$ expands to a string with all of $EXPR's members, equivalent to
635
636              sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
637                       $EXPR->a, $EXPR->b)
638
639
640       $EXPR$$
641              expands  to  a string with all of $var's members and submembers,
642              equivalent to
643
644              sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
645                      $EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
646
647
648       @errno expands to the last value the C library  global  variable  errno
649              was set to.
650
651
652   MORE ON RETURN PROBES
653       For the kernel ".return" probes, only a certain fixed number of returns
654       may be outstanding.  The default is a relatively small number,  on  the
655       order  of  a  few times the number of physical CPUs.  If many different
656       threads concurrently call the same blocking function, such as  futex(2)
657       or  read(2),  this  limit  could  be exceeded, and skipped "kretprobes"
658       would be reported by "stap -t".  To work around this, specify a
659
660              probe FOO.return.maxactive(NNN)
661
662       suffix, with a large enough NNN  to  cover  all  expected  concurrently
663       blocked threads.  Alternately, use the
664
665              stap -DKRETACTIVE=NNNN
666
667       stap  command  line macro setting to override the default for all ".re‐
668       turn" probes.
669
670
671       For ".return" probes, context variables other than the "$return" may be
672       accessible,  as a convenience for a script programmer wishing to access
673       function parameters.  These values are snapshots taken at the  time  of
674       function entry.  (Local variables within the function are not generally
675       accessible, since those variables did not exist  in  allocated/initial‐
676       ized  form  at  the  snapshot  moment.)  These entry-snapshot variables
677       should be accessed via @entry($var).
678
679       In addition, arbitrary entry-time expressions can  also  be  saved  for
680       ".return" probes using the @entry(expr) operator.  For example, one can
681       compute the elapsed time of a function:
682
683              probe kernel.function("do_filp_open").return {
684                  println( get_timeofday_us() - @entry(get_timeofday_us()) )
685              }
686
687
688
689       The following table summarizes how values related to a function parame‐
690       ter context variable, a pointer named addr, may be accessed from a .re‐
691       turn probe.
692
693       at-entry value   past-exit value
694
695       $addr            not available
696       $addr->x->y      @cast(@entry($addr),"struct zz")->x->y
697       $addr[0]         {kernel,user}_{char,int,...}(& $addr[0])
698
699
700
701   DWARFLESS
702       In absence of debugging information, entry & exit points  of  kernel  &
703       module  functions  can  be  probed using the "kprobe" family of probes.
704       However, these do not permit looking up the arguments / local variables
705       of the function.  Following constructs are supported :
706
707              kprobe.function(FUNCTION)
708              kprobe.function(FUNCTION).call
709              kprobe.function(FUNCTION).return
710              kprobe.module(NAME).function(FUNCTION)
711              kprobe.module(NAME).function(FUNCTION).call
712              kprobe.module(NAME).function(FUNCTION).return
713              kprobe.statement(ADDRESS).absolute
714
715
716       Probes  of  type function are recommended for kernel functions, whereas
717       probes of type module are recommended  for  probing  functions  of  the
718       specified  module.   In case the absolute address of a kernel or module
719       function is known, statement probes can be utilized.
720
721       Note that FUNCTION and MODULE names must not contain wildcards, or  the
722       probe will not be registered.  Also, statement probes must be run under
723       guru-mode only.
724
725
726
727   USER-SPACE
728       Support for user-space probing is available for kernels that  are  con‐
729       figured  with  the  utrace  extensions, or have the uprobes facility in
730       linux 3.5.  (Various kernel build configuration options need to be  en‐
731       abled; systemtap will advise if these are missing.)
732
733
734       There are several forms.  First, a non-symbolic probe point:
735
736              process(PID).statement(ADDRESS).absolute
737
738       is analogous to kernel.statement(ADDRESS).absolute in that both use raw
739       (unverified) virtual addresses and provide no $variables.   The  target
740       PID parameter must identify a running process, and ADDRESS should iden‐
741       tify a valid instruction address.  All threads of that process will  be
742       probed.
743
744       Second, non-symbolic user-kernel interface events handled by utrace may
745       be probed:
746
747              process(PID).begin
748              process("FULLPATH").begin
749              process.begin
750              process(PID).thread.begin
751              process("FULLPATH").thread.begin
752              process.thread.begin
753              process(PID).end
754              process("FULLPATH").end
755              process.end
756              process(PID).thread.end
757              process("FULLPATH").thread.end
758              process.thread.end
759              process(PID).syscall
760              process("FULLPATH").syscall
761              process.syscall
762              process(PID).syscall.return
763              process("FULLPATH").syscall.return
764              process.syscall.return
765
766
767
768       A process.begin probe gets called when new process described by PID  or
769       FULLPATH gets created.  In addition, it is called once from the context
770       of each preexisting process, at systemtap script startup.  This is use‐
771       ful  to track live processes.  A process.thread.begin probe gets called
772       when a new thread  described  by  PID  or  FULLPATH  gets  created.   A
773       process.end probe gets called when process described by PID or FULLPATH
774       dies.  A process.thread.end probe gets called when a  thread  described
775       by  PID  or  FULLPATH dies.  A process.syscall probe gets called when a
776       thread described by PID or FULLPATH makes a system  call.   The  system
777       call  number  is  available  in  the $syscall context variable, and the
778       first 6 arguments of the system call are available in  the  $argN  (ex.
779       $arg1,  $arg2,  ...)  context variable.  A process.syscall.return probe
780       gets called when a thread described by PID or FULLPATH returns  from  a
781       system  call.  The system call number is available in the $syscall con‐
782       text variable, and the return value of the system call is available  in
783       the $return context variable.  A
784
785
786       If  a  process  probe  is specified without a PID or FULLPATH, all user
787       threads will be probed.  However, if systemtap was invoked with the  -c
788       or  -x options, then process probes are restricted to the process hier‐
789       archy associated with the target process.  If a process  probe  is  un‐
790       specified (i.e. without a PID or FULLPATH), but with the -c option, the
791       PATH of the -c cmd will be heuristically filled into the process  PATH.
792       In  that  case,  only  command parameters are allowed in the -c command
793       (i.e. no command substitution allowed and  no  occurrences  of  any  of
794       these characters: '|&;<>(){}').
795
796
797       Third,  symbolic  static  instrumentation  compiled  into  programs and
798       shared libraries may be probed:
799
800              process("PATH").mark("LABEL")
801              process("PATH").provider("PROVIDER").mark("LABEL")
802              process(PID).mark("LABEL")
803              process(PID).provider("PROVIDER").mark("LABEL")
804
805
806       A .mark probe gets called via a static probe which is  defined  in  the
807       application  by  STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros de‐
808       fined in sys/sdt.h.  The PROVIDER is an arbitrary application identifi‐
809       er,  LABEL is the marker site identifier, and arg1 is the integer-typed
810       argument.  STAP_PROBE1 is used for probes with 1 argument,  STAP_PROBE2
811       is  used  for probes with 2 arguments, and so on.  The arguments of the
812       probe are available in the context variables $arg1, $arg2, ...  An  al‐
813       ternative to using the STAP_PROBE macros is to use the dtrace script to
814       create  custom  macros.   Additionally,  the   variables   $$name   and
815       $$provider  are  available  as  parts  of  the  probe  point name.  The
816       sys/sdt.h macro  names  DTRACE_PROBE*  are  available  as  aliases  for
817       STAP_PROBE*.
818
819
820       Finally,  full  symbolic source-level probes in user-space programs and
821       shared libraries are supported.  These are  exactly  analogous  to  the
822       symbolic DWARF-based kernel/module probes described above.  They expose
823       the same sorts of context $variables  for  function  parameters,  local
824       variables, and so on.
825
826              process("PATH").function("NAME")
827              process("PATH").statement("*@FILE.c:123")
828              process("PATH").plt("NAME")
829              process("PATH").library("PATH").plt("NAME")
830              process("PATH").library("PATH").function("NAME")
831              process("PATH").library("PATH").statement("*@FILE.c:123")
832              process("PATH").function("*").return
833              process("PATH").function("myfun").label("foo")
834              process("PATH").function("foo").callee("bar")
835              process("PATH").plt("NAME").return
836              debuginfod.process("PATH").**
837              process(PID).function("NAME")
838              process(PID).statement("*@FILE.c:123")
839              process(PID).plt("NAME")
840
841
842
843       Note  that for all process probes, PATH names refer to executables that
844       are searched the same way shells do: relative to the working  directory
845       if they contain a "/" character, otherwise in $PATH.  If PATH names re‐
846       fer to scripts, the actual interpreters (specified in the script in the
847       first  line  after  the  #!  characters) are probed.  In the debuginfod
848       probe family PATH names likewise refer to executables, but are searched
849       for in the currently defined $DEBUGINFOD_URLS.
850
851
852
853       Tapset   process   probes   placed   in  the  special  directory  $pre‐
854       fix/share/systemtap/tapset/PATH/ with relative paths  will  have  their
855       process  parameter  prefixed with the location of the tapset. For exam‐
856       ple,
857
858
859              process("foo").function("NAME")
860
861
862       expands to
863
864              process("/usr/bin/foo").function("NAME")
865
866
867
868       when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
869
870
871       If PATH is a process component parameter referring to shared  libraries
872       then  all  processes that map it at runtime would be selected for prob‐
873       ing.  If PATH is a library component parameter referring to shared  li‐
874       braries  then  the  process specified by the process component would be
875       selected.  Note that the PATH pattern in a library component  will  al‐
876       ways  apply  to  libraries  statically  determined  to be in use by the
877       process. However, you may also specify the full  path  to  any  library
878       file even if not statically needed by the process.
879
880
881       A  .plt  probe will probe functions in the program linkage table corre‐
882       sponding to the rest of the probe point.  .plt can be  specified  as  a
883       shorthand for .plt("*").  The symbol name is available as a $$name con‐
884       text variable; function arguments are not  available,  since  PLTs  are
885       processed without debuginfo.  A .plt.return probe places a probe at the
886       moment after the return from the named function.
887
888
889       If the PATH string contains wildcards as in  the  MPATTERN  case,  then
890       standard  globbing  is  performed  to find all matching paths.  In this
891       case, the $PATH environment variable is not used.
892
893
894       If systemtap was invoked with the -c or -x options, then process probes
895       are  restricted  to  the  process  hierarchy associated with the target
896       process.
897
898
899   DEBUGINFOD
900       These probes take the form
901
902              debuginfod.process("PATH").**
903
904
905       They are very similar to the process("PATH").** probe family.  The  key
906       difference  is  that  the  process  probes  search for PATH in the host
907       filesystem, while debuginfod probes search the  current  federation  of
908       debuginfod  servers,  using the currently defined $DEBUGINFOD_URLS (see
909       debuginfod(8) ).
910
911
912       In order to probe the contents of one or more elf/archive files  and/or
913       elf/archive  containing directories, the below will create a debuginfod
914       server which will scan and process the elf  files  within  and  prepare
915       them for systemtap.
916
917              $ debuginfod [options] [-F -R -Z etc.] /path1 /path2
918              $ env DEBUGINFOD_URLS=http://localhost:8002/ stap ...
919
920
921
922   JAVA
923       Support  for probing Java methods is available using Byteman as a back‐
924       end. Byteman is an instrumentation tool from the  JBoss  project  which
925       systemtap  can use to monitor invocations for a specific method or line
926       in a Java program.
927
928       Systemtap does so by generating a Byteman script listing the probes  to
929       instrument and then invoking the Byteman bminstall utility.
930
931       This Java instrumentation support is currently a prototype feature with
932       major limitations.  Moreover, Java  probing  currently  does  not  work
933       across  users;  the stap script must run (with appropriate permissions)
934       under the same user that the Java process being probed.  (Thus  a  stap
935       script under root currently cannot probe Java methods in a non-root-us‐
936       er Java process.)
937
938
939       The first probe type refers to Java processes by the name of  the  Java
940       process:
941
942              java("PNAME").class("CLASSNAME").method("PATTERN")
943              java("PNAME").class("CLASSNAME").method("PATTERN").return
944
945       The  PNAME argument must be a pre-existing jvm pid, and be identifiable
946       via a jps listing.
947
948       The PATTERN parameter specifies the signature of  the  Java  method  to
949       probe. The signature must consist of the exact name of the method, fol‐
950       lowed by a bracketed list of the types of the arguments,  for  instance
951       "myMethod(int,double,Foo)". Wildcards are not supported.
952
953       The probe can be set to trigger at a specific line within the method by
954       appending a line number with colon, just as in other types  of  probes:
955       "myMethod(int,double,Foo):245".
956
957       The  CLASSNAME  parameter  identifies the Java class the method belongs
958       to, either with or without the package qualification. By  default,  the
959       probe  only  triggers  on descendants of the class that do not override
960       the method definition of the original  class.  However,  CLASSNAME  can
961       take  an  optional caret prefix, as in ^org.my.MyClass, which specifies
962       that the probe should also trigger on all descendants of  MyClass  that
963       override the original method. For instance, every method with signature
964       foo(int) in program org.my.MyApp can be probed at once using
965
966              java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
967
968
969       The second probe type works analogously, but refers to  Java  processes
970       by PID:
971
972              java(PID).class("CLASSNAME").method("PATTERN")
973              java(PID).class("CLASSNAME").method("PATTERN").return
974
975       (PIDs  for  an already running process can be obtained using the jps(1)
976       utility.)
977
978       Context variables defined within  java  probes  include  $arg1  through
979       $arg10  (for  up to the first 10 arguments of a method), represented as
980       character-pointers for the toString() form  of  each  actual  argument.
981       The  arg1 through arg10 script variables provide access to these as or‐
982       dinary strings, fetched via user_string_warn().
983
984       Prior to systemtap version 3.1, $arg1 through $arg10 could contain  ei‐
985       ther  integers or character pointers, depending on the types of the ob‐
986       jects being passed to each particular java method.  This  previous  be‐
987       haviour may be invoked with the stap --compatible=3.0 flag.
988
989
990   PROCFS
991       These  probe  points allow procfs "files" in /proc/systemtap/MODNAME to
992       be created, read and written using a permission that  may  be  modified
993       using  the  proper  umask  value. Default permissions are 0400 for read
994       probes, and 0200 for write probes. If both a read and write  probe  are
995       being used on the same file, a default permission of 0600 will be used.
996       Using procfs.umask(0040).read would result in a 0404 permission set for
997       the  file.   (MODNAME  is  the  name of the systemtap module). The proc
998       filesystem is a pseudo-filesystem which is used as an interface to ker‐
999       nel  data  structures. There are several probe point variants supported
1000       by the translator:
1001
1002
1003              procfs("PATH").read
1004              procfs("PATH").umask(UMASK).read
1005              procfs("PATH").read.maxsize(MAXSIZE)
1006              procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
1007              procfs("PATH").write
1008              procfs("PATH").umask(UMASK).write
1009              procfs.read
1010              procfs.umask(UMASK).read
1011              procfs.read.maxsize(MAXSIZE)
1012              procfs.umask(UMASK).read.maxsize(MAXSIZE)
1013              procfs.write
1014              procfs.umask(UMASK).write
1015
1016
1017       Note that there are a few differences when procfs probes  are  used  in
1018       the  stapbpf  runtime.   FIFO  special  files  are used instead of proc
1019       filesystem files.  These files are  created  in  /var/tmp/systemtap-US‐
1020       ER/MODNAME.   (USER is the name of the user).  Additionally, users can‐
1021       not create both read and write probes on the same file.
1022
1023       PATH  is  the  file  name  (relative  to   /proc/systemtap/MODNAME   or
1024       /var/tmp/systemtap-USER/MODNAME)  to  be created.  If no PATH is speci‐
1025       fied (as in the last two variants above), PATH defaults  to  "command".
1026       The  file  name  "__stdin"  is  used  internally by systemtap for input
1027       probes and should not be used as a PATH for procfs probes; see the  in‐
1028       put probe section below.
1029
1030       When  a  user  reads  /proc/systemtap/MODNAME/PATH  (normal runtime) or
1031       /var/tmp/systemtap-USER/MODNAME (stapbpf  runtime),  the  corresponding
1032       procfs  read  probe is triggered.  The string data to be read should be
1033       assigned to a variable named $value, like this:
1034
1035
1036              procfs("PATH").read { $value = "100\n" }
1037
1038
1039       When a user writes into /proc/systemtap/MODNAME/PATH  (normal  runtime)
1040       or /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the corresponding
1041       procfs write probe is triggered.  The data the user wrote is  available
1042       in the string variable named $value, like this:
1043
1044
1045              procfs("PATH").write { printf("user wrote: %s", $value) }
1046
1047
1048       MAXSIZE  is the size of the procfs read buffer.  Specifying MAXSIZE al‐
1049       lows larger procfs output.  If no MAXSIZE is specified, the procfs read
1050       buffer  defaults to STP_PROCFS_BUFSIZE (which defaults to MAXSTRINGLEN,
1051       the maximum length of a string).  If setting the  procfs  read  buffers
1052       for  more  than  one  file is needed, it may be easiest to override the
1053       STP_PROCFS_BUFSIZE definition.  Here's an example of using MAXSIZE:
1054
1055
1056              procfs.read.maxsize(1024) {
1057                  $value = "long string..."
1058                  $value .= "another long string..."
1059                  $value .= "another long string..."
1060                  $value .= "another long string..."
1061              }
1062
1063
1064
1065   INPUT
1066       These probe points make input from stdin available to the script during
1067       runtime.   The translator currently supports two variants of this fami‐
1068       ly:
1069
1070              input.char
1071              input.line
1072
1073
1074       input.char is triggered each time a character is read from  stdin.  The
1075       current  character  is  available  in  the  string variable named char.
1076       There is no newline buffering; the next character is read from stdin as
1077       soon as it becomes available.
1078
1079       input.line causes all characters read from stdin to be buffered until a
1080       newline is read, at which point the probe will be triggered.  The  cur‐
1081       rent  line of characters (including the newline) is made available in a
1082       string variable named line.  Note that no more than MAXSTRINGLEN  char‐
1083       acters will be buffered. Any additional characters will not be included
1084       in line.
1085
1086
1087       Input probes are aliases for procfs("__stdin").write.  Systemtap recon‐
1088       figures  stdin if the presence of this procfs probe is detected, there‐
1089       fore "__stdin" should not be used as a path argument for procfs probes.
1090       Additionally,  input  probes will not work with the -F and --remote op‐
1091       tions.
1092
1093
1094   NETFILTER HOOKS
1095       These probe points allow observation of network packets using the  net‐
1096       filter  mechanism. A netfilter probe in systemtap corresponds to a net‐
1097       filter hook function in the original netfilter probes API. It is proba‐
1098       bly  more  convenient  to use tapset::netfilter(3stap), which wraps the
1099       primitive netfilter hooks and does the work of extracting useful infor‐
1100       mation from the context variables.
1101
1102
1103       There are several probe point variants supported by the translator:
1104
1105
1106              netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
1107              netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
1108              netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
1109              netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
1110
1111
1112
1113       PROTOCOL_F  is  the protocol family to listen for, currently one of NF‐
1114       PROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
1115
1116
1117       HOOKNAME is the point, or 'hook', in the protocol stack at which to in‐
1118       tercept  the  packet. The available hook names for each protocol family
1119       are taken from the kernel header files <linux/netfilter_ipv4.h>,  <lin‐
1120       ux/netfilter_ipv6.h>,    <linux/netfilter_arp.h>   and   <linux/netfil‐
1121       ter_bridge.h>. For instance, allowable hook names for NFPROTO_IPV4  are
1122       NF_INET_PRE_ROUTING,   NF_INET_LOCAL_IN,  NF_INET_FORWARD,  NF_INET_LO‐
1123       CAL_OUT, and NF_INET_POST_ROUTING.
1124
1125
1126       PRIORITY is an integer priority giving the order  in  which  the  probe
1127       point  should  be  triggered relative to any other netfilter hook func‐
1128       tions which trigger on the same packet. Hook functions execute on  each
1129       packet  in order from smallest priority number to largest priority num‐
1130       ber. If no PRIORITY is specified (as in the first two probe point vari‐
1131       ants above), PRIORITY defaults to "0".
1132
1133       There are a number of predefined priority names of the form NF_IP_PRI_*
1134       and NF_IP6_PRI_* which are defined in the  kernel  header  files  <lin‐
1135       ux/netfilter_ipv4.h>  and  <linux/netfilter_ipv6.h>  respectively.  The
1136       script is permitted to use these instead of specifying an integer  pri‐
1137       ority.  (The  probe points for NFPROTO_ARP and NFPROTO_BRIDGE currently
1138       do not expose any named hook priorities to the script  writer.)   Thus,
1139       allowable ways to specify the priority include:
1140
1141
1142              priority("255")
1143              priority("NF_IP_PRI_SELINUX_LAST")
1144
1145
1146       A script using guru mode is permitted to specify any identifier or num‐
1147       ber as the parameter for hook, pf, and priority. This feature should be
1148       used  with  caution,  as  the parameter is inserted verbatim into the C
1149       code generated by systemtap.
1150
1151       The netfilter probe points define the following context variables:
1152
1153       $hooknum
1154              The hook number.
1155
1156       $skb   The address of the sk_buff struct representing the  packet.  See
1157              <linux/skbuff.h>  for  details on how to use this struct, or al‐
1158              ternatively use the tapset tapset::netfilter(3stap) for easy ac‐
1159              cess to key information.
1160
1161
1162       $in    The  address  of  the net_device struct representing the network
1163              device on which the packet was received (if any). May  be  0  if
1164              the device is unknown or undefined at that stage in the protocol
1165              stack.
1166
1167
1168       $out   The address of the net_device struct  representing  the  network
1169              device  on  which  the packet will be sent (if any). May be 0 if
1170              the device is unknown or undefined at that stage in the protocol
1171              stack.
1172
1173
1174       $verdict
1175              (Guru mode only.) Assigning one of the verdict values defined in
1176              <linux/netfilter.h> to this variable alters the further progress
1177              of the packet through the protocol stack. For instance, the fol‐
1178              lowing guru mode script forces all ipv6 network  packets  to  be
1179              dropped:
1180
1181
1182              probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
1183                $verdict = 0 /* nf_drop */
1184              }
1185
1186
1187              For  convenience,  unlike  the  primitive probe points discussed
1188              here, the probes defined in tapset::netfilter(3stap) export  the
1189              lowercase  names  of the verdict constants (e.g. NF_DROP becomes
1190              nf_drop) as local variables.
1191
1192
1193   KERNEL TRACEPOINTS
1194       This family of probe points hooks up to static probing tracepoints  in‐
1195       serted  into the kernel or modules.  As with markers, these tracepoints
1196       are special macro calls inserted by kernel developers to  make  probing
1197       faster and more reliable than with DWARF-based probes, and DWARF debug‐
1198       ging information is not required  to  probe  tracepoints.   Tracepoints
1199       have an extra advantage of more strongly-typed parameters than markers.
1200
1201       Tracepoint probes look like: kernel.trace("name").  The tracepoint name
1202       string, which may contain the usual  wildcard  characters,  is  matched
1203       against  the  names  defined by the kernel developers in the tracepoint
1204       header files. To restrict  the  search  to  specific  subsystems  (e.g.
1205       sched,   ext3,   etc...),  the  following  syntax  can  be  used:  ker‐
1206       nel.trace("system:name").  The tracepoint system string may  also  con‐
1207       tain the usual wildcard characters.
1208
1209       The  handler  associated with a tracepoint-based probe may read the op‐
1210       tional parameters specified at the macro call site.   These  are  named
1211       according  to  the  declaration by the tracepoint author.  For example,
1212       the tracepoint probe  kernel.trace("sched:sched_switch")  provides  the
1213       parameters  $prev and $next.  If the parameter is a complex type, as in
1214       a struct pointer, then a script can access fields with the same  syntax
1215       as DWARF $target variables.  Also, tracepoint parameters cannot be mod‐
1216       ified, but in guru-mode a script may modify fields of parameters.
1217
1218       The subsystem and name of the tracepoint are available in $$system  and
1219       $$name  and a string of name=value pairs for all parameters of the tra‐
1220       cepoint is available in $$vars or $$parms.
1221
1222
1223   KERNEL MARKERS (OBSOLETE)
1224       This family of probe points hooks up to an older style of static  prob‐
1225       ing  markers inserted into older kernels or modules.  These markers are
1226       special STAP_MARK macro calls inserted by  kernel  developers  to  make
1227       probing  faster  and  more reliable than with DWARF-based probes.  Fur‐
1228       ther, DWARF debugging information is not required to probe markers.
1229
1230       Marker probe points begin with kernel.  The next part names the  marker
1231       itself:  mark("name").   The  marker name string, which may contain the
1232       usual wildcard characters, is matched against the names  given  to  the
1233       marker  macros when the kernel and/or module was compiled.    Optional‐
1234       ly, you can specify format("format").   Specifying  the  marker  format
1235       string  allows  differentiation  between two markers with the same name
1236       but different marker format strings.
1237
1238       The handler associated with a marker-based probe may read the  optional
1239       parameters  specified  at  the  macro call site.  These are named $arg1
1240       through $argNN, where NN is the number of parameters  supplied  by  the
1241       macro.  Number and string parameters are passed in a type-safe manner.
1242
1243       The marker format string associated with a marker is available in $for‐
1244       mat.  And also the marker name string is available in $name.
1245
1246
1247   HARDWARE BREAKPOINTS
1248       This family of probes is used to set hardware watchpoints for a given
1249        (global) kernel symbol. The probes take three components as inputs :
1250
1251       1. The virtual address / name of the kernel symbol to be traced is sup‐
1252       plied  as argument to this class of probes. ( Probes for only data seg‐
1253       ment variables are supported. Probing local  variables  of  a  function
1254       cannot be done.)
1255
1256       2. Nature of access to be probed : a.  .write probe gets triggered when
1257       a write happens at the specified address/symbol name.  b.  rw probe  is
1258       triggered when either a read or write happens.
1259
1260       3.   .length (optional) Users have the option of specifying the address
1261       interval to be probed using  "length"  constructs.  The  user-specified
1262       length  gets  approximated  to the closest possible address length that
1263       the architecture can support. If the specified length exceeds the  lim‐
1264       its imposed by architecture, an error message is flagged and probe reg‐
1265       istration fails.  Wherever 'length' is not  specified,  the  translator
1266       requests  a  hardware  breakpoint probe of length 1. It should be noted
1267       that the "length" construct is not valid with symbol names.
1268
1269       Following constructs are supported :
1270
1271              probe kernel.data(ADDRESS).write
1272              probe kernel.data(ADDRESS).rw
1273              probe kernel.data(ADDRESS).length(LEN).write
1274              probe kernel.data(ADDRESS).length(LEN).rw
1275              probe kernel.data("SYMBOL_NAME").write
1276              probe kernel.data("SYMBOL_NAME").rw
1277
1278
1279       This set of probes make use of the debug registers  of  the  processor,
1280       which  is  a  scarce  resource.  (4  on x86 , 1 on powerpc ) The script
1281       translation flags a warning if a user requests more hardware breakpoint
1282       probes  than the limits set by architecture. For example,a pass-2 warn‐
1283       ing is flashed when an input  script  requests  5  hardware  breakpoint
1284       probes  on an x86 system while x86 architecture supports a maximum of 4
1285       breakpoints.  Users are cautioned to set probes judiciously.
1286
1287
1288   PERF
1289       This family of probe points interfaces to the kernel "perf  event"  in‐
1290       frastructure for controlling hardware performance counters.  The events
1291       being attached to are described by the "type", "config" fields  of  the
1292       perf_event_attr  structure,  and are sampled at an interval governed by
1293       the "sample_period" and "sample_freq" fields.
1294
1295       These fields are made available to systemtap scripts using the  follow‐
1296       ing syntax:
1297
1298              probe perf.type(NN).config(MM).sample(XX)
1299              probe perf.type(NN).config(MM).hz(XX)
1300              probe perf.type(NN).config(MM)
1301              probe perf.type(NN).config(MM).process("PROC")
1302              probe perf.type(NN).config(MM).counter("COUNTER")
1303              probe perf.type(NN).config(MM).process("PROC").counter("NAME")
1304
1305       The systemtap probe handler is called once per XX increments of the un‐
1306       derlying performance counter when using the .sample field or at a  fre‐
1307       quency  in  hertz when using the .hz field. When not specified, the de‐
1308       fault behavior is to sample at a count of 1000000.  The range of  valid
1309       type/config  is described by the perf_event_open(2) system call, and/or
1310       the linux/perf_event.h file.  Invalid combinations or  exhausted  hard‐
1311       ware  counter resources result in errors during systemtap script start‐
1312       up.  Systemtap does not sanity-check the values: it merely passes  them
1313       through  to  the kernel for error- and safety-checking.  By default the
1314       perf event probe is systemwide unless .process is specified, which will
1315       bind  the  probe to a specific task.  If the name is omitted then it is
1316       inferred from the stap -c argument.   A perf event can be read  on  de‐
1317       mand  using  .counter.   The body of the perf probe handler will not be
1318       invoked for a .counter probe; instead, the counter is read  in  a  user
1319       space probe via:
1320
1321          process("PROC").statement("func@file") {stat <<< @perf("NAME")}
1322
1323
1324
1325   PYTHON
1326       Support  for  probing  python 2 and python 3 function is available with
1327       the help of an extra python support module. Note that the debuginfo for
1328       the  version of python being probed is required. To run a python script
1329       with the extra python support module you'd add the '-m  HelperSDT'  op‐
1330       tion to your python command, like this:
1331
1332              stap foo.stp -c "python -m HelperSDT foo.py"
1333
1334       Python probes look like the following:
1335
1336              python2.module("MPATTERN").function("PATTERN")
1337              python2.module("MPATTERN").function("PATTERN").call
1338              python2.module("MPATTERN").function("PATTERN").return
1339              python3.module("MPATTERN").function("PATTERN")
1340              python3.module("MPATTERN").function("PATTERN").call
1341              python3.module("MPATTERN").function("PATTERN").return
1342
1343       The  list  above includes multiple variants and modifiers which provide
1344       additional functionality or filters. They are:
1345
1346              .function
1347                     Places a probe at the beginning of the named function  by
1348                     default,  unless  modified  by  PATTERN.  Parameters  are
1349                     available as context variables.
1350
1351              .call  Places a probe at the beginning of  the  named  function.
1352                     Parameters are available as context variables.
1353
1354              .return
1355                     Places  a  probe at the moment before the return from the
1356                     named function. Parameters and local/global python  vari‐
1357                     ables are available as context variables.
1358
1359       PATTERN  stands  for  a string literal that aims to identify a point in
1360       the python program.  It is made up of three parts:
1361
1362       •   The first part is the name of a  function  (e.g.  "foo")  or  class
1363           method  (e.g.  "bar.baz").  This part may use the "*" and "?" wild‐
1364           carding operators to match multiple names.
1365
1366       •   The second part is optional and begins with the "@" character.   It
1367           is followed by the path to the source file containing the function,
1368           which may include a wildcard pattern. The python path  is  searched
1369           for a matching filename.
1370
1371       •   Finally,  the third part is optional if the file name part was giv‐
1372           en, and identifies the line number in the source file preceded by a
1373           ":"  or  a  "+".  The line number is assumed to be an absolute line
1374           number if preceded by a ":", or relative to the declaration line of
1375           the  function  if preceded by a "+".  All the lines in the function
1376           can be matched with ":*".  A range of lines  x  through  y  can  be
1377           matched  with  ":x-y". Ranges and specific lines can be mixed using
1378           commas, e.g. ":x,y-z".
1379
1380       In the above list of probe points, MPATTERN stands for a python  module
1381       or  script name that names the python module of interest. This part may
1382       use the "*" and "?" wildcarding operators to match multiple names.  The
1383       python path is searched for a matching filename.
1384
1385
1386

EXAMPLES

1388       Here are some example probe points, defining the associated events.
1389
1390       begin, end, end
1391              refers  to  the  startup and normal shutdown of the session.  In
1392              this case, the handler would run once during startup  and  twice
1393              during shutdown.
1394
1395       timer.jiffies(1000).randomize(200)
1396              refers to a periodic interrupt, every 1000 +/- 200 jiffies.
1397
1398       kernel.function("*init*"), kernel.function("*exit*")
1399              refers  to  all  kernel  functions  with "init" or "exit" in the
1400              name.
1401
1402       kernel.function("*@kernel/time.c:240")
1403              refers to any functions within  the  "kernel/time.c"  file  that
1404              span  line 240.   Note that this is not a probe at the statement
1405              at that line number.  Use the kernel.statement probe instead.
1406
1407       kernel.trace("sched_*")
1408              refers to all scheduler-related (really,  prefixed)  tracepoints
1409              in the kernel.
1410
1411       kernel.mark("getuid")
1412              refers  to  an obsolete STAP_MARK(getuid, ...) macro call in the
1413              kernel.
1414
1415       module("usb*").function("*sync*").return
1416              refers to the moment of return from all functions with "sync" in
1417              the name in any of the USB drivers.
1418
1419       kernel.statement(0xc0044852)
1420              refers  to  the  first  byte of the statement whose compiled in‐
1421              structions include the given address in the kernel.
1422
1423       kernel.statement("*@kernel/time.c:296")
1424              refers to the statement of line 296 within "kernel/time.c".
1425
1426       kernel.statement("bio_init@fs/bio.c+3")
1427              refers to the statement at line bio_init+3 within "fs/bio.c".
1428
1429       kernel.data("pid_max").write
1430              refers to a hardware breakpoint of type "write" set on pid_max
1431
1432       syscall.*.return
1433              refers to the group of probe aliases with any name in the  third
1434              position
1435
1436