1STAPPROBES(3stap) STAPPROBES(3stap)
2
3
4
6 stapprobes - systemtap probe points
7
8
9
11 The following sections enumerate the variety of probe points supported
12 by the systemtap translator, and some of the additional aliases defined
13 by standard tapset scripts. Many are individually documented in the
14 3stap manual section, with the probe:: prefix.
15
16
18 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
19
20
21 A probe declaration may list multiple comma-separated probe points in
22 order to attach a handler to all of the named events. Normally, the
23 handler statements are run whenever any of events occur.
24
25 The syntax of a single probe point is a general dotted-symbol sequence.
26 This allows a breakdown of the event namespace into parts, somewhat
27 like the Domain Name System does on the Internet. Each component iden‐
28 tifier may be parametrized by a string or number literal, with a syntax
29 like a function call. A component may include a "*" character, to ex‐
30 pand to a set of matching probe points. It may also include "**" to
31 match multiple sequential components at once. Probe aliases likewise
32 expand to other probe points.
33
34 Probe aliases can be given on their own, or with a suffix. The suffix
35 attaches to the underlying probe point that the alias is expanded to.
36 For example,
37
38 syscall.read.return.maxactive(10)
39
40 expands to
41
42 kernel.function("sys_read").return.maxactive(10)
43
44 with the component maxactive(10) being recognized as a suffix.
45
46 Normally, each and every probe point resulting from wildcard- and
47 alias-expansion must be resolved to some low-level system instrumenta‐
48 tion facility (e.g., a kprobe address, marker, or a timer configura‐
49 tion), otherwise the elaboration phase will fail.
50
51 However, a probe point may be followed by a "?" character, to indicate
52 that it is optional, and that no error should result if it fails to re‐
53 solve. Optionalness passes down through all levels of alias/wildcard
54 expansion. Alternately, a probe point may be followed by a "!" charac‐
55 ter, to indicate that it is both optional and sufficient. (Think
56 vaguely of the Prolog cut operator.) If it does resolve, then no fur‐
57 ther probe points in the same comma-separated list will be resolved.
58 Therefore, the "!" sufficiency mark only makes sense in a list of
59 probe point alternatives.
60
61 Additionally, a probe point may be followed by a "if (expr)" statement,
62 in order to enable/disable the probe point on-the-fly. With the "if"
63 statement, if the "expr" is false when the probe point is hit, the
64 whole probe body including alias's body is skipped. The condition is
65 stacked up through all levels of alias/wildcard expansion. So the final
66 condition becomes the logical-and of conditions of all expanded
67 alias/wildcard. The expressions are necessarily restricted to global
68 variables.
69
70 These are all syntactically valid probe points. (They are generally
71 semantically invalid, depending on the contents of the tapsets, and the
72 versions of kernel/user software installed.)
73
74
75 kernel.function("foo").return
76 process("/bin/vi").statement(0x2222)
77 end
78 syscall.*
79 syscall.*.return.maxactive(10)
80 sys**open
81 kernel.function("no_such_function") ?
82 module("awol").function("no_such_function") !
83 signal.*? if (switch)
84 kprobe.function("foo")
85
86
87 Probes may be broadly classified into "synchronous" and "asynchronous".
88 A "synchronous" event is deemed to occur when any processor executes an
89 instruction matched by the specification. This gives these probes a
90 reference point (instruction address) from which more contextual data
91 may be available. Other families of probe points refer to "asynchro‐
92 nous" events such as timers/counters rolling over, where there is no
93 fixed reference point that is related. Each probe point specification
94 may match multiple locations (for example, using wildcards or aliases),
95 and all them are then probed. A probe declaration may also contain
96 several comma-separated specifications, all of which are probed.
97
98
100 Resolving some probe points requires DWARF debuginfo or "debug symbols"
101 for the specific program being instrumented. For some others, DWARF is
102 automatically synthesized on the fly from source code header files.
103 For others, it is not needed at all. Since a systemtap script may use
104 any mixture of probe points together, the union of their DWARF require‐
105 ments has to be met on the computer where script compilation occurs.
106 (See the --use-server option and the stap-server(8) man page for infor‐
107 mation about the remote compilation facility, which allows these re‐
108 quirements to be met on a different machine.)
109
110 The following point lists many of the available probe point families,
111 to classify them with respect to their need for DWARF debuginfo for the
112 specific program for that probe point.
113
114
115 DWARF NON-DWARF SYMBOL-TABLE
116
117 kernel.function, .statement kernel.mark kernel.function*
118 module.function, .statement process.mark, process.plt module.function*
119 process.function, .statement begin, end, error, never process.function*
120 process.mark* timer
121 .function.callee perf
122 procfs
123 AUTO-GENERATED-DWARF kernel.statement.absolute
124 kernel.data
125 kernel.trace kprobe.function
126 process.statement.absolute
127 process.begin, .end
128 netfilter
129 java
130
131
132 The probe types marked with * asterisks mark fallbacks, where systemtap
133 can sometimes infer subset or substitute information. In general, the
134 more symbolic / debugging information available, the higher quality
135 probing will be available.
136
137
138
140 The following types of probe points may be armed/disarmed on-the-fly to
141 save overheads during uninteresting times. Arming conditions may also
142 be added to other types of probes, but will be treated as a wrapping
143 conditional and won't benefit from overhead savings.
144
145
146 DISARMABLE exceptions
147 kernel.function, kernel.statement
148 module.function, module.statement
149 process.*.function, process.*.statement
150 process.*.plt, process.*.mark
151 timer. timer.profile
152 java
153
154
156 BEGIN/END/ERROR
157 The probe points begin and end are defined by the translator to refer
158 to the time of session startup and shutdown. All "begin" probe han‐
159 dlers are run, in some sequence, during the startup of the session.
160 All global variables will have been initialized prior to this point.
161 All "end" probes are run, in some sequence, during the normal shutdown
162 of a session, such as in the aftermath of an exit () function call, or
163 an interruption from the user. In the case of an error-triggered shut‐
164 down, "end" probes are not run. There are no target variables avail‐
165 able in either context.
166
167 If the order of execution among "begin" or "end" probes is significant,
168 then an optional sequence number may be provided:
169
170
171 begin(N)
172 end(N)
173
174
175 The number N may be positive or negative. The probe handlers are run
176 in increasing order, and the order between handlers with the same se‐
177 quence number is unspecified. When "begin" or "end" are given without
178 a sequence, they are effectively sequence zero.
179
180 The error probe point is similar to the end probe, except that each
181 such probe handler run when the session ends after errors have oc‐
182 curred. In such cases, "end" probes are skipped, but each "error"
183 probe is still attempted. This kind of probe can be used to clean up
184 or emit a "final gasp". It may also be numerically parametrized to set
185 a sequence.
186
187
188 NEVER
189 The probe point never is specially defined by the translator to mean
190 "never". Its probe handler is never run, though its statements are an‐
191 alyzed for symbol / type correctness as usual. This probe point may be
192 useful in conjunction with optional probes.
193
194
195 SYSCALL and ND_SYSCALL
196 The syscall.* and nd_syscall.* aliases define several hundred probes,
197 too many to detail here. They are of the general form:
198
199
200 syscall.NAME
201 nd_syscall.NAME
202 syscall.NAME.return
203 nd_syscall.NAME.return
204
205
206 Generally, a pair of probes are defined for each normal system call as
207 listed in the syscalls(2) manual page, one for entry and one for re‐
208 turn. Those system calls that never return do not have a corresponding
209 .return probe. The nd_* family of probes are about the same, except it
210 uses non-DWARF based searching mechanisms, which may result in a lower
211 quality of symbolic context data (parameters), and may miss some system
212 calls. You may want to try them first, in case kernel debugging infor‐
213 mation is not immediately available.
214
215 Each probe alias provides a variety of variables. Looking at the tapset
216 source code is the most reliable way. Generally, each variable listed
217 in the standard manual page is made available as a script-level vari‐
218 able, so syscall.open exposes filename, flags, and mode. In addition,
219 a standard suite of variables is available at most aliases:
220
221 argstr A pretty-printed form of the entire argument list, without
222 parentheses.
223
224 name The name of the system call.
225
226 retstr For return probes, a pretty-printed form of the system-call re‐
227 sult.
228
229 As usual for probe aliases, these variables are all initialized once
230 from the underlying $context variables, so that later changes to $con‐
231 text variables are not automatically reflected. Not all probe aliases
232 obey all of these general guidelines. Please report any bothersome
233 ones you encounter as a bug. Note that on some kernel/userspace archi‐
234 tecture combinations (e.g., 32-bit userspace on 64-bit kernel), the un‐
235 derlying $context variables may need explicit sign extension / masking.
236 When this is an issue, consider using the tapset-provided variables in‐
237 stead of raw $context variables.
238
239 If debuginfo availability is a problem, you may try using the non-DWARF
240 syscall probe aliases instead. Use the nd_syscall. prefix instead of
241 syscall. The same context variables are available, as far as possible.
242
243
244 TIMERS
245 Intervals defined by the standard kernel "jiffies" timer may be used to
246 trigger probe handlers asynchronously. Two probe point variants are
247 supported by the translator:
248
249
250 timer.jiffies(N)
251 timer.jiffies(N).randomize(M)
252
253
254 The probe handler is run every N jiffies (a kernel-defined unit of
255 time, typically between 1 and 60 ms). If the "randomize" component is
256 given, a linearly distributed random value in the range [-M..+M] is
257 added to N every time the handler is run. N is restricted to a reason‐
258 able range (1 to around a million), and M is restricted to be smaller
259 than N. There are no target variables provided in either context. It
260 is possible for such probes to be run concurrently on a multi-processor
261 computer.
262
263 Alternatively, intervals may be specified in units of time. There are
264 two probe point variants similar to the jiffies timer:
265
266
267 timer.ms(N)
268 timer.ms(N).randomize(M)
269
270
271 Here, N and M are specified in milliseconds, but the full options for
272 units are seconds (s/sec), milliseconds (ms/msec), microseconds
273 (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
274 supported for hertz timers.
275
276 The actual resolution of the timers depends on the target kernel. For
277 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
278 intervals are rounded up to the nearest jiffies interval. After
279 2.6.17, the implementation uses hrtimers for tighter precision, though
280 the actual resolution will be arch-dependent. In either case, if the
281 "randomize" component is given, then the random value will be added to
282 the interval before any rounding occurs.
283
284 Profiling timers are also available to provide probes that execute on
285 all CPUs at the rate of the system tick (CONFIG_HZ). This probe takes
286 no parameters. On some kernels, this is a one-concurrent-user-only or
287 disabled facility, resulting in error -16 (EBUSY) during probe regis‐
288 tration.
289
290
291 timer.profile.tick
292
293
294 Full context information of the interrupted process is available, mak‐
295 ing this probe suitable for a time-based sampling profiler.
296
297 It is recommended to use the tapset probe timer.profile rather than
298 timer.profile.tick. This probe point behaves identically to timer.pro‐
299 file.tick when the underlying functionality is available, and falls
300 back to using perf.sw.cpu_clock on some recent kernels which lack the
301 corresponding profile timer facility.
302
303
304 DWARF
305 This family of probe points uses symbolic debugging information for the
306 target kernel/module/program, as may be found in unstripped executa‐
307 bles, or the separate debuginfo packages. They allow placement of
308 probes logically into the execution path of the target program, by
309 specifying a set of points in the source or object code. When a match‐
310 ing statement executes on any processor, the probe handler is run in
311 that context.
312
313 Probe points in the DWARF family can be identified by the target kernel
314 module (or user process), source file, line number, function name, or
315 some combination of these.
316
317 Here is a list of DWARF probe points currently supported:
318
319 kernel.function(PATTERN)
320 kernel.function(PATTERN).call
321 kernel.function(PATTERN).callee(PATTERN)
322 kernel.function(PATTERN).callee(PATTERN).return
323 kernel.function(PATTERN).callee(PATTERN).call
324 kernel.function(PATTERN).callees(DEPTH)
325 kernel.function(PATTERN).return
326 kernel.function(PATTERN).inline
327 kernel.function(PATTERN).label(LPATTERN)
328 module(MPATTERN).function(PATTERN)
329 module(MPATTERN).function(PATTERN).call
330 module(MPATTERN).function(PATTERN).callee(PATTERN)
331 module(MPATTERN).function(PATTERN).callee(PATTERN).return
332 module(MPATTERN).function(PATTERN).callee(PATTERN).call
333 module(MPATTERN).function(PATTERN).callees(DEPTH)
334 module(MPATTERN).function(PATTERN).return
335 module(MPATTERN).function(PATTERN).inline
336 module(MPATTERN).function(PATTERN).label(LPATTERN)
337 kernel.statement(PATTERN)
338 kernel.statement(PATTERN).nearest
339 kernel.statement(ADDRESS).absolute
340 module(MPATTERN).statement(PATTERN)
341 process("PATH").function("NAME")
342 process("PATH").statement("*@FILE.c:123")
343 process("PATH").library("PATH").function("NAME")
344 process("PATH").library("PATH").statement("*@FILE.c:123")
345 process("PATH").library("PATH").statement("*@FILE.c:123").nearest
346 process("PATH").function("*").return
347 process("PATH").function("myfun").label("foo")
348 process("PATH").function("foo").callee("bar")
349 process("PATH").function("foo").callee("bar").return
350 process("PATH").function("foo").callee("bar").call
351 process("PATH").function("foo").callees(DEPTH)
352 process(PID).function("NAME")
353 process(PID).function("myfun").label("foo")
354 process(PID).plt("NAME")
355 process(PID).plt("NAME").return
356 process(PID).statement("*@FILE.c:123")
357 process(PID).statement("*@FILE.c:123").nearest
358 process(PID).statement(ADDRESS).absolute
359
360 (See the USER-SPACE section below for more information on the process
361 probes.)
362
363 The list above includes multiple variants and modifiers which provide
364 additional functionality or filters. They are:
365
366 .function
367 Places a probe near the beginning of the named function,
368 so that parameters are available as context variables.
369
370 .return
371 Places a probe at the moment after the return from the
372 named function, so the return value is available as the
373 "$return" context variable.
374
375 .inline
376 Filters the results to include only instances of inlined
377 functions. Note that inlined functions do not have an
378 identifiable return point, so .return is not supported on
379 .inline probes.
380
381 .call Filters the results to include only non-inlined functions
382 (the opposite set of .inline)
383
384 .exported
385 Filters the results to include only exported functions.
386
387 .statement
388 Places a probe at the exact spot, exposing those local
389 variables that are visible there.
390
391 .statement.nearest
392 Places a probe at the nearest available line number for
393 each line number given in the statement.
394
395 .callee
396 Places a probe on the callee function given in the
397 .callee modifier, where the callee must be a function
398 called by the target function given in .function. The ad‐
399 vantage of doing this over directly probing the callee
400 function is that this probe point is run only when the
401 callee is called from the target function (add the
402 -DSTAP_CALLEE_MATCHALL directive to override this when
403 calling stap(1)).
404
405 Note that only callees that can be statically determined
406 are available. For example, calls through function
407 pointers are not available. Additionally, calls to func‐
408 tions located in other objects (e.g. libraries) are not
409 available (instead use another probe point). This feature
410 will only work for code compiled with GCC 4.7+.
411
412 .callees
413 Shortcut for .callee("*"), which places a probe on all
414 callees of the function.
415
416 .callees(DEPTH)
417 Recursively places probes on callees. For example,
418 .callees(2) will probe both callees of the target func‐
419 tion, as well as callees of those callees. And
420 .callees(3) goes one level deeper, etc... A callee probe
421 at depth N is only triggered when the N callers in the
422 callstack match those that were statically determined
423 during analysis (this also may be overridden using
424 -DSTAP_CALLEE_MATCHALL).
425
426 In the above list of probe points, MPATTERN stands for a string literal
427 that aims to identify the loaded kernel module of interest. For in-tree
428 kernel modules, the name suffices (e.g. "btrfs"). The name may also in‐
429 clude the "*", "[]", and "?" wildcards to match multiple in-tree mod‐
430 ules. Out-of-tree modules are also supported by specifying the full
431 path to the ko file. Wildcards are not supported. The file must follow
432 the convention of being named <module_name>.ko (characters ',' and '-'
433 are replaced by '_').
434
435 LPATTERN stands for a source program label. It may also contain "*",
436 "[]", and "?" wildcards. PATTERN stands for a string literal that aims
437 to identify a point in the program. It is made up of three parts:
438
439 · The first part is the name of a function, as would appear in the nm
440 program's output. This part may use the "*" and "?" wildcarding
441 operators to match multiple names.
442
443 · The second part is optional and begins with the "@" character. It
444 is followed by the path to the source file containing the function,
445 which may include a wildcard pattern, such as mm/slab*. If it does
446 not match as is, an implicit "*/" is optionally added before the
447 pattern, so that a script need only name the last few components of
448 a possibly long source directory path.
449
450 · Finally, the third part is optional if the file name part was giv‐
451 en, and identifies the line number in the source file preceded by a
452 ":" or a "+". The line number is assumed to be an absolute line
453 number if preceded by a ":", or relative to the declaration line of
454 the function if preceded by a "+". All the lines in the function
455 can be matched with ":*". A range of lines x through y can be
456 matched with ":x-y". Ranges and specific lines can be mixed using
457 commas, e.g. ":x,y-z".
458
459 As an alternative, PATTERN may be a numeric constant, indicating an ad‐
460 dress. Such an address may be found from symbol tables of the appro‐
461 priate kernel / module object file. It is verified against known
462 statement code boundaries, and will be relocated for use at run time.
463
464 In guru mode only, absolute kernel-space addresses may be specified
465 with the ".absolute" suffix. Such an address is considered already re‐
466 located, as if it came from /proc/kallsyms, so it cannot be checked
467 against statement/instruction boundaries.
468
469 CONTEXT VARIABLES
470 Many of the source-level context variables, such as function parame‐
471 ters, locals, globals visible in the compilation unit, may be visible
472 to probe handlers. They may refer to these variables by prefixing
473 their name with "$" within the scripts. In addition, a special syntax
474 allows limited traversal of structures, pointers, and arrays. More
475 syntax allows pretty-printing of individual variables or their groups.
476 See also @cast. Note that variables may be inaccessible due to them
477 being paged out, or for a few other reasons. See also man er‐
478 ror::fault(7stap).
479
480
481 $var refers to an in-scope variable "var". If it's an integer-like
482 type, it will be cast to a 64-bit int for systemtap script use.
483 String-like pointers (char *) may be copied to systemtap string
484 values using the kernel_string or user_string functions.
485
486 @var("varname")
487 an alternative syntax for $varname
488
489 @var("varname@src/file.c")
490 refers to the global (either file local or external) variable
491 varname defined when the file src/file.c was compiled. The CU in
492 which the variable is resolved is the first CU in the module of
493 the probe point which matches the given file name at the end and
494 has the shortest file name path (e.g. given
495 @var("foo@bar/baz.c") and CUs with file name paths src/sub/mod‐
496 ule/bar/baz.c and src/bar/baz.c the second CU will be chosen to
497 resolve the (file) global variable foo
498
499 $var->field traversal via a structure's or a pointer's field. This
500 generalized indirection operator may be repeated to follow more
501 levels. Note that the . operator is not used for plain struc‐
502 ture members, only -> for both purposes. (This is because "."
503 is reserved for string concatenation.)
504
505 $return
506 is available in return probes only for functions that are de‐
507 clared with a return value, which can be determined using @de‐
508 fined($return).
509
510 $var[N]
511 indexes into an array. The index given with a literal number or
512 even an arbitrary numeric expression.
513
514 A number of operators exist for such basic context variable expres‐
515 sions:
516
517 $$vars expands to a character string that is equivalent to
518
519 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
520 parm1, ..., parmN, var1, ..., varN)
521
522 for each variable in scope at the probe point. Some values may
523 be printed as =? if their run-time location cannot be found.
524
525 $$locals
526 expands to a subset of $$vars for only local variables.
527
528 $$parms
529 expands to a subset of $$vars for only function parameters.
530
531 $$return
532 is available in return probes only. It expands to a string that
533 is equivalent to sprintf("return=%x", $return) if the probed
534 function has a return value, or else an empty string.
535
536 & $EXPR
537 expands to the address of the given context variable expression,
538 if it is addressable.
539
540 @defined($EXPR)
541 expands to 1 or 0 iff the given context variable expression is
542 resolvable, for use in conditionals such as
543
544 @defined($foo->bar) ? $foo->bar : 0
545
546
547 $EXPR$ expands to a string with all of $EXPR's members, equivalent to
548
549 sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
550 $EXPR->a, $EXPR->b)
551
552
553 $EXPR$$
554 expands to a string with all of $var's members and submembers,
555 equivalent to
556
557 sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
558 $EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
559
560
561
562 MORE ON RETURN PROBES
563 For the kernel ".return" probes, only a certain fixed number of returns
564 may be outstanding. The default is a relatively small number, on the
565 order of a few times the number of physical CPUs. If many different
566 threads concurrently call the same blocking function, such as futex(2)
567 or read(2), this limit could be exceeded, and skipped "kretprobes"
568 would be reported by "stap -t". To work around this, specify a
569
570 probe FOO.return.maxactive(NNN)
571
572 suffix, with a large enough NNN to cover all expected concurrently
573 blocked threads. Alternately, use the
574
575 stap -DKRETACTIVE=NNNN
576
577 stap command line macro setting to override the default for all ".re‐
578 turn" probes.
579
580
581 For ".return" probes, context variables other than the "$return" may be
582 accessible, as a convenience for a script programmer wishing to access
583 function parameters. These values are snapshots taken at the time of
584 function entry. Local variables within the function are not generally
585 accessible, since those variables did not exist in allocated/initial‐
586 ized form at the snapshot moment.
587
588 In addition, arbitrary entry-time expressions can also be saved for
589 ".return" probes using the @entry(expr) operator. For example, one can
590 compute the elapsed time of a function:
591
592 probe kernel.function("do_filp_open").return {
593 println( get_timeofday_us() - @entry(get_timeofday_us()) )
594 }
595
596
597
598 The following table summarizes how values related to a function parame‐
599 ter context variable, a pointer named addr, may be accessed from a .re‐
600 turn probe.
601
602
603 at-entry value past-exit value
604
605 $addr not available
606 $addr->x->y @cast(@entry($addr),"struct zz")->x->y
607 $addr[0] {kernel,user}_{char,int,...}(& $addr[0])
608
609
610
611 DWARFLESS
612 In absence of debugging information, entry & exit points of kernel &
613 module functions can be probed using the "kprobe" family of probes.
614 However, these do not permit looking up the arguments / local variables
615 of the function. Following constructs are supported :
616
617 kprobe.function(FUNCTION)
618 kprobe.function(FUNCTION).call
619 kprobe.function(FUNCTION).return
620 kprobe.module(NAME).function(FUNCTION)
621 kprobe.module(NAME).function(FUNCTION).call
622 kprobe.module(NAME).function(FUNCTION).return
623 kprobe.statement(ADDRESS).absolute
624
625
626 Probes of type function are recommended for kernel functions, whereas
627 probes of type module are recommended for probing functions of the
628 specified module. In case the absolute address of a kernel or module
629 function is known, statement probes can be utilized.
630
631 Note that FUNCTION and MODULE names must not contain wildcards, or the
632 probe will not be registered. Also, statement probes must be run under
633 guru-mode only.
634
635
636
637 USER-SPACE
638 Support for user-space probing is available for kernels that are con‐
639 figured with the utrace extensions, or have the uprobes facility in
640 linux 3.5. (Various kernel build configuration options need to be en‐
641 abled; systemtap will advise if these are missing.)
642
643
644 There are several forms. First, a non-symbolic probe point:
645
646 process(PID).statement(ADDRESS).absolute
647
648 is analogous to kernel.statement(ADDRESS).absolute in that both use raw
649 (unverified) virtual addresses and provide no $variables. The target
650 PID parameter must identify a running process, and ADDRESS should iden‐
651 tify a valid instruction address. All threads of that process will be
652 probed.
653
654 Second, non-symbolic user-kernel interface events handled by utrace may
655 be probed:
656
657 process(PID).begin
658 process("FULLPATH").begin
659 process.begin
660 process(PID).thread.begin
661 process("FULLPATH").thread.begin
662 process.thread.begin
663 process(PID).end
664 process("FULLPATH").end
665 process.end
666 process(PID).thread.end
667 process("FULLPATH").thread.end
668 process.thread.end
669 process(PID).syscall
670 process("FULLPATH").syscall
671 process.syscall
672 process(PID).syscall.return
673 process("FULLPATH").syscall.return
674 process.syscall.return
675 process(PID).insn
676 process("FULLPATH").insn
677 process(PID).insn.block
678 process("FULLPATH").insn.block
679
680
681 A .begin probe gets called when new process described by PID or FULL‐
682 PATH gets created. A .thread.begin probe gets called when a new thread
683 described by PID or FULLPATH gets created. A .end probe gets called
684 when process described by PID or FULLPATH dies. A .thread.end probe
685 gets called when a thread described by PID or FULLPATH dies. A
686 .syscall probe gets called when a thread described by PID or FULLPATH
687 makes a system call. The system call number is available in the
688 $syscall context variable, and the first 6 arguments of the system call
689 are available in the $argN (ex. $arg1, $arg2, ...) context variable. A
690 .syscall.return probe gets called when a thread described by PID or
691 FULLPATH returns from a system call. The system call number is avail‐
692 able in the $syscall context variable, and the return value of the sys‐
693 tem call is available in the $return context variable. A .insn probe
694 gets called for every single-stepped instruction of the process de‐
695 scribed by PID or FULLPATH. A .insn.block probe gets called for every
696 block-stepped instruction of the process described by PID or FULLPATH.
697
698 If a process probe is specified without a PID or FULLPATH, all user
699 threads will be probed. However, if systemtap was invoked with the -c
700 or -x options, then process probes are restricted to the process hier‐
701 archy associated with the target process. If a process probe is un‐
702 specified (i.e. without a PID or FULLPATH), but with the -c option, the
703 PATH of the -c cmd will be heuristically filled into the process PATH.
704 In that case, only command parameters are allowed in the -c command
705 (i.e. no command substitution allowed and no occurrences of any of
706 these characters: '|&;<>(){}').
707
708
709 Third, symbolic static instrumentation compiled into programs and
710 shared libraries may be probed:
711
712 process("PATH").mark("LABEL")
713 process("PATH").provider("PROVIDER").mark("LABEL")
714 process(PID).mark("LABEL")
715 process(PID).provider("PROVIDER").mark("LABEL")
716
717
718 A .mark probe gets called via a static probe which is defined in the
719 application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros de‐
720 fined in sys/sdt.h. The PROVIDER is an arbitrary application identifi‐
721 er, LABEL is the marker site identifier, and arg1 is the integer-typed
722 argument. STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2
723 is used for probes with 2 arguments, and so on. The arguments of the
724 probe are available in the context variables $arg1, $arg2, ... An al‐
725 ternative to using the STAP_PROBE macros is to use the dtrace script to
726 create custom macros. Additionally, the variables $$name and
727 $$provider are available as parts of the probe point name. The
728 sys/sdt.h macro names DTRACE_PROBE* are available as aliases for
729 STAP_PROBE*.
730
731
732 Finally, full symbolic source-level probes in user-space programs and
733 shared libraries are supported. These are exactly analogous to the
734 symbolic DWARF-based kernel/module probes described above. They expose
735 the same sorts of context $variables for function parameters, local
736 variables, and so on.
737
738 process("PATH").function("NAME")
739 process("PATH").statement("*@FILE.c:123")
740 process("PATH").plt("NAME")
741 process("PATH").library("PATH").plt("NAME")
742 process("PATH").library("PATH").function("NAME")
743 process("PATH").library("PATH").statement("*@FILE.c:123")
744 process("PATH").function("*").return
745 process("PATH").function("myfun").label("foo")
746 process("PATH").function("foo").callee("bar")
747 process("PATH").plt("NAME").return
748 process(PID).function("NAME")
749 process(PID).statement("*@FILE.c:123")
750 process(PID).plt("NAME")
751
752
753
754 Note that for all process probes, PATH names refer to executables that
755 are searched the same way shells do: relative to the working directory
756 if they contain a "/" character, otherwise in $PATH. If PATH names re‐
757 fer to scripts, the actual interpreters (specified in the script in the
758 first line after the #! characters) are probed.
759
760
761 If PATH is a process component parameter referring to shared libraries
762 then all processes that map it at runtime would be selected for prob‐
763 ing. If PATH is a library component parameter referring to shared li‐
764 braries then the process specified by the process component would be
765 selected. Note that the PATH pattern in a library component will al‐
766 ways apply to libraries statically determined to be in use by the
767 process. However, you may also specify the full path to any library
768 file even if not statically needed by the process.
769
770
771 A .plt probe will probe functions in the program linkage table corre‐
772 sponding to the rest of the probe point. .plt can be specified as a
773 shorthand for .plt("*"). The symbol name is available as a $$name con‐
774 text variable; function arguments are not available, since PLTs are
775 processed without debuginfo. A .plt.return probe places a probe at the
776 moment after the return from the named function.
777
778
779 If the PATH string contains wildcards as in the MPATTERN case, then
780 standard globbing is performed to find all matching paths. In this
781 case, the $PATH environment variable is not used.
782
783
784 If systemtap was invoked with the -c or -x options, then process probes
785 are restricted to the process hierarchy associated with the target
786 process.
787
788
789 JAVA
790 Support for probing Java methods is available using Byteman as a back‐
791 end. Byteman is an instrumentation tool from the JBoss project which
792 systemtap can use to monitor invocations for a specific method or line
793 in a Java program.
794
795 Systemtap does so by generating a Byteman script listing the probes to
796 instrument and then invoking the Byteman bminstall utility.
797
798 This Java instrumentation support is currently a prototype feature with
799 major limitations. Moreover, Java probing currently does not work
800 across users; the stap script must run (with appropriate permissions)
801 under the same user that the Java process being probed. (Thus a stap
802 script under root currently cannot probe Java methods in a non-root-us‐
803 er Java process.)
804
805
806 The first probe type refers to Java processes by the name of the Java
807 process:
808
809 java("PNAME").class("CLASSNAME").method("PATTERN")
810 java("PNAME").class("CLASSNAME").method("PATTERN").return
811
812 The PNAME argument must be a pre-existing jvm pid, and be identifiable
813 via a jps listing.
814
815 The PATTERN parameter specifies the signature of the Java method to
816 probe. The signature must consist of the exact name of the method, fol‐
817 lowed by a bracketed list of the types of the arguments, for instance
818 "myMethod(int,double,Foo)". Wildcards are not supported.
819
820 The probe can be set to trigger at a specific line within the method by
821 appending a line number with colon, just as in other types of probes:
822 "myMethod(int,double,Foo):245".
823
824 The CLASSNAME parameter identifies the Java class the method belongs
825 to, either with or without the package qualification. By default, the
826 probe only triggers on descendants of the class that do not override
827 the method definition of the original class. However, CLASSNAME can
828 take an optional caret prefix, as in ^org.my.MyClass, which specifies
829 that the probe should also trigger on all descendants of MyClass that
830 override the original method. For instance, every method with signature
831 foo(int) in program org.my.MyApp can be probed at once using
832
833 java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
834
835
836 The second probe type works analogously, but refers to Java processes
837 by PID:
838
839 java(PID).class("CLASSNAME").method("PATTERN")
840 java(PID).class("CLASSNAME").method("PATTERN").return
841
842 (PIDs for an already running process can be obtained using the jps(1)
843 utility.)
844
845 Context variables defined within java probes include $arg1 through
846 $arg10 (for up to the first 10 arguments of a method), represented as
847 integers or strings.
848
849
850 PROCFS
851 These probe points allow procfs "files" in /proc/systemtap/MODNAME to
852 be created, read and written using a permission that may be modified
853 using the proper umask value. Default permissions are 0400 for read
854 probes, and 0200 for write probes. If both a read and write probe are
855 being used on the same file, a default permission of 0600 will be used.
856 Using procfs.umask(0040).read would result in a 0404 permission set for
857 the file. (MODNAME is the name of the systemtap module). The proc
858 filesystem is a pseudo-filesystem which is used as an interface to ker‐
859 nel data structures. There are several probe point variants supported
860 by the translator:
861
862
863 procfs("PATH").read
864 procfs("PATH").umask(UMASK).read
865 procfs("PATH").read.maxsize(MAXSIZE)
866 procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
867 procfs("PATH").write
868 procfs("PATH").umask(UMASK).write
869 procfs.read
870 procfs.umask(UMASK).read
871 procfs.read.maxsize(MAXSIZE)
872 procfs.umask(UMASK).read.maxsize(MAXSIZE)
873 procfs.write
874 procfs.umask(UMASK).write
875
876
877 PATH is the file name (relative to /proc/systemtap/MODNAME) to be cre‐
878 ated. If no PATH is specified (as in the last two variants above),
879 PATH defaults to "command".
880
881 When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
882 procfs read probe is triggered. The string data to be read should be
883 assigned to a variable named $value, like this:
884
885
886 procfs("PATH").read { $value = "100\n" }
887
888
889 When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding
890 procfs write probe is triggered. The data the user wrote is available
891 in the string variable named $value, like this:
892
893
894 procfs("PATH").write { printf("user wrote: %s", $value) }
895
896
897 MAXSIZE is the size of the procfs read buffer. Specifying MAXSIZE al‐
898 lows larger procfs output. If no MAXSIZE is specified, the procfs read
899 buffer defaults to STP_PROCFS_BUFSIZE (which defaults to MAXSTRINGLEN,
900 the maximum length of a string). If setting the procfs read buffers
901 for more than one file is needed, it may be easiest to override the
902 STP_PROCFS_BUFSIZE definition. Here's an example of using MAXSIZE:
903
904
905 procfs.read.maxsize(1024) {
906 $value = "long string..."
907 $value .= "another long string..."
908 $value .= "another long string..."
909 $value .= "another long string..."
910 }
911
912
913
914 NETFILTER HOOKS
915 These probe points allow observation of network packets using the net‐
916 filter mechanism. A netfilter probe in systemtap corresponds to a net‐
917 filter hook function in the original netfilter probes API. It is proba‐
918 bly more convenient to use tapset::netfilter(3stap), which wraps the
919 primitive netfilter hooks and does the work of extracting useful infor‐
920 mation from the context variables.
921
922
923 There are several probe point variants supported by the translator:
924
925
926 netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
927 netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
928 netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
929 netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
930
931
932
933 PROTOCOL_F is the protocol family to listen for, currently one of NF‐
934 PROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
935
936
937 HOOKNAME is the point, or 'hook', in the protocol stack at which to in‐
938 tercept the packet. The available hook names for each protocol family
939 are taken from the kernel header files <linux/netfilter_ipv4.h>, <lin‐
940 ux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and <linux/netfil‐
941 ter_bridge.h>. For instance, allowable hook names for NFPROTO_IPV4 are
942 NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_LO‐
943 CAL_OUT, and NF_INET_POST_ROUTING.
944
945
946 PRIORITY is an integer priority giving the order in which the probe
947 point should be triggered relative to any other netfilter hook func‐
948 tions which trigger on the same packet. Hook functions execute on each
949 packet in order from smallest priority number to largest priority num‐
950 ber. If no PRIORITY is specified (as in the first two probe point vari‐
951 ants above), PRIORITY defaults to "0".
952
953 There are a number of predefined priority names of the form NF_IP_PRI_*
954 and NF_IP6_PRI_* which are defined in the kernel header files <lin‐
955 ux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The
956 script is permitted to use these instead of specifying an integer pri‐
957 ority. (The probe points for NFPROTO_ARP and NFPROTO_BRIDGE currently
958 do not expose any named hook priorities to the script writer.) Thus,
959 allowable ways to specify the priority include:
960
961
962 priority("255")
963 priority("NF_IP_PRI_SELINUX_LAST")
964
965
966 A script using guru mode is permitted to specify any identifier or num‐
967 ber as the parameter for hook, pf, and priority. This feature should be
968 used with caution, as the parameter is inserted verbatim into the C
969 code generated by systemtap.
970
971 The netfilter probe points define the following context variables:
972
973 $hooknum
974 The hook number.
975
976 $skb The address of the sk_buff struct representing the packet. See
977 <linux/skbuff.h> for details on how to use this struct, or al‐
978 ternatively use the tapset tapset::netfilter(3stap) for easy ac‐
979 cess to key information.
980
981
982 $in The address of the net_device struct representing the network
983 device on which the packet was received (if any). May be 0 if
984 the device is unknown or undefined at that stage in the protocol
985 stack.
986
987
988 $out The address of the net_device struct representing the network
989 device on which the packet will be sent (if any). May be 0 if
990 the device is unknown or undefined at that stage in the protocol
991 stack.
992
993
994 $verdict
995 (Guru mode only.) Assigning one of the verdict values defined in
996 <linux/netfilter.h> to this variable alters the further progress
997 of the packet through the protocol stack. For instance, the fol‐
998 lowing guru mode script forces all ipv6 network packets to be
999 dropped:
1000
1001
1002 probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
1003 $verdict = 0 /* nf_drop */
1004 }
1005
1006
1007 For convenience, unlike the primitive probe points discussed
1008 here, the probes defined in tapset::netfilter(3stap) export the
1009 lowercase names of the verdict constants (e.g. NF_DROP becomes
1010 nf_drop) as local variables.
1011
1012
1013 KERNEL TRACEPOINTS
1014 This family of probe points hooks up to static probing tracepoints in‐
1015 serted into the kernel or modules. As with markers, these tracepoints
1016 are special macro calls inserted by kernel developers to make probing
1017 faster and more reliable than with DWARF-based probes, and DWARF debug‐
1018 ging information is not required to probe tracepoints. Tracepoints
1019 have an extra advantage of more strongly-typed parameters than markers.
1020
1021 Tracepoint probes look like: kernel.trace("name"). The tracepoint name
1022 string, which may contain the usual wildcard characters, is matched
1023 against the names defined by the kernel developers in the tracepoint
1024 header files. To restrict the search to specific subsystems (e.g.
1025 sched, ext3, etc...), the following syntax can be used: ker‐
1026 nel.trace("system:name"). The tracepoint system string may also con‐
1027 tain the usual wildcard characters.
1028
1029 The handler associated with a tracepoint-based probe may read the op‐
1030 tional parameters specified at the macro call site. These are named
1031 according to the declaration by the tracepoint author. For example,
1032 the tracepoint probe kernel.trace("sched:sched_switch") provides the
1033 parameters $prev and $next. If the parameter is a complex type, as in
1034 a struct pointer, then a script can access fields with the same syntax
1035 as DWARF $target variables. Also, tracepoint parameters cannot be mod‐
1036 ified, but in guru-mode a script may modify fields of parameters.
1037
1038 The subsystem and name of the tracepoint are available in $$system and
1039 $$name and a string of name=value pairs for all parameters of the tra‐
1040 cepoint is available in $$vars or $$parms.
1041
1042
1043 KERNEL MARKERS (OBSOLETE)
1044 This family of probe points hooks up to an older style of static prob‐
1045 ing markers inserted into older kernels or modules. These markers are
1046 special STAP_MARK macro calls inserted by kernel developers to make
1047 probing faster and more reliable than with DWARF-based probes. Fur‐
1048 ther, DWARF debugging information is not required to probe markers.
1049
1050 Marker probe points begin with kernel. The next part names the marker
1051 itself: mark("name"). The marker name string, which may contain the
1052 usual wildcard characters, is matched against the names given to the
1053 marker macros when the kernel and/or module was compiled. Optional‐
1054 ly, you can specify format("format"). Specifying the marker format
1055 string allows differentiation between two markers with the same name
1056 but different marker format strings.
1057
1058 The handler associated with a marker-based probe may read the optional
1059 parameters specified at the macro call site. These are named $arg1
1060 through $argNN, where NN is the number of parameters supplied by the
1061 macro. Number and string parameters are passed in a type-safe manner.
1062
1063 The marker format string associated with a marker is available in $for‐
1064 mat. And also the marker name string is available in $name.
1065
1066
1067 HARDWARE BREAKPOINTS
1068 This family of probes is used to set hardware watchpoints for a given
1069 (global) kernel symbol. The probes take three components as inputs :
1070
1071 1. The virtualaddress/name of the kernel symbol to be traced is sup‐
1072 plied as argument to this class of probes. ( Probes for only data seg‐
1073 ment variables are supported. Probing local variables of a function
1074 cannot be done.)
1075
1076 2. Nature of access to be probed : a. .write probe gets triggered when
1077 a write happens at the specified address/symbol name. b. rw probe is
1078 triggered when either a read or write happens.
1079
1080 3. .length (optional) Users have the option of specifying the address
1081 interval to be probed using "length" constructs. The user-specified
1082 length gets approximated to the closest possible address length that
1083 the architecture can support. If the specified length exceeds the lim‐
1084 its imposed by architecture, an error message is flagged and probe reg‐
1085 istration fails. Wherever 'length' is not specified, the translator
1086 requests a hardware breakpoint probe of length 1. It should be noted
1087 that the "length" construct is not valid with symbol names.
1088
1089 Following constructs are supported :
1090
1091 probe kernel.data(ADDRESS).write
1092 probe kernel.data(ADDRESS).rw
1093 probe kernel.data(ADDRESS).length(LEN).write
1094 probe kernel.data(ADDRESS).length(LEN).rw
1095 probe kernel.data("SYMBOL_NAME").write
1096 probe kernel.data("SYMBOL_NAME").rw
1097
1098
1099 This set of probes make use of the debug registers of the processor,
1100 which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1101 translation flags a warning if a user requests more hardware breakpoint
1102 probes than the limits set by architecture. For example,a pass-2 warn‐
1103 ing is flashed when an input script requests 5 hardware breakpoint
1104 probes on an x86 system while x86 architecture supports a maximum of 4
1105 breakpoints. Users are cautioned to set probes judiciously.
1106
1107
1108 PERF
1109 This family of probe points interfaces to the kernel "perf event" in‐
1110 frastructure for controlling hardware performance counters. The events
1111 being attached to are described by the "type", "config" fields of the
1112 perf_event_attr structure, and are sampled at an interval governed by
1113 the "sample_period" field.
1114
1115 These fields are made available to systemtap scripts using the follow‐
1116 ing syntax:
1117
1118 probe perf.type(NN).config(MM).sample(XX)
1119 probe perf.type(NN).config(MM)
1120 probe perf.type(NN).config(MM).process("PROC")
1121 probe perf.type(NN).config(MM).counter("COUNTER")
1122 probe perf.type(NN).config(MM).process("PROC").counter("COUNTER")
1123
1124 The systemtap probe handler is called once per XX increments of the un‐
1125 derlying performance counter. The default sampling count is 1000000.
1126 The range of valid type/config is described by the perf_event_open(2)
1127 system call, and/or the linux/perf_event.h file. Invalid combinations
1128 or exhausted hardware counter resources result in errors during system‐
1129 tap script startup. Systemtap does not sanity-check the values: it
1130 merely passes them through to the kernel for error- and safety-check‐
1131 ing. By default the perf event probe is systemwide unless .process is
1132 specified, which will bind the probe to a specific task. If the name
1133 is omitted then it is inferred from the stap -c argument. A perf
1134 event can be read on demand using .counter. The body of the perf probe
1135 handler will not be invoked for a .counter probe; instead, the counter
1136 is read in a user space probe via:
1137
1138 process("PROCESS").statement("func@file") {stat <<< @perf("NAME")}
1139
1140
1141
1143 Here are some example probe points, defining the associated events.
1144
1145 begin, end, end
1146 refers to the startup and normal shutdown of the session. In
1147 this case, the handler would run once during startup and twice
1148 during shutdown.
1149
1150 timer.jiffies(1000).randomize(200)
1151 refers to a periodic interrupt, every 1000 +/- 200 jiffies.
1152
1153 kernel.function("*init*"), kernel.function("*exit*")
1154 refers to all kernel functions with "init" or "exit" in the
1155 name.
1156
1157 kernel.function("*@kernel/time.c:240")
1158 refers to any functions within the "kernel/time.c" file that
1159 span line 240. Note that this is not a probe at the statement
1160 at that line number. Use the kernel.statement probe instead.
1161
1162 kernel.trace("sched_*")
1163 refers to all scheduler-related (really, prefixed) tracepoints
1164 in the kernel.
1165
1166 kernel.mark("getuid")
1167 refers to an obsolete STAP_MARK(getuid, ...) macro call in the
1168 kernel.
1169
1170 module("usb*").function("*sync*").return
1171 refers to the moment of return from all functions with "sync" in
1172 the name in any of the USB drivers.
1173
1174 kernel.statement(0xc0044852)
1175 refers to the first byte of the statement whose compiled in‐
1176 structions include the given address in the kernel.
1177
1178 kernel.statement("*@kernel/time.c:296")
1179 refers to the statement of line 296 within "kernel/time.c".
1180
1181 kernel.statement("bio_init@fs/bio.c+3")
1182 refers to the statement at line bio_init+3 within "fs/bio.c".
1183
1184 kernel.data("pid_max").write
1185 refers to a hardware breakpoint of type "write" set on pid_max
1186
1187 syscall.*.return
1188 refers to the group of probe aliases with any name in the third
1189 position
1190
1191
1193 stap(1),
1194 probe::*[24m(3stap),
1195 tapset::*[24m(3stap)
1196
1197
1198
1199
1200 STAPPROBES(3stap)