1STAPPROBES(3stap) STAPPROBES(3stap)
2
3
4
6 stapprobes - systemtap probe points
7
8
9
11 The following sections enumerate the variety of probe points supported
12 by the systemtap translator, and some of the additional aliases defined
13 by standard tapset scripts. Many are individually documented in the
14 3stap manual section, with the probe:: prefix.
15
16
18 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
19
20
21 A probe declaration may list multiple comma-separated probe points in
22 order to attach a handler to all of the named events. Normally, the
23 handler statements are run whenever any of events occur. Depending on
24 the type of probe point, the handler statements may refer to context
25 variables (denoted with a dollar-sign prefix like $foo) to read or
26 write state. This may include function parameters for function probes,
27 or local variables for statement probes.
28
29 The syntax of a single probe point is a general dotted-symbol sequence.
30 This allows a breakdown of the event namespace into parts, somewhat
31 like the Domain Name System does on the Internet. Each component iden‐
32 tifier may be parametrized by a string or number literal, with a syntax
33 like a function call. A component may include a "*" character, to ex‐
34 pand to a set of matching probe points. It may also include "**" to
35 match multiple sequential components at once. Probe aliases likewise
36 expand to other probe points.
37
38 Probe aliases can be given on their own, or with a suffix. The suffix
39 attaches to the underlying probe point that the alias is expanded to.
40 For example,
41
42 syscall.read.return.maxactive(10)
43
44 expands to
45
46 kernel.function("sys_read").return.maxactive(10)
47
48 with the component maxactive(10) being recognized as a suffix.
49
50 Normally, each and every probe point resulting from wildcard- and
51 alias-expansion must be resolved to some low-level system instrumenta‐
52 tion facility (e.g., a kprobe address, marker, or a timer configura‐
53 tion), otherwise the elaboration phase will fail.
54
55 However, a probe point may be followed by a "?" character, to indicate
56 that it is optional, and that no error should result if it fails to re‐
57 solve. Optionalness passes down through all levels of alias/wildcard
58 expansion. Alternately, a probe point may be followed by a "!" charac‐
59 ter, to indicate that it is both optional and sufficient. (Think
60 vaguely of the Prolog cut operator.) If it does resolve, then no fur‐
61 ther probe points in the same comma-separated list will be resolved.
62 Therefore, the "!" sufficiency mark only makes sense in a list of
63 probe point alternatives.
64
65 Additionally, a probe point may be followed by a "if (expr)" statement,
66 in order to enable/disable the probe point on-the-fly. With the "if"
67 statement, if the "expr" is false when the probe point is hit, the
68 whole probe body including alias's body is skipped. The condition is
69 stacked up through all levels of alias/wildcard expansion. So the final
70 condition becomes the logical-and of conditions of all expanded
71 alias/wildcard. The expressions are necessarily restricted to global
72 variables.
73
74 These are all syntactically valid probe points. (They are generally
75 semantically invalid, depending on the contents of the tapsets, and the
76 versions of kernel/user software installed.)
77
78
79 kernel.function("foo").return
80 process("/bin/vi").statement(0x2222)
81 end
82 syscall.*
83 syscall.*.return.maxactive(10)
84 syscall.{open,close}
85 sys**open
86 kernel.function("no_such_function") ?
87 module("awol").function("no_such_function") !
88 signal.*? if (switch)
89 kprobe.function("foo")
90
91
92 Probes may be broadly classified into "synchronous" and "asynchronous".
93 A "synchronous" event is deemed to occur when any processor executes an
94 instruction matched by the specification. This gives these probes a
95 reference point (instruction address) from which more contextual data
96 may be available. Other families of probe points refer to "asynchro‐
97 nous" events such as timers/counters rolling over, where there is no
98 fixed reference point that is related. Each probe point specification
99 may match multiple locations (for example, using wildcards or aliases),
100 and all them are then probed. A probe declaration may also contain
101 several comma-separated specifications, all of which are probed.
102
103 Brace expansion is a mechanism which allows a list of probe points to
104 be generated. It is very similar to shell expansion. A component may be
105 surrounded by a pair of curly braces to indicate that the comma-sepa‐
106 rated sequence of one or more subcomponents will each constitute a new
107 probe point. The braces may be arbitrarily nested. The ordering of ex‐
108 panded results is based on product order.
109
110 The question mark (?), exclamation mark (!) indicators and probe point
111 conditions may not be placed in any expansions that are before the last
112 component.
113
114 The following is an example of brace expansion.
115
116
117 syscall.{write,read}
118 # Expands to
119 syscall.write, syscall.read
120
121 {kernel,module("nfs")}.function("nfs*")!
122 # Expands to
123 kernel.function("nfs*")!, module("nfs").function("nfs*")!
124
125
126
128 Resolving some probe points requires DWARF debuginfo or "debug symbols"
129 for the specific program being instrumented. For some others, DWARF is
130 automatically synthesized on the fly from source code header files.
131 For others, it is not needed at all. Since a systemtap script may use
132 any mixture of probe points together, the union of their DWARF require‐
133 ments has to be met on the computer where script compilation occurs.
134 (See the --use-server option and the stap-server(8) man page for infor‐
135 mation about the remote compilation facility, which allows these re‐
136 quirements to be met on a different machine.)
137
138 The following point lists many of the available probe point families,
139 to classify them with respect to their need for DWARF debuginfo for the
140 specific program for that probe point.
141
142
143 DWARF NON-DWARF SYMBOL-TABLE
144
145 kernel.function, .statement kernel.mark kernel.function*
146 module.function, .statement process.mark, process.plt module.function*
147 process.function, .statement begin, end, error, never process.function*
148 process.mark* timer
149 .function.callee perf
150 python2, python3 procfs
151 kernel.statement.absolute
152 AUTO-GENERATED-DWARF kernel.data
153 kprobe.function
154 kernel.trace process.statement.absolute
155 process.begin, .end
156 netfilter
157 java
158
159
160 The probe types marked with * asterisks mark fallbacks, where systemtap
161 can sometimes infer subset or substitute information. In general, the
162 more symbolic / debugging information available, the higher quality
163 probing will be available.
164
165
166
168 The following types of probe points may be armed/disarmed on-the-fly to
169 save overheads during uninteresting times. Arming conditions may also
170 be added to other types of probes, but will be treated as a wrapping
171 conditional and won't benefit from overhead savings.
172
173
174 DISARMABLE exceptions
175 kernel.function, kernel.statement
176 module.function, module.statement
177 process.*.function, process.*.statement
178 process.*.plt, process.*.mark
179 timer. timer.profile
180 java
181
182
184 BEGIN/END/ERROR
185 The probe points begin and end are defined by the translator to refer
186 to the time of session startup and shutdown. All "begin" probe han‐
187 dlers are run, in some sequence, during the startup of the session.
188 All global variables will have been initialized prior to this point.
189 All "end" probes are run, in some sequence, during the normal shutdown
190 of a session, such as in the aftermath of an exit () function call, or
191 an interruption from the user. In the case of an error-triggered shut‐
192 down, "end" probes are not run. There are no target variables avail‐
193 able in either context.
194
195 If the order of execution among "begin" or "end" probes is significant,
196 then an optional sequence number may be provided:
197
198
199 begin(N)
200 end(N)
201
202
203 The number N may be positive or negative. The probe handlers are run
204 in increasing order, and the order between handlers with the same se‐
205 quence number is unspecified. When "begin" or "end" are given without
206 a sequence, they are effectively sequence zero.
207
208 The error probe point is similar to the end probe, except that each
209 such probe handler run when the session ends after errors have oc‐
210 curred. In such cases, "end" probes are skipped, but each "error"
211 probe is still attempted. This kind of probe can be used to clean up
212 or emit a "final gasp". It may also be numerically parametrized to set
213 a sequence.
214
215
216 NEVER
217 The probe point never is specially defined by the translator to mean
218 "never". Its probe handler is never run, though its statements are an‐
219 alyzed for symbol / type correctness as usual. This probe point may be
220 useful in conjunction with optional probes.
221
222
223 SYSCALL and ND_SYSCALL
224 The syscall.* and nd_syscall.* aliases define several hundred probes,
225 too many to detail here. They are of the general form:
226
227
228 syscall.NAME
229 nd_syscall.NAME
230 syscall.NAME.return
231 nd_syscall.NAME.return
232
233
234 Generally, a pair of probes are defined for each normal system call as
235 listed in the syscalls(2) manual page, one for entry and one for re‐
236 turn. Those system calls that never return do not have a corresponding
237 .return probe. The nd_* family of probes are about the same, except it
238 uses non-DWARF based searching mechanisms, which may result in a lower
239 quality of symbolic context data (parameters), and may miss some system
240 calls. You may want to try them first, in case kernel debugging infor‐
241 mation is not immediately available.
242
243 Each probe alias provides a variety of variables. Looking at the tapset
244 source code is the most reliable way. Generally, each variable listed
245 in the standard manual page is made available as a script-level vari‐
246 able, so syscall.open exposes filename, flags, and mode. In addition,
247 a standard suite of variables is available at most aliases:
248
249 argstr A pretty-printed form of the entire argument list, without
250 parentheses.
251
252 name The name of the system call.
253
254 retval For return probes, the raw numeric system-call result.
255
256 retstr For return probes, a pretty-printed string form of the system-
257 call result.
258
259 As usual for probe aliases, these variables are all initialized once
260 from the underlying $context variables, so that later changes to $con‐
261 text variables are not automatically reflected. Not all probe aliases
262 obey all of these general guidelines. Please report any bothersome
263 ones you encounter as a bug. Note that on some kernel/userspace archi‐
264 tecture combinations (e.g., 32-bit userspace on 64-bit kernel), the un‐
265 derlying $context variables may need explicit sign extension / masking.
266 When this is an issue, consider using the tapset-provided variables in‐
267 stead of raw $context variables.
268
269 If debuginfo availability is a problem, you may try using the non-DWARF
270 syscall probe aliases instead. Use the nd_syscall. prefix instead of
271 syscall. The same context variables are available, as far as possible.
272
273 nd_syscall probes on kernels that use syscall wrappers to pass argu‐
274 ments via pt_regs (currently 4.17+ on x86_64 and 4.19+ on aarch64) sup‐
275 port syscall argument writing when guru mode is enabled. If a probe
276 syscall parameter is modified in the probe body then immediately before
277 the probe exits the parameter's current value will be written to
278 pt_regs. This overwrites the previous value. nd_syscall probes also
279 include two parameters for each of the syscall's string parameters.
280 One holds a quoted version of the string passed to the syscall. The
281 other holds an unquoted version of the string intended to be used when
282 modifying the parameter. If the probe modifies the unquoted string
283 variable then as the probe is about to exit the contents of this vari‐
284 able will be written to the user space buffer passed to the syscall. It
285 is the user's responsibility to ensure that this buffer is large enough
286 to hold the modified string and that it is located in a writable memory
287 segment.
288
289
290 TIMERS
291 There are two main types of timer probes: "jiffies" timer probes and
292 time interval timer probes.
293
294 Intervals defined by the standard kernel "jiffies" timer may be used to
295 trigger probe handlers asynchronously. Two probe point variants are
296 supported by the translator:
297
298
299 timer.jiffies(N)
300 timer.jiffies(N).randomize(M)
301
302
303 The probe handler is run every N jiffies (a kernel-defined unit of
304 time, typically between 1 and 60 ms). If the "randomize" component is
305 given, a linearly distributed random value in the range [-M..+M] is
306 added to N every time the handler is run. N is restricted to a reason‐
307 able range (1 to around a million), and M is restricted to be smaller
308 than N. There are no target variables provided in either context. It
309 is possible for such probes to be run concurrently on a multi-processor
310 computer.
311
312 Alternatively, intervals may be specified in units of time. There are
313 two probe point variants similar to the jiffies timer:
314
315
316 timer.ms(N)
317 timer.ms(N).randomize(M)
318
319
320 Here, N and M are specified in milliseconds, but the full options for
321 units are seconds (s/sec), milliseconds (ms/msec), microseconds
322 (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
323 supported for hertz timers.
324
325 The actual resolution of the timers depends on the target kernel. For
326 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
327 intervals are rounded up to the nearest jiffies interval. After
328 2.6.17, the implementation uses hrtimers for tighter precision, though
329 the actual resolution will be arch-dependent. In either case, if the
330 "randomize" component is given, then the random value will be added to
331 the interval before any rounding occurs.
332
333 Profiling timers are also available to provide probes that execute on
334 all CPUs at the rate of the system tick (CONFIG_HZ) or at a given fre‐
335 quency (hz). On some kernels, this is a one-concurrent-user-only or
336 disabled facility, resulting in error -16 (EBUSY) during probe regis‐
337 tration.
338
339
340 timer.profile.tick
341 timer.profile.freq.hz(N)
342
343
344 Full context information of the interrupted process is available, mak‐
345 ing this probe suitable for a time-based sampling profiler.
346
347 It is recommended to use the tapset probe timer.profile rather than
348 timer.profile.tick. This probe point behaves identically to timer.pro‐
349 file.tick when the underlying functionality is available, and falls
350 back to using perf.sw.cpu_clock on some recent kernels which lack the
351 corresponding profile timer facility.
352
353 Profiling timers with specified frequencies are only accurate up to
354 around 100 hz. You may need to provide a larger value to achieve the
355 desired rate.
356
357 Note that if a timer probe is set to fire at a very high rate and if
358 the probe body is complex, succeeding timer probes can get skipped,
359 since the time for them to run has already passed. Normally systemtap
360 reports missed probes, but it will not report these skipped probes.
361
362
363 DWARF
364 This family of probe points uses symbolic debugging information for the
365 target kernel/module/program, as may be found in unstripped executa‐
366 bles, or the separate debuginfo packages. They allow placement of
367 probes logically into the execution path of the target program, by
368 specifying a set of points in the source or object code. When a match‐
369 ing statement executes on any processor, the probe handler is run in
370 that context.
371
372 Probe points in the DWARF family can be identified by the target kernel
373 module (or user process), source file, line number, function name, or
374 some combination of these.
375
376 Here is a list of DWARF probe points currently supported:
377
378 kernel.function(PATTERN)
379 kernel.function(PATTERN).call
380 kernel.function(PATTERN).callee(PATTERN)
381 kernel.function(PATTERN).callee(PATTERN).return
382 kernel.function(PATTERN).callee(PATTERN).call
383 kernel.function(PATTERN).callees(DEPTH)
384 kernel.function(PATTERN).return
385 kernel.function(PATTERN).inline
386 kernel.function(PATTERN).label(LPATTERN)
387 module(MPATTERN).function(PATTERN)
388 module(MPATTERN).function(PATTERN).call
389 module(MPATTERN).function(PATTERN).callee(PATTERN)
390 module(MPATTERN).function(PATTERN).callee(PATTERN).return
391 module(MPATTERN).function(PATTERN).callee(PATTERN).call
392 module(MPATTERN).function(PATTERN).callees(DEPTH)
393 module(MPATTERN).function(PATTERN).return
394 module(MPATTERN).function(PATTERN).inline
395 module(MPATTERN).function(PATTERN).label(LPATTERN)
396 kernel.statement(PATTERN)
397 kernel.statement(PATTERN).nearest
398 kernel.statement(ADDRESS).absolute
399 module(MPATTERN).statement(PATTERN)
400 process("PATH").function("NAME")
401 process("PATH").statement("*@FILE.c:123")
402 process("PATH").library("PATH").function("NAME")
403 process("PATH").library("PATH").statement("*@FILE.c:123")
404 process("PATH").library("PATH").statement("*@FILE.c:123").nearest
405 process("PATH").function("*").return
406 process("PATH").function("myfun").label("foo")
407 process("PATH").function("foo").callee("bar")
408 process("PATH").function("foo").callee("bar").return
409 process("PATH").function("foo").callee("bar").call
410 process("PATH").function("foo").callees(DEPTH)
411 process(PID).function("NAME")
412 process(PID).function("myfun").label("foo")
413 process(PID).plt("NAME")
414 process(PID).plt("NAME").return
415 process(PID).statement("*@FILE.c:123")
416 process(PID).statement("*@FILE.c:123").nearest
417 process(PID).statement(ADDRESS).absolute
418
419 (See the USER-SPACE section below for more information on the process
420 probes.)
421
422 The list above includes multiple variants and modifiers which provide
423 additional functionality or filters. They are:
424
425 .function
426 Places a probe near the beginning of the named function,
427 so that parameters are available as context variables.
428
429 .return
430 Places a probe at the moment after the return from the
431 named function, so the return value is available as the
432 "$return" context variable.
433
434 .inline
435 Filters the results to include only instances of inlined
436 functions. Note that inlined functions do not have an
437 identifiable return point, so .return is not supported on
438 .inline probes.
439
440 .call Filters the results to include only non-inlined functions
441 (the opposite set of .inline)
442
443 .exported
444 Filters the results to include only exported functions.
445
446 .statement
447 Places a probe at the exact spot, exposing those local
448 variables that are visible there.
449
450 .statement.nearest
451 Places a probe at the nearest available line number for
452 each line number given in the statement.
453
454 .callee
455 Places a probe on the callee function given in the
456 .callee modifier, where the callee must be a function
457 called by the target function given in .function. The ad‐
458 vantage of doing this over directly probing the callee
459 function is that this probe point is run only when the
460 callee is called from the target function (add the
461 -DSTAP_CALLEE_MATCHALL directive to override this when
462 calling stap(1)).
463
464 Note that only callees that can be statically determined
465 are available. For example, calls through function
466 pointers are not available. Additionally, calls to func‐
467 tions located in other objects (e.g. libraries) are not
468 available (instead use another probe point). This feature
469 will only work for code compiled with GCC 4.7+.
470
471 .callees
472 Shortcut for .callee("*"), which places a probe on all
473 callees of the function.
474
475 .callees(DEPTH)
476 Recursively places probes on callees. For example,
477 .callees(2) will probe both callees of the target func‐
478 tion, as well as callees of those callees. And
479 .callees(3) goes one level deeper, etc... A callee probe
480 at depth N is only triggered when the N callers in the
481 callstack match those that were statically determined
482 during analysis (this also may be overridden using
483 -DSTAP_CALLEE_MATCHALL).
484
485 In the above list of probe points, MPATTERN stands for a string literal
486 that aims to identify the loaded kernel module of interest. For in-tree
487 kernel modules, the name suffices (e.g. "btrfs"). The name may also in‐
488 clude the "*", "[]", and "?" wildcards to match multiple in-tree mod‐
489 ules. Out-of-tree modules are also supported by specifying the full
490 path to the ko file. Wildcards are not supported. The file must follow
491 the convention of being named <module_name>.ko (characters ',' and '-'
492 are replaced by '_').
493
494 LPATTERN stands for a source program label. It may also contain "*",
495 "[]", and "?" wildcards. PATTERN stands for a string literal that aims
496 to identify a point in the program. It is made up of three parts:
497
498 · The first part is the name of a function, as would appear in the nm
499 program's output. This part may use the "*" and "?" wildcarding
500 operators to match multiple names.
501
502 · The second part is optional and begins with the "@" character. It
503 is followed by the path to the source file containing the function,
504 which may include a wildcard pattern, such as mm/slab*. If it does
505 not match as is, an implicit "*/" is optionally added before the
506 pattern, so that a script need only name the last few components of
507 a possibly long source directory path.
508
509 · Finally, the third part is optional if the file name part was giv‐
510 en, and identifies the line number in the source file preceded by a
511 ":" or a "+". The line number is assumed to be an absolute line
512 number if preceded by a ":", or relative to the declaration line of
513 the function if preceded by a "+". All the lines in the function
514 can be matched with ":*". A range of lines x through y can be
515 matched with ":x-y". Ranges and specific lines can be mixed using
516 commas, e.g. ":x,y-z".
517
518 As an alternative, PATTERN may be a numeric constant, indicating an ad‐
519 dress. Such an address may be found from symbol tables of the appro‐
520 priate kernel / module object file. It is verified against known
521 statement code boundaries, and will be relocated for use at run time.
522
523 In guru mode only, absolute kernel-space addresses may be specified
524 with the ".absolute" suffix. Such an address is considered already re‐
525 located, as if it came from /proc/kallsyms, so it cannot be checked
526 against statement/instruction boundaries.
527
528
529 CONTEXT VARIABLES
530 Many of the source-level context variables, such as function parame‐
531 ters, locals, globals visible in the compilation unit, may be visible
532 to probe handlers. They may refer to these variables by prefixing
533 their name with "$" within the scripts. In addition, a special syntax
534 allows limited traversal of structures, pointers, and arrays. More
535 syntax allows pretty-printing of individual variables or their groups.
536 See also @cast. Note that variables may be inaccessible due to them
537 being paged out, or for a few other reasons. See also man er‐
538 ror::fault(7stap).
539
540
541 Functions called from DWARF class probe points and from process.mark
542 probes may also refer to context variables.
543
544
545 $var refers to an in-scope variable or thread local storage variable
546 "var". If it's an integer-like type, it will be cast to a
547 64-bit int for systemtap script use. String-like pointers (char
548 *) may be copied to systemtap string values using the ker‐
549 nel_string or user_string functions.
550
551 @var("varname")
552 an alternative syntax for $varname
553
554 @var("varname","module")
555 The global variable or global thread local storage variable in
556 scope of the given module already loaded into the current probed
557 process. Useful to get an exported variable in a shared library
558 loaded into the process being probed, or a global variable in a
559 process while a shared library probe is being executed. For us‐
560 er-space modules only. For example: @var("_r_debug","/lib/ld-
561 linux.so.2")
562
563 @var("varname@src/file.c")
564 refers to the global (either file local or external) variable
565 varname defined when the file src/file.c was compiled. The CU in
566 which the variable is resolved is the first CU in the module of
567 the probe point which matches the given file name at the end and
568 has the shortest file name path (e.g. given
569 @var("foo@bar/baz.c") and CUs with file name paths src/sub/mod‐
570 ule/bar/baz.c and src/bar/baz.c the second CU will be chosen to
571 resolve the (file) global variable foo
572
573
574 @var("varname@src/file.c","module")
575 The global variable in scope of the given CU, defined in the
576 given module, even if the variable is static (so the name is not
577 unique without the CU name).
578
579
580 $var->field traversal via a structure's or a pointer's field. This
581 generalized indirection operator may be repeated to follow more
582 levels. Note that the . operator is not used for plain struc‐
583 ture members, only -> for both purposes. (This is because "."
584 is reserved for string concatenation.) Also note that for direct
585 dereferencing of $var pointer {kernel,user}_{char,int,...}($var)
586 should be used. (Refer to stapfuncs(5) for more details.)
587
588 $return
589 is available in return probes only for functions that are de‐
590 clared with a return value, which can be determined using @de‐
591 fined($return).
592
593 $var[N]
594 indexes into an array. The index given with a literal number or
595 even an arbitrary numeric expression.
596
597 A number of operators exist for such basic context variable expres‐
598 sions:
599
600 $$vars expands to a character string that is equivalent to
601
602 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
603 parm1, ..., parmN, var1, ..., varN)
604
605 for each variable in scope at the probe point. Some values may
606 be printed as =? if their run-time location cannot be found.
607
608 $$locals
609 expands to a subset of $$vars for only local variables.
610
611 $$parms
612 expands to a subset of $$vars for only function parameters.
613
614 $$return
615 is available in return probes only. It expands to a string that
616 is equivalent to sprintf("return=%x", $return) if the probed
617 function has a return value, or else an empty string.
618
619 & $EXPR
620 expands to the address of the given context variable expression,
621 if it is addressable.
622
623 @defined($EXPR)
624 expands to 1 or 0 iff the given context variable expression is
625 resolvable, for use in conditionals such as
626
627 @defined($foo->bar) ? $foo->bar : 0
628
629
630 @probewrite($VAR)
631 see the PROBES section of stap(1).
632
633 $EXPR$ expands to a string with all of $EXPR's members, equivalent to
634
635 sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
636 $EXPR->a, $EXPR->b)
637
638
639 $EXPR$$
640 expands to a string with all of $var's members and submembers,
641 equivalent to
642
643 sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
644 $EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
645
646
647 @errno expands to the last value the C library global variable errno
648 was set to.
649
650
651 MORE ON RETURN PROBES
652 For the kernel ".return" probes, only a certain fixed number of returns
653 may be outstanding. The default is a relatively small number, on the
654 order of a few times the number of physical CPUs. If many different
655 threads concurrently call the same blocking function, such as futex(2)
656 or read(2), this limit could be exceeded, and skipped "kretprobes"
657 would be reported by "stap -t". To work around this, specify a
658
659 probe FOO.return.maxactive(NNN)
660
661 suffix, with a large enough NNN to cover all expected concurrently
662 blocked threads. Alternately, use the
663
664 stap -DKRETACTIVE=NNNN
665
666 stap command line macro setting to override the default for all ".re‐
667 turn" probes.
668
669
670 For ".return" probes, context variables other than the "$return" may be
671 accessible, as a convenience for a script programmer wishing to access
672 function parameters. These values are snapshots taken at the time of
673 function entry. (Local variables within the function are not generally
674 accessible, since those variables did not exist in allocated/initial‐
675 ized form at the snapshot moment.) These entry-snapshot variables
676 should be accessed via @entry($var).
677
678 In addition, arbitrary entry-time expressions can also be saved for
679 ".return" probes using the @entry(expr) operator. For example, one can
680 compute the elapsed time of a function:
681
682 probe kernel.function("do_filp_open").return {
683 println( get_timeofday_us() - @entry(get_timeofday_us()) )
684 }
685
686
687
688 The following table summarizes how values related to a function parame‐
689 ter context variable, a pointer named addr, may be accessed from a .re‐
690 turn probe.
691
692 at-entry value past-exit value
693
694 $addr not available
695 $addr->x->y @cast(@entry($addr),"struct zz")->x->y
696 $addr[0] {kernel,user}_{char,int,...}(& $addr[0])
697
698
699
700 DWARFLESS
701 In absence of debugging information, entry & exit points of kernel &
702 module functions can be probed using the "kprobe" family of probes.
703 However, these do not permit looking up the arguments / local variables
704 of the function. Following constructs are supported :
705
706 kprobe.function(FUNCTION)
707 kprobe.function(FUNCTION).call
708 kprobe.function(FUNCTION).return
709 kprobe.module(NAME).function(FUNCTION)
710 kprobe.module(NAME).function(FUNCTION).call
711 kprobe.module(NAME).function(FUNCTION).return
712 kprobe.statement(ADDRESS).absolute
713
714
715 Probes of type function are recommended for kernel functions, whereas
716 probes of type module are recommended for probing functions of the
717 specified module. In case the absolute address of a kernel or module
718 function is known, statement probes can be utilized.
719
720 Note that FUNCTION and MODULE names must not contain wildcards, or the
721 probe will not be registered. Also, statement probes must be run under
722 guru-mode only.
723
724
725
726 USER-SPACE
727 Support for user-space probing is available for kernels that are con‐
728 figured with the utrace extensions, or have the uprobes facility in
729 linux 3.5. (Various kernel build configuration options need to be en‐
730 abled; systemtap will advise if these are missing.)
731
732
733 There are several forms. First, a non-symbolic probe point:
734
735 process(PID).statement(ADDRESS).absolute
736
737 is analogous to kernel.statement(ADDRESS).absolute in that both use raw
738 (unverified) virtual addresses and provide no $variables. The target
739 PID parameter must identify a running process, and ADDRESS should iden‐
740 tify a valid instruction address. All threads of that process will be
741 probed.
742
743 Second, non-symbolic user-kernel interface events handled by utrace may
744 be probed:
745
746 process(PID).begin
747 process("FULLPATH").begin
748 process.begin
749 process(PID).thread.begin
750 process("FULLPATH").thread.begin
751 process.thread.begin
752 process(PID).end
753 process("FULLPATH").end
754 process.end
755 process(PID).thread.end
756 process("FULLPATH").thread.end
757 process.thread.end
758 process(PID).syscall
759 process("FULLPATH").syscall
760 process.syscall
761 process(PID).syscall.return
762 process("FULLPATH").syscall.return
763 process.syscall.return
764 process(PID).insn
765 process("FULLPATH").insn
766 process(PID).insn.block
767 process("FULLPATH").insn.block
768
769
770
771 A process.begin probe gets called when new process described by PID or
772 FULLPATH gets created. In addition, it is called once from the context
773 of each preexisting process, at systemtap script startup. This is use‐
774 ful to track live processes. A process.thread.begin probe gets called
775 when a new thread described by PID or FULLPATH gets created. A
776 process.end probe gets called when process described by PID or FULLPATH
777 dies. A process.thread.end probe gets called when a thread described
778 by PID or FULLPATH dies. A process.syscall probe gets called when a
779 thread described by PID or FULLPATH makes a system call. The system
780 call number is available in the $syscall context variable, and the
781 first 6 arguments of the system call are available in the $argN (ex.
782 $arg1, $arg2, ...) context variable. A process.syscall.return probe
783 gets called when a thread described by PID or FULLPATH returns from a
784 system call. The system call number is available in the $syscall con‐
785 text variable, and the return value of the system call is available in
786 the $return context variable. A process.insn probe gets called for ev‐
787 ery single-stepped instruction of the process described by PID or FULL‐
788 PATH. A process.insn.block probe gets called for every block-stepped
789 instruction of the process described by PID or FULLPATH.
790
791
792 If a process probe is specified without a PID or FULLPATH, all user
793 threads will be probed. However, if systemtap was invoked with the -c
794 or -x options, then process probes are restricted to the process hier‐
795 archy associated with the target process. If a process probe is un‐
796 specified (i.e. without a PID or FULLPATH), but with the -c option, the
797 PATH of the -c cmd will be heuristically filled into the process PATH.
798 In that case, only command parameters are allowed in the -c command
799 (i.e. no command substitution allowed and no occurrences of any of
800 these characters: '|&;<>(){}').
801
802
803 Third, symbolic static instrumentation compiled into programs and
804 shared libraries may be probed:
805
806 process("PATH").mark("LABEL")
807 process("PATH").provider("PROVIDER").mark("LABEL")
808 process(PID).mark("LABEL")
809 process(PID).provider("PROVIDER").mark("LABEL")
810
811
812 A .mark probe gets called via a static probe which is defined in the
813 application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros de‐
814 fined in sys/sdt.h. The PROVIDER is an arbitrary application identifi‐
815 er, LABEL is the marker site identifier, and arg1 is the integer-typed
816 argument. STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2
817 is used for probes with 2 arguments, and so on. The arguments of the
818 probe are available in the context variables $arg1, $arg2, ... An al‐
819 ternative to using the STAP_PROBE macros is to use the dtrace script to
820 create custom macros. Additionally, the variables $$name and
821 $$provider are available as parts of the probe point name. The
822 sys/sdt.h macro names DTRACE_PROBE* are available as aliases for
823 STAP_PROBE*.
824
825
826 Finally, full symbolic source-level probes in user-space programs and
827 shared libraries are supported. These are exactly analogous to the
828 symbolic DWARF-based kernel/module probes described above. They expose
829 the same sorts of context $variables for function parameters, local
830 variables, and so on.
831
832 process("PATH").function("NAME")
833 process("PATH").statement("*@FILE.c:123")
834 process("PATH").plt("NAME")
835 process("PATH").library("PATH").plt("NAME")
836 process("PATH").library("PATH").function("NAME")
837 process("PATH").library("PATH").statement("*@FILE.c:123")
838 process("PATH").function("*").return
839 process("PATH").function("myfun").label("foo")
840 process("PATH").function("foo").callee("bar")
841 process("PATH").plt("NAME").return
842 process(PID).function("NAME")
843 process(PID).statement("*@FILE.c:123")
844 process(PID).plt("NAME")
845
846
847
848 Note that for all process probes, PATH names refer to executables that
849 are searched the same way shells do: relative to the working directory
850 if they contain a "/" character, otherwise in $PATH. If PATH names re‐
851 fer to scripts, the actual interpreters (specified in the script in the
852 first line after the #! characters) are probed.
853
854
855 Tapset process probes placed in the special directory $pre‐
856 fix/share/systemtap/tapset/PATH/ with relative paths will have their
857 process parameter prefixed with the location of the tapset. For exam‐
858 ple,
859
860
861 process("foo").function("NAME")
862
863
864 expands to
865
866 process("/usr/bin/foo").function("NAME")
867
868
869
870 when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
871
872
873 If PATH is a process component parameter referring to shared libraries
874 then all processes that map it at runtime would be selected for prob‐
875 ing. If PATH is a library component parameter referring to shared li‐
876 braries then the process specified by the process component would be
877 selected. Note that the PATH pattern in a library component will al‐
878 ways apply to libraries statically determined to be in use by the
879 process. However, you may also specify the full path to any library
880 file even if not statically needed by the process.
881
882
883 A .plt probe will probe functions in the program linkage table corre‐
884 sponding to the rest of the probe point. .plt can be specified as a
885 shorthand for .plt("*"). The symbol name is available as a $$name con‐
886 text variable; function arguments are not available, since PLTs are
887 processed without debuginfo. A .plt.return probe places a probe at the
888 moment after the return from the named function.
889
890
891 If the PATH string contains wildcards as in the MPATTERN case, then
892 standard globbing is performed to find all matching paths. In this
893 case, the $PATH environment variable is not used.
894
895
896 If systemtap was invoked with the -c or -x options, then process probes
897 are restricted to the process hierarchy associated with the target
898 process.
899
900
901 JAVA
902 Support for probing Java methods is available using Byteman as a back‐
903 end. Byteman is an instrumentation tool from the JBoss project which
904 systemtap can use to monitor invocations for a specific method or line
905 in a Java program.
906
907 Systemtap does so by generating a Byteman script listing the probes to
908 instrument and then invoking the Byteman bminstall utility.
909
910 This Java instrumentation support is currently a prototype feature with
911 major limitations. Moreover, Java probing currently does not work
912 across users; the stap script must run (with appropriate permissions)
913 under the same user that the Java process being probed. (Thus a stap
914 script under root currently cannot probe Java methods in a non-root-us‐
915 er Java process.)
916
917
918 The first probe type refers to Java processes by the name of the Java
919 process:
920
921 java("PNAME").class("CLASSNAME").method("PATTERN")
922 java("PNAME").class("CLASSNAME").method("PATTERN").return
923
924 The PNAME argument must be a pre-existing jvm pid, and be identifiable
925 via a jps listing.
926
927 The PATTERN parameter specifies the signature of the Java method to
928 probe. The signature must consist of the exact name of the method, fol‐
929 lowed by a bracketed list of the types of the arguments, for instance
930 "myMethod(int,double,Foo)". Wildcards are not supported.
931
932 The probe can be set to trigger at a specific line within the method by
933 appending a line number with colon, just as in other types of probes:
934 "myMethod(int,double,Foo):245".
935
936 The CLASSNAME parameter identifies the Java class the method belongs
937 to, either with or without the package qualification. By default, the
938 probe only triggers on descendants of the class that do not override
939 the method definition of the original class. However, CLASSNAME can
940 take an optional caret prefix, as in ^org.my.MyClass, which specifies
941 that the probe should also trigger on all descendants of MyClass that
942 override the original method. For instance, every method with signature
943 foo(int) in program org.my.MyApp can be probed at once using
944
945 java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
946
947
948 The second probe type works analogously, but refers to Java processes
949 by PID:
950
951 java(PID).class("CLASSNAME").method("PATTERN")
952 java(PID).class("CLASSNAME").method("PATTERN").return
953
954 (PIDs for an already running process can be obtained using the jps(1)
955 utility.)
956
957 Context variables defined within java probes include $arg1 through
958 $arg10 (for up to the first 10 arguments of a method), represented as
959 character-pointers for the toString() form of each actual argument.
960 The arg1 through arg10 script variables provide access to these as or‐
961 dinary strings, fetched via user_string_warn().
962
963 Prior to systemtap version 3.1, $arg1 through $arg10 could contain ei‐
964 ther integers or character pointers, depending on the types of the ob‐
965 jects being passed to each particular java method. This previous be‐
966 haviour may be invoked with the stap --compatible=3.0 flag.
967
968
969 PROCFS
970 These probe points allow procfs "files" in /proc/systemtap/MODNAME to
971 be created, read and written using a permission that may be modified
972 using the proper umask value. Default permissions are 0400 for read
973 probes, and 0200 for write probes. If both a read and write probe are
974 being used on the same file, a default permission of 0600 will be used.
975 Using procfs.umask(0040).read would result in a 0404 permission set for
976 the file. (MODNAME is the name of the systemtap module). The proc
977 filesystem is a pseudo-filesystem which is used as an interface to ker‐
978 nel data structures. There are several probe point variants supported
979 by the translator:
980
981
982 procfs("PATH").read
983 procfs("PATH").umask(UMASK).read
984 procfs("PATH").read.maxsize(MAXSIZE)
985 procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
986 procfs("PATH").write
987 procfs("PATH").umask(UMASK).write
988 procfs.read
989 procfs.umask(UMASK).read
990 procfs.read.maxsize(MAXSIZE)
991 procfs.umask(UMASK).read.maxsize(MAXSIZE)
992 procfs.write
993 procfs.umask(UMASK).write
994
995
996 Note that there are a few differences when procfs probes are used in
997 the stapbpf runtime. FIFO special files are used instead of proc
998 filesystem files. These files are created in /var/tmp/systemtap-US‐
999 ER/MODNAME. (USER is the name of the user). Additionally, users can‐
1000 not create both read and write probes on the same file.
1001
1002 PATH is the file name (relative to /proc/systemtap/MODNAME or
1003 /var/tmp/systemtap-USER/MODNAME) to be created. If no PATH is speci‐
1004 fied (as in the last two variants above), PATH defaults to "command".
1005 The file name "__stdin" is used internally by systemtap for input
1006 probes and should not be used as a PATH for procfs probes; see the in‐
1007 put probe section below.
1008
1009 When a user reads /proc/systemtap/MODNAME/PATH (normal runtime) or
1010 /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the corresponding
1011 procfs read probe is triggered. The string data to be read should be
1012 assigned to a variable named $value, like this:
1013
1014
1015 procfs("PATH").read { $value = "100\n" }
1016
1017
1018 When a user writes into /proc/systemtap/MODNAME/PATH (normal runtime)
1019 or /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the corresponding
1020 procfs write probe is triggered. The data the user wrote is available
1021 in the string variable named $value, like this:
1022
1023
1024 procfs("PATH").write { printf("user wrote: %s", $value) }
1025
1026
1027 MAXSIZE is the size of the procfs read buffer. Specifying MAXSIZE al‐
1028 lows larger procfs output. If no MAXSIZE is specified, the procfs read
1029 buffer defaults to STP_PROCFS_BUFSIZE (which defaults to MAXSTRINGLEN,
1030 the maximum length of a string). If setting the procfs read buffers
1031 for more than one file is needed, it may be easiest to override the
1032 STP_PROCFS_BUFSIZE definition. Here's an example of using MAXSIZE:
1033
1034
1035 procfs.read.maxsize(1024) {
1036 $value = "long string..."
1037 $value .= "another long string..."
1038 $value .= "another long string..."
1039 $value .= "another long string..."
1040 }
1041
1042
1043
1044 INPUT
1045 These probe points make input from stdin available to the script during
1046 runtime. The translator currently supports two variants of this fami‐
1047 ly:
1048
1049 input.char
1050 input.line
1051
1052
1053 input.char is triggered each time a character is read from stdin. The
1054 current character is available in the string variable named char.
1055 There is no newline buffering; the next character is read from stdin as
1056 soon as it becomes available.
1057
1058 input.line causes all characters read from stdin to be buffered until a
1059 newline is read, at which point the probe will be triggered. The cur‐
1060 rent line of characters (including the newline) is made available in a
1061 string variable named line. Note that no more than MAXSTRINGLEN char‐
1062 acters will be buffered. Any additional characters will not be included
1063 in line.
1064
1065
1066 Input probes are aliases for procfs("__stdin").write. Systemtap recon‐
1067 figures stdin if the presence of this procfs probe is detected, there‐
1068 fore "__stdin" should not be used as a path argument for procfs probes.
1069 Additionally, input probes will not work with the -F and --remote op‐
1070 tions.
1071
1072
1073 NETFILTER HOOKS
1074 These probe points allow observation of network packets using the net‐
1075 filter mechanism. A netfilter probe in systemtap corresponds to a net‐
1076 filter hook function in the original netfilter probes API. It is proba‐
1077 bly more convenient to use tapset::netfilter(3stap), which wraps the
1078 primitive netfilter hooks and does the work of extracting useful infor‐
1079 mation from the context variables.
1080
1081
1082 There are several probe point variants supported by the translator:
1083
1084
1085 netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
1086 netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
1087 netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
1088 netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
1089
1090
1091
1092 PROTOCOL_F is the protocol family to listen for, currently one of NF‐
1093 PROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
1094
1095
1096 HOOKNAME is the point, or 'hook', in the protocol stack at which to in‐
1097 tercept the packet. The available hook names for each protocol family
1098 are taken from the kernel header files <linux/netfilter_ipv4.h>, <lin‐
1099 ux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and <linux/netfil‐
1100 ter_bridge.h>. For instance, allowable hook names for NFPROTO_IPV4 are
1101 NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_LO‐
1102 CAL_OUT, and NF_INET_POST_ROUTING.
1103
1104
1105 PRIORITY is an integer priority giving the order in which the probe
1106 point should be triggered relative to any other netfilter hook func‐
1107 tions which trigger on the same packet. Hook functions execute on each
1108 packet in order from smallest priority number to largest priority num‐
1109 ber. If no PRIORITY is specified (as in the first two probe point vari‐
1110 ants above), PRIORITY defaults to "0".
1111
1112 There are a number of predefined priority names of the form NF_IP_PRI_*
1113 and NF_IP6_PRI_* which are defined in the kernel header files <lin‐
1114 ux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The
1115 script is permitted to use these instead of specifying an integer pri‐
1116 ority. (The probe points for NFPROTO_ARP and NFPROTO_BRIDGE currently
1117 do not expose any named hook priorities to the script writer.) Thus,
1118 allowable ways to specify the priority include:
1119
1120
1121 priority("255")
1122 priority("NF_IP_PRI_SELINUX_LAST")
1123
1124
1125 A script using guru mode is permitted to specify any identifier or num‐
1126 ber as the parameter for hook, pf, and priority. This feature should be
1127 used with caution, as the parameter is inserted verbatim into the C
1128 code generated by systemtap.
1129
1130 The netfilter probe points define the following context variables:
1131
1132 $hooknum
1133 The hook number.
1134
1135 $skb The address of the sk_buff struct representing the packet. See
1136 <linux/skbuff.h> for details on how to use this struct, or al‐
1137 ternatively use the tapset tapset::netfilter(3stap) for easy ac‐
1138 cess to key information.
1139
1140
1141 $in The address of the net_device struct representing the network
1142 device on which the packet was received (if any). May be 0 if
1143 the device is unknown or undefined at that stage in the protocol
1144 stack.
1145
1146
1147 $out The address of the net_device struct representing the network
1148 device on which the packet will be sent (if any). May be 0 if
1149 the device is unknown or undefined at that stage in the protocol
1150 stack.
1151
1152
1153 $verdict
1154 (Guru mode only.) Assigning one of the verdict values defined in
1155 <linux/netfilter.h> to this variable alters the further progress
1156 of the packet through the protocol stack. For instance, the fol‐
1157 lowing guru mode script forces all ipv6 network packets to be
1158 dropped:
1159
1160
1161 probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
1162 $verdict = 0 /* nf_drop */
1163 }
1164
1165
1166 For convenience, unlike the primitive probe points discussed
1167 here, the probes defined in tapset::netfilter(3stap) export the
1168 lowercase names of the verdict constants (e.g. NF_DROP becomes
1169 nf_drop) as local variables.
1170
1171
1172 KERNEL TRACEPOINTS
1173 This family of probe points hooks up to static probing tracepoints in‐
1174 serted into the kernel or modules. As with markers, these tracepoints
1175 are special macro calls inserted by kernel developers to make probing
1176 faster and more reliable than with DWARF-based probes, and DWARF debug‐
1177 ging information is not required to probe tracepoints. Tracepoints
1178 have an extra advantage of more strongly-typed parameters than markers.
1179
1180 Tracepoint probes look like: kernel.trace("name"). The tracepoint name
1181 string, which may contain the usual wildcard characters, is matched
1182 against the names defined by the kernel developers in the tracepoint
1183 header files. To restrict the search to specific subsystems (e.g.
1184 sched, ext3, etc...), the following syntax can be used: ker‐
1185 nel.trace("system:name"). The tracepoint system string may also con‐
1186 tain the usual wildcard characters.
1187
1188 The handler associated with a tracepoint-based probe may read the op‐
1189 tional parameters specified at the macro call site. These are named
1190 according to the declaration by the tracepoint author. For example,
1191 the tracepoint probe kernel.trace("sched:sched_switch") provides the
1192 parameters $prev and $next. If the parameter is a complex type, as in
1193 a struct pointer, then a script can access fields with the same syntax
1194 as DWARF $target variables. Also, tracepoint parameters cannot be mod‐
1195 ified, but in guru-mode a script may modify fields of parameters.
1196
1197 The subsystem and name of the tracepoint are available in $$system and
1198 $$name and a string of name=value pairs for all parameters of the tra‐
1199 cepoint is available in $$vars or $$parms.
1200
1201
1202 KERNEL MARKERS (OBSOLETE)
1203 This family of probe points hooks up to an older style of static prob‐
1204 ing markers inserted into older kernels or modules. These markers are
1205 special STAP_MARK macro calls inserted by kernel developers to make
1206 probing faster and more reliable than with DWARF-based probes. Fur‐
1207 ther, DWARF debugging information is not required to probe markers.
1208
1209 Marker probe points begin with kernel. The next part names the marker
1210 itself: mark("name"). The marker name string, which may contain the
1211 usual wildcard characters, is matched against the names given to the
1212 marker macros when the kernel and/or module was compiled. Optional‐
1213 ly, you can specify format("format"). Specifying the marker format
1214 string allows differentiation between two markers with the same name
1215 but different marker format strings.
1216
1217 The handler associated with a marker-based probe may read the optional
1218 parameters specified at the macro call site. These are named $arg1
1219 through $argNN, where NN is the number of parameters supplied by the
1220 macro. Number and string parameters are passed in a type-safe manner.
1221
1222 The marker format string associated with a marker is available in $for‐
1223 mat. And also the marker name string is available in $name.
1224
1225
1226 HARDWARE BREAKPOINTS
1227 This family of probes is used to set hardware watchpoints for a given
1228 (global) kernel symbol. The probes take three components as inputs :
1229
1230 1. The virtual address / name of the kernel symbol to be traced is sup‐
1231 plied as argument to this class of probes. ( Probes for only data seg‐
1232 ment variables are supported. Probing local variables of a function
1233 cannot be done.)
1234
1235 2. Nature of access to be probed : a. .write probe gets triggered when
1236 a write happens at the specified address/symbol name. b. rw probe is
1237 triggered when either a read or write happens.
1238
1239 3. .length (optional) Users have the option of specifying the address
1240 interval to be probed using "length" constructs. The user-specified
1241 length gets approximated to the closest possible address length that
1242 the architecture can support. If the specified length exceeds the lim‐
1243 its imposed by architecture, an error message is flagged and probe reg‐
1244 istration fails. Wherever 'length' is not specified, the translator
1245 requests a hardware breakpoint probe of length 1. It should be noted
1246 that the "length" construct is not valid with symbol names.
1247
1248 Following constructs are supported :
1249
1250 probe kernel.data(ADDRESS).write
1251 probe kernel.data(ADDRESS).rw
1252 probe kernel.data(ADDRESS).length(LEN).write
1253 probe kernel.data(ADDRESS).length(LEN).rw
1254 probe kernel.data("SYMBOL_NAME").write
1255 probe kernel.data("SYMBOL_NAME").rw
1256
1257
1258 This set of probes make use of the debug registers of the processor,
1259 which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1260 translation flags a warning if a user requests more hardware breakpoint
1261 probes than the limits set by architecture. For example,a pass-2 warn‐
1262 ing is flashed when an input script requests 5 hardware breakpoint
1263 probes on an x86 system while x86 architecture supports a maximum of 4
1264 breakpoints. Users are cautioned to set probes judiciously.
1265
1266
1267 PERF
1268 This family of probe points interfaces to the kernel "perf event" in‐
1269 frastructure for controlling hardware performance counters. The events
1270 being attached to are described by the "type", "config" fields of the
1271 perf_event_attr structure, and are sampled at an interval governed by
1272 the "sample_period" and "sample_freq" fields.
1273
1274 These fields are made available to systemtap scripts using the follow‐
1275 ing syntax:
1276
1277 probe perf.type(NN).config(MM).sample(XX)
1278 probe perf.type(NN).config(MM).hz(XX)
1279 probe perf.type(NN).config(MM)
1280 probe perf.type(NN).config(MM).process("PROC")
1281 probe perf.type(NN).config(MM).counter("COUNTER")
1282 probe perf.type(NN).config(MM).process("PROC").counter("NAME")
1283
1284 The systemtap probe handler is called once per XX increments of the un‐
1285 derlying performance counter when using the .sample field or at a fre‐
1286 quency in hertz when using the .hz field. When not specified, the de‐
1287 fault behavior is to sample at a count of 1000000. The range of valid
1288 type/config is described by the perf_event_open(2) system call, and/or
1289 the linux/perf_event.h file. Invalid combinations or exhausted hard‐
1290 ware counter resources result in errors during systemtap script start‐
1291 up. Systemtap does not sanity-check the values: it merely passes them
1292 through to the kernel for error- and safety-checking. By default the
1293 perf event probe is systemwide unless .process is specified, which will
1294 bind the probe to a specific task. If the name is omitted then it is
1295 inferred from the stap -c argument. A perf event can be read on de‐
1296 mand using .counter. The body of the perf probe handler will not be
1297 invoked for a .counter probe; instead, the counter is read in a user
1298 space probe via:
1299
1300 process("PROC").statement("func@file") {stat <<< @perf("NAME")}
1301
1302
1303
1304 PYTHON
1305 Support for probing python 2 and python 3 function is available with
1306 the help of an extra python support module. Note that the debuginfo for
1307 the version of python being probed is required. To run a python script
1308 with the extra python support module you'd add the '-m HelperSDT' op‐
1309 tion to your python command, like this:
1310
1311 stap foo.stp -c "python -m HelperSDT foo.py"
1312
1313 Python probes look like the following:
1314
1315 python2.module("MPATTERN").function("PATTERN")
1316 python2.module("MPATTERN").function("PATTERN").call
1317 python2.module("MPATTERN").function("PATTERN").return
1318 python3.module("MPATTERN").function("PATTERN")
1319 python3.module("MPATTERN").function("PATTERN").call
1320 python3.module("MPATTERN").function("PATTERN").return
1321
1322 The list above includes multiple variants and modifiers which provide
1323 additional functionality or filters. They are:
1324
1325 .function
1326 Places a probe at the beginning of the named function by
1327 default, unless modified by PATTERN. Parameters are
1328 available as context variables.
1329
1330 .call Places a probe at the beginning of the named function.
1331 Parameters are available as context variables.
1332
1333 .return
1334 Places a probe at the moment before the return from the
1335 named function. Parameters and local/global python vari‐
1336 ables are available as context variables.
1337
1338 PATTERN stands for a string literal that aims to identify a point in
1339 the python program. It is made up of three parts:
1340
1341 · The first part is the name of a function (e.g. "foo") or class
1342 method (e.g. "bar.baz"). This part may use the "*" and "?" wild‐
1343 carding operators to match multiple names.
1344
1345 · The second part is optional and begins with the "@" character. It
1346 is followed by the path to the source file containing the function,
1347 which may include a wildcard pattern. The python path is searched
1348 for a matching filename.
1349
1350 · Finally, the third part is optional if the file name part was giv‐
1351 en, and identifies the line number in the source file preceded by a
1352 ":" or a "+". The line number is assumed to be an absolute line
1353 number if preceded by a ":", or relative to the declaration line of
1354 the function if preceded by a "+". All the lines in the function
1355 can be matched with ":*". A range of lines x through y can be
1356 matched with ":x-y". Ranges and specific lines can be mixed using
1357 commas, e.g. ":x,y-z".
1358
1359 In the above list of probe points, MPATTERN stands for a python module
1360 or script name that names the python module of interest. This part may
1361 use the "*" and "?" wildcarding operators to match multiple names. The
1362 python path is searched for a matching filename.
1363
1364
1365
1367 Here are some example probe points, defining the associated events.
1368
1369 begin, end, end
1370 refers to the startup and normal shutdown of the session. In
1371 this case, the handler would run once during startup and twice
1372 during shutdown.
1373
1374 timer.jiffies(1000).randomize(200)
1375 refers to a periodic interrupt, every 1000 +/- 200 jiffies.
1376
1377 kernel.function("*init*"), kernel.function("*exit*")
1378 refers to all kernel functions with "init" or "exit" in the
1379 name.
1380
1381 kernel.function("*@kernel/time.c:240")
1382 refers to any functions within the "kernel/time.c" file that
1383 span line 240. Note that this is not a probe at the statement
1384 at that line number. Use the kernel.statement probe instead.
1385
1386 kernel.trace("sched_*")
1387 refers to all scheduler-related (really, prefixed) tracepoints
1388 in the kernel.
1389
1390 kernel.mark("getuid")
1391 refers to an obsolete STAP_MARK(getuid, ...) macro call in the
1392 kernel.
1393
1394 module("usb*").function("*sync*").return
1395 refers to the moment of return from all functions with "sync" in
1396 the name in any of the USB drivers.
1397
1398 kernel.statement(0xc0044852)
1399 refers to the first byte of the statement whose compiled in‐
1400 structions include the given address in the kernel.
1401
1402 kernel.statement("*@kernel/time.c:296")
1403 refers to the statement of line 296 within "kernel/time.c".
1404
1405 kernel.statement("bio_init@fs/bio.c+3")
1406 refers to the statement at line bio_init+3 within "fs/bio.c".
1407
1408 kernel.data("pid_max").write
1409 refers to a hardware breakpoint of type "write" set on pid_max
1410
1411 syscall.*.return
1412 refers to the group of probe aliases with any name in the third
1413 position
1414
1415
1417 stap(1),
1418 probe::*[24m(3stap),
1419 tapset::*[24m(3stap)
1420
1421
1422
1423
1424 STAPPROBES(3stap)