1STAPPROBES(3stap) STAPPROBES(3stap)
2
3
4
6 stapprobes - systemtap probe points
7
8
9
11 The following sections enumerate the variety of probe points supported
12 by the systemtap translator, and some of the additional aliases defined
13 by standard tapset scripts. Many are individually documented in the
14 3stap manual section, with the probe:: prefix.
15
16
18 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
19
20
21 A probe declaration may list multiple comma-separated probe points in
22 order to attach a handler to all of the named events. Normally, the
23 handler statements are run whenever any of events occur. Depending on
24 the type of probe point, the handler statements may refer to context
25 variables (denoted with a dollar-sign prefix like $foo) to read or
26 write state. This may include function parameters for function probes,
27 or local variables for statement probes.
28
29 The syntax of a single probe point is a general dotted-symbol sequence.
30 This allows a breakdown of the event namespace into parts, somewhat
31 like the Domain Name System does on the Internet. Each component iden‐
32 tifier may be parametrized by a string or number literal, with a syntax
33 like a function call. A component may include a "*" character, to ex‐
34 pand to a set of matching probe points. It may also include "**" to
35 match multiple sequential components at once. Probe aliases likewise
36 expand to other probe points.
37
38 Probe aliases can be given on their own, or with a suffix. The suffix
39 attaches to the underlying probe point that the alias is expanded to.
40 For example,
41
42 syscall.read.return.maxactive(10)
43
44 expands to
45
46 kernel.function("sys_read").return.maxactive(10)
47
48 with the component maxactive(10) being recognized as a suffix.
49
50 Normally, each and every probe point resulting from wildcard- and
51 alias-expansion must be resolved to some low-level system instrumenta‐
52 tion facility (e.g., a kprobe address, marker, or a timer configura‐
53 tion), otherwise the elaboration phase will fail.
54
55 However, a probe point may be followed by a "?" character, to indicate
56 that it is optional, and that no error should result if it fails to re‐
57 solve. Optionalness passes down through all levels of alias/wildcard
58 expansion. Alternately, a probe point may be followed by a "!" charac‐
59 ter, to indicate that it is both optional and sufficient. (Think
60 vaguely of the Prolog cut operator.) If it does resolve, then no fur‐
61 ther probe points in the same comma-separated list will be resolved.
62 Therefore, the "!" sufficiency mark only makes sense in a list of
63 probe point alternatives.
64
65 Additionally, a probe point may be followed by a "if (expr)" statement,
66 in order to enable/disable the probe point on-the-fly. With the "if"
67 statement, if the "expr" is false when the probe point is hit, the
68 whole probe body including alias's body is skipped. The condition is
69 stacked up through all levels of alias/wildcard expansion. So the final
70 condition becomes the logical-and of conditions of all expanded
71 alias/wildcard. The expressions are necessarily restricted to global
72 variables.
73
74 These are all syntactically valid probe points. (They are generally
75 semantically invalid, depending on the contents of the tapsets, and the
76 versions of kernel/user software installed.)
77
78
79 kernel.function("foo").return
80 process("/bin/vi").statement(0x2222)
81 end
82 syscall.*
83 syscall.*.return.maxactive(10)
84 syscall.{open,close}
85 sys**open
86 kernel.function("no_such_function") ?
87 module("awol").function("no_such_function") !
88 signal.*? if (switch)
89 kprobe.function("foo")
90
91
92 Probes may be broadly classified into "synchronous" and "asynchronous".
93 A "synchronous" event is deemed to occur when any processor executes an
94 instruction matched by the specification. This gives these probes a
95 reference point (instruction address) from which more contextual data
96 may be available. Other families of probe points refer to "asynchro‐
97 nous" events such as timers/counters rolling over, where there is no
98 fixed reference point that is related. Each probe point specification
99 may match multiple locations (for example, using wildcards or aliases),
100 and all them are then probed. A probe declaration may also contain
101 several comma-separated specifications, all of which are probed.
102
103 Brace expansion is a mechanism which allows a list of probe points to
104 be generated. It is very similar to shell expansion. A component may be
105 surrounded by a pair of curly braces to indicate that the comma-sepa‐
106 rated sequence of one or more subcomponents will each constitute a new
107 probe point. The braces may be arbitrarily nested. The ordering of ex‐
108 panded results is based on product order.
109
110 The question mark (?), exclamation mark (!) indicators and probe point
111 conditions may not be placed in any expansions that are before the last
112 component.
113
114 The following is an example of brace expansion.
115
116
117 syscall.{write,read}
118 # Expands to
119 syscall.write, syscall.read
120
121 {kernel,module("nfs")}.function("nfs*")!
122 # Expands to
123 kernel.function("nfs*")!, module("nfs").function("nfs*")!
124
125
126
128 Resolving some probe points requires DWARF debuginfo or "debug symbols"
129 for the specific program being instrumented. For some others, DWARF is
130 automatically synthesized on the fly from source code header files.
131 For others, it is not needed at all. Since a systemtap script may use
132 any mixture of probe points together, the union of their DWARF require‐
133 ments has to be met on the computer where script compilation occurs.
134 (See the --use-server option and the stap-server(8) man page for infor‐
135 mation about the remote compilation facility, which allows these re‐
136 quirements to be met on a different machine.)
137
138 The following point lists many of the available probe point families,
139 to classify them with respect to their need for DWARF debuginfo for the
140 specific program for that probe point.
141
142
143 DWARF NON-DWARF SYMBOL-TABLE
144
145 kernel.function, .statement kernel.mark kernel.function*
146 module.function, .statement process.mark, process.plt module.function*
147 process.function, .statement begin, end, error, never process.function*
148 process.mark* timer
149 .function.callee perf
150 python2, python3 procfs
151 debuginfod kernel.statement.absolute
152 kernel.data
153 AUTO-GENERATED-DWARF kprobe.function
154 kernel.trace process.statement.absolute
155 process.begin, .end
156 netfilter
157 java
158
159
160 The probe types marked with * asterisks mark fallbacks, where systemtap
161 can sometimes infer subset or substitute information. In general, the
162 more symbolic / debugging information available, the higher quality
163 probing will be available.
164
165
166
168 The following types of probe points may be armed/disarmed on-the-fly to
169 save overheads during uninteresting times. Arming conditions may also
170 be added to other types of probes, but will be treated as a wrapping
171 conditional and won't benefit from overhead savings.
172
173
174 DISARMABLE exceptions
175 kernel.function, kernel.statement
176 module.function, module.statement
177 process.*.function, process.*.statement
178 process.*.plt, process.*.mark
179 timer. timer.profile
180 java
181
182
184 BEGIN/END/ERROR
185 The probe points begin and end are defined by the translator to refer
186 to the time of session startup and shutdown. All "begin" probe han‐
187 dlers are run, in some sequence, during the startup of the session.
188 All global variables will have been initialized prior to this point.
189 All "end" probes are run, in some sequence, during the normal shutdown
190 of a session, such as in the aftermath of an exit () function call, or
191 an interruption from the user. In the case of an error-triggered shut‐
192 down, "end" probes are not run. There are no target variables avail‐
193 able in either context.
194
195 If the order of execution among "begin" or "end" probes is significant,
196 then an optional sequence number may be provided:
197
198
199 begin(N)
200 end(N)
201
202
203 The number N may be positive or negative. The probe handlers are run
204 in increasing order, and the order between handlers with the same se‐
205 quence number is unspecified. When "begin" or "end" are given without
206 a sequence, they are effectively sequence zero.
207
208 The error probe point is similar to the end probe, except that each
209 such probe handler run when the session ends after errors have oc‐
210 curred. In such cases, "end" probes are skipped, but each "error"
211 probe is still attempted. This kind of probe can be used to clean up
212 or emit a "final gasp". It may also be numerically parametrized to set
213 a sequence.
214
215
216 NEVER
217 The probe point never is specially defined by the translator to mean
218 "never". Its probe handler is never run, though its statements are an‐
219 alyzed for symbol / type correctness as usual. This probe point may be
220 useful in conjunction with optional probes.
221
222
223 SYSCALL and ND_SYSCALL
224 The syscall.* and nd_syscall.* aliases define several hundred probes,
225 too many to detail here. They are of the general form:
226
227
228 syscall.NAME
229 nd_syscall.NAME
230 syscall.NAME.return
231 nd_syscall.NAME.return
232
233
234 Generally, a pair of probes are defined for each normal system call as
235 listed in the syscalls(2) manual page, one for entry and one for re‐
236 turn. Those system calls that never return do not have a corresponding
237 .return probe. The nd_* family of probes are about the same, except it
238 uses non-DWARF based searching mechanisms, which may result in a lower
239 quality of symbolic context data (parameters), and may miss some system
240 calls. You may want to try them first, in case kernel debugging infor‐
241 mation is not immediately available.
242
243 Each probe alias provides a variety of variables. Looking at the tapset
244 source code is the most reliable way. Generally, each variable listed
245 in the standard manual page is made available as a script-level vari‐
246 able, so syscall.open exposes filename, flags, and mode. In addition,
247 a standard suite of variables is available at most aliases:
248
249 argstr A pretty-printed form of the entire argument list, without
250 parentheses.
251
252 name The name of the system call.
253
254 retval For return probes, the raw numeric system-call result.
255
256 retstr For return probes, a pretty-printed string form of the system-
257 call result.
258
259 As usual for probe aliases, these variables are all initialized once
260 from the underlying $context variables, so that later changes to $con‐
261 text variables are not automatically reflected. Not all probe aliases
262 obey all of these general guidelines. Please report any bothersome
263 ones you encounter as a bug. Note that on some kernel/userspace archi‐
264 tecture combinations (e.g., 32-bit userspace on 64-bit kernel), the un‐
265 derlying $context variables may need explicit sign extension / masking.
266 When this is an issue, consider using the tapset-provided variables in‐
267 stead of raw $context variables.
268
269 If debuginfo availability is a problem, you may try using the non-DWARF
270 syscall probe aliases instead. Use the nd_syscall. prefix instead of
271 syscall. The same context variables are available, as far as possible.
272
273 nd_syscall probes on kernels that use syscall wrappers to pass argu‐
274 ments via pt_regs (currently 4.17+ on x86_64 and 4.19+ on aarch64) sup‐
275 port syscall argument writing when guru mode is enabled. If a probe
276 syscall parameter is modified in the probe body then immediately before
277 the probe exits the parameter's current value will be written to
278 pt_regs. This overwrites the previous value. nd_syscall probes also
279 include two parameters for each of the syscall's string parameters.
280 One holds a quoted version of the string passed to the syscall. The
281 other holds an unquoted version of the string intended to be used when
282 modifying the parameter. If the probe modifies the unquoted string
283 variable then as the probe is about to exit the contents of this vari‐
284 able will be written to the user space buffer passed to the syscall. It
285 is the user's responsibility to ensure that this buffer is large enough
286 to hold the modified string and that it is located in a writable memory
287 segment.
288
289
290 TIMERS
291 There are two main types of timer probes: "jiffies" timer probes and
292 time interval timer probes.
293
294 Intervals defined by the standard kernel "jiffies" timer may be used to
295 trigger probe handlers asynchronously. Two probe point variants are
296 supported by the translator:
297
298
299 timer.jiffies(N)
300 timer.jiffies(N).randomize(M)
301
302
303 The probe handler is run every N jiffies (a kernel-defined unit of
304 time, typically between 1 and 60 ms). If the "randomize" component is
305 given, a linearly distributed random value in the range [-M..+M] is
306 added to N every time the handler is run. N is restricted to a reason‐
307 able range (1 to around a million), and M is restricted to be smaller
308 than N. There are no target variables provided in either context. It
309 is possible for such probes to be run concurrently on a multi-processor
310 computer.
311
312 Alternatively, intervals may be specified in units of time. There are
313 two probe point variants similar to the jiffies timer:
314
315
316 timer.ms(N)
317 timer.ms(N).randomize(M)
318
319
320 Here, N and M are specified in milliseconds, but the full options for
321 units are seconds (s/sec), milliseconds (ms/msec), microseconds
322 (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
323 supported for hertz timers.
324
325 The actual resolution of the timers depends on the target kernel. For
326 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
327 intervals are rounded up to the nearest jiffies interval. After
328 2.6.17, the implementation uses hrtimers for tighter precision, though
329 the actual resolution will be arch-dependent. In either case, if the
330 "randomize" component is given, then the random value will be added to
331 the interval before any rounding occurs.
332
333 Profiling timers are also available to provide probes that execute on
334 all CPUs at the rate of the system tick (CONFIG_HZ) or at a given fre‐
335 quency (hz). On some kernels, this is a one-concurrent-user-only or
336 disabled facility, resulting in error -16 (EBUSY) during probe regis‐
337 tration.
338
339
340 timer.profile.tick
341 timer.profile.freq.hz(N)
342
343
344 Full context information of the interrupted process is available, mak‐
345 ing this probe suitable for a time-based sampling profiler.
346
347 It is recommended to use the tapset probe timer.profile rather than
348 timer.profile.tick. This probe point behaves identically to timer.pro‐
349 file.tick when the underlying functionality is available, and falls
350 back to using perf.sw.cpu_clock on some recent kernels which lack the
351 corresponding profile timer facility.
352
353 Profiling timers with specified frequencies are only accurate up to
354 around 100 hz. You may need to provide a larger value to achieve the
355 desired rate.
356
357 Note that if a timer probe is set to fire at a very high rate and if
358 the probe body is complex, succeeding timer probes can get skipped,
359 since the time for them to run has already passed. Normally systemtap
360 reports missed probes, but it will not report these skipped probes.
361
362
363 DWARF
364 This family of probe points uses symbolic debugging information for the
365 target kernel/module/program, as may be found in unstripped executa‐
366 bles, or the separate debuginfo packages. They allow placement of
367 probes logically into the execution path of the target program, by
368 specifying a set of points in the source or object code. When a match‐
369 ing statement executes on any processor, the probe handler is run in
370 that context.
371
372 Probe points in the DWARF family can be identified by the target kernel
373 module (or user process), source file, line number, function name, or
374 some combination of these.
375
376 Here is a list of DWARF probe points currently supported:
377
378 kernel.function(PATTERN)
379 kernel.function(PATTERN).call
380 kernel.function(PATTERN).callee(PATTERN)
381 kernel.function(PATTERN).callee(PATTERN).return
382 kernel.function(PATTERN).callee(PATTERN).call
383 kernel.function(PATTERN).callees(DEPTH)
384 kernel.function(PATTERN).return
385 kernel.function(PATTERN).inline
386 kernel.function(PATTERN).label(LPATTERN)
387 module(MPATTERN).function(PATTERN)
388 module(MPATTERN).function(PATTERN).call
389 module(MPATTERN).function(PATTERN).callee(PATTERN)
390 module(MPATTERN).function(PATTERN).callee(PATTERN).return
391 module(MPATTERN).function(PATTERN).callee(PATTERN).call
392 module(MPATTERN).function(PATTERN).callees(DEPTH)
393 module(MPATTERN).function(PATTERN).return
394 module(MPATTERN).function(PATTERN).inline
395 module(MPATTERN).function(PATTERN).label(LPATTERN)
396 kernel.statement(PATTERN)
397 kernel.statement(PATTERN).nearest
398 kernel.statement(ADDRESS).absolute
399 module(MPATTERN).statement(PATTERN)
400 process("PATH").function("NAME")
401 process("PATH").statement("*@FILE.c:123")
402 process("PATH").library("PATH").function("NAME")
403 process("PATH").library("PATH").statement("*@FILE.c:123")
404 process("PATH").library("PATH").statement("*@FILE.c:123").nearest
405 process("PATH").function("*").return
406 process("PATH").function("myfun").label("foo")
407 process("PATH").function("foo").callee("bar")
408 process("PATH").function("foo").callee("bar").return
409 process("PATH").function("foo").callee("bar").call
410 process("PATH").function("foo").callees(DEPTH)
411 process(PID).function("NAME")
412 process(PID).function("myfun").label("foo")
413 process(PID).plt("NAME")
414 process(PID).plt("NAME").return
415 process(PID).statement("*@FILE.c:123")
416 process(PID).statement("*@FILE.c:123").nearest
417 process(PID).statement(ADDRESS).absolute
418 debuginfod.process("PATH").**
419
420 (See the USER-SPACE section below for more information on the process
421 probes.)
422
423 The list above includes multiple variants and modifiers which provide
424 additional functionality or filters. They are:
425
426 .function
427 Places a probe near the beginning of the named function,
428 so that parameters are available as context variables.
429
430 .return
431 Places a probe at the moment after the return from the
432 named function, so the return value is available as the
433 "$return" context variable.
434
435 .inline
436 Filters the results to include only instances of inlined
437 functions. Note that inlined functions do not have an
438 identifiable return point, so .return is not supported on
439 .inline probes.
440
441 .call Filters the results to include only non-inlined functions
442 (the opposite set of .inline)
443
444 .exported
445 Filters the results to include only exported functions.
446
447 .statement
448 Places a probe at the exact spot, exposing those local
449 variables that are visible there.
450
451 .statement.nearest
452 Places a probe at the nearest available line number for
453 each line number given in the statement.
454
455 .callee
456 Places a probe on the callee function given in the
457 .callee modifier, where the callee must be a function
458 called by the target function given in .function. The ad‐
459 vantage of doing this over directly probing the callee
460 function is that this probe point is run only when the
461 callee is called from the target function (add the
462 -DSTAP_CALLEE_MATCHALL directive to override this when
463 calling stap(1)).
464
465 Note that only callees that can be statically determined
466 are available. For example, calls through function
467 pointers are not available. Additionally, calls to func‐
468 tions located in other objects (e.g. libraries) are not
469 available (instead use another probe point). This feature
470 will only work for code compiled with GCC 4.7+.
471
472 .callees
473 Shortcut for .callee("*"), which places a probe on all
474 callees of the function.
475
476 .callees(DEPTH)
477 Recursively places probes on callees. For example,
478 .callees(2) will probe both callees of the target func‐
479 tion, as well as callees of those callees. And
480 .callees(3) goes one level deeper, etc... A callee probe
481 at depth N is only triggered when the N callers in the
482 callstack match those that were statically determined
483 during analysis (this also may be overridden using
484 -DSTAP_CALLEE_MATCHALL).
485
486 In the above list of probe points, MPATTERN stands for a string literal
487 that aims to identify the loaded kernel module of interest. For in-tree
488 kernel modules, the name suffices (e.g. "btrfs"). The name may also in‐
489 clude the "*", "[]", and "?" wildcards to match multiple in-tree mod‐
490 ules. Out-of-tree modules are also supported by specifying the full
491 path to the ko file. Wildcards are not supported. The file must follow
492 the convention of being named <module_name>.ko (characters ',' and '-'
493 are replaced by '_').
494
495 LPATTERN stands for a source program label. It may also contain "*",
496 "[]", and "?" wildcards. PATTERN stands for a string literal that aims
497 to identify a point in the program. It is made up of three parts:
498
499 • The first part is the name of a function, as would appear in the nm
500 program's output. This part may use the "*" and "?" wildcarding
501 operators to match multiple names.
502
503 • The second part is optional and begins with the "@" character. It
504 is followed by the path to the source file containing the function,
505 which may include a wildcard pattern, such as mm/slab*. If it does
506 not match as is, an implicit "*/" is optionally added before the
507 pattern, so that a script need only name the last few components of
508 a possibly long source directory path.
509
510 • Finally, the third part is optional if the file name part was giv‐
511 en, and identifies the line number in the source file preceded by a
512 ":" or a "+". The line number is assumed to be an absolute line
513 number if preceded by a ":", or relative to the declaration line of
514 the function if preceded by a "+". All the lines in the function
515 can be matched with ":*". A range of lines x through y can be
516 matched with ":x-y". Ranges and specific lines can be mixed using
517 commas, e.g. ":x,y-z".
518
519 As an alternative, PATTERN may be a numeric constant, indicating an ad‐
520 dress. Such an address may be found from symbol tables of the appro‐
521 priate kernel / module object file. It is verified against known
522 statement code boundaries, and will be relocated for use at run time.
523
524 In guru mode only, absolute kernel-space addresses may be specified
525 with the ".absolute" suffix. Such an address is considered already re‐
526 located, as if it came from /proc/kallsyms, so it cannot be checked
527 against statement/instruction boundaries.
528
529
530 CONTEXT VARIABLES
531 Many of the source-level context variables, such as function parame‐
532 ters, locals, globals visible in the compilation unit, may be visible
533 to probe handlers. They may refer to these variables by prefixing
534 their name with "$" within the scripts. In addition, a special syntax
535 allows limited traversal of structures, pointers, and arrays. More
536 syntax allows pretty-printing of individual variables or their groups.
537 See also @cast. Note that variables may be inaccessible due to them
538 being paged out, or for a few other reasons. See also man er‐
539 ror::fault(7stap).
540
541
542 Functions called from DWARF class probe points and from process.mark
543 probes may also refer to context variables.
544
545
546 $var refers to an in-scope variable or thread local storage variable
547 "var". If it's an integer-like type, it will be cast to a
548 64-bit int for systemtap script use. String-like pointers (char
549 *) may be copied to systemtap string values using the ker‐
550 nel_string or user_string functions.
551
552 @var("varname")
553 an alternative syntax for $varname
554
555 @var("varname","module")
556 The global variable or global thread local storage variable in
557 scope of the given module already loaded into the current probed
558 process. Useful to get an exported variable in a shared library
559 loaded into the process being probed, or a global variable in a
560 process while a shared library probe is being executed. For us‐
561 er-space modules only. For example: @var("_r_debug","/lib/ld-
562 linux.so.2")
563
564 @var("varname@src/file.c")
565 refers to the global (either file local or external) variable
566 varname defined when the file src/file.c was compiled. The CU in
567 which the variable is resolved is the first CU in the module of
568 the probe point which matches the given file name at the end and
569 has the shortest file name path (e.g. given
570 @var("foo@bar/baz.c") and CUs with file name paths src/sub/mod‐
571 ule/bar/baz.c and src/bar/baz.c the second CU will be chosen to
572 resolve the (file) global variable foo
573
574
575 @var("varname@src/file.c","module")
576 The global variable in scope of the given CU, defined in the
577 given module, even if the variable is static (so the name is not
578 unique without the CU name).
579
580
581 $var->field traversal via a structure's or a pointer's field. This
582 generalized indirection operator may be repeated to follow more
583 levels. Note that the . operator is not used for plain struc‐
584 ture members, only -> for both purposes. (This is because "."
585 is reserved for string concatenation.) Also note that for direct
586 dereferencing of $var pointer {kernel,user}_{char,int,...}($var)
587 should be used. (Refer to stapfuncs(5) for more details.)
588
589 $return
590 is available in return probes only for functions that are de‐
591 clared with a return value, which can be determined using @de‐
592 fined($return).
593
594 $var[N]
595 indexes into an array. The index given with a literal number or
596 even an arbitrary numeric expression.
597
598 A number of operators exist for such basic context variable expres‐
599 sions:
600
601 $$vars expands to a character string that is equivalent to
602
603 sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
604 parm1, ..., parmN, var1, ..., varN)
605
606 for each variable in scope at the probe point. Some values may
607 be printed as =? if their run-time location cannot be found.
608
609 $$locals
610 expands to a subset of $$vars for only local variables.
611
612 $$parms
613 expands to a subset of $$vars for only function parameters.
614
615 $$return
616 is available in return probes only. It expands to a string that
617 is equivalent to sprintf("return=%x", $return) if the probed
618 function has a return value, or else an empty string.
619
620 & $EXPR
621 expands to the address of the given context variable expression,
622 if it is addressable.
623
624 @defined($EXPR)
625 expands to 1 or 0 iff the given context variable expression is
626 resolvable, for use in conditionals such as
627
628 @defined($foo->bar) ? $foo->bar : 0
629
630
631 @probewrite($VAR)
632 see the PROBES section of stap(1).
633
634 $EXPR$ expands to a string with all of $EXPR's members, equivalent to
635
636 sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
637 $EXPR->a, $EXPR->b)
638
639
640 $EXPR$$
641 expands to a string with all of $var's members and submembers,
642 equivalent to
643
644 sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
645 $EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
646
647
648 @errno expands to the last value the C library global variable errno
649 was set to.
650
651
652 MORE ON RETURN PROBES
653 For the kernel ".return" probes, only a certain fixed number of returns
654 may be outstanding. The default is a relatively small number, on the
655 order of a few times the number of physical CPUs. If many different
656 threads concurrently call the same blocking function, such as futex(2)
657 or read(2), this limit could be exceeded, and skipped "kretprobes"
658 would be reported by "stap -t". To work around this, specify a
659
660 probe FOO.return.maxactive(NNN)
661
662 suffix, with a large enough NNN to cover all expected concurrently
663 blocked threads. Alternately, use the
664
665 stap -DKRETACTIVE=NNNN
666
667 stap command line macro setting to override the default for all ".re‐
668 turn" probes.
669
670
671 For ".return" probes, context variables other than the "$return" may be
672 accessible, as a convenience for a script programmer wishing to access
673 function parameters. These values are snapshots taken at the time of
674 function entry. (Local variables within the function are not generally
675 accessible, since those variables did not exist in allocated/initial‐
676 ized form at the snapshot moment.) These entry-snapshot variables
677 should be accessed via @entry($var).
678
679 In addition, arbitrary entry-time expressions can also be saved for
680 ".return" probes using the @entry(expr) operator. For example, one can
681 compute the elapsed time of a function:
682
683 probe kernel.function("do_filp_open").return {
684 println( get_timeofday_us() - @entry(get_timeofday_us()) )
685 }
686
687
688
689 The following table summarizes how values related to a function parame‐
690 ter context variable, a pointer named addr, may be accessed from a .re‐
691 turn probe.
692
693 at-entry value past-exit value
694
695 $addr not available
696 $addr->x->y @cast(@entry($addr),"struct zz")->x->y
697 $addr[0] {kernel,user}_{char,int,...}(& $addr[0])
698
699
700
701 DWARFLESS
702 In absence of debugging information, entry & exit points of kernel &
703 module functions can be probed using the "kprobe" family of probes.
704 However, these do not permit looking up the arguments / local variables
705 of the function. Following constructs are supported :
706
707 kprobe.function(FUNCTION)
708 kprobe.function(FUNCTION).call
709 kprobe.function(FUNCTION).return
710 kprobe.module(NAME).function(FUNCTION)
711 kprobe.module(NAME).function(FUNCTION).call
712 kprobe.module(NAME).function(FUNCTION).return
713 kprobe.statement(ADDRESS).absolute
714
715
716 Probes of type function are recommended for kernel functions, whereas
717 probes of type module are recommended for probing functions of the
718 specified module. In case the absolute address of a kernel or module
719 function is known, statement probes can be utilized.
720
721 Note that FUNCTION and MODULE names must not contain wildcards, or the
722 probe will not be registered. Also, statement probes must be run under
723 guru-mode only.
724
725
726
727 USER-SPACE
728 Support for user-space probing is available for kernels that are con‐
729 figured with the utrace extensions, or have the uprobes facility in
730 linux 3.5. (Various kernel build configuration options need to be en‐
731 abled; systemtap will advise if these are missing.)
732
733
734 There are several forms. First, a non-symbolic probe point:
735
736 process(PID).statement(ADDRESS).absolute
737
738 is analogous to kernel.statement(ADDRESS).absolute in that both use raw
739 (unverified) virtual addresses and provide no $variables. The target
740 PID parameter must identify a running process, and ADDRESS should iden‐
741 tify a valid instruction address. All threads of that process will be
742 probed.
743
744 Second, non-symbolic user-kernel interface events handled by utrace may
745 be probed:
746
747 process(PID).begin
748 process("FULLPATH").begin
749 process.begin
750 process(PID).thread.begin
751 process("FULLPATH").thread.begin
752 process.thread.begin
753 process(PID).end
754 process("FULLPATH").end
755 process.end
756 process(PID).thread.end
757 process("FULLPATH").thread.end
758 process.thread.end
759 process(PID).syscall
760 process("FULLPATH").syscall
761 process.syscall
762 process(PID).syscall.return
763 process("FULLPATH").syscall.return
764 process.syscall.return
765
766
767
768 A process.begin probe gets called when new process described by PID or
769 FULLPATH gets created. In addition, it is called once from the context
770 of each preexisting process, at systemtap script startup. This is use‐
771 ful to track live processes. A process.thread.begin probe gets called
772 when a new thread described by PID or FULLPATH gets created. A
773 process.end probe gets called when process described by PID or FULLPATH
774 dies. A process.thread.end probe gets called when a thread described
775 by PID or FULLPATH dies. A process.syscall probe gets called when a
776 thread described by PID or FULLPATH makes a system call. The system
777 call number is available in the $syscall context variable, and the
778 first 6 arguments of the system call are available in the $argN (ex.
779 $arg1, $arg2, ...) context variable. A process.syscall.return probe
780 gets called when a thread described by PID or FULLPATH returns from a
781 system call. The system call number is available in the $syscall con‐
782 text variable, and the return value of the system call is available in
783 the $return context variable. A
784
785
786 If a process probe is specified without a PID or FULLPATH, all user
787 threads will be probed. However, if systemtap was invoked with the -c
788 or -x options, then process probes are restricted to the process hier‐
789 archy associated with the target process. If a process probe is un‐
790 specified (i.e. without a PID or FULLPATH), but with the -c option, the
791 PATH of the -c cmd will be heuristically filled into the process PATH.
792 In that case, only command parameters are allowed in the -c command
793 (i.e. no command substitution allowed and no occurrences of any of
794 these characters: '|&;<>(){}').
795
796
797 Third, symbolic static instrumentation compiled into programs and
798 shared libraries may be probed:
799
800 process("PATH").mark("LABEL")
801 process("PATH").provider("PROVIDER").mark("LABEL")
802 process(PID).mark("LABEL")
803 process(PID).provider("PROVIDER").mark("LABEL")
804
805
806 A .mark probe gets called via a static probe which is defined in the
807 application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros de‐
808 fined in sys/sdt.h. The PROVIDER is an arbitrary application identifi‐
809 er, LABEL is the marker site identifier, and arg1 is the integer-typed
810 argument. STAP_PROBE1 is used for probes with 1 argument, STAP_PROBE2
811 is used for probes with 2 arguments, and so on. The arguments of the
812 probe are available in the context variables $arg1, $arg2, ... An al‐
813 ternative to using the STAP_PROBE macros is to use the dtrace script to
814 create custom macros. Additionally, the variables $$name and
815 $$provider are available as parts of the probe point name. The
816 sys/sdt.h macro names DTRACE_PROBE* are available as aliases for
817 STAP_PROBE*.
818
819
820 Finally, full symbolic source-level probes in user-space programs and
821 shared libraries are supported. These are exactly analogous to the
822 symbolic DWARF-based kernel/module probes described above. They expose
823 the same sorts of context $variables for function parameters, local
824 variables, and so on.
825
826 process("PATH").function("NAME")
827 process("PATH").statement("*@FILE.c:123")
828 process("PATH").plt("NAME")
829 process("PATH").library("PATH").plt("NAME")
830 process("PATH").library("PATH").function("NAME")
831 process("PATH").library("PATH").statement("*@FILE.c:123")
832 process("PATH").function("*").return
833 process("PATH").function("myfun").label("foo")
834 process("PATH").function("foo").callee("bar")
835 process("PATH").plt("NAME").return
836 debuginfod.process("PATH").**
837 process(PID).function("NAME")
838 process(PID).statement("*@FILE.c:123")
839 process(PID).plt("NAME")
840
841
842
843 Note that for all process probes, PATH names refer to executables that
844 are searched the same way shells do: relative to the working directory
845 if they contain a "/" character, otherwise in $PATH. If PATH names re‐
846 fer to scripts, the actual interpreters (specified in the script in the
847 first line after the #! characters) are probed. In the debuginfod
848 probe family PATH names likewise refer to executables, but are searched
849 for in the currently defined $DEBUGINFOD_URLS.
850
851
852
853 Tapset process probes placed in the special directory $pre‐
854 fix/share/systemtap/tapset/PATH/ with relative paths will have their
855 process parameter prefixed with the location of the tapset. For exam‐
856 ple,
857
858
859 process("foo").function("NAME")
860
861
862 expands to
863
864 process("/usr/bin/foo").function("NAME")
865
866
867
868 when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
869
870
871 If PATH is a process component parameter referring to shared libraries
872 then all processes that map it at runtime would be selected for prob‐
873 ing. If PATH is a library component parameter referring to shared li‐
874 braries then the process specified by the process component would be
875 selected. Note that the PATH pattern in a library component will al‐
876 ways apply to libraries statically determined to be in use by the
877 process. However, you may also specify the full path to any library
878 file even if not statically needed by the process.
879
880
881 A .plt probe will probe functions in the program linkage table corre‐
882 sponding to the rest of the probe point. .plt can be specified as a
883 shorthand for .plt("*"). The symbol name is available as a $$name con‐
884 text variable; function arguments are not available, since PLTs are
885 processed without debuginfo. A .plt.return probe places a probe at the
886 moment after the return from the named function.
887
888
889 If the PATH string contains wildcards as in the MPATTERN case, then
890 standard globbing is performed to find all matching paths. In this
891 case, the $PATH environment variable is not used.
892
893
894 If systemtap was invoked with the -c or -x options, then process probes
895 are restricted to the process hierarchy associated with the target
896 process.
897
898
899 DEBUGINFOD
900 These probes take the form
901
902 debuginfod.process("PATH").**
903
904
905 They are very similar to the process("PATH").** probe family. The key
906 difference is that the process probes search for PATH in the host
907 filesystem, while debuginfod probes search the current federation of
908 debuginfod servers, using the currently defined $DEBUGINFOD_URLS (see
909 debuginfod(8) ).
910
911
912 In order to probe the contents of one or more elf/archive files and/or
913 elf/archive containing directories, the below will create a debuginfod
914 server which will scan and process the elf files within and prepare
915 them for systemtap.
916
917 $ debuginfod [options] [-F -R -Z etc.] /path1 /path2
918 $ env DEBUGINFOD_URLS=http://localhost:8002/ stap ...
919
920
921
922 JAVA
923 Support for probing Java methods is available using Byteman as a back‐
924 end. Byteman is an instrumentation tool from the JBoss project which
925 systemtap can use to monitor invocations for a specific method or line
926 in a Java program.
927
928 Systemtap does so by generating a Byteman script listing the probes to
929 instrument and then invoking the Byteman bminstall utility.
930
931 This Java instrumentation support is currently a prototype feature with
932 major limitations. Moreover, Java probing currently does not work
933 across users; the stap script must run (with appropriate permissions)
934 under the same user that the Java process being probed. (Thus a stap
935 script under root currently cannot probe Java methods in a non-root-us‐
936 er Java process.)
937
938
939 The first probe type refers to Java processes by the name of the Java
940 process:
941
942 java("PNAME").class("CLASSNAME").method("PATTERN")
943 java("PNAME").class("CLASSNAME").method("PATTERN").return
944
945 The PNAME argument must be a pre-existing jvm pid, and be identifiable
946 via a jps listing.
947
948 The PATTERN parameter specifies the signature of the Java method to
949 probe. The signature must consist of the exact name of the method, fol‐
950 lowed by a bracketed list of the types of the arguments, for instance
951 "myMethod(int,double,Foo)". Wildcards are not supported.
952
953 The probe can be set to trigger at a specific line within the method by
954 appending a line number with colon, just as in other types of probes:
955 "myMethod(int,double,Foo):245".
956
957 The CLASSNAME parameter identifies the Java class the method belongs
958 to, either with or without the package qualification. By default, the
959 probe only triggers on descendants of the class that do not override
960 the method definition of the original class. However, CLASSNAME can
961 take an optional caret prefix, as in ^org.my.MyClass, which specifies
962 that the probe should also trigger on all descendants of MyClass that
963 override the original method. For instance, every method with signature
964 foo(int) in program org.my.MyApp can be probed at once using
965
966 java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
967
968
969 The second probe type works analogously, but refers to Java processes
970 by PID:
971
972 java(PID).class("CLASSNAME").method("PATTERN")
973 java(PID).class("CLASSNAME").method("PATTERN").return
974
975 (PIDs for an already running process can be obtained using the jps(1)
976 utility.)
977
978 Context variables defined within java probes include $arg1 through
979 $arg10 (for up to the first 10 arguments of a method), represented as
980 character-pointers for the toString() form of each actual argument.
981 The arg1 through arg10 script variables provide access to these as or‐
982 dinary strings, fetched via user_string_warn().
983
984 Prior to systemtap version 3.1, $arg1 through $arg10 could contain ei‐
985 ther integers or character pointers, depending on the types of the ob‐
986 jects being passed to each particular java method. This previous be‐
987 haviour may be invoked with the stap --compatible=3.0 flag.
988
989
990 PROCFS
991 These probe points allow procfs "files" in /proc/systemtap/MODNAME to
992 be created, read and written using a permission that may be modified
993 using the proper umask value. Default permissions are 0400 for read
994 probes, and 0200 for write probes. If both a read and write probe are
995 being used on the same file, a default permission of 0600 will be used.
996 Using procfs.umask(0040).read would result in a 0404 permission set for
997 the file. (MODNAME is the name of the systemtap module). The proc
998 filesystem is a pseudo-filesystem which is used as an interface to ker‐
999 nel data structures. There are several probe point variants supported
1000 by the translator:
1001
1002
1003 procfs("PATH").read
1004 procfs("PATH").umask(UMASK).read
1005 procfs("PATH").read.maxsize(MAXSIZE)
1006 procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
1007 procfs("PATH").write
1008 procfs("PATH").umask(UMASK).write
1009 procfs.read
1010 procfs.umask(UMASK).read
1011 procfs.read.maxsize(MAXSIZE)
1012 procfs.umask(UMASK).read.maxsize(MAXSIZE)
1013 procfs.write
1014 procfs.umask(UMASK).write
1015
1016
1017 Note that there are a few differences when procfs probes are used in
1018 the stapbpf runtime. FIFO special files are used instead of proc
1019 filesystem files. These files are created in /var/tmp/systemtap-US‐
1020 ER/MODNAME. (USER is the name of the user). Additionally, users can‐
1021 not create both read and write probes on the same file.
1022
1023 PATH is the file name (relative to /proc/systemtap/MODNAME or
1024 /var/tmp/systemtap-USER/MODNAME) to be created. If no PATH is speci‐
1025 fied (as in the last two variants above), PATH defaults to "command".
1026 The file name "__stdin" is used internally by systemtap for input
1027 probes and should not be used as a PATH for procfs probes; see the in‐
1028 put probe section below.
1029
1030 When a user reads /proc/systemtap/MODNAME/PATH (normal runtime) or
1031 /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the corresponding
1032 procfs read probe is triggered. The string data to be read should be
1033 assigned to a variable named $value, like this:
1034
1035
1036 procfs("PATH").read { $value = "100\n" }
1037
1038
1039 When a user writes into /proc/systemtap/MODNAME/PATH (normal runtime)
1040 or /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the corresponding
1041 procfs write probe is triggered. The data the user wrote is available
1042 in the string variable named $value, like this:
1043
1044
1045 procfs("PATH").write { printf("user wrote: %s", $value) }
1046
1047
1048 MAXSIZE is the size of the procfs read buffer. Specifying MAXSIZE al‐
1049 lows larger procfs output. If no MAXSIZE is specified, the procfs read
1050 buffer defaults to STP_PROCFS_BUFSIZE (which defaults to MAXSTRINGLEN,
1051 the maximum length of a string). If setting the procfs read buffers
1052 for more than one file is needed, it may be easiest to override the
1053 STP_PROCFS_BUFSIZE definition. Here's an example of using MAXSIZE:
1054
1055
1056 procfs.read.maxsize(1024) {
1057 $value = "long string..."
1058 $value .= "another long string..."
1059 $value .= "another long string..."
1060 $value .= "another long string..."
1061 }
1062
1063
1064
1065 INPUT
1066 These probe points make input from stdin available to the script during
1067 runtime. The translator currently supports two variants of this fami‐
1068 ly:
1069
1070 input.char
1071 input.line
1072
1073
1074 input.char is triggered each time a character is read from stdin. The
1075 current character is available in the string variable named char.
1076 There is no newline buffering; the next character is read from stdin as
1077 soon as it becomes available.
1078
1079 input.line causes all characters read from stdin to be buffered until a
1080 newline is read, at which point the probe will be triggered. The cur‐
1081 rent line of characters (including the newline) is made available in a
1082 string variable named line. Note that no more than MAXSTRINGLEN char‐
1083 acters will be buffered. Any additional characters will not be included
1084 in line.
1085
1086
1087 Input probes are aliases for procfs("__stdin").write. Systemtap recon‐
1088 figures stdin if the presence of this procfs probe is detected, there‐
1089 fore "__stdin" should not be used as a path argument for procfs probes.
1090 Additionally, input probes will not work with the -F and --remote op‐
1091 tions.
1092
1093
1094 NETFILTER HOOKS
1095 These probe points allow observation of network packets using the net‐
1096 filter mechanism. A netfilter probe in systemtap corresponds to a net‐
1097 filter hook function in the original netfilter probes API. It is proba‐
1098 bly more convenient to use tapset::netfilter(3stap), which wraps the
1099 primitive netfilter hooks and does the work of extracting useful infor‐
1100 mation from the context variables.
1101
1102
1103 There are several probe point variants supported by the translator:
1104
1105
1106 netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
1107 netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
1108 netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
1109 netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
1110
1111
1112
1113 PROTOCOL_F is the protocol family to listen for, currently one of NF‐
1114 PROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
1115
1116
1117 HOOKNAME is the point, or 'hook', in the protocol stack at which to in‐
1118 tercept the packet. The available hook names for each protocol family
1119 are taken from the kernel header files <linux/netfilter_ipv4.h>, <lin‐
1120 ux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and <linux/netfil‐
1121 ter_bridge.h>. For instance, allowable hook names for NFPROTO_IPV4 are
1122 NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_LO‐
1123 CAL_OUT, and NF_INET_POST_ROUTING.
1124
1125
1126 PRIORITY is an integer priority giving the order in which the probe
1127 point should be triggered relative to any other netfilter hook func‐
1128 tions which trigger on the same packet. Hook functions execute on each
1129 packet in order from smallest priority number to largest priority num‐
1130 ber. If no PRIORITY is specified (as in the first two probe point vari‐
1131 ants above), PRIORITY defaults to "0".
1132
1133 There are a number of predefined priority names of the form NF_IP_PRI_*
1134 and NF_IP6_PRI_* which are defined in the kernel header files <lin‐
1135 ux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The
1136 script is permitted to use these instead of specifying an integer pri‐
1137 ority. (The probe points for NFPROTO_ARP and NFPROTO_BRIDGE currently
1138 do not expose any named hook priorities to the script writer.) Thus,
1139 allowable ways to specify the priority include:
1140
1141
1142 priority("255")
1143 priority("NF_IP_PRI_SELINUX_LAST")
1144
1145
1146 A script using guru mode is permitted to specify any identifier or num‐
1147 ber as the parameter for hook, pf, and priority. This feature should be
1148 used with caution, as the parameter is inserted verbatim into the C
1149 code generated by systemtap.
1150
1151 The netfilter probe points define the following context variables:
1152
1153 $hooknum
1154 The hook number.
1155
1156 $skb The address of the sk_buff struct representing the packet. See
1157 <linux/skbuff.h> for details on how to use this struct, or al‐
1158 ternatively use the tapset tapset::netfilter(3stap) for easy ac‐
1159 cess to key information.
1160
1161
1162 $in The address of the net_device struct representing the network
1163 device on which the packet was received (if any). May be 0 if
1164 the device is unknown or undefined at that stage in the protocol
1165 stack.
1166
1167
1168 $out The address of the net_device struct representing the network
1169 device on which the packet will be sent (if any). May be 0 if
1170 the device is unknown or undefined at that stage in the protocol
1171 stack.
1172
1173
1174 $verdict
1175 (Guru mode only.) Assigning one of the verdict values defined in
1176 <linux/netfilter.h> to this variable alters the further progress
1177 of the packet through the protocol stack. For instance, the fol‐
1178 lowing guru mode script forces all ipv6 network packets to be
1179 dropped:
1180
1181
1182 probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
1183 $verdict = 0 /* nf_drop */
1184 }
1185
1186
1187 For convenience, unlike the primitive probe points discussed
1188 here, the probes defined in tapset::netfilter(3stap) export the
1189 lowercase names of the verdict constants (e.g. NF_DROP becomes
1190 nf_drop) as local variables.
1191
1192
1193 KERNEL TRACEPOINTS
1194 This family of probe points hooks up to static probing tracepoints in‐
1195 serted into the kernel or modules. As with markers, these tracepoints
1196 are special macro calls inserted by kernel developers to make probing
1197 faster and more reliable than with DWARF-based probes, and DWARF debug‐
1198 ging information is not required to probe tracepoints. Tracepoints
1199 have an extra advantage of more strongly-typed parameters than markers.
1200
1201 Tracepoint probes look like: kernel.trace("name"). The tracepoint name
1202 string, which may contain the usual wildcard characters, is matched
1203 against the names defined by the kernel developers in the tracepoint
1204 header files. To restrict the search to specific subsystems (e.g.
1205 sched, ext3, etc...), the following syntax can be used: ker‐
1206 nel.trace("system:name"). The tracepoint system string may also con‐
1207 tain the usual wildcard characters.
1208
1209 The handler associated with a tracepoint-based probe may read the op‐
1210 tional parameters specified at the macro call site. These are named
1211 according to the declaration by the tracepoint author. For example,
1212 the tracepoint probe kernel.trace("sched:sched_switch") provides the
1213 parameters $prev and $next. If the parameter is a complex type, as in
1214 a struct pointer, then a script can access fields with the same syntax
1215 as DWARF $target variables. Also, tracepoint parameters cannot be mod‐
1216 ified, but in guru-mode a script may modify fields of parameters.
1217
1218 The subsystem and name of the tracepoint are available in $$system and
1219 $$name and a string of name=value pairs for all parameters of the tra‐
1220 cepoint is available in $$vars or $$parms.
1221
1222
1223 KERNEL MARKERS (OBSOLETE)
1224 This family of probe points hooks up to an older style of static prob‐
1225 ing markers inserted into older kernels or modules. These markers are
1226 special STAP_MARK macro calls inserted by kernel developers to make
1227 probing faster and more reliable than with DWARF-based probes. Fur‐
1228 ther, DWARF debugging information is not required to probe markers.
1229
1230 Marker probe points begin with kernel. The next part names the marker
1231 itself: mark("name"). The marker name string, which may contain the
1232 usual wildcard characters, is matched against the names given to the
1233 marker macros when the kernel and/or module was compiled. Optional‐
1234 ly, you can specify format("format"). Specifying the marker format
1235 string allows differentiation between two markers with the same name
1236 but different marker format strings.
1237
1238 The handler associated with a marker-based probe may read the optional
1239 parameters specified at the macro call site. These are named $arg1
1240 through $argNN, where NN is the number of parameters supplied by the
1241 macro. Number and string parameters are passed in a type-safe manner.
1242
1243 The marker format string associated with a marker is available in $for‐
1244 mat. And also the marker name string is available in $name.
1245
1246
1247 HARDWARE BREAKPOINTS
1248 This family of probes is used to set hardware watchpoints for a given
1249 (global) kernel symbol. The probes take three components as inputs :
1250
1251 1. The virtual address / name of the kernel symbol to be traced is sup‐
1252 plied as argument to this class of probes. ( Probes for only data seg‐
1253 ment variables are supported. Probing local variables of a function
1254 cannot be done.)
1255
1256 2. Nature of access to be probed : a. .write probe gets triggered when
1257 a write happens at the specified address/symbol name. b. rw probe is
1258 triggered when either a read or write happens.
1259
1260 3. .length (optional) Users have the option of specifying the address
1261 interval to be probed using "length" constructs. The user-specified
1262 length gets approximated to the closest possible address length that
1263 the architecture can support. If the specified length exceeds the lim‐
1264 its imposed by architecture, an error message is flagged and probe reg‐
1265 istration fails. Wherever 'length' is not specified, the translator
1266 requests a hardware breakpoint probe of length 1. It should be noted
1267 that the "length" construct is not valid with symbol names.
1268
1269 Following constructs are supported :
1270
1271 probe kernel.data(ADDRESS).write
1272 probe kernel.data(ADDRESS).rw
1273 probe kernel.data(ADDRESS).length(LEN).write
1274 probe kernel.data(ADDRESS).length(LEN).rw
1275 probe kernel.data("SYMBOL_NAME").write
1276 probe kernel.data("SYMBOL_NAME").rw
1277
1278
1279 This set of probes make use of the debug registers of the processor,
1280 which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
1281 translation flags a warning if a user requests more hardware breakpoint
1282 probes than the limits set by architecture. For example,a pass-2 warn‐
1283 ing is flashed when an input script requests 5 hardware breakpoint
1284 probes on an x86 system while x86 architecture supports a maximum of 4
1285 breakpoints. Users are cautioned to set probes judiciously.
1286
1287
1288 PERF
1289 This family of probe points interfaces to the kernel "perf event" in‐
1290 frastructure for controlling hardware performance counters. The events
1291 being attached to are described by the "type", "config" fields of the
1292 perf_event_attr structure, and are sampled at an interval governed by
1293 the "sample_period" and "sample_freq" fields.
1294
1295 These fields are made available to systemtap scripts using the follow‐
1296 ing syntax:
1297
1298 probe perf.type(NN).config(MM).sample(XX)
1299 probe perf.type(NN).config(MM).hz(XX)
1300 probe perf.type(NN).config(MM)
1301 probe perf.type(NN).config(MM).process("PROC")
1302 probe perf.type(NN).config(MM).counter("COUNTER")
1303 probe perf.type(NN).config(MM).process("PROC").counter("NAME")
1304
1305 The systemtap probe handler is called once per XX increments of the un‐
1306 derlying performance counter when using the .sample field or at a fre‐
1307 quency in hertz when using the .hz field. When not specified, the de‐
1308 fault behavior is to sample at a count of 1000000. The range of valid
1309 type/config is described by the perf_event_open(2) system call, and/or
1310 the linux/perf_event.h file. Invalid combinations or exhausted hard‐
1311 ware counter resources result in errors during systemtap script start‐
1312 up. Systemtap does not sanity-check the values: it merely passes them
1313 through to the kernel for error- and safety-checking. By default the
1314 perf event probe is systemwide unless .process is specified, which will
1315 bind the probe to a specific task. If the name is omitted then it is
1316 inferred from the stap -c argument. A perf event can be read on de‐
1317 mand using .counter. The body of the perf probe handler will not be
1318 invoked for a .counter probe; instead, the counter is read in a user
1319 space probe via:
1320
1321 process("PROC").statement("func@file") {stat <<< @perf("NAME")}
1322
1323
1324
1325 PYTHON
1326 Support for probing python 2 and python 3 function is available with
1327 the help of an extra python support module. Note that the debuginfo for
1328 the version of python being probed is required. To run a python script
1329 with the extra python support module you'd add the '-m HelperSDT' op‐
1330 tion to your python command, like this:
1331
1332 stap foo.stp -c "python -m HelperSDT foo.py"
1333
1334 Python probes look like the following:
1335
1336 python2.module("MPATTERN").function("PATTERN")
1337 python2.module("MPATTERN").function("PATTERN").call
1338 python2.module("MPATTERN").function("PATTERN").return
1339 python3.module("MPATTERN").function("PATTERN")
1340 python3.module("MPATTERN").function("PATTERN").call
1341 python3.module("MPATTERN").function("PATTERN").return
1342
1343 The list above includes multiple variants and modifiers which provide
1344 additional functionality or filters. They are:
1345
1346 .function
1347 Places a probe at the beginning of the named function by
1348 default, unless modified by PATTERN. Parameters are
1349 available as context variables.
1350
1351 .call Places a probe at the beginning of the named function.
1352 Parameters are available as context variables.
1353
1354 .return
1355 Places a probe at the moment before the return from the
1356 named function. Parameters and local/global python vari‐
1357 ables are available as context variables.
1358
1359 PATTERN stands for a string literal that aims to identify a point in
1360 the python program. It is made up of three parts:
1361
1362 • The first part is the name of a function (e.g. "foo") or class
1363 method (e.g. "bar.baz"). This part may use the "*" and "?" wild‐
1364 carding operators to match multiple names.
1365
1366 • The second part is optional and begins with the "@" character. It
1367 is followed by the path to the source file containing the function,
1368 which may include a wildcard pattern. The python path is searched
1369 for a matching filename.
1370
1371 • Finally, the third part is optional if the file name part was giv‐
1372 en, and identifies the line number in the source file preceded by a
1373 ":" or a "+". The line number is assumed to be an absolute line
1374 number if preceded by a ":", or relative to the declaration line of
1375 the function if preceded by a "+". All the lines in the function
1376 can be matched with ":*". A range of lines x through y can be
1377 matched with ":x-y". Ranges and specific lines can be mixed using
1378 commas, e.g. ":x,y-z".
1379
1380 In the above list of probe points, MPATTERN stands for a python module
1381 or script name that names the python module of interest. This part may
1382 use the "*" and "?" wildcarding operators to match multiple names. The
1383 python path is searched for a matching filename.
1384
1385
1386
1388 Here are some example probe points, defining the associated events.
1389
1390 begin, end, end
1391 refers to the startup and normal shutdown of the session. In
1392 this case, the handler would run once during startup and twice
1393 during shutdown.
1394
1395 timer.jiffies(1000).randomize(200)
1396 refers to a periodic interrupt, every 1000 +/- 200 jiffies.
1397
1398 kernel.function("*init*"), kernel.function("*exit*")
1399 refers to all kernel functions with "init" or "exit" in the
1400 name.
1401
1402 kernel.function("*@kernel/time.c:240")
1403 refers to any functions within the "kernel/time.c" file that
1404 span line 240. Note that this is not a probe at the statement
1405 at that line number. Use the kernel.statement probe instead.
1406
1407 kernel.trace("sched_*")
1408 refers to all scheduler-related (really, prefixed) tracepoints
1409 in the kernel.
1410
1411 kernel.mark("getuid")
1412 refers to an obsolete STAP_MARK(getuid, ...) macro call in the
1413 kernel.
1414
1415 module("usb*").function("*sync*").return
1416 refers to the moment of return from all functions with "sync" in
1417 the name in any of the USB drivers.
1418
1419 kernel.statement(0xc0044852)
1420 refers to the first byte of the statement whose compiled in‐
1421 structions include the given address in the kernel.
1422
1423 kernel.statement("*@kernel/time.c:296")
1424 refers to the statement of line 296 within "kernel/time.c".
1425
1426 kernel.statement("bio_init@fs/bio.c+3")
1427 refers to the statement at line bio_init+3 within "fs/bio.c".
1428
1429 kernel.data("pid_max").write
1430 refers to a hardware breakpoint of type "write" set on pid_max
1431
1432 syscall.*.return
1433 refers to the group of probe aliases with any name in the third
1434 position
1435
1436
1438 stap(1),
1439 probe::*[24m(3stap),
1440 tapset::*[24m(3stap)
1441
1442
1443
1444
1445 STAPPROBES(3stap)