1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a Dyninst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use.
46 Some systemtap diagnostic warning/error messages specially suggest
47 reading a man page by including a string like [man error::pass5]. For
48 online documentation, see the project web site,
49 https://sourceware.org/systemtap/
50
51
52 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
53 │man pages │ │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stap (this page) │ language syntax, concepts, operation, options │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │error::* │ further explanation of error conditions │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │warning::* │ further explanation of warning conditions │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stapprobes │ probe points and their $context variables │
62 ├──────────────────────────┼──────────────────────────────────────────────────────┤
63 │stapref │ quick reference to language syntax │
64 ├──────────────────────────┼──────────────────────────────────────────────────────┤
65 │stappaths │ list of directories, including books & references │
66 ├──────────────────────────┼──────────────────────────────────────────────────────┤
67 │stap-prep │ program to install auxiliary dependencies like ker‐ │
68 │ │ nel debuginfo │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │tapset::* │ generated list of tapsets │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │probe::* │ generated list of tapset probe aliases │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │function::* │ generated list of tapset functions │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │macro::* │ generated list of tapset macros │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stapvars │ some of the tapset global variables │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │systemtap │ initscript, boot-time probing │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │stap-server │ compilation server │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │stapex │ a few very basic script examples │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │books │ │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Beginner's Guide │ tutorial book, language essentials, examples │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │Tutorial │ shorter tutorial, exercises │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │Language Reference │ detailed language manual, covers statistics/analysis │
95 ├──────────────────────────┼──────────────────────────────────────────────────────┤
96 │Tapset Reference │ the tapset man pages, reformatted into a book │
97 ├──────────────────────────┼──────────────────────────────────────────────────────┤
98 │references │ │
99 ├──────────────────────────┼──────────────────────────────────────────────────────┤
100 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
101 │ │ hacks to learn from │
102 └──────────────────────────┴──────────────────────────────────────────────────────┘
103
105 The systemtap translator supports the following options. Any other op‐
106 tion prints a list of supported options. Options may be given on the
107 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
108 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
109 faults to $HOME/.systemtap if unset.)
110
111
112 In some cases, the default value of an option depends on particular
113 system configuration and thus can't be mentioned here directly. In
114 some of those cases running "stap --help" might display the default.
115
116
117 - Use standard input instead of a given FILENAME as probe language
118 input, unless -e SCRIPT is given.
119
120 -h --help
121 Show help message.
122
123 -V --version
124 Show version message.
125
126 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
127 rate, translate, compile, run. See the PROCESSING section for
128 details.
129
130 -v Increase verbosity for all passes. Produce a larger volume of
131 informative (?) output each time option repeated.
132
133 --vp ABCDE
134 Increase verbosity on a per-pass basis. For example, "--vp 002"
135 adds 2 units of verbosity to pass 3 only. The combination
136 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
137 more for pass 5.
138
139 -k Keep the temporary directory after all processing. This may be
140 useful in order to examine the generated C code, or to reuse the
141 compiled kernel object.
142
143 -g Guru mode. Enable parsing of unsafe expert-level constructs
144 like embedded C.
145
146 -P Prologue-searching mode. This is equivalent to --pro‐
147 logue-searching=always. Activate heuristics to work around in‐
148 correct debugging information for function parameter $context
149 variables.
150
151 -u Unoptimized mode. Disable unused code elision and many other
152 optimizations during elaboration / translation.
153
154 -w Suppressed warnings mode. Disables all warning messages.
155
156 -W Treat all warnings as errors.
157
158 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
159 Use the stap-merge program to multiplex them back together lat‐
160 er.
161
162 -i --interactive
163 Interactive mode. Enable an interface to build the systemtap
164 script incrementally and interactively.
165
166 -t Collect timing information on the number of times probe executes
167 and average amount of time spent in each probe-point. Also shows
168 the derivation for each probe-point.
169
170 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer. On a
171 multiprocessor in bulk mode, this is a per-processor amount.
172
173 -I DIR Add the given directory to the tapset search directory. See the
174 description of pass 2 for details.
175
176 -D NAME=VALUE
177 Add the given C preprocessor directive to the module Makefile.
178 These can be used to override limit parameters described below.
179
180 -B NAME=VALUE
181 In kernel-runtime mode, add the given make directive to the ker‐
182 nel module build's make invocation. These can be used to add or
183 override kconfig options. For example, use
184
185 -B CONFIG_DEBUG_INFO=y
186
187 to add debugging information.
188
189 -B FLAG
190 In dyninst-runtime mode, add the given parameter to the compiler
191 CFLAGS used for building the dyninst shared library. For exam‐
192 ple, use
193
194 -B -g
195
196 to add debugging information.
197
198 -a ARCH
199 Use a cross-compilation mode for the given target architecture.
200 This requires access to the cross-compiler and the kernel build
201 tree, and goes along with the
202
203 -B CROSS_COMPILE=arch-tool-prefix-
204 and
205 -r /build/tree
206
207 options.
208
209 --modinfo NAME=VALUE
210 Add the name/value pair as a MODULE_INFO macro call to the gen‐
211 erated module. This may be useful to inform or override various
212 module-related checks in the kernel.
213
214 -G NAME=VALUE
215 Sets the value of global variable NAME to VALUE when staprun is
216 invoked. This applies to scalar variables declared global in
217 the script/tapset.
218
219 -R DIR Look for the systemtap runtime sources in the given directory.
220 Your DIR default can be seen using "stap --help".
221
222 -r /DIR
223 Build for kernel in given build tree. Can also be set with the
224 SYSTEMTAP_RELEASE environment variable.
225
226 -r RELEASE
227 Build for kernel in build tree /lib/modules/RELEASE/build. Can
228 also be set with the SYSTEMTAP_RELEASE environment variable.
229
230 -m MODULE
231 Use the given name for the generated kernel object module, in‐
232 stead of a unique randomized name. The generated kernel object
233 module is copied to the current directory.
234
235 -d MODULE
236 Add symbol/unwind information for the given module into the ker‐
237 nel object module. This may enable symbolic tracebacks from
238 those modules/programs, even if they do not have an explicit
239 probe placed into them.
240
241 --ldd Add symbol/unwind information for all user-space shared li‐
242 braries suspected by ldd to be necessary for user-space binaries
243 being probed or listed with the -d option. Caution: this can
244 make the probe modules considerably larger. Note that this op‐
245 tion does not deal with kernel-space modules: see instead
246 --all-modules below.
247
248 --all-modules
249 Equivalent to specifying "-dkernel" and a "-d" for each kernel
250 module that is currently loaded. Caution: this can make the
251 probe modules considerably larger.
252
253 -o FILE
254 Send standard output to named file. In bulk mode, percpu files
255 will start with FILE_ (FILE_cpu with -F) followed by the cpu
256 number. This supports strftime(3) formats for FILE.
257
258 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
259 so has the effect of setting target() to the pid of the command
260 ran.
261
262 -x PID Sets target() to PID. This allows scripts to be written that
263 filter on a specific process. Scripts run independent of the
264 PID's lifespan.
265
266 -e SCRIPT
267 Run the given SCRIPT specified on the command line.
268
269 -E SCRIPT
270 Run the given SCRIPT specified. This SCRIPT is run in addition
271 to the main script specified, through -e, or as a script file.
272 This option can be repeated to run multiple scripts, and can be
273 used in listing mode (-l/-L).
274
275 -l PROBE
276 Instead of running a probe script, just list all available probe
277 points matching the given single probe point. The pattern may
278 include wildcards and aliases, but not comma-separated multiple
279 probe points. The process result code will indicate failure if
280 there are no matches.
281
282 % stap -e 'probe syscall.* { }'
283 [...]
284 % stap -l 'syscall.*'
285 syscall.accept
286 [...]
287 syscall.writev
288
289
290 -L PROBE
291 Similar to "-l", but list matching probe points plus their
292 available context variables. When -v is set with -L, the output
293 includes duplicate probe points which are distinguished by their
294 PC address.
295
296 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
297 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
298 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
299 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
300 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
301 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
302 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
303 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
304
305
306 -F Without -o option, load module and start probes, then detach
307 from the module leaving the probes running. With -o option, run
308 staprun in background as a daemon and show its pid.
309
310 -S size[,N]
311 Sets the maximum size of output file and the maximum number of
312 output files. If the size of output file will exceed size ,
313 systemtap switches output file to the next file. And if the num‐
314 ber of output files exceed N , systemtap removes the oldest out‐
315 put file. You can omit the second argument.
316
317 -T TIMEOUT
318 Exit the script after TIMEOUT seconds.
319
320 --skip-badvars
321 Ignore unresolvable or run-time-inaccessible context variables
322 and substitute with 0, without errors.
323
324
325 --prologue-searching[=WHEN]
326 Prologue-searching mode. Activate heuristics to work around in‐
327 correct debugging information for function parameter $context
328 variables. WHEN can be either "never", "always", or "auto" (i.e.
329 enabled by heuristic). If WHEN is missing, then "always" is as‐
330 sumed. If the option is missing, then "auto" is assumed.
331
332
333 --suppress-handler-errors
334 Wrap all probe handlers into something like this
335
336 try { ... } catch { next }
337
338 block, which causes any runtime errors to be quietly suppressed.
339 Suppressed errors do not count against MAXERRORS limits. In
340 this mode, the MAXSKIPPED limits are also suppressed, so that
341 many errors and skipped probes may be accumulated during a
342 script's runtime. Any overall counts will still be reported at
343 shutdown.
344
345
346 --compatible VERSION
347 Suppress recent script language or tapset changes which are in‐
348 compatible with given older version of systemtap. This may be
349 useful if a much older systemtap script fails to run. See the
350 DEPRECATION section for more details.
351
352
353 --check-version
354 This option is used to check if the active script has any con‐
355 structs that may be systemtap version specific. See the DEPRE‐
356 CATION section for more details.
357
358
359 --clean-cache
360 This option prunes stale entries from the cache directory. This
361 is normally done automatically after successful runs, but this
362 option will trigger the cleanup manually and then exit. See the
363 CACHING section for more details about cache limits.
364
365
366 --color[=WHEN], --colour[=WHEN]
367 This option controls coloring of error messages. WHEN can be ei‐
368 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
369 minal). If WHEN is missing, then "always" is assumed. If the op‐
370 tion is missing, then "auto" is assumed.
371
372 Colors can be modified using the SYSTEMTAP_COLORS environment
373 variable. The format must be of the form
374 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
375 "warning", "source", "caret", and "token". Values constitute
376 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
377 mentation of your terminal for the SGRs it supports. As an exam‐
378 ple, the default colors would be expressed as
379 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
380 SYSTEMTAP_COLORS is absent, the default colors will be used. If
381 it is empty or invalid, coloring is turned off.
382
383
384 --disable-cache
385 This option disables all use of the cache directory. No files
386 will be either read from or written to the cache.
387
388
389 --poison-cache
390 This option treats files in the cache directory as invalid. No
391 files will be read from the cache, but resulting files from this
392 run will still be written to the cache. This is meant as a
393 troubleshooting aid when stap's cached behavior seems to be mis‐
394 behaving. If it helped, there is a probably a bug in systemtap
395 that the developers would like you to report.
396
397
398 --privilege[=stapusr | =stapsys | =stapdev]
399 This option instructs stap to examine the script looking for
400 constructs which are not allowed for the specified privilege
401 level (see UNPRIVILEGED USERS). Compilation fails if any such
402 constructs are used. If stapusr or stapsys are specified when
403 using a compile server (see --use-server), the server will exam‐
404 ine the script and, if compilation succeeds, the server will
405 cryptographically sign the resulting kernel module, certifying
406 that is it safe for use by users at the specified privilege lev‐
407 el.
408
409 If --privilege has not been specified, -pN has not been speci‐
410 fied with N < 5, and the invoking user is not root, and is not a
411 member of the group stapdev, then stap will automatically add
412 the appropriate --privilege option to the options already speci‐
413 fied.
414
415
416 --unprivileged
417 This option is equivalent to --privilege=stapusr.
418
419
420 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
421 Specify compile-server(s) to be used for compilation and/or in
422 conjunction with --list-servers and --trust-servers (see below)
423 for listing. If no argument is supplied, then the default in un‐
424 privileged mode (see --privilege) is to select compatible
425 servers which are trusted as SSL peers and as module signers and
426 currently online. Otherwise the default is to select compatible
427 servers which are trusted as SSL peers and currently online.
428 --use-server may be specified more than once, in which case a
429 list of servers is accumulated in the order specified. Servers
430 may be specified by host name, ip address, or by certificate se‐
431 rial number (obtained using --list-servers). The latter is most
432 commonly used when adding or revoking trust in a server (see
433 --trust-servers below). If a server is specified by host name or
434 ip address, then an optional port number may be specified. This
435 is useful for accessing servers which are not on the local net‐
436 work or to specify a particular server.
437
438 IP addresses may be IPv4 or IPv6 addresses.
439
440 If a particular IPv6 address is link local and exists on more
441 than one interface, the intended interface may be specified by
442 appending the address with a percent sign (%) followed by the
443 intended interface name. For example,
444 "fe80::5eff:35ff:fe07:55ca%eth0".
445
446 In order to specify a port number with an IPv6 address, it is
447 necessary to enclose the IPv6 address in square brackets ([]) in
448 order to separate the port number from the rest of the address.
449 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
450 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
451
452 If --use-server has not been specified, -pN has not been speci‐
453 fied with N < 5, and the invoking user not root, is not a member
454 of the group stapdev, but is a member of the group stapusr, then
455 stap will automatically add --use-server to the options already
456 specified.
457
458
459 --use-server-on-error[=yes|=no]
460 Instructs stap to retry compilation of a script using a compile
461 server if compilation on the local host fails in a manner which
462 suggests that it might succeed using a server. If this option
463 is not specified, the default is no. If no argument is provid‐
464 ed, then the default is yes. Compilation will be retried for
465 certain types of errors (e.g. insufficient data or resources)
466 which may not occur during re-compilation by a compile server.
467 Compile servers will be selected automatically for the re-compi‐
468 lation attempt as if --use-server was specified with no argu‐
469 ments.
470
471
472 --list-servers[=SERVERS]
473 Display the status of the requested SERVERS, where SERVERS is a
474 comma-separated list of server attributes. The list of at‐
475 tributes is combined to filter the list of servers displayed.
476 Supported attributes are:
477
478 all specifies all known servers (trusted SSL peers, trusted
479 module signers, online servers).
480
481 specified
482 specifies servers specified using --use-server.
483
484 online filters the output by retaining information about servers
485 which are currently online.
486
487 trusted
488 filters the output by retaining information about servers
489 which are trusted as SSL peers.
490
491 signer filters the output by retaining information about servers
492 which are trusted as module signers (see --privilege).
493
494 compatible
495 filters the output by retaining information about servers
496 which are compatible with the current kernel release and
497 architecture.
498
499 If no argument is provided, then the default is specified. If
500 no servers were specified using --use-server, then the default
501 servers for --use-server are listed.
502
503 Note that --list-servers uses the avahi-daemon service to detect
504 online servers. If this service is not available, then
505 --list-servers will fail to detect any online servers. In order
506 for --list-servers to detect servers listening on IPv6 address‐
507 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
508 mon.conf must contain an active "use-ipv6=yes" line. The service
509 must be restarted after adding this line in order for IPv6 to be
510 enabled.
511
512
513 --trust-servers[=TRUST_SPEC]
514 Grant or revoke trust in compile-servers, specified using
515 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
516 comma-separated list specifying the trust which is to be granted
517 or revoked. Supported elements are:
518
519 ssl trust the specified servers as SSL peers.
520
521 signer trust the specified servers as module signers (see
522 --privilege). Only root can specify signer.
523
524 all-users
525 grant trust as an ssl peer for all users on the local
526 host. The default is to grant trust as an ssl peer for
527 the current user only. Trust as a module signer is always
528 granted for all users. Only root can specify all-users.
529
530 revoke revoke the specified trust. The default is to grant it.
531
532 no-prompt
533 do not prompt the user for confirmation before carrying
534 out the requested action. The default is to prompt the
535 user for confirmation.
536
537 If no argument is provided, then the default is ssl. If no
538 servers were specified using --use-server, then no trust will be
539 granted or revoked.
540
541 Unless no-prompt has been specified, the user will be prompted
542 to confirm the trust to be granted or revoked before the opera‐
543 tion is performed.
544
545
546 --dump-probe-types
547 Dumps a list of supported probe types and exits. If --privi‐
548 lege=stapusr is also specified, the list will be limited to
549 probe types available to unprivileged users.
550
551
552 --dump-probe-aliases
553 Dumps a list of all probe aliases found in library files and ex‐
554 its.
555
556
557 --dump-functions
558 Dumps a list of all the public functions found in library files
559 and exits. Also includes their parameters and types. A function
560 of type 'unknown' indicates a function that does not return a
561 value. Note that not all function/parameter types may be re‐
562 solved (these are also shown by 'unknown'). This features is
563 very memory-intensive and thus may not work properly with --use-
564 server if the target server imposes an rlimit on process memory
565 (i.e. through the ~stap-server/.systemtap/rc configuration file,
566 see stap-server(8)).
567
568
569 --remote URL
570 Set the execution target to the given host. This option may be
571 repeated to target multiple execution targets. Passes 1-4 are
572 completed locally as normal to build the script, and then pass 5
573 will copy the module to the target and run it. Acceptable URL
574 forms include:
575
576 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
577 This mode uses ssh, optionally using a username not
578 matching your own. If a custom ssh_config file is in use,
579 add SendEnv LANG to retain internationalization function‐
580 ality.
581
582 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
583 This mode uses stapvirt to execute the script on a domain
584 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
585 fied to connect to a specific driver and/or a remote
586 host. For example, to connect to the local privileged QE‐
587 MU driver, use:
588
589 --remote libvirt://MyDomain/qemu:///system
590
591 See the page at <http://libvirt.org/uri.html> for sup‐
592 ported URIs. Also see stapvirt(1) for more information on
593 how to prepare the domain for stap probing.
594
595 unix:PATH
596 This mode connects to a UNIX socket. This can be used
597 with a QEMU virtio-serial port for executing scripts in‐
598 side a running virtual machine.
599
600 direct://
601 Special loopback mode to run on the local host.
602
603 --remote-prefix
604 Prefix each line of remote output with "N: ", where N is the in‐
605 dex of the remote execution target from which the given line
606 originated.
607
608
609 --download-debuginfo[=OPTION]
610 Enable, disable or set a timeout for the automatic debuginfo
611 downloading feature offered by abrt as specified by OPTION,
612 where OPTION is one of the following:
613
614 yes enable automatic downloading of debuginfo with no time‐
615 out. This is the same as not providing an OPTION value to
616 --download-debuginfo
617
618 no explicitly disable automatic downloading of debuginfo.
619 This is the same as not using the option at all.
620
621 ask show abrt output, and ask before continuing download. No
622 timeout will be set.
623
624 <timeout>
625 specify a timeout as a positive number to stop the down‐
626 load if it is taking longer than <timeout> seconds.
627
628 --rlimit-as=NUM
629 Specify the maximum size of the process's virtual memory (ad‐
630 dress space), in bytes.
631
632
633 --rlimit-cpu=NUM
634 Specify the CPU time limit, in seconds.
635
636
637 --rlimit-nproc=NUM
638 Specify the maximum number of processes that can be created.
639
640
641 --rlimit-stack=NUM
642 Specify the maximum size of the process stack, in bytes.
643
644
645 --rlimit-fsize=NUM
646 Specify the maximum size of files that the process may create,
647 in bytes.
648
649
650 --sysroot=DIR
651 Specify sysroot directory where target files (executables, li‐
652 braries, etc.) are located. With -r RELEASE, the sysroot will
653 be searched for the appropriate kernel build directory. With -r
654 /DIR, however, the sysroot will not be used to find the kernel
655 build.
656
657
658 --sysenv=VAR=VALUE
659 Provide an alternate value for an environment variable where the
660 value on a remote system differs. Path variables (e.g. PATH,
661 LD_LIBRARY_PATH) are assumed to be relative to the directory
662 provided by --sysroot, if provided.
663
664
665 --suppress-time-limits
666 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
667 and -DMAXTRYLOCK. This option requires guru mode.
668
669
670 --runtime=MODE
671 Set the pass-5 runtime mode. Valid options are kernel (de‐
672 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
673 information.
674
675
676 --dyninst
677 Shorthand for --runtime=dyninst.
678
679
680 --bpf Shorthand for --runtime=bpf.
681
682
683 --save-uprobes
684 On machines that require SystemTap to build its own uprobes mod‐
685 ule (kernels prior to version 3.5), this option instructs Sys‐
686 temTap to also save a copy of the module in the current directo‐
687 ry (creating a new "uprobes" directory first).
688
689
690 --target-namespaces=PID
691 Allow for a set of target namespaces to be set based on the
692 namespaces the given PID is in. This is for namespace-aware
693 tapset functions. If the target namespaces was not set, the tar‐
694 get defaults to the stap process' namespaces.
695
696
697 --monitor=INTERVAL
698 Enables an interface to display status information about the
699 module(uptime, module name, invoker uid, memory sizes, global
700 variables, list of probes with their statistics). An optional
701 argument INTERVAL can be supplied to set the refresh rate in
702 seconds of the status window. The module can also be controlled
703 by a list of commands using the following keys:
704
705 c Resets all global variables to their initial values or
706 zeroes them if they did not have an initial value.
707
708 s Rotates the attribute used to sort the list of probes.
709
710 t Brings up a prompt to allow toggling(on/off) of probes by
711 index. Probe points are still affected by their condi‐
712 tions.
713
714 r Resumes the script by toggling on all probes.
715
716 p Pauses the script by toggling off all probes.
717
718 x Hides/shows the status window. This allows for more out‐
719 put to be seen.
720
721 navigation-keys
722 The navigation keys can be used to scroll up and down the
723 windows.
724
725 Tab Toggle scrolling between status and output windows.
726
727
728 --example
729 This option is used to run example scripts without having to en‐
730 ter the entire path to the script. Example scripts can be found
731 in the directory specified in the stappaths(7) manual page.
732
733
734 --no-global-var-display
735 This option is used to disable the automatic logging of unused
736 global variables at the end of a stap session.
737
738
740 Any additional arguments on the command line are passed to the script
741 parser for substitution. See below.
742
743
745 The systemtap script language resembles awk and C. There are two main
746 outermost constructs: probes and functions. Within these, statements
747 and expressions use C-like operator syntax and precedence.
748
749
750 GENERAL SYNTAX
751 Whitespace is ignored. Three forms of comments are supported:
752 # ... shell style, to the end of line, except for $# and @#
753 // ... C++ style, to the end of line
754 /* ... C style ... */
755 Literals are either strings enclosed in double-quotes (passing through
756 the usual C escape codes with backslashes, and with adjacent string
757 literals glued together, also as in C), or integers (in decimal, hexa‐
758 decimal, or octal, using the same notation as in C). All strings are
759 limited in length to some reasonable value (a few hundred bytes). In‐
760 tegers are 64-bit signed quantities, although the parser also accepts
761 (and wraps around) values above positive 2**63.
762
763 In addition, script arguments given at the end of the command line may
764 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
765 insertion as a string literal. The number of arguments may be accessed
766 through $# (as an unquoted number) or through @# (as a quoted number).
767 These may be used at any place a token may begin, including within the
768 preprocessing stage. Reference to an argument number beyond what was
769 actually given is an error.
770
771
772 PREPROCESSING
773 A simple conditional preprocessing stage is run as a part of parsing.
774 The general form is similar to the cond ? exp1 : exp2 ternary operator:
775
776 %( CONDITION %? TRUE-TOKENS %)
777 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
778
779 The CONDITION is either an expression whose format is determined by its
780 first keyword, or a string literals comparison or a numeric literals
781 comparison. It can be also composed of many alternatives and conjunc‐
782 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
783 spectively. However, parentheses are not supported yet, so remembering
784 that conjunction takes precedence over alternative is important.
785
786 If the first part is the identifier kernel_vr or kernel_v to refer to
787 the kernel version number, with ("2.6.13-1.322FC3smp") or without
788 ("2.6.13") the release code suffix, then the second part is one of the
789 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
790 the third part is a string literal that contains an RPM-style version-
791 release value. The condition is deemed satisfied if the version of the
792 target kernel (as optionally overridden by the -r option) compares to
793 the given version string. The comparison is performed by the glibc
794 function strverscmp. As a special case, if the operator is for simple
795 equality (==), or inequality (!=), and the third part contains any
796 wildcard characters (* or ? or [), then the expression is treated as a
797 wildcard (mis)match as evaluated by fnmatch.
798
799 If, on the other hand, the first part is the identifier arch to refer
800 to the processor architecture (as named by the kernel build system
801 ARCH/SUBARCH), then the second part is one of the two string comparison
802 operators == or !=, and the third part is a string literal for matching
803 it. This comparison is a wildcard (mis)match.
804
805 Similarly, if the first part is an identifier like CONFIG_something to
806 refer to a kernel configuration option, then the second part is == or
807 !=, and the third part is a string literal for matching the value (com‐
808 monly "y" or "m"). Nonexistent or unset kernel configuration options
809 are represented by the empty string. This comparison is also a wild‐
810 card (mis)match.
811
812 If the first part is the identifier systemtap_v, the test refers to the
813 systemtap compatibility version, which may be overridden for old
814 scripts with the --compatible flag. The comparison operator is as is
815 for kernel_v and the right operand is a version string. See also the
816 DEPRECATION section below.
817
818 If the first part is the identifier systemtap_privilege, the test
819 refers to the privilege level that the systemtap script is compiled
820 with. Here the second part is == or !=, and the third part is a string
821 literal, either "stapusr" or "stapsys" or "stapdev".
822
823 If the first part is the identifier guru_mode, the test refers to if
824 the systemtap script is compiled with guru_mode. Here the second part
825 is == or !=, and the third part is a number, either 1 or 0.
826
827 If the first part is the identifier runtime, the test refers to the
828 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
829 tion on runtimes. The second part is one of the two string comparison
830 operators == or !=, and the third part is a string literal for matching
831 it. This comparison is a wildcard (mis)match.
832
833 Otherwise, the CONDITION is expected to be a comparison between two
834 string literals or two numeric literals. In this case, the arguments
835 are the only variables usable.
836
837 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
838 (possibly including nested preprocessor conditionals), and are passed
839 into the input stream if the condition is true or false. For example,
840 the following code induces a parse error unless the target kernel ver‐
841 sion is newer than 2.6.5:
842
843 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
844
845 The following code might adapt to hypothetical kernel version drift:
846
847 probe kernel.function (
848 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
849 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
850 UNSUPPORTED %) %)
851 ) { /* ... */ }
852
853 %( arch == "ia64" %?
854 probe syscall.vliw = kernel.function("vliw_widget") {}
855 %)
856
857
858
859 PREPROCESSOR MACROS
860 The preprocessor also supports a simple macro facility, run as a sepa‐
861 rate pass before conditional preprocessing.
862
863 Macros are defined using the following construct:
864
865 @define NAME %( BODY %)
866 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
867
868 Macros, and parameters inside a macro body, are both invoked by prefix‐
869 ing the macro name with an @ symbol:
870
871 @define foo %( x %)
872 @define add(a,b) %( ((@a)+(@b)) %)
873
874 @foo = @add(2,2)
875
876
877 Macro expansion is currently performed in a separate pass before condi‐
878 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
879 tional expressions will be macroexpanded regardless of how the condi‐
880 tion is evaluated. This can sometimes lead to errors:
881
882 // The following results in a conflict:
883 %( CONFIG_UTRACE == "y" %?
884 @define foo %( process.syscall %)
885 %:
886 @define foo %( **ERROR** %)
887 %)
888
889 // The following works properly as expected:
890 @define foo %(
891 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
892 %)
893
894 The first example is incorrect because both @defines are evaluated in a
895 pass prior to the conditional being evaluated.
896
897 Normally, a macro definition is local to the file it occurs in. Thus,
898 defining a macro in a tapset does not make it available to the user of
899 the tapset. Publically available library macros can be defined by in‐
900 cluding .stpm files on the tapset search path. These files may only
901 contain @define constructs, which become visible across all tapsets and
902 user scripts. Optionally, within the .stpm files, a public macro defi‐
903 nition can be surrounded by a preprocessor conditional as described
904 above.
905
906
907 CONSTANTS
908 Tapsets or guru-mode user scripts can access header file constant to‐
909 kens, typically macros, using built-in @const() operator. The respec‐
910 tive header file inclusion is possible either via the tapset library,
911 or using a top-level guru mode embedded-C construct. This results in
912 appropriate embedded C pragma comments setting.
913
914 @const("STP_SKIP_BADVARS")
915
916
917
918 VARIABLES
919 Identifiers for variables and functions are an alphanumeric sequence,
920 and may include _ and $ characters. They may not start with a plain
921 digit, as in C. Each variable is by default local to the probe or
922 function statement block within which it is mentioned, and therefore
923 its scope and lifetime is limited to a particular probe or function in‐
924 vocation.
925
926 Scalar variables are implicitly typed as either string or integer. As‐
927 sociative arrays also have a string or integer value, and a tuple of
928 strings and/or integers serving as a key. Here are a few basic expres‐
929 sions.
930
931 var1 = 5
932 var2 = "bar"
933 array1 [pid()] = "name" # single numeric key
934 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
935 if (["hello",5,4] in array2) println ("yes") # membership test
936
937
938 The translator performs type inference on all identifiers, including
939 array indexes and function parameters. Inconsistent type-related use
940 of identifiers signals an error.
941
942 Variables may be declared global, so that they are shared amongst all
943 probes and functions and live as long as the entire systemtap session.
944 There is one namespace for all global variables, regardless of which
945 script file they are found within. Concurrent access to global vari‐
946 ables is automatically protected with locks, see the SAFETY AND SECURI‐
947 TY section for more details. A global declaration may be written at
948 the outermost level anywhere, not within a block of code. Global vari‐
949 ables which are written but never read will be displayed automatically
950 at session shutdown. The translator will infer for each its value
951 type, and if it is used as an array, its key types. Optionally, scalar
952 globals may be initialized with a string or number literal. The fol‐
953 lowing declaration marks variables as global.
954
955 global var1, var2, var3=4
956
957
958 Global variables can also be set as module options. One can do this by
959 either using the -G option, or the module must first be compiled using
960 stap -p4. Global variables can then be set on the command line when
961 calling staprun on the module generated by stap -p4. See staprun(8) for
962 more information.
963
964 The scope of a global variable may be limited to a tapset or user
965 script file using private keyword. The global keyword is optional when
966 defining a private global variable. Following declaration marks var1
967 and var2 private globals.
968
969 private global var1=2
970 private var2
971
972
973 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
974 SAFETY AND SECURITY section for details. Optionally, global arrays may
975 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
976 for that array only. Note that this doesn't indicate the type of keys
977 for the array, just the size.
978
979 global tiny_array[10], normal_array, big_array[50000]
980
981
982 Arrays may be configured for wrapping using the '%' suffix. This caus‐
983 es older elements to be overwritten if more elements are inserted than
984 the array can hold. This works for both associative and statistics
985 typed arrays.
986
987 global wrapped_array1%[10], wrapped_array2%
988
989
990
991 Many types of probe points provide context variables, which are run-
992 time values, safely extracted from the kernel or userspace program be‐
993 ing probed. These are prefixed with the $ character. The CONTEXT
994 VARIABLES section in stapprobes(3stap) lists what is available for each
995 type of probe point. These context variables become normal string or
996 numeric scalars once they are stored in normal script variables. See
997 the TYPECASTING section below on how to to turn them back into typed
998 pointers for further processing as context variables. There is some
999 automation to help!
1000
1001
1002 STATEMENTS
1003 Statements enable procedural control flow. They may occur within func‐
1004 tions and probe handlers. The total number of statements executed in
1005 response to any single probe event is limited to some number defined by
1006 the MAXACTION macro in the translated C code, and is in the neighbour‐
1007 hood of 1000.
1008
1009 EXP Execute the string- or integer-valued expression and throw away
1010 the value.
1011
1012 { STMT1 STMT2 ... }
1013 Execute each statement in sequence in this block. Note that
1014 separators or terminators are generally not necessary between
1015 statements.
1016
1017 ; Null statement, do nothing. It is useful as an optional separa‐
1018 tor between statements to improve syntax-error detection and to
1019 handle certain grammar ambiguities.
1020
1021 if (EXP) STMT1 [ else STMT2 ]
1022 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1023 ro) or second STMT (zero).
1024
1025 while (EXP) STMT
1026 While integer-valued EXP evaluates to non-zero, execute STMT.
1027
1028 for (EXP1; EXP2; EXP3) STMT
1029 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1030 STMT, then the iteration expression EXP3.
1031
1032 foreach (VAR in ARRAY [ limit EXP ]) STMT
1033 Loop over each element of the named global array, assigning cur‐
1034 rent key to VAR. The array may not be modified within the
1035 statement. By adding a single + or - operator after the VAR or
1036 the ARRAY identifier, the iteration will proceed in a sorted or‐
1037 der, by ascending or descending index or value. If the array
1038 contains statistics aggregates, adding the desired @operator be‐
1039 tween the ARRAY identifier and the + or - will specify the sort‐
1040 ing aggregate function. See the STATISTICS section below for
1041 the ones available. Default is @count. Using the optional lim‐
1042 it keyword limits the number of loop iterations to EXP times.
1043 EXP is evaluated once at the beginning of the loop.
1044
1045 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1046 Same as above, used when the array is indexed with a tuple of
1047 keys. A sorting suffix may be used on at most one VAR or ARRAY
1048 identifier.
1049
1050 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1051 ]) STMT
1052 Same as above, where iterations are limited to elements in the
1053 array where the keys match the index values specified. The sym‐
1054 bol * can be used to specify an index and will be treated as a
1055 wildcard.
1056
1057 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1058 This variant of foreach saves current value into VAR0 on each
1059 iteration, so it is the same as ARRAY[VAR]. This also works
1060 with a tuple of keys. Sorting suffixes on VAR0 have the same
1061 effect as on ARRAY.
1062
1063 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1064 Same as above, where iterations are limited to elements in the
1065 array where the keys match the index values specified. The sym‐
1066 bol * can be used to specify an index and will be treated as a
1067 wildcard.
1068
1069 break, continue
1070 Exit or iterate the innermost nesting loop (while or for or
1071 foreach) statement.
1072
1073 return EXP
1074 Return EXP value from enclosing function. If the function's
1075 value is not taken anywhere, then a return statement is not
1076 needed, and the function will have a special "unknown" type with
1077 no return value.
1078
1079 next Return now from enclosing probe handler. This is especially
1080 useful in probe aliases that apply event filtering predicates.
1081 When used in functions, the execution will be immediately trans‐
1082 ferred to the next overloaded function.
1083
1084 try { STMT1 } catch { STMT2 }
1085 Run the statements in the first block. Upon any run-time er‐
1086 rors, abort STMT1 and start executing STMT2. Any errors in
1087 STMT2 will propagate to outer try/catch blocks, if any.
1088
1089 try { STMT1 } catch(VAR) { STMT2 }
1090 Same as above, plus assign the error message to the string
1091 scalar variable VAR.
1092
1093 delete ARRAY[INDEX1, INDEX2, ...]
1094 Remove from ARRAY the element specified by the index tuple. If
1095 the index tuple contains a * in place of an index, the * is
1096 treated as a wildcard and all elements with keys that match the
1097 index tuple will be removed from ARRAY. The value will no
1098 longer be available, and subsequent iterations will not report
1099 the element. It is not an error to delete an element that does
1100 not exist.
1101
1102 delete ARRAY
1103 Remove all elements from ARRAY.
1104
1105 delete SCALAR
1106 Removes the value of SCALAR. Integers and strings are cleared
1107 to 0 and "" respectively, while statistics are reset to the ini‐
1108 tial empty state.
1109
1110
1111 EXPRESSIONS
1112 Systemtap supports a number of operators that have the same general
1113 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1114 formed as per typical C rules for signed integers. Division by zero or
1115 overflow is detected and results in an error.
1116
1117 binary numeric operators
1118 * / % + - >> << & ^ | && ||
1119
1120 binary string operators
1121 . (string concatenation)
1122
1123 numeric assignment operators
1124 = *= /= %= += -= >>= <<= &= ^= |=
1125
1126 string assignment operators
1127 = .=
1128
1129 unary numeric operators
1130 + - ! ~ ++ --
1131
1132 binary numeric, string comparison or regex matching operators
1133 < > <= >= == != =~ !~
1134
1135 ternary operator
1136 cond ? exp1 : exp2
1137
1138 grouping operator
1139 ( exp )
1140
1141 function call
1142 fn ([ arg1, arg2, ... ])
1143
1144 array membership check
1145 exp in array
1146 [exp1, exp2, ... ] in array
1147 [*, *, ... ] in array
1148
1149
1150 REGULAR EXPRESSION MATCHING
1151 The scripting language supports regular expression matching. The basic
1152 syntax is as follows:
1153
1154 exp =~ regex
1155 exp !~ regex
1156
1157 (The first operand must be an expression evaluating to a string; the
1158 second operand must be a string literal containing a syntactically
1159 valid regular expression.)
1160
1161 The regular expression syntax supports most of the features of POSIX
1162 Extended Regular Expressions, except for subexpression reuse ("\1")
1163 functionality.
1164
1165 After a successful match, the contents of the matched string and subex‐
1166 pressions can be extracted using the matched() and ngroups() tapset
1167 functions as follows:
1168
1169 if ("an example string" =~ "str(ing)") {
1170 matched(0) // -> returns "string", the matched substring
1171 matched(1) // -> returns "ing", the 1st matched subexpression
1172 ngroups() // -> returns 2, the number of matched groups
1173 }
1174
1175
1176 PROBES
1177 The main construct in the scripting language identifies probes. Probes
1178 associate abstract events with a statement block ("probe handler") that
1179 is to be executed when any of those events occur. The general syntax
1180 is as follows:
1181
1182 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1183 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1184
1185
1186 Events are specified in a special syntax called "probe points". There
1187 are several varieties of probe points defined by the translator, and
1188 tapset scripts may define further ones using aliases. Probe points may
1189 be wildcarded, grouped, or listed in preference sequences, or declared
1190 optional. More details on probe point syntax and semantics are listed
1191 on the stapprobes(3stap) manual page.
1192
1193 The probe handler is interpreted relative to the context of each event.
1194 For events associated with kernel code, this context may include vari‐
1195 ables defined in the source code at that spot. These "context vari‐
1196 ables" are presented to the script as variables whose names are pre‐
1197 fixed with "$". They may be accessed only if the kernel's compiler
1198 preserved them despite optimization. This is the same constraint that
1199 a debugger user faces when working with optimized code. In addition,
1200 the objects must exist in paged-in memory at the moment of the system‐
1201 tap probe handler's execution, because systemtap must not cause (sup‐
1202 presses) any additional paging. Some probe types have very little con‐
1203 text. See the stapprobes(3stap) man pages to see the kinds of context
1204 variables available at each kind of probe point. As of systemtap ver‐
1205 sion 4.3, functions called from the handlers of some probe point types
1206 may also refer to context variables. These are treated as if a clone
1207 of that function was inlined into the calling probe handler and $vari‐
1208 ables evaluated in its context.
1209
1210 Probes may be decorated with an arming condition, consisting of a sim‐
1211 ple boolean expression on read-only global script variables. While
1212 disarmed (inactive, condition evaluates to false), some probe types re‐
1213 duce or eliminate their run-time overheads. When an arming condition
1214 evaluates to true, probes will be soon re-armed, and their probe han‐
1215 dlers will start getting called as the events fire. (Some events may
1216 be lost during the arming interval. If this is unacceptable, do not
1217 use arming conditions for those probes.) Example of the syntax:
1218
1219 probe timer.us(TIMER) if (enabled) {
1220 }
1221
1222
1223 New probe points may be defined using "aliases". Probe point aliases
1224 look similar to probe definitions, but instead of activating a probe at
1225 the given point, it just defines a new probe point name as an alias to
1226 an existing one. There are two types of alias, i.e. the prologue style
1227 and the epilogue style which are identified by "=" and "+=" respective‐
1228 ly.
1229
1230 For prologue style alias, the statement block that follows an alias
1231 definition is implicitly added as a prologue to any probe that refers
1232 to the alias. While for the epilogue style alias, the statement block
1233 that follows an alias definition is implicitly added as an epilogue to
1234 any probe that refers to the alias. For example:
1235
1236 probe syscall.read = kernel.function("sys_read") {
1237 fildes = $fd
1238 if (execname() == "init") next # skip rest of probe
1239 }
1240
1241 defines a new probe point syscall.read, which expands to
1242 kernel.function("sys_read"), with the given statement as a prologue,
1243 which is useful to predefine some variables for the alias user and/or
1244 to skip probe processing entirely based on some conditions. And
1245
1246 probe syscall.read += kernel.function("sys_read") {
1247 if (tracethis) println ($fd)
1248 }
1249
1250 defines a new probe point with the given statement as an epilogue,
1251 which is useful to take actions based upon variables set or left over
1252 by the the alias user. Please note that in each case, the statements
1253 in the alias handler block are treated ordinarily, so that variables
1254 assigned there constitute mere initialization, not a macro substitu‐
1255 tion.
1256
1257 Aliases can also be defined to include both a prologue and an epilogue.
1258
1259 probe syscall.read = kernel.function("sys_read") {
1260 fildes = $fd
1261 if (execname() == "init") next
1262 },{
1263 if (tracethis) println ($fd)
1264 }
1265
1266
1267 An alias is used just like a built-in probe type.
1268
1269 probe syscall.read {
1270 printf("reading fd=%d\n", fildes)
1271 if (fildes > 10) tracethis = 1
1272 }
1273
1274
1275 Probes with an alias can make use of the @probewrite predicate. This
1276 check is used to detect whether a script variable or target variable
1277 has been written to in the probe handler body.
1278
1279 @probewrite(var)
1280 expands to 1 iff var has been written to in the probe handler
1281 body, otherwise it expands to 0.
1282
1283 In the following example, @probewrite(var) expands to 1 because var has
1284 been written to in the probe handler body and consequently, the condi‐
1285 tional statement will run.
1286
1287 probe foo = begin { var = 0 }, { if (@probewrite(var)) println(var) }
1288
1289 probe foo {
1290 var = 1
1291 }
1292
1293
1294
1295 FUNCTIONS
1296 Systemtap scripts may define subroutines to factor out common work.
1297 Functions take any number of scalar (integer or string) arguments, and
1298 must return a single scalar (integer or string). An example function
1299 declaration looks like this:
1300
1301 function thisfn (arg1, arg2) {
1302 return arg1 + arg2
1303 }
1304
1305 Note the general absence of type declarations, which are instead in‐
1306 ferred by the translator. However, if desired, a function definition
1307 may include explicit type declarations for its return value and/or its
1308 arguments. This is especially helpful for embedded-C functions. In
1309 the following example, the type inference engine need only infer type
1310 type of arg2 (a string).
1311
1312 function thatfn:string (arg1:long, arg2) {
1313 return sprint(arg1) . arg2
1314 }
1315
1316 Functions may call others or themselves recursively, up to a fixed
1317 nesting limit. This limit is defined by the MAXNESTING macro in the
1318 translated C code and is in the neighbourhood of 10.
1319
1320 Functions may be marked private using the private keyword to limit
1321 their scope to the tapset or user script file they are defined in. An
1322 example definition of a private function follows:
1323
1324 private function three:long () { return 3 }
1325
1326
1327 Functions terminating without reaching an explicit return statement
1328 will return an implicit 0 or "", determined by type inference.
1329
1330 Functions may be overloaded during both runtime and compile time.
1331
1332 Runtime overloading allows the executed function to be selected while
1333 the module is running based on runtime conditions and is achieved using
1334 the "next" statement in script functions and STAP_NEXT macro for embed‐
1335 ded-C functions. For example,
1336
1337
1338 function f() { if (condition) next; print("first function") }
1339 function f() %{ STAP_NEXT; print("second function") %}
1340 function f() { print("third function") }
1341
1342
1343 During a functioncall f(), the execution will transfer to the third
1344 function if condition evaluates to true and print "third function".
1345 Note that the second function is unconditionally nexted.
1346
1347 Parameter overloading allows the function to be executed to be selected
1348 at compile time based on the number of arguments provided to the func‐
1349 tioncall. For example,
1350
1351
1352 function g() { print("first function") }
1353 function g(x) { print("second function") }
1354 g() -> "first function"
1355 g(1) -> "second function"
1356
1357
1358 Note that runtime overloading does not occur in the above example, as
1359 exactly one function will be resolved for the functioncall. The use of
1360 a next statement inside a function while no more overloads remain will
1361 trigger a runtime exception Runtime overloading will only occur if the
1362 functions have the same arity, functions with the same name but differ‐
1363 ent number of parameters are completely unrelated.
1364
1365 Execution order is determined by a priority value which may be speci‐
1366 fied. If no explicit priority is specified, user script functions are
1367 given a higher priority than library functions. User script functions
1368 and library functions are assigned a default priority value of 0 and 1
1369 respectively. Functions with the same priority are executed in decla‐
1370 ration order. For example,
1371
1372
1373 function f():3 { if (condition) next; print("first function") }
1374 function f():1 { if (condition) next; print("second function") }
1375 function f():2 { print("third function") }
1376
1377
1378 Since the second function has highest priority, it is executed first.
1379 The first function is never executed as there no "next" statements in
1380 the third function to transfer execution.
1381
1382
1383 PRINTING
1384 There are a set of function names that are specially treated by the
1385 translator. They format values for printing to the standard systemtap
1386 output stream in a more convenient way (note that data generated in the
1387 kernel module need to get transferred to user-space in order to get
1388 printed).
1389
1390 The sprint* variants return the formatted string instead of printing
1391 it.
1392
1393 print, sprint
1394 Print one or more values of any type, concatenated directly to‐
1395 gether.
1396
1397 println, sprintln
1398 Print values like print and sprint, but also append a newline.
1399
1400 printd, sprintd
1401 Take a string delimiter and two or more values of any type, and
1402 print the values with the delimiter interposed. The delimiter
1403 must be a literal string constant.
1404
1405 printdln, sprintdln
1406 Print values with a delimiter like printd and sprintd, but also
1407 append a newline.
1408
1409 printf, sprintf
1410 Take a formatting string and a number of values of corresponding
1411 types, and print them all. The format must be a literal string
1412 constant.
1413
1414 The printf formatting directives similar to those of C, except that
1415 they are fully type-checked by the translator:
1416
1417 %b Writes a binary blob of the value given, instead of ASCII
1418 text. The width specifier determines the number of bytes
1419 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1420 fault (%b) is 8 bytes.
1421
1422 %c Character.
1423
1424 %d,%i Signed decimal.
1425
1426 %m Safely reads kernel (without #) or user (with #) memory
1427 at the given address, outputs its content. The optional
1428 precision specifier (not field width) determines the num‐
1429 ber of bytes to read - default is 1 byte. %10.4m prints
1430 4 bytes of the memory in a 10-character-wide field.
1431 Note, on some architectures user memory can still be read
1432 without #.
1433
1434 %M Same as %m, but outputs in hexadecimal. The minimal size
1435 of output is double the optional precision specifier -
1436 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1437 of the memory as 8 hexadecimal characters in a 10-charac‐
1438 ter-wide field. %.*M hex-dumps a given number of bytes
1439 from a given buffer.
1440
1441 %o Unsigned octal.
1442
1443 %p Unsigned pointer address.
1444
1445 %s String.
1446
1447 %u Unsigned decimal.
1448
1449 %x Unsigned hex value, in all lower-case.
1450
1451 %X Unsigned hex value, in all upper-case.
1452
1453 %% Writes a %.
1454
1455 The # flag selects the alternate forms. For octal, this prefixes a 0.
1456 For hex, this prefixes 0x or 0X, depending on case. For characters,
1457 this escapes non-printing values with either C-like escapes or raw oc‐
1458 tal. In the case of %#m/%#M, this safely accesses user space memory
1459 rather than kernel space memory.
1460
1461 Examples:
1462
1463 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1464 print("hello")
1465 Prints: hello
1466 println(b)
1467 Prints: bob\n
1468 println(a . " is " . sprint(16))
1469 Prints: alice is 16
1470 foreach (name in id) printdln("|", strlen(name), name, id[name])
1471 Prints: 5|alice|1234\n3|bob|4567
1472 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1473 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1474 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1475 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1476 printf("%4b", p)
1477 Prints (these values as binary data): 0x1234abcd
1478 printf("%#o %#x %#X\n", 1, 2, 3)
1479 Prints: 01 0x2 0X3
1480 printf("%#c %#c %#c\n", 0, 9, 42)
1481 Prints: \000 \t *
1482
1483
1484
1485 STATISTICS
1486 It is often desirable to collect statistics in a way that avoids the
1487 penalties of repeatedly exclusive locking the global variables those
1488 numbers are being put into. Systemtap provides a solution using a spe‐
1489 cial operator to accumulate values, and several pseudo-functions to ex‐
1490 tract the statistical aggregates.
1491
1492 The aggregation operator is <<<, and resembles an assignment, or a C++
1493 output-streaming operation. The left operand specifies a scalar or ar‐
1494 ray-index lvalue, which must be declared global. The right operand is
1495 a numeric expression. The meaning is intuitive: add the given number
1496 to the pile of numbers to compute statistics of. (The specific list of
1497 statistics to gather is given separately, by the extraction functions.)
1498
1499 foo <<< 1
1500 stats[pid()] <<< memsize
1501
1502
1503 The extraction functions are also special. For each appearance of a
1504 distinct extraction function operating on a given identifier, the
1505 translator arranges to compute a set of statistics that satisfy it.
1506 The statistics system is thereby "on-demand". Each execution of an ex‐
1507 traction function causes the aggregation to be computed for that moment
1508 across all processors.
1509
1510 Here is the set of extractor functions. The first argument of each is
1511 the same style of lvalue used on the left hand side of the accumulate
1512 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1513 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1514 mum/average/variance of all accumulated values. The resulting values
1515 are all simple integers. Arrays containing aggregates may be sorted
1516 and iterated. See the foreach construct above.
1517
1518 Variance uses Welford's online algorithm. The calculations are based
1519 on integer arithmetic, and so may suffer from low precision and over‐
1520 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1521 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1522 Only one value of bit-shift may be used with given global variable. A
1523 larger bitshift value increases precision, but increases the likelihood
1524 of overflow.
1525
1526
1527 $ stap -e \
1528 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1529 12
1530 $ stap -e \
1531 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1532 2
1533 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1534 2.5
1535 $
1536
1537
1538 Overflow (from internal multiplication of large numbers) may occur and
1539 may cause a negative variance result. Consider normalizing your input
1540 data. Adding or subtracting a fixed value from all variance inputs
1541 preserves the original variance. Dividing the variance inputs by a
1542 fixed value shrinks the original variance by that value squared.
1543
1544
1545
1546 Histograms are also available, but are more complicated because they
1547 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1548 terval) represents a linear histogram from "start" to "stop" (inclu‐
1549 sive) by increments of "interval". The interval must be positive. Sim‐
1550 ilarly, @hist_log(v) represents a base-2 logarithmic histogram. Print‐
1551 ing a histogram with the print family of functions renders a histogram
1552 object as a tabular "ASCII art" bar chart.
1553
1554
1555 probe timer.profile {
1556 x[1] <<< pid()
1557 x[2] <<< uid()
1558 y <<< tid()
1559 }
1560 global x // an array containing aggregates
1561 global y // a scalar
1562 probe end {
1563 foreach ([i] in x @count+) {
1564 printf ("x[%d]: avg %d = sum %d / count %d\n",
1565 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1566 println (@hist_log(x[i]))
1567 }
1568 println ("y:")
1569 println (@hist_log(y))
1570 }
1571
1572
1573 The counts of each histogram bucket may be individually accessed via
1574 the [index] operator. Each bucket is addressed from 1 through N (for
1575 each natural bucket). In addition bucket #0 counts all the samples be‐
1576 neath the start value, and bucket #N+1 counts all the samples above the
1577 stop value. Histogram buckets (including the two out-of-range buckets)
1578 may also be iterated with foreach.
1579
1580
1581 global x
1582 probe oneshot {
1583 x <<< -100
1584 x <<< 1
1585 x <<< 2
1586 x <<< 3
1587 x <<< 100
1588 foreach (bucket in @hist_linear(x,1,3,1))
1589 // expecting 1 out-of-range-low bucket
1590 // 3 payload buckets
1591 // 1 out-of-range-high bucket
1592 printf("bucket %d count %d\n",
1593 bucket, @hist_linear(x,1,3,1)[bucket])
1594 }
1595
1596
1597
1598 TYPECASTING
1599 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1600 has been saved into a script integer variable, the translator attempts
1601 to keep the type information necessary to access members from that
1602 pointer.
1603
1604 The translator attempts to track DWARF typing associated with script
1605 variables assigned from addresses of context $variables, @cast or @var
1606 operators. Depending on the complexity of the script code, this asso‐
1607 ciation may pass to related variables, so that -> and [] operators may
1608 be used on them, just as on the original context variable. For exam‐
1609 ple:
1610
1611
1612 foo = $param->foo; printf("x:%d y:%d\n", foo->x, foo->y)
1613 printf("my value is %d\n", ($type == 42 ? $foo : $bar)->value)
1614 printf("my parent pid is %d\n", task_parent(task_current())->tgid)
1615
1616
1617 However, if this association heuristic doesn't work for a script, using
1618 the @cast() operator tells the translator how to interpret the number
1619 as a typed pointer.
1620
1621 @cast(p, "type_name"[, "module"])->member
1622
1623
1624 This will interpret p as a pointer to a struct/union named type_name
1625 and dereference the member value. Further ->subfield expressions may
1626 be appended to dereference more levels. Note that for direct derefer‐
1627 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1628 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1629 ing operator -> is used to refer to both direct containment or pointer
1630 indirection. Systemtap automatically determines which. The optional
1631 module tells the translator where to look for information about that
1632 type. Multiple modules may be specified as a list with : separators.
1633 If the module is not specified, it will default either to the probe
1634 module for dwarf probes, or to "kernel" for functions and all other
1635 probes types.
1636
1637 The translator can create its own module with type information from a
1638 header surrounded by angle brackets, in case normal debuginfo is not
1639 available. For kernel headers, prefix it with "kernel" to use the ap‐
1640 propriate build system. All other headers are built with default GCC
1641 parameters into a user module. Multiple headers may be specified in
1642 sequence to resolve a codependency.
1643
1644 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1645 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1646 @cast(task, "task_struct",
1647 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1648
1649 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1650 operators, the same way as described in the CONTEXT VARIABLES section
1651 of the stapprobes(3stap) manual page.
1652
1653
1654 When in guru mode, the translator will also allow scripts to assign new
1655 values to members of typecasted pointers.
1656
1657 Typecasting is also useful in the case of void* members whose type may
1658 be determinable at runtime.
1659
1660 probe foo {
1661 if ($var->type == 1) {
1662 value = @cast($var->data, "type1")->bar
1663 } else {
1664 value = @cast($var->data, "type2")->baz
1665 }
1666 print(value)
1667 }
1668
1669
1670
1671 EMBEDDED C
1672 When in guru mode, the translator accepts embedded C code in the top
1673 level of the script. Such code is enclosed between %{ and %} markers,
1674 and is transcribed verbatim, without analysis, in some sequence, into
1675 the top level of the generated C code. At the outermost level, this
1676 may be useful to add #include instructions, and any auxiliary defini‐
1677 tions for use by other embedded code.
1678
1679 Another place where embedded code is permitted is as a function body.
1680 In this case, the script language body is replaced entirely by a piece
1681 of C code enclosed again between %{ and %} markers. This C code may do
1682 anything reasonable and safe. There are a number of undocumented but
1683 complex safety constraints on atomicity, concurrency, resource consump‐
1684 tion, and run time limits, so this is an advanced technique.
1685
1686 The memory locations set aside for input and output values are made
1687 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1688 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1689 The function may return early with STAP_RETURN. Here are some exam‐
1690 ples:
1691
1692 function integer_ops (val) %{
1693 STAP_PRINTF("%d\n", STAP_ARG_val);
1694 STAP_RETVALUE = STAP_ARG_val + 1;
1695 if (STAP_RETVALUE == 4)
1696 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1697 if (STAP_RETVALUE == 3)
1698 STAP_RETURN(0);
1699 STAP_RETVALUE ++;
1700 %}
1701 function string_ops (val) %{
1702 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1703 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1704 if (strcmp (STAP_RETVALUE, "three-two-one"))
1705 STAP_RETURN("parameter should be three-two-");
1706 %}
1707 function no_ops () %{
1708 STAP_RETURN(); /* function inferred with no return value */
1709 %}
1710
1711 The function argument and return value types have to be inferred by the
1712 translator from the call sites in order for this to work. The user
1713 should examine C code generated for ordinary script-language functions
1714 in order to write compatible embedded-C ones.
1715
1716 The last place where embedded code is permitted is as an expression
1717 rvalue. In this case, the C code enclosed between %{ and %} markers is
1718 interpreted as an ordinary expression value. It is assumed to be a
1719 normal 64-bit signed number, unless the marker /* string */ is includ‐
1720 ed, in which case it's treated as a string.
1721
1722 function add_one (val) {
1723 return val + %{ 1 %}
1724 }
1725 function add_string_two (val) {
1726 return val . %{ /* string */ "two" %}
1727 }
1728 @define SOME_STAP_MACRO %( %{ SOME_C_MACRO %} %)
1729 probe begin {
1730 printf("SOME_C_MACRO has value: %d\n", @SOME_STAP_MACRO);
1731 }
1732
1733
1734 The embedded-C code may contain markers to assert optimization and
1735 safety properties.
1736
1737 /* pure */
1738 means that the C code has no side effects and may be elided en‐
1739 tirely if its value is not used by script code.
1740
1741 /* stable */
1742 means that the C code always has the same value (in any given
1743 probe handler invocation), so repeated calls may be automatical‐
1744 ly replaced by memoized values. Such functions must take no pa‐
1745 rameters, and also be pure.
1746
1747 /* unprivileged */
1748 means that the C code is so safe that even unprivileged users
1749 are permitted to use it.
1750
1751 /* myproc-unprivileged */
1752 means that the C code is so safe that even unprivileged users
1753 are permitted to use it, provided that the target of the current
1754 probe is within the user's own process.
1755
1756 /* guru */
1757 means that the C code is so unsafe that a systemtap user must
1758 specify -g (guru mode) to use this. (Tapsets are permitted and
1759 presumed to call them safely.)
1760
1761 /* unmangled */
1762 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1763 ment access syntax should be made available inside the function.
1764 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1765 THIS->foo and THIS->__retvalue respectively inside the function.
1766 This is useful for quickly migrating code written for SystemTap
1767 version 1.7 and earlier.
1768
1769 /* unmodified-fnargs */
1770 in an embedded-C function, means that the function arguments are
1771 not modified inside the function body.
1772
1773 /* string */
1774 in embedded-C expressions only, means that the expression has
1775 const char * type and should be treated as a string value, in‐
1776 stead of the default long numeric.
1777
1778 Script level global variables may be accessed in embedded-C functions
1779 and blocks. To read or write the global variable var , the /* prag‐
1780 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1781 the embedded-C function or block. This provides the macros STAP_GLOB‐
1782 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1783 spectively. For example:
1784
1785 global var
1786 global var2[100]
1787 function increment() %{
1788 /* pragma:read:var */ /* pragma:write:var */
1789 /* pragma:read:var2 */ /* pragma:write:var2 */
1790 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1791 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1792 %}
1793
1794 Variables may be read and set in both embedded-C functions and expres‐
1795 sions. Strings returned from embedded-C code are decayed to pointers.
1796 Variables must also be assigned at script level to allow for type in‐
1797 ference. Map assignment does not return the value written, so chaining
1798 does not work.
1799
1800
1801 BUILT-INS
1802 A set of builtin probe point aliases are provided by the scripts in‐
1803 stalled in the directory specified in the stappaths(7) manual page.
1804 The functions are described in the stapprobes(3stap) manual page.
1805
1806
1807 DEREFERENCING
1808 Integers can be dereferenced from pointers saved as a script integer
1809 variables using the @kderef() or @uderef() operators. @kderef() is
1810 used for kernel space addresses and @uderef() is used for user space
1811 addresses.
1812
1813 @kderef(SIZE, addr)
1814 @uderef(SIZE, addr)
1815
1816 This will interpret addr as a kernel/user address and read SIZE bytes
1817 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1818
1819
1820 REGISTERS
1821 The value stored within a register can be accessed using the @kregis‐
1822 ter() or @uregister() operators. @kregister() is used for kernel space
1823 registers and @uregister() is used for user space registers. The regis‐
1824 ter of interest is specified using its DWARF number.
1825
1826 @kregister(0)
1827 @uregister(5)
1828
1829
1831 The translator begins pass 1 by parsing the given input script, and all
1832 scripts (files named *.stp) found in a tapset directory. The
1833 directories listed with -I are processed in sequence, each processed in
1834 "guru mode". For each directory, a number of subdirectories are also
1835 searched. These subdirectories are derived from the selected kernel
1836 version (the -R option), in order to allow more kernel-version-specific
1837 scripts to override less specific ones. For example, for a kernel
1838 version 2.6.12-23.FC3 the following patterns would be searched, in
1839 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1840 *.stp. Stopping the translator after pass 1 causes it to print the
1841 parse trees.
1842
1843
1844 In pass 2, the translator analyzes the input script to resolve symbols
1845 and types. References to variables, functions, and probe aliases that
1846 are unresolved internally are satisfied by searching through the parsed
1847 tapset script files. If any tapset script file is selected because it
1848 defines an unresolved symbol, then the entirety of that file is added
1849 to the translator's resolution queue. This process iterates until all
1850 symbols are resolved and a subset of tapset script files is selected.
1851
1852 Next, all probe point descriptions are validated against the wide
1853 variety supported by the translator. Probe points that refer to code
1854 locations ("synchronous probe points") require the appropriate kernel
1855 debugging information to be installed. In the associated probe
1856 handlers, target-side variables (whose names begin with "$") are found
1857 and have their run-time locations decoded.
1858
1859 Next, all probes and functions are analyzed for optimization
1860 opportunities, in order to remove variables, expressions, and functions
1861 that have no useful value and no side-effect. Embedded-C functions are
1862 assumed to have side-effects unless they include the magic string
1863 /* pure */. Since this optimization can hide latent code errors such
1864 as type mismatches or invalid $context variables, it sometimes may be
1865 useful to disable the optimizations with the -u option.
1866
1867 Finally, all variable, function, parameter, array, and index types are
1868 inferred from context (literals and operators). Stopping the
1869 translator after pass 2 causes it to list all the probes, functions,
1870 and variables, along with all inferred types. Any inconsistent or
1871 unresolved types cause an error.
1872
1873
1874 In pass 3, the translator writes C code that represents the actions of
1875 all selected script files, and creates a Makefile to build that into a
1876 kernel object. These files are placed into a temporary directory.
1877 Stopping the translator at this point causes it to print the contents
1878 of the C file.
1879
1880
1881 In pass 4, the translator invokes the Linux kernel build system to
1882 create the actual kernel object file. This involves running make in
1883 the temporary directory, and requires a kernel module build system
1884 (headers, config and Makefiles) to be installed in the usual spot
1885 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1886 the last chance before running the kernel object. This may be useful
1887 if you want to archive the file.
1888
1889
1890 In pass 5, the translator invokes the systemtap auxiliary program
1891 staprun program for the given kernel object. This program arranges to
1892 load the module then communicates with it, copying trace data from the
1893 kernel into temporary files, until the user sends an interrupt signal.
1894 Any run-time error encountered by the probe handlers, such as running
1895 out of memory, division by zero, exceeding nesting or runtime limits,
1896 results in a soft error indication. Soft errors in excess of MAXERRORS
1897 block of all subsequent probes (except error-handling probes), and
1898 terminate the session. Finally, staprun unloads the module, and cleans
1899 up.
1900
1901
1902 ABNORMAL TERMINATION
1903 One should avoid killing the stap process forcibly, for example with
1904 SIGKILL, because the stapio process (a child process of the stap
1905 process) and the loaded module may be left running on the system. If
1906 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1907 then use rmmod to unload the systemtap module.
1908
1909
1910
1912 See the stapex(3stap) manual page for a brief collection of samples, or
1913 a large set of installed samples under the systemtap
1914 documentation/testsuite directories. See stappaths(7stap) for the
1915 likely location of these on the system.
1916
1917
1919 The systemtap translator caches the pass 3 output (the generated C
1920 code) and the pass 4 output (the compiled kernel module) if pass 4
1921 completes successfully. This cached output is reused if the same
1922 script is translated again assuming the same conditions exist (same
1923 kernel version, same systemtap version, etc.). Cached files are stored
1924 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1925 having the file cache_mb_limit placed in the cache directory (shown
1926 above) containing only an ASCII integer representing how many MiB the
1927 cache should not exceed. In the absence of this file, a default will be
1928 created with the limit set to 256MiB. This is a 'soft' limit in that
1929 the cache will be cleaned after a new entry is added if the cache clean
1930 interval is exceeded, so the total cache size may temporarily exceed
1931 this limit. This interval can be specified by having the file
1932 cache_clean_interval_s placed in the cache directory (shown above)
1933 containing only an ASCII integer representing the interval in seconds.
1934 In the absence of this file, a default will be created with the
1935 interval set to 300 s.
1936
1937
1939 Systemtap may be used as a powerful administrative tool. It can expose
1940 kernel internal data structures and potentially private user
1941 information. (In dyninst runtime mode, this is not the case, see the
1942 ALTERNATE RUNTIMES section below.)
1943
1944 The translator asserts many safety constraints during compilation and
1945 more during run-time. It aims to ensure that no handler routine can
1946 run for very long, allocate boundless memory, perform unsafe
1947 operations, or in unintentionally interfere with the system. Uses of
1948 script global variables are automatically read/write locked as
1949 appropriate, to protect against manipulation by concurrent probe
1950 handlers. Locks are taken so as to run the global-variable
1951 manipulation portion of probe handlers atomically (locks are taken all-
1952 or-none). Deadlocks are detected with timeouts. Use the -t flag to
1953 receive reports of excessive lock contention. Experimenting with
1954 scripts is therefore generally safe. The guru-mode -g option allows
1955 administrators to bypass most safety measures, which permits invasive
1956 or state-changing operations, embedded-C code, and increases the risk
1957 of upset. By default, overload prevention is turned on for all
1958 modules. If you would like to disable overload processing, use the
1959 --suppress-time-limits option.
1960
1961 Errors that are caught at run time normally result in a clean script
1962 shutdown and a pass-5 error message. The --suppress-handler-errors
1963 option lets scripts tolerate soft errors without shutting down.
1964
1965
1966
1967 PERMISSIONS
1968 For the normal linux-kernel-module runtime, to run the kernel objects
1969 systemtap builds, a user must be one of the following:
1970
1971 • the root user;
1972
1973 • a member of the stapdev and stapusr groups;
1974
1975 • a member of the stapsys and stapusr groups; or
1976
1977 • a member of the stapusr group.
1978
1979 The root user or a user who is a member of both the stapdev and stapusr
1980 groups can build and run any systemtap script.
1981
1982 A user who is a member of both the stapsys and stapusr groups can only
1983 use pre-built modules under the following conditions:
1984
1985 • The module has been signed by a trusted signer. Trusted signers are
1986 normally systemtap compile-servers which sign modules when the
1987 --privilege option is specified by the client. See the
1988 stap-server(8) manual page for more information.
1989
1990 • The module was built using the --privilege=stapsys or the
1991 --privilege=stapusr options.
1992
1993 Members of only the stapusr group can only use pre-built modules under
1994 the following conditions:
1995
1996 • The module is located in the /lib/modules/VERSION/systemtap
1997 directory. This directory must be owned by root and not be world
1998 writable.
1999
2000 or
2001
2002 • The module has been signed by a trusted signer. Trusted signers are
2003 normally systemtap compile-servers which sign modules when the
2004 --privilege option is specified by the client. See the
2005 stap-server(8) manual page for more information.
2006
2007 • The module was built using the --privilege=stapusr option.
2008
2009 The kernel modules generated by stap program are run by the staprun
2010 program. The latter is a part of the Systemtap package, dedicated to
2011 module loading and unloading (but only in the white zone), and kernel-
2012 to-user data transfer. Since staprun does not perform any additional
2013 security checks on the kernel objects it is given, it would be unwise
2014 for a system administrator to add untrusted users to the stapdev or
2015 stapusr groups.
2016
2017
2018 SECUREBOOT
2019 If the current system has SecureBoot turned on in the UEFI firmware,
2020 all kernel modules must be signed. (Some kernels may allow disabling
2021 SecureBoot long after booting with a key sequence such as SysRq-X,
2022 making it unnecessary to sign modules.) The systemtap compile server
2023 can sign modules with a MOK (Machine Owner Key) that it has in common
2024 with a client system. See the following wiki page for more details:
2025
2026 https://sourceware.org/systemtap/wiki/SecureBoot
2027
2028 Some kernels do not let systemtap guess whether module module signing
2029 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
2030 variable to any value while running stap.
2031
2032
2033 RESOURCE LIMITS
2034 Many resource use limits are set by macros in the generated C code.
2035 These may be overridden with -D flags. A selection of these is as fol‐
2036 lows:
2037
2038 MAXNESTING
2039 Maximum number of nested function calls. Default determined by
2040 script analysis, with a bonus 10 slots added for recursive
2041 scripts.
2042
2043 MAXSTRINGLEN
2044 Maximum length of strings, default 128.
2045
2046 MAXTRYLOCK
2047 Maximum number of iterations to wait for locks on global vari‐
2048 ables before declaring possible deadlock and skipping the probe,
2049 default 1000.
2050
2051 MAXACTION
2052 Maximum number of statements to execute during any single probe
2053 hit (with interrupts disabled), default 1000. Note that for
2054 straight-through probe handlers lacking loops or recursion, due
2055 to optimization, this parameter may be interpreted too conserva‐
2056 tively.
2057
2058 MAXACTION_INTERRUPTIBLE
2059 Maximum number of statements to execute during any single probe
2060 hit which is executed with interrupts enabled (such as begin/end
2061 probes), default (MAXACTION * 10).
2062
2063 MAXBACKTRACE
2064 Maximum number of stack frames that will be be processed by the
2065 stap runtime unwinder as produced by the backtrace functions in
2066 the [u]context-unwind.stp tapsets, default 20.
2067
2068 MAXMAPENTRIES
2069 Maximum number of rows in any single global array, default 2048.
2070 Individual arrays may be declared with a larger or smaller limit
2071 instead:
2072
2073 global big[10000],little[5]
2074
2075 or denoted with % to make them wrap-around (replace old entries)
2076 automatically, as in
2077
2078 global big%
2079
2080 or both.
2081
2082 MAPHASHBIAS
2083 The number of powers-of-two to add or subtract from the natural
2084 size of the hash table backing each global associative array.
2085 Default is 0. Try small positive numbers to get extra perfor‐
2086 mance at the cost of more memory consumption, because that
2087 should reduce hash table collisions. Try small negative numbers
2088 for the opposite tradeoff.
2089
2090 MAXERRORS
2091 Maximum number of soft errors before an exit is triggered, de‐
2092 fault 0, which means that the first error will exit the script.
2093 Note that with the --suppress-handler-errors option, this limit
2094 is not enforced.
2095
2096 MAXSKIPPED
2097 Maximum number of skipped probes before an exit is triggered,
2098 default 100. Running systemtap with -t (timing) mode gives more
2099 details about skipped probes. With the default -DINTERRUPT‐
2100 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2101 lated against this limit. Note that with the --suppress-han‐
2102 dler-errors option, this limit is not enforced.
2103
2104 MINSTACKSPACE
2105 Minimum number of free kernel stack bytes required in order to
2106 run a probe handler, default 1024. This number should be large
2107 enough for the probe handler's own needs, plus a safety margin.
2108
2109 MAXUPROBES
2110 Maximum number of concurrently armed user-space probes (up‐
2111 robes), default somewhat larger than the number of user-space
2112 probe points named in the script. This pool needs to be poten‐
2113 tially large because individual uprobe objects (about 64 bytes
2114 each) are allocated for each process for each matching script-
2115 level probe.
2116
2117 STP_MAXMEMORY
2118 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2119 ule should use, default unlimited. The memory size includes the
2120 size of the module itself, plus any additional allocations.
2121 This only tracks direct allocations by the systemtap runtime.
2122 This does not track indirect allocations (as done by kprobes/up‐
2123 robes/etc. internals).
2124
2125 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2126 Maximum number of machine cycles spent in probes on any cpu per
2127 given interval, before an overload condition is declared and the
2128 script shut down. The defaults are 500 million and 1 billion,
2129 so as to limit stap script cpu consumption at around 50%.
2130
2131 STP_PROCFS_BUFSIZE
2132 Size of procfs probe read buffers (in bytes). Defaults to
2133 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2134 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2135
2136 With scripts that contain probes on any interrupt path, it is possible
2137 that those interrupts may occur in the middle of another probe handler.
2138 The probe in the interrupt handler would be skipped in this case to
2139 avoid reentrance. To work around this issue, execute stap with the op‐
2140 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2141 This does add some extra overhead to the probes, but it may prevent
2142 reentrance for common problem cases. However, probes in NMI handlers
2143 and in the callpath of the stap runtime may still be skipped due to
2144 reentrance.
2145
2146
2147 In case something goes wrong with stap or staprun after a probe has al‐
2148 ready started running, one may safely kill both user processes, and re‐
2149 move the active probe kernel module with rmmod. Any pending trace mes‐
2150 sages may be lost.
2151
2152
2154 Systemtap exposes kernel internal data structures and potentially pri‐
2155 vate user information. Because of this, use of systemtap's full capa‐
2156 bilities are restricted to root and to users who are members of the
2157 groups stapdev and stapusr.
2158
2159 However, a restricted set of systemtap's features can be made available
2160 to trusted, unprivileged users. These users are members of the group
2161 stapusr only, or members of the groups stapusr and stapsys. These
2162 users can load systemtap modules which have been compiled and certified
2163 by a trusted systemtap compile-server. See the descriptions of the op‐
2164 tions --privilege and --use-server. See README.unprivileged in the sys‐
2165 temtap source code for information about setting up a trusted compile
2166 server.
2167
2168 The restrictions enforced when --privilege=stapsys is specified are de‐
2169 signed to prevent unprivileged users from:
2170
2171 • harming the system maliciously.
2172
2173 The restrictions enforced when --privilege=stapusr is specified are de‐
2174 signed to prevent unprivileged users from:
2175
2176 • harming the system maliciously.
2177
2178 • gaining access to information which would not normally be
2179 available to an unprivileged user.
2180
2181 • disrupting the performance of processes owned by other users
2182 of the system. Some overhead to the system in general is
2183 unavoidable since the unprivileged user's probes will be
2184 triggered at the appropriate times. What we would like to
2185 avoid is targeted interruption of another user's processes
2186 which would not normally be possible by an unprivileged us‐
2187 er.
2188
2189
2190 PROBE RESTRICTIONS
2191 A member of the groups stapusr and stapsys may use all probe points.
2192
2193 A member of only the group stapusr may use only the following probes:
2194
2195 • begin, begin(n)
2196
2197 • end, end(n)
2198
2199 • error(n)
2200
2201 • never
2202
2203 • process.*, where the target process is owned by the user.
2204
2205 • timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2206
2207 • timer.hz(n)
2208
2209
2210 SCRIPT LANGUAGE RESTRICTIONS
2211 The following scripting language features are unavailable to all un‐
2212 privileged users:
2213
2214
2215 • any feature enabled by the Guru Mode (-g) option.
2216
2217 • embedded C code.
2218
2219
2220 RUNTIME RESTRICTIONS
2221 The following runtime restrictions are placed upon all unprivileged
2222 users:
2223
2224 • Only the default runtime code (see -R) may be used.
2225
2226 Additional restrictions are placed on members of only the group sta‐
2227 pusr:
2228
2229 • Probing of processes owned by other users is not permitted.
2230
2231 • Access of kernel memory (read and write) is not permitted.
2232
2233
2234 COMMAND LINE OPTION RESTRICTIONS
2235 Some command line options provide access to features which must not be
2236 available to all unprivileged users:
2237
2238
2239 • -g may not be specified.
2240
2241 • The following options may not be used by the compile-server
2242 client:
2243
2244 -a, -B, -D, -I, -r, -R
2245
2246
2247
2248 ENVIRONMENT RESTRICTIONS
2249 The following environment variables must not be set for all unprivi‐
2250 leged users:
2251
2252 SYSTEMTAP_RUNTIME
2253 SYSTEMTAP_TAPSET
2254 SYSTEMTAP_DEBUGINFO_PATH
2255
2256
2257
2258 TAPSET RESTRICTIONS
2259 In general, tapset functions are only available for members of the
2260 group stapusr when they do not gather information that an ordinary pro‐
2261 gram running with that user's privileges would be denied access to.
2262
2263 There are two categories of unprivileged tapset functions. The first
2264 category consists of utility functions that are unconditionally avail‐
2265 able to all users; these include such things as:
2266
2267 cpu:long ()
2268 exit ()
2269 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2270
2271
2272 The second category consists of so-called myproc-unprivileged functions
2273 that can only gather information within their own processes. Scripts
2274 that wish to use these functions must test the result of the tapset
2275 function is_myproc and only call these functions if the result is 1.
2276 The script will exit immediately if any of these functions are called
2277 by an unprivileged user within a probe within a process which is not
2278 owned by that user. Examples of myproc-unprivileged functions include:
2279
2280 print_usyms (stk:string)
2281 user_int:long (addr:long)
2282 usymname:string (addr:long)
2283
2284
2285 A compile error is triggered when any function not in either of the
2286 above categories is used by members of only the group stapusr.
2287
2288 No other built-in tapset functions may be used by members of only the
2289 group stapusr.
2290
2291
2293 As described above, systemtap's default runtime mode involves building
2294 and loading kernel modules, with various security tradeoffs presented.
2295 Systemtap now includes two new prototype backends: --runtime=dyninst
2296 and --runtime=bpf.
2297
2298 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2299 runtime. This backend does not use kernel modules, and does not require
2300 root privileges, but is restricted with respect to the kinds of probes
2301 and other constructs that a script may use. dyninst runtime operates in
2302 target-attach mode, so it does require a -c COMMAND or -x PID process.
2303 For example:
2304
2305 stap --runtime=dyninst -c 'stap -V' \
2306 -e 'probe process.function("main")
2307 { println("hi from dyninst!") }'
2308
2309
2310 It may be necessary to disable a conflicting selinux check with
2311
2312 # setsebool allow_execstack 1
2313
2314
2315 --runtime=bpf compiles the user script into extended Berkeley Packet
2316 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2317 verified by the kernel for safety and are executed by an in-kernel vir‐
2318 tual machine. This runtime is in an early stage of development and
2319 currently lacks support for a number of features available in the de‐
2320 fault runtime. Please see the stapbpf(8) man page for more information.
2321
2322
2324 The systemtap translator generally returns with a success code of 0 if
2325 the requested script was processed and executed successfully through
2326 the requested pass. Otherwise, errors may be printed to stderr and a
2327 failure code is returned. Use -v or -vp N to increase (global or per-
2328 pass) verbosity to identify the source of the trouble.
2329
2330 In listings mode (-l and -L), error messages are normally suppressed.
2331 A success code of 0 is returned if at least one matching probe was
2332 found.
2333
2334 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2335 considered to be successful.
2336
2337
2339 Over time, some features of the script language and the tapset library
2340 may undergo incompatible changes, so that a script written against an
2341 old version of systemtap may no longer run. In these cases, it may
2342 help to run systemtap with the --compatible VERSION flag, specifying
2343 the last known working version. Running systemtap with the
2344 --check-version flag will output a warning if any possible incompatible
2345 elements have been parsed. Deprecation historical details may be found
2346 in the NEWS file.
2347
2348 The purpose of deprecation facility is to improve the experience of
2349 scripts written for newer versions of systemtap (by adding better al‐
2350 ternatives and removing conflicting or messy older alternatives), while
2351 at the same time permitting scripts written for older versions of sys‐
2352 temtap to continue running. Deprecation is thus intended a service to
2353 users (and an inconvenience to systemtap's developers), rather than the
2354 other way around.
2355
2356 Please note that underscore-prefixed identifiers in the tapset some‐
2357 times undergo such changes that are difficult to preserve compatibility
2358 for, even with the deprecation mechanisms. Avoid relying on these in
2359 your scripts; instead propose them for promotion to non-underscored
2360 status.
2361
2362
2363
2365 Important files and their corresponding paths can be located in the
2366 stappaths (7) manual page.
2367
2368
2370 stapprobes(3stap),
2371 function::*[24m(3stap),
2372 probe::*[24m(3stap),
2373 tapset::*[24m(3stap),
2374 stappaths(7),
2375 staprun(8),
2376 stapdyn(8),
2377 systemtap(8),
2378 stapvars(3stap),
2379 stapex(3stap),
2380 stap-server(8),
2381 stap-prep(1),
2382 stapref(1),
2383 awk(1),
2384 gdb(1)
2385
2386
2388 Use the Bugzilla link of the project web page or our mailing list.
2389 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2390
2391 error::reporting(7stap),
2392 https://sourceware.org/systemtap/wiki/HowToReportBugs
2393
2394
2395
2396 STAP(1)