1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a DynInst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use. For
46 online documentation, see the project web site,
47 https://sourceware.org/systemtap/
48
49
50 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
51 │man pages │ │
52 ├──────────────────────────┼──────────────────────────────────────────────────────┤
53 │stap (this page) │ language syntax, concepts, operation, options │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stapprobes │ probe points and their $context variables │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │stapref │ quick reference to language syntax │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │stappaths │ list of directories, including books & references │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stap-prep │ program to install auxiliary dependencies like ker‐ │
62 │ │ nel debuginfo │
63 ├──────────────────────────┼──────────────────────────────────────────────────────┤
64 │tapset::* │ generated list of tapsets │
65 ├──────────────────────────┼──────────────────────────────────────────────────────┤
66 │probe::* │ generated list of tapset probe aliases │
67 ├──────────────────────────┼──────────────────────────────────────────────────────┤
68 │function::* │ generated list of tapset functions │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │macro::* │ generated list of tapset macros │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │stapvars │ some of the tapset global variables │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │systemtap │ initscript, boot-time probing │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stap-server │ compilation server │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │stapex │ a few very basic script examples │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │books │ │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │Beginner's Guide │ tutorial book, language essentials, examples │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │Tutorial │ shorter tutorial, exercises │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │Language Reference │ detailed language manual, covers statistics/analysis │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Tapset Reference │ the tapset man pages, reformatted into a book │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │references │ │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
95 │ │ hacks to learn from │
96 └──────────────────────────┴──────────────────────────────────────────────────────┘
97
99 The systemtap translator supports the following options. Any other op‐
100 tion prints a list of supported options. Options may be given on the
101 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
102 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
103 faults to $HOME/.systemtap if unset.)
104
105
106 In some cases, the default value of an option depends on particular
107 system configuration and thus can't be mentioned here directly. In
108 some of those cases running "stap --help" might display the default.
109
110
111 - Use standard input instead of a given FILENAME as probe language
112 input, unless -e SCRIPT is given.
113
114 -h --help
115 Show help message.
116
117 -V --version
118 Show version message.
119
120 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
121 rate, translate, compile, run. See the PROCESSING section for
122 details.
123
124 -v Increase verbosity for all passes. Produce a larger volume of
125 informative (?) output each time option repeated.
126
127 --vp ABCDE
128 Increase verbosity on a per-pass basis. For example, "--vp 002"
129 adds 2 units of verbosity to pass 3 only. The combination
130 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
131 more for pass 5.
132
133 -k Keep the temporary directory after all processing. This may be
134 useful in order to examine the generated C code, or to reuse the
135 compiled kernel object.
136
137 -g Guru mode. Enable parsing of unsafe expert-level constructs
138 like embedded C.
139
140 -P Prologue-searching mode. This is equivalent to --pro‐
141 logue-searching=always. Activate heuristics to work around in‐
142 correct debugging information for function parameter $context
143 variables.
144
145 -u Unoptimized mode. Disable unused code elision and many other
146 optimizations during elaboration / translation.
147
148 -w Suppressed warnings mode. Disables all warning messages.
149
150 -W Treat all warnings as errors.
151
152 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
153 Use the stap-merge program to multiplex them back together lat‐
154 er.
155
156 -i --interactive
157 Interactive mode. Enable an interface to build the systemtap
158 script incrementally and interactively.
159
160 -t Collect timing information on the number of times probe executes
161 and average amount of time spent in each probe-point. Also shows
162 the derivation for each probe-point.
163
164 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer. On a
165 multiprocessor in bulk mode, this is a per-processor amount.
166
167 -I DIR Add the given directory to the tapset search directory. See the
168 description of pass 2 for details.
169
170 -D NAME=VALUE
171 Add the given C preprocessor directive to the module Makefile.
172 These can be used to override limit parameters described below.
173
174 -B NAME=VALUE
175 In kernel-runtime mode, add the given make directive to the ker‐
176 nel module build's make invocation. These can be used to add or
177 override kconfig options. For example, use
178
179 -B CONFIG_DEBUG_INFO=y
180
181 to add debugging information.
182
183 -B FLAG
184 In dyninst-runtime mode, add the given parameter to the compiler
185 CFLAGS used for building the dyninst shared library. For exam‐
186 ple, use
187
188 -B -g
189
190 to add debugging information.
191
192 -a ARCH
193 Use a cross-compilation mode for the given target architecture.
194 This requires access to the cross-compiler and the kernel build
195 tree, and goes along with the
196
197 -B CROSS_COMPILE=arch-tool-prefix-
198 and
199 -r /build/tree
200
201 options.
202
203 --modinfo NAME=VALUE
204 Add the name/value pair as a MODULE_INFO macro call to the gen‐
205 erated module. This may be useful to inform or override various
206 module-related checks in the kernel.
207
208 -G NAME=VALUE
209 Sets the value of global variable NAME to VALUE when staprun is
210 invoked. This applies to scalar variables declared global in
211 the script/tapset.
212
213 -R DIR Look for the systemtap runtime sources in the given directory.
214 Your DIR default can be seen using "stap --help".
215
216 -r /DIR
217 Build for kernel in given build tree. Can also be set with the
218 SYSTEMTAP_RELEASE environment variable.
219
220 -r RELEASE
221 Build for kernel in build tree /lib/modules/RELEASE/build. Can
222 also be set with the SYSTEMTAP_RELEASE environment variable.
223
224 -m MODULE
225 Use the given name for the generated kernel object module, in‐
226 stead of a unique randomized name. The generated kernel object
227 module is copied to the current directory.
228
229 -d MODULE
230 Add symbol/unwind information for the given module into the ker‐
231 nel object module. This may enable symbolic tracebacks from
232 those modules/programs, even if they do not have an explicit
233 probe placed into them.
234
235 --ldd Add symbol/unwind information for all user-space shared li‐
236 braries suspected by ldd to be necessary for user-space binaries
237 being probed or listed with the -d option. Caution: this can
238 make the probe modules considerably larger. Note that this op‐
239 tion does not deal with kernel-space modules: see instead
240 --all-modules below.
241
242 --all-modules
243 Equivalent to specifying "-dkernel" and a "-d" for each kernel
244 module that is currently loaded. Caution: this can make the
245 probe modules considerably larger.
246
247 -o FILE
248 Send standard output to named file. In bulk mode, percpu files
249 will start with FILE_ (FILE_cpu with -F) followed by the cpu
250 number. This supports strftime(3) formats for FILE.
251
252 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
253 so has the effect of setting target() to the pid of the command
254 ran.
255
256 -x PID Sets target() to PID. This allows scripts to be written that
257 filter on a specific process. Scripts run independent of the
258 PID's lifespan.
259
260 -e SCRIPT
261 Run the given SCRIPT specified on the command line.
262
263 -E SCRIPT
264 Run the given SCRIPT specified. This SCRIPT is run in addition
265 to the main script specified, through -e, or as a script file.
266 This option can be repeated to run multiple scripts, and can be
267 used in listing mode (-l/-L).
268
269 -l PROBE
270 Instead of running a probe script, just list all available probe
271 points matching the given single probe point. The pattern may
272 include wildcards and aliases, but not comma-separated multiple
273 probe points. The process result code will indicate failure if
274 there are no matches.
275
276 % stap -e 'probe syscall.* { }'
277 [...]
278 % stap -l 'syscall.*'
279 syscall.accept
280 [...]
281 syscall.writev
282
283
284 -L PROBE
285 Similar to "-l", but list matching probe points plus their
286 available context variables. When -v is set with -L, the output
287 includes duplicate probe points which are distinguished by their
288 PC address.
289
290 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
291 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
292 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
293 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
294 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
295 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
296 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
297 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
298
299
300 -F Without -o option, load module and start probes, then detach
301 from the module leaving the probes running. With -o option, run
302 staprun in background as a daemon and show its pid.
303
304 -S size[,N]
305 Sets the maximum size of output file and the maximum number of
306 output files. If the size of output file will exceed size ,
307 systemtap switches output file to the next file. And if the num‐
308 ber of output files exceed N , systemtap removes the oldest out‐
309 put file. You can omit the second argument.
310
311 -T TIMEOUT
312 Exit the script after TIMEOUT seconds.
313
314 --skip-badvars
315 Ignore unresolvable or run-time-inaccessible context variables
316 and substitute with 0, without errors.
317
318
319 --prologue-searching[=WHEN]
320 Prologue-searching mode. Activate heuristics to work around in‐
321 correct debugging information for function parameter $context
322 variables. WHEN can be either "never", "always", or "auto" (i.e.
323 enabled by heuristic). If WHEN is missing, then "always" is as‐
324 sumed. If the option is missing, then "auto" is assumed.
325
326
327 --suppress-handler-errors
328 Wrap all probe handlers into something like this
329
330 try { ... } catch { next }
331
332 block, which causes any runtime errors to be quietly suppressed.
333 Suppressed errors do not count against MAXERRORS limits. In
334 this mode, the MAXSKIPPED limits are also suppressed, so that
335 many errors and skipped probes may be accumulated during a
336 script's runtime. Any overall counts will still be reported at
337 shutdown.
338
339
340 --compatible VERSION
341 Suppress recent script language or tapset changes which are in‐
342 compatible with given older version of systemtap. This may be
343 useful if a much older systemtap script fails to run. See the
344 DEPRECATION section for more details.
345
346
347 --check-version
348 This option is used to check if the active script has any con‐
349 structs that may be systemtap version specific. See the DEPRE‐
350 CATION section for more details.
351
352
353 --clean-cache
354 This option prunes stale entries from the cache directory. This
355 is normally done automatically after successful runs, but this
356 option will trigger the cleanup manually and then exit. See the
357 CACHING section for more details about cache limits.
358
359
360 --color[=WHEN], --colour[=WHEN]
361 This option controls coloring of error messages. WHEN can be ei‐
362 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
363 minal). If WHEN is missing, then "always" is assumed. If the op‐
364 tion is missing, then "auto" is assumed.
365
366 Colors can be modified using the SYSTEMTAP_COLORS environment
367 variable. The format must be of the form
368 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
369 "warning", "source", "caret", and "token". Values constitute
370 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
371 mentation of your terminal for the SGRs it supports. As an exam‐
372 ple, the default colors would be expressed as
373 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
374 SYSTEMTAP_COLORS is absent, the default colors will be used. If
375 it is empty or invalid, coloring is turned off.
376
377
378 --disable-cache
379 This option disables all use of the cache directory. No files
380 will be either read from or written to the cache.
381
382
383 --poison-cache
384 This option treats files in the cache directory as invalid. No
385 files will be read from the cache, but resulting files from this
386 run will still be written to the cache. This is meant as a
387 troubleshooting aid when stap's cached behavior seems to be mis‐
388 behaving. If it helped, there is a probably a bug in systemtap
389 that the developers would like you to report.
390
391
392 --privilege[=stapusr | =stapsys | =stapdev]
393 This option instructs stap to examine the script looking for
394 constructs which are not allowed for the specified privilege
395 level (see UNPRIVILEGED USERS). Compilation fails if any such
396 constructs are used. If stapusr or stapsys are specified when
397 using a compile server (see --use-server), the server will exam‐
398 ine the script and, if compilation succeeds, the server will
399 cryptographically sign the resulting kernel module, certifying
400 that is it safe for use by users at the specified privilege lev‐
401 el.
402
403 If --privilege has not been specified, -pN has not been speci‐
404 fied with N < 5, and the invoking user is not root, and is not a
405 member of the group stapdev, then stap will automatically add
406 the appropriate --privilege option to the options already speci‐
407 fied.
408
409
410 --unprivileged
411 This option is equivalent to --privilege=stapusr.
412
413
414 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
415 Specify compile-server(s) to be used for compilation and/or in
416 conjunction with --list-servers and --trust-servers (see below)
417 for listing. If no argument is supplied, then the default in un‐
418 privileged mode (see --privilege) is to select compatible
419 servers which are trusted as SSL peers and as module signers and
420 currently online. Otherwise the default is to select compatible
421 servers which are trusted as SSL peers and currently online.
422 --use-server may be specified more than once, in which case a
423 list of servers is accumulated in the order specified. Servers
424 may be specified by host name, ip address, or by certificate se‐
425 rial number (obtained using --list-servers). The latter is most
426 commonly used when adding or revoking trust in a server (see
427 --trust-servers below). If a server is specified by host name or
428 ip address, then an optional port number may be specified. This
429 is useful for accessing servers which are not on the local net‐
430 work or to specify a particular server.
431
432 IP addresses may be IPv4 or IPv6 addresses.
433
434 If a particular IPv6 address is link local and exists on more
435 than one interface, the intended interface may be specified by
436 appending the address with a percent sign (%) followed by the
437 intended interface name. For example,
438 "fe80::5eff:35ff:fe07:55ca%eth0".
439
440 In order to specify a port number with an IPv6 address, it is
441 necessary to enclose the IPv6 address in square brackets ([]) in
442 order to separate the port number from the rest of the address.
443 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
444 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
445
446 If --use-server has not been specified, -pN has not been speci‐
447 fied with N < 5, and the invoking user not root, is not a member
448 of the group stapdev, but is a member of the group stapusr, then
449 stap will automatically add --use-server to the options already
450 specified.
451
452
453 --use-server-on-error[=yes|=no]
454 Instructs stap to retry compilation of a script using a compile
455 server if compilation on the local host fails in a manner which
456 suggests that it might succeed using a server. If this option
457 is not specified, the default is no. If no argument is provid‐
458 ed, then the default is yes. Compilation will be retried for
459 certain types of errors (e.g. insufficient data or resources)
460 which may not occur during re-compilation by a compile server.
461 Compile servers will be selected automatically for the re-compi‐
462 lation attempt as if --use-server was specified with no argu‐
463 ments.
464
465
466 --list-servers[=SERVERS]
467 Display the status of the requested SERVERS, where SERVERS is a
468 comma-separated list of server attributes. The list of at‐
469 tributes is combined to filter the list of servers displayed.
470 Supported attributes are:
471
472 all specifies all known servers (trusted SSL peers, trusted
473 module signers, online servers).
474
475 specified
476 specifies servers specified using --use-server.
477
478 online filters the output by retaining information about servers
479 which are currently online.
480
481 trusted
482 filters the output by retaining information about servers
483 which are trusted as SSL peers.
484
485 signer filters the output by retaining information about servers
486 which are trusted as module signers (see --privilege).
487
488 compatible
489 filters the output by retaining information about servers
490 which are compatible with the current kernel release and
491 architecture.
492
493 If no argument is provided, then the default is specified. If
494 no servers were specified using --use-server, then the default
495 servers for --use-server are listed.
496
497 Note that --list-servers uses the avahi-daemon service to detect
498 online servers. If this service is not available, then
499 --list-servers will fail to detect any online servers. In order
500 for --list-servers to detect servers listening on IPv6 address‐
501 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
502 mon.conf must contain an active "use-ipv6=yes" line. The service
503 must be restarted after adding this line in order for IPv6 to be
504 enabled.
505
506
507 --trust-servers[=TRUST_SPEC]
508 Grant or revoke trust in compile-servers, specified using
509 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
510 comma-separated list specifying the trust which is to be granted
511 or revoked. Supported elements are:
512
513 ssl trust the specified servers as SSL peers.
514
515 signer trust the specified servers as module signers (see
516 --privilege). Only root can specify signer.
517
518 all-users
519 grant trust as an ssl peer for all users on the local
520 host. The default is to grant trust as an ssl peer for
521 the current user only. Trust as a module signer is always
522 granted for all users. Only root can specify all-users.
523
524 revoke revoke the specified trust. The default is to grant it.
525
526 no-prompt
527 do not prompt the user for confirmation before carrying
528 out the requested action. The default is to prompt the
529 user for confirmation.
530
531 If no argument is provided, then the default is ssl. If no
532 servers were specified using --use-server, then no trust will be
533 granted or revoked.
534
535 Unless no-prompt has been specified, the user will be prompted
536 to confirm the trust to be granted or revoked before the opera‐
537 tion is performed.
538
539
540 --dump-probe-types
541 Dumps a list of supported probe types and exits. If --privi‐
542 lege=stapusr is also specified, the list will be limited to
543 probe types available to unprivileged users.
544
545
546 --dump-probe-aliases
547 Dumps a list of all probe aliases found in library files and ex‐
548 its.
549
550
551 --dump-functions
552 Dumps a list of all the public functions found in library files
553 and exits. Also includes their parameters and types. A function
554 of type 'unknown' indicates a function that does not return a
555 value. Note that not all function/parameter types may be re‐
556 solved (these are also shown by 'unknown'). This features is
557 very memory-intensive and thus may not work properly with --use-
558 server if the target server imposes an rlimit on process memory
559 (i.e. through the ~stap-server/.systemtap/rc configuration file,
560 see stap-server(8)).
561
562
563 --remote URL
564 Set the execution target to the given host. This option may be
565 repeated to target multiple execution targets. Passes 1-4 are
566 completed locally as normal to build the script, and then pass 5
567 will copy the module to the target and run it. Acceptable URL
568 forms include:
569
570 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
571 This mode uses ssh, optionally using a username not
572 matching your own. If a custom ssh_config file is in use,
573 add SendEnv LANG to retain internationalization function‐
574 ality.
575
576 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
577 This mode uses stapvirt to execute the script on a domain
578 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
579 fied to connect to a specific driver and/or a remote
580 host. For example, to connect to the local privileged QE‐
581 MU driver, use:
582
583 --remote libvirt://MyDomain/qemu:///system
584
585 See the page at <http://libvirt.org/uri.html> for sup‐
586 ported URIs. Also see stapvirt(1) for more information on
587 how to prepare the domain for stap probing.
588
589 unix:PATH
590 This mode connects to a UNIX socket. This can be used
591 with a QEMU virtio-serial port for executing scripts in‐
592 side a running virtual machine.
593
594 direct://
595 Special loopback mode to run on the local host.
596
597 --remote-prefix
598 Prefix each line of remote output with "N: ", where N is the in‐
599 dex of the remote execution target from which the given line
600 originated.
601
602
603 --download-debuginfo[=OPTION]
604 Enable, disable or set a timeout for the automatic debuginfo
605 downloading feature offered by abrt as specified by OPTION,
606 where OPTION is one of the following:
607
608 yes enable automatic downloading of debuginfo with no time‐
609 out. This is the same as not providing an OPTION value to
610 --download-debuginfo
611
612 no explicitly disable automatic downloading of debuginfo.
613 This is the same as not using the option at all.
614
615 ask show abrt output, and ask before continuing download. No
616 timeout will be set.
617
618 <timeout>
619 specify a timeout as a positive number to stop the down‐
620 load if it is taking longer than <timeout> seconds.
621
622 --rlimit-as=NUM
623 Specify the maximum size of the process's virtual memory (ad‐
624 dress space), in bytes.
625
626
627 --rlimit-cpu=NUM
628 Specify the CPU time limit, in seconds.
629
630
631 --rlimit-nproc=NUM
632 Specify the maximum number of processes that can be created.
633
634
635 --rlimit-stack=NUM
636 Specify the maximum size of the process stack, in bytes.
637
638
639 --rlimit-fsize=NUM
640 Specify the maximum size of files that the process may create,
641 in bytes.
642
643
644 --sysroot=DIR
645 Specify sysroot directory where target files (executables, li‐
646 braries, etc.) are located. With -r RELEASE, the sysroot will
647 be searched for the appropriate kernel build directory. With -r
648 /DIR, however, the sysroot will not be used to find the kernel
649 build.
650
651
652 --sysenv=VAR=VALUE
653 Provide an alternate value for an environment variable where the
654 value on a remote system differs. Path variables (e.g. PATH,
655 LD_LIBRARY_PATH) are assumed to be relative to the directory
656 provided by --sysroot, if provided.
657
658
659 --suppress-time-limits
660 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
661 and -DMAXTRYLOCK. This option requires guru mode.
662
663
664 --runtime=MODE
665 Set the pass-5 runtime mode. Valid options are kernel (de‐
666 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
667 information.
668
669
670 --dyninst
671 Shorthand for --runtime=dyninst.
672
673
674 --bpf Shorthand for --runtime=bpf.
675
676
677 --save-uprobes
678 On machines that require SystemTap to build its own uprobes mod‐
679 ule (kernels prior to version 3.5), this option instructs Sys‐
680 temTap to also save a copy of the module in the current directo‐
681 ry (creating a new "uprobes" directory first).
682
683
684 --target-namespaces=PID
685 Allow for a set of target namespaces to be set based on the
686 namespaces the given PID is in. This is for namespace-aware
687 tapset functions. If the target namespaces was not set, the tar‐
688 get defaults to the stap process' namespaces.
689
690
691 --monitor=INTERVAL
692 Enables an interface to display status information about the
693 module(uptime, module name, invoker uid, memory sizes, global
694 variables, list of probes with their statistics). An optional
695 argument INTERVAL can be supplied to set the refresh rate in
696 seconds of the status window. The module can also be controlled
697 by a list of commands using the following keys:
698
699 c Resets all global variables to their initial values or
700 zeroes them if they did not have an initial value.
701
702 s Rotates the attribute used to sort the list of probes.
703
704 t Brings up a prompt to allow toggling(on/off) of probes by
705 index. Probe points are still affected by their condi‐
706 tions.
707
708 r Resumes the script by toggling on all probes.
709
710 p Pauses the script by toggling off all probes.
711
712 x Hides/shows the status window. This allows for more out‐
713 put to be seen.
714
715 navigation-keys
716 The navigation keys can be used to scroll up and down the
717 windows.
718
719 Tab Toggle scrolling between status and output windows.
720
721
722 --example
723 This option is used to run example scripts without having to en‐
724 ter the entire path to the script. Example scripts can be found
725 in the directory specified in the stappaths(7) manual page.
726
727
728 --no-global-var-display
729 This option is used to disable the automatic logging of unused
730 global variables at the end of a stap session.
731
732
734 Any additional arguments on the command line are passed to the script
735 parser for substitution. See below.
736
737
739 The systemtap script language resembles awk and C. There are two main
740 outermost constructs: probes and functions. Within these, statements
741 and expressions use C-like operator syntax and precedence.
742
743
744 GENERAL SYNTAX
745 Whitespace is ignored. Three forms of comments are supported:
746 # ... shell style, to the end of line, except for $# and @#
747 // ... C++ style, to the end of line
748 /* ... C style ... */
749 Literals are either strings enclosed in double-quotes (passing through
750 the usual C escape codes with backslashes, and with adjacent string
751 literals glued together, also as in C), or integers (in decimal, hexa‐
752 decimal, or octal, using the same notation as in C). All strings are
753 limited in length to some reasonable value (a few hundred bytes). In‐
754 tegers are 64-bit signed quantities, although the parser also accepts
755 (and wraps around) values above positive 2**63.
756
757 In addition, script arguments given at the end of the command line may
758 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
759 insertion as a string literal. The number of arguments may be accessed
760 through $# (as an unquoted number) or through @# (as a quoted number).
761 These may be used at any place a token may begin, including within the
762 preprocessing stage. Reference to an argument number beyond what was
763 actually given is an error.
764
765
766 PREPROCESSING
767 A simple conditional preprocessing stage is run as a part of parsing.
768 The general form is similar to the cond ? exp1 : exp2 ternary operator:
769
770 %( CONDITION %? TRUE-TOKENS %)
771 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
772
773 The CONDITION is either an expression whose format is determined by its
774 first keyword, or a string literals comparison or a numeric literals
775 comparison. It can be also composed of many alternatives and conjunc‐
776 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
777 spectively. However, parentheses are not supported yet, so remembering
778 that conjunction takes precedence over alternative is important.
779
780 If the first part is the identifier kernel_vr or kernel_v to refer to
781 the kernel version number, with ("2.6.13-1.322FC3smp") or without
782 ("2.6.13") the release code suffix, then the second part is one of the
783 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
784 the third part is a string literal that contains an RPM-style version-
785 release value. The condition is deemed satisfied if the version of the
786 target kernel (as optionally overridden by the -r option) compares to
787 the given version string. The comparison is performed by the glibc
788 function strverscmp. As a special case, if the operator is for simple
789 equality (==), or inequality (!=), and the third part contains any
790 wildcard characters (* or ? or [), then the expression is treated as a
791 wildcard (mis)match as evaluated by fnmatch.
792
793 If, on the other hand, the first part is the identifier arch to refer
794 to the processor architecture (as named by the kernel build system
795 ARCH/SUBARCH), then the second part is one of the two string comparison
796 operators == or !=, and the third part is a string literal for matching
797 it. This comparison is a wildcard (mis)match.
798
799 Similarly, if the first part is an identifier like CONFIG_something to
800 refer to a kernel configuration option, then the second part is == or
801 !=, and the third part is a string literal for matching the value (com‐
802 monly "y" or "m"). Nonexistent or unset kernel configuration options
803 are represented by the empty string. This comparison is also a wild‐
804 card (mis)match.
805
806 If the first part is the identifier systemtap_v, the test refers to the
807 systemtap compatibility version, which may be overridden for old
808 scripts with the --compatible flag. The comparison operator is as is
809 for kernel_v and the right operand is a version string. See also the
810 DEPRECATION section below.
811
812 If the first part is the identifier systemtap_privilege, the test
813 refers to the privilege level that the systemtap script is compiled
814 with. Here the second part is == or !=, and the third part is a string
815 literal, either "stapusr" or "stapsys" or "stapdev".
816
817 If the first part is the identifier guru_mode, the test refers to if
818 the systemtap script is compiled with guru_mode. Here the second part
819 is == or !=, and the third part is a number, either 1 or 0.
820
821 If the first part is the identifier runtime, the test refers to the
822 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
823 tion on runtimes. The second part is one of the two string comparison
824 operators == or !=, and the third part is a string literal for matching
825 it. This comparison is a wildcard (mis)match.
826
827 Otherwise, the CONDITION is expected to be a comparison between two
828 string literals or two numeric literals. In this case, the arguments
829 are the only variables usable.
830
831 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
832 (possibly including nested preprocessor conditionals), and are passed
833 into the input stream if the condition is true or false. For example,
834 the following code induces a parse error unless the target kernel ver‐
835 sion is newer than 2.6.5:
836
837 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
838
839 The following code might adapt to hypothetical kernel version drift:
840
841 probe kernel.function (
842 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
843 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
844 UNSUPPORTED %) %)
845 ) { /* ... */ }
846
847 %( arch == "ia64" %?
848 probe syscall.vliw = kernel.function("vliw_widget") {}
849 %)
850
851
852
853 PREPROCESSOR MACROS
854 The preprocessor also supports a simple macro facility, run as a sepa‐
855 rate pass before conditional preprocessing.
856
857 Macros are defined using the following construct:
858
859 @define NAME %( BODY %)
860 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
861
862 Macros, and parameters inside a macro body, are both invoked by prefix‐
863 ing the macro name with an @ symbol:
864
865 @define foo %( x %)
866 @define add(a,b) %( ((@a)+(@b)) %)
867
868 @foo = @add(2,2)
869
870
871 Macro expansion is currently performed in a separate pass before condi‐
872 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
873 tional expressions will be macroexpanded regardless of how the condi‐
874 tion is evaluated. This can sometimes lead to errors:
875
876 // The following results in a conflict:
877 %( CONFIG_UTRACE == "y" %?
878 @define foo %( process.syscall %)
879 %:
880 @define foo %( **ERROR** %)
881 %)
882
883 // The following works properly as expected:
884 @define foo %(
885 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
886 %)
887
888 The first example is incorrect because both @defines are evaluated in a
889 pass prior to the conditional being evaluated.
890
891 Normally, a macro definition is local to the file it occurs in. Thus,
892 defining a macro in a tapset does not make it available to the user of
893 the tapset. Publically available library macros can be defined by in‐
894 cluding .stpm files on the tapset search path. These files may only
895 contain @define constructs, which become visible across all tapsets and
896 user scripts. Optionally, within the .stpm files, a public macro defi‐
897 nition can be surrounded by a preprocessor conditional as described
898 above.
899
900
901 CONSTANTS
902 Tapsets or guru-mode user scripts can access header file constant to‐
903 kens, typically macros, using built-in @const() operator. The respec‐
904 tive header file inclusion is possible either via the tapset library,
905 or using a top-level guru mode embedded-C construct. This results in
906 appropriate embedded C pragma comments setting.
907
908 @const("STP_SKIP_BADVARS")
909
910
911
912 VARIABLES
913 Identifiers for variables and functions are an alphanumeric sequence,
914 and may include _ and $ characters. They may not start with a plain
915 digit, as in C. Each variable is by default local to the probe or
916 function statement block within which it is mentioned, and therefore
917 its scope and lifetime is limited to a particular probe or function in‐
918 vocation.
919
920 Scalar variables are implicitly typed as either string or integer. As‐
921 sociative arrays also have a string or integer value, and a tuple of
922 strings and/or integers serving as a key. Here are a few basic expres‐
923 sions.
924
925 var1 = 5
926 var2 = "bar"
927 array1 [pid()] = "name" # single numeric key
928 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
929 if (["hello",5,4] in array2) println ("yes") # membership test
930
931
932 The translator performs type inference on all identifiers, including
933 array indexes and function parameters. Inconsistent type-related use
934 of identifiers signals an error.
935
936 Variables may be declared global, so that they are shared amongst all
937 probes and functions and live as long as the entire systemtap session.
938 There is one namespace for all global variables, regardless of which
939 script file they are found within. Concurrent access to global vari‐
940 ables is automatically protected with locks, see the SAFETY AND SECURI‐
941 TY section for more details. A global declaration may be written at
942 the outermost level anywhere, not within a block of code. Global vari‐
943 ables which are written but never read will be displayed automatically
944 at session shutdown. The translator will infer for each its value
945 type, and if it is used as an array, its key types. Optionally, scalar
946 globals may be initialized with a string or number literal. The fol‐
947 lowing declaration marks variables as global.
948
949 global var1, var2, var3=4
950
951
952 Global variables can also be set as module options. One can do this by
953 either using the -G option, or the module must first be compiled using
954 stap -p4. Global variables can then be set on the command line when
955 calling staprun on the module generated by stap -p4. See staprun(8) for
956 more information.
957
958 The scope of a global variable may be limited to a tapset or user
959 script file using private keyword. The global keyword is optional when
960 defining a private global variable. Following declaration marks var1
961 and var2 private globals.
962
963 private global var1=2
964 private var2
965
966
967 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
968 SAFETY AND SECURITY section for details. Optionally, global arrays may
969 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
970 for that array only. Note that this doesn't indicate the type of keys
971 for the array, just the size.
972
973 global tiny_array[10], normal_array, big_array[50000]
974
975
976 Arrays may be configured for wrapping using the '%' suffix. This caus‐
977 es older elements to be overwritten if more elements are inserted than
978 the array can hold. This works for both associative and statistics
979 typed arrays.
980
981 global wrapped_array1%[10], wrapped_array2%
982
983
984
985 Many types of probe points provide context variables, which are run-
986 time values, safely extracted from the kernel or userspace program be‐
987 ing probed. These are prefixed with the $ character. The CONTEXT
988 VARIABLES section in stapprobes(3stap) lists what is available for each
989 type of probe point. These context variables become normal string or
990 numeric scalars once they are stored in normal script variables. See
991 the TYPECASTING section below on how to to turn them back into typed
992 pointers for further processing as context variables.
993
994
995 STATEMENTS
996 Statements enable procedural control flow. They may occur within func‐
997 tions and probe handlers. The total number of statements executed in
998 response to any single probe event is limited to some number defined by
999 the MAXACTION macro in the translated C code, and is in the neighbour‐
1000 hood of 1000.
1001
1002 EXP Execute the string- or integer-valued expression and throw away
1003 the value.
1004
1005 { STMT1 STMT2 ... }
1006 Execute each statement in sequence in this block. Note that
1007 separators or terminators are generally not necessary between
1008 statements.
1009
1010 ; Null statement, do nothing. It is useful as an optional separa‐
1011 tor between statements to improve syntax-error detection and to
1012 handle certain grammar ambiguities.
1013
1014 if (EXP) STMT1 [ else STMT2 ]
1015 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1016 ro) or second STMT (zero).
1017
1018 while (EXP) STMT
1019 While integer-valued EXP evaluates to non-zero, execute STMT.
1020
1021 for (EXP1; EXP2; EXP3) STMT
1022 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1023 STMT, then the iteration expression EXP3.
1024
1025 foreach (VAR in ARRAY [ limit EXP ]) STMT
1026 Loop over each element of the named global array, assigning cur‐
1027 rent key to VAR. The array may not be modified within the
1028 statement. By adding a single + or - operator after the VAR or
1029 the ARRAY identifier, the iteration will proceed in a sorted or‐
1030 der, by ascending or descending index or value. If the array
1031 contains statistics aggregates, adding the desired @operator be‐
1032 tween the ARRAY identifier and the + or - will specify the sort‐
1033 ing aggregate function. See the STATISTICS section below for
1034 the ones available. Default is @count. Using the optional lim‐
1035 it keyword limits the number of loop iterations to EXP times.
1036 EXP is evaluated once at the beginning of the loop.
1037
1038 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1039 Same as above, used when the array is indexed with a tuple of
1040 keys. A sorting suffix may be used on at most one VAR or ARRAY
1041 identifier.
1042
1043 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1044 ]) STMT
1045 Same as above, where iterations are limited to elements in the
1046 array where the keys match the index values specified. The sym‐
1047 bol * can be used to specify an index and will be treated as a
1048 wildcard.
1049
1050 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1051 This variant of foreach saves current value into VAR0 on each
1052 iteration, so it is the same as ARRAY[VAR]. This also works
1053 with a tuple of keys. Sorting suffixes on VAR0 have the same
1054 effect as on ARRAY.
1055
1056 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1057 Same as above, where iterations are limited to elements in the
1058 array where the keys match the index values specified. The sym‐
1059 bol * can be used to specify an index and will be treated as a
1060 wildcard.
1061
1062 break, continue
1063 Exit or iterate the innermost nesting loop (while or for or
1064 foreach) statement.
1065
1066 return EXP
1067 Return EXP value from enclosing function. If the function's
1068 value is not taken anywhere, then a return statement is not
1069 needed, and the function will have a special "unknown" type with
1070 no return value.
1071
1072 next Return now from enclosing probe handler. This is especially
1073 useful in probe aliases that apply event filtering predicates.
1074 When used in functions, the execution will be immediately trans‐
1075 ferred to the next overloaded function.
1076
1077 try { STMT1 } catch { STMT2 }
1078 Run the statements in the first block. Upon any run-time er‐
1079 rors, abort STMT1 and start executing STMT2. Any errors in
1080 STMT2 will propagate to outer try/catch blocks, if any.
1081
1082 try { STMT1 } catch(VAR) { STMT2 }
1083 Same as above, plus assign the error message to the string
1084 scalar variable VAR.
1085
1086 delete ARRAY[INDEX1, INDEX2, ...]
1087 Remove from ARRAY the element specified by the index tuple. If
1088 the index tuple contains a * in place of an index, the * is
1089 treated as a wildcard and all elements with keys that match the
1090 index tuple will be removed from ARRAY. The value will no
1091 longer be available, and subsequent iterations will not report
1092 the element. It is not an error to delete an element that does
1093 not exist.
1094
1095 delete ARRAY
1096 Remove all elements from ARRAY.
1097
1098 delete SCALAR
1099 Removes the value of SCALAR. Integers and strings are cleared
1100 to 0 and "" respectively, while statistics are reset to the ini‐
1101 tial empty state.
1102
1103
1104 EXPRESSIONS
1105 Systemtap supports a number of operators that have the same general
1106 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1107 formed as per typical C rules for signed integers. Division by zero or
1108 overflow is detected and results in an error.
1109
1110 binary numeric operators
1111 * / % + - >> << & ^ | && ||
1112
1113 binary string operators
1114 . (string concatenation)
1115
1116 numeric assignment operators
1117 = *= /= %= += -= >>= <<= &= ^= |=
1118
1119 string assignment operators
1120 = .=
1121
1122 unary numeric operators
1123 + - ! ~ ++ --
1124
1125 binary numeric, string comparison or regex matching operators
1126 < > <= >= == != =~ !~
1127
1128 ternary operator
1129 cond ? exp1 : exp2
1130
1131 grouping operator
1132 ( exp )
1133
1134 function call
1135 fn ([ arg1, arg2, ... ])
1136
1137 array membership check
1138 exp in array
1139 [exp1, exp2, ...] in array
1140 [*, *, ... ]in array
1141
1142
1143 REGULAR EXPRESSION MATCHING
1144 The scripting language supports regular expression matching. The basic
1145 syntax is as follows:
1146
1147 exp =~ regex
1148 exp !~ regex
1149
1150 (The first operand must be an expression evaluating to a string; the
1151 second operand must be a string literal containing a syntactically
1152 valid regular expression.)
1153
1154 The regular expression syntax supports most of the features of POSIX
1155 Extended Regular Expressions, except for subexpression reuse ("\1")
1156 functionality.
1157
1158 After a successful match, the contents of the matched string and subex‐
1159 pressions can be extracted using the matched() and ngroups() tapset
1160 functions as follows:
1161
1162 if ("an example string" =~ "str(ing)") {
1163 matched(0) // -> returns "string", the matched substring
1164 matched(1) // -> returns "ing", the 1st matched subexpression
1165 ngroups() // -> returns 2, the number of matched groups
1166 }
1167
1168
1169 PROBES
1170 The main construct in the scripting language identifies probes. Probes
1171 associate abstract events with a statement block ("probe handler") that
1172 is to be executed when any of those events occur. The general syntax
1173 is as follows:
1174
1175 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1176 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1177
1178
1179 Events are specified in a special syntax called "probe points". There
1180 are several varieties of probe points defined by the translator, and
1181 tapset scripts may define further ones using aliases. Probe points may
1182 be wildcarded, grouped, or listed in preference sequences, or declared
1183 optional. More details on probe point syntax and semantics are listed
1184 on the stapprobes(3stap) manual page.
1185
1186 The probe handler is interpreted relative to the context of each event.
1187 For events associated with kernel code, this context may include vari‐
1188 ables defined in the source code at that spot. These "context vari‐
1189 ables" are presented to the script as variables whose names are pre‐
1190 fixed with "$". They may be accessed only if the kernel's compiler
1191 preserved them despite optimization. This is the same constraint that
1192 a debugger user faces when working with optimized code. In addition,
1193 the objects must exist in paged-in memory at the moment of the system‐
1194 tap probe handler's execution, because systemtap must not cause (sup‐
1195 presses) any additional paging. Some probe types have very little con‐
1196 text. See the stapprobes(3stap) man pages to see the kinds of context
1197 variables available at each kind of probe point.
1198
1199 Probes may be decorated with an arming condition, consisting of a sim‐
1200 ple boolean expression on read-only global script variables. While
1201 disarmed (inactive, condition evaluates to false), some probe types re‐
1202 duce or eliminate their run-time overheads. When an arming condition
1203 evaluates to true, probes will be soon re-armed, and their probe han‐
1204 dlers will start getting called as the events fire. (Some events may
1205 be lost during the arming interval. If this is unacceptable, do not
1206 use arming conditions for those probes.) Example of the syntax:
1207
1208 probe timer.us(TIMER) if (enabled) {
1209 }
1210
1211
1212 New probe points may be defined using "aliases". Probe point aliases
1213 look similar to probe definitions, but instead of activating a probe at
1214 the given point, it just defines a new probe point name as an alias to
1215 an existing one. There are two types of alias, i.e. the prologue style
1216 and the epilogue style which are identified by "=" and "+=" respective‐
1217 ly.
1218
1219 For prologue style alias, the statement block that follows an alias
1220 definition is implicitly added as a prologue to any probe that refers
1221 to the alias. While for the epilogue style alias, the statement block
1222 that follows an alias definition is implicitly added as an epilogue to
1223 any probe that refers to the alias. For example:
1224
1225 probe syscall.read = kernel.function("sys_read") {
1226 fildes = $fd
1227 if (execname() == "init") next # skip rest of probe
1228 }
1229
1230 defines a new probe point syscall.read, which expands to
1231 kernel.function("sys_read"), with the given statement as a prologue,
1232 which is useful to predefine some variables for the alias user and/or
1233 to skip probe processing entirely based on some conditions. And
1234
1235 probe syscall.read += kernel.function("sys_read") {
1236 if (tracethis) println ($fd)
1237 }
1238
1239 defines a new probe point with the given statement as an epilogue,
1240 which is useful to take actions based upon variables set or left over
1241 by the the alias user. Please note that in each case, the statements
1242 in the alias handler block are treated ordinarily, so that variables
1243 assigned there constitute mere initialization, not a macro substitu‐
1244 tion.
1245
1246 An alias is used just like a built-in probe type.
1247
1248 probe syscall.read {
1249 printf("reading fd=%d\n", fildes)
1250 if (fildes > 10) tracethis = 1
1251 }
1252
1253
1254
1255 FUNCTIONS
1256 Systemtap scripts may define subroutines to factor out common work.
1257 Functions take any number of scalar (integer or string) arguments, and
1258 must return a single scalar (integer or string). An example function
1259 declaration looks like this:
1260
1261 function thisfn (arg1, arg2) {
1262 return arg1 + arg2
1263 }
1264
1265 Note the general absence of type declarations, which are instead in‐
1266 ferred by the translator. However, if desired, a function definition
1267 may include explicit type declarations for its return value and/or its
1268 arguments. This is especially helpful for embedded-C functions. In
1269 the following example, the type inference engine need only infer type
1270 type of arg2 (a string).
1271
1272 function thatfn:string (arg1:long, arg2) {
1273 return sprint(arg1) . arg2
1274 }
1275
1276 Functions may call others or themselves recursively, up to a fixed
1277 nesting limit. This limit is defined by the MAXNESTING macro in the
1278 translated C code and is in the neighbourhood of 10.
1279
1280 Functions may be marked private using the private keyword to limit
1281 their scope to the tapset or user script file they are defined in. An
1282 example definition of a private function follows:
1283
1284 private function three:long () { return 3 }
1285
1286
1287 Functions terminating without reaching an explicit return statement
1288 will return an implicit 0 or "", determined by type inference.
1289
1290 Functions may be overloaded during both runtime and compile time.
1291
1292 Runtime overloading allows the executed function to be selected while
1293 the module is running based on runtime conditions and is achieved using
1294 the "next" statement in script functions and STAP_NEXT macro for embed‐
1295 ded-C functions. For example,
1296
1297
1298 function f() { if (condition) next; print("first function") }
1299 function f() %{ STAP_NEXT; print("second function") %}
1300 function f() { print("third function") }
1301
1302
1303 During a functioncall f(), the execution will transfer to the third
1304 function if condition evaluates to true and print "third function".
1305 Note that the second function is unconditionally nexted.
1306
1307 Parameter overloading allows the function to be executed to be selected
1308 at compile time based on the number of arguments provided to the func‐
1309 tioncall. For example,
1310
1311
1312 function g() { print("first function") }
1313 function g(x) { print("second function") }
1314 g() -> "first function"
1315 g(1) -> "second function"
1316
1317
1318 Note that runtime overloading does not occur in the above example, as
1319 exactly one function will be resolved for the functioncall. The use of
1320 a next statement inside a function while no more overloads remain will
1321 trigger a runtime exception Runtime overloading will only occur if the
1322 functions have the same arity, functions with the same name but differ‐
1323 ent number of parameters are completely unrelated.
1324
1325 Execution order is determined by a priority value which may be speci‐
1326 fied. If no explicit priority is specified, user script functions are
1327 given a higher priority than library functions. User script functions
1328 and library functions are assigned a default priority value of 0 and 1
1329 respectively. Functions with the same priority are executed in decla‐
1330 ration order. For example,
1331
1332
1333 function f():3 { if (condition) next; print("first function") }
1334 function f():1 { if (condition) next; print("second function") }
1335 function f():2 { print("third function") }
1336
1337
1338 Since the second function has highest priority, it is executed first.
1339 The first function is never executed as there no "next" statements in
1340 the third function to transfer execution.
1341
1342
1343 PRINTING
1344 There are a set of function names that are specially treated by the
1345 translator. They format values for printing to the standard systemtap
1346 output stream in a more convenient way (note that data generated in the
1347 kernel module need to get transferred to user-space in order to get
1348 printed).
1349
1350 The sprint* variants return the formatted string instead of printing
1351 it.
1352
1353 print, sprint
1354 Print one or more values of any type, concatenated directly to‐
1355 gether.
1356
1357 println, sprintln
1358 Print values like print and sprint, but also append a newline.
1359
1360 printd, sprintd
1361 Take a string delimiter and two or more values of any type, and
1362 print the values with the delimiter interposed. The delimiter
1363 must be a literal string constant.
1364
1365 printdln, sprintdln
1366 Print values with a delimiter like printd and sprintd, but also
1367 append a newline.
1368
1369 printf, sprintf
1370 Take a formatting string and a number of values of corresponding
1371 types, and print them all. The format must be a literal string
1372 constant.
1373
1374 The printf formatting directives similar to those of C, except that
1375 they are fully type-checked by the translator:
1376
1377 %b Writes a binary blob of the value given, instead of ASCII
1378 text. The width specifier determines the number of bytes
1379 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1380 fault (%b) is 8 bytes.
1381
1382 %c Character.
1383
1384 %d,%i Signed decimal.
1385
1386 %m Safely reads kernel (without #) or user (with #) memory
1387 at the given address, outputs its content. The optional
1388 precision specifier (not field width) determines the num‐
1389 ber of bytes to read - default is 1 byte. %10.4m prints
1390 4 bytes of the memory in a 10-character-wide field.
1391 Note, on some architectures user memory can still be read
1392 without #.
1393
1394 %M Same as %m, but outputs in hexadecimal. The minimal size
1395 of output is double the optional precision specifier -
1396 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1397 of the memory as 8 hexadecimal characters in a 10-charac‐
1398 ter-wide field. %.*M hex-dumps a given number of bytes
1399 from a given buffer.
1400
1401 %o Unsigned octal.
1402
1403 %p Unsigned pointer address.
1404
1405 %s String.
1406
1407 %u Unsigned decimal.
1408
1409 %x Unsigned hex value, in all lower-case.
1410
1411 %X Unsigned hex value, in all upper-case.
1412
1413 %% Writes a %.
1414
1415 The # flag selects the alternate forms. For octal, this prefixes a 0.
1416 For hex, this prefixes 0x or 0X, depending on case. For characters,
1417 this escapes non-printing values with either C-like escapes or raw oc‐
1418 tal. In the case of %#m/%#M, this safely accesses user space memory
1419 rather than kernel space memory.
1420
1421 Examples:
1422
1423 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1424 print("hello")
1425 Prints: hello
1426 println(b)
1427 Prints: bob\n
1428 println(a . " is " . sprint(16))
1429 Prints: alice is 16
1430 foreach (name in id) printdln("|", strlen(name), name, id[name])
1431 Prints: 5|alice|1234\n3|bob|4567
1432 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1433 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1434 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1435 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1436 printf("%4b", p)
1437 Prints (these values as binary data): 0x1234abcd
1438 printf("%#o %#x %#X\n", 1, 2, 3)
1439 Prints: 01 0x2 0X3
1440 printf("%#c %#c %#c\n", 0, 9, 42)
1441 Prints: \000 \t *
1442
1443
1444
1445 STATISTICS
1446 It is often desirable to collect statistics in a way that avoids the
1447 penalties of repeatedly exclusive locking the global variables those
1448 numbers are being put into. Systemtap provides a solution using a spe‐
1449 cial operator to accumulate values, and several pseudo-functions to ex‐
1450 tract the statistical aggregates.
1451
1452 The aggregation operator is <<<, and resembles an assignment, or a C++
1453 output-streaming operation. The left operand specifies a scalar or ar‐
1454 ray-index lvalue, which must be declared global. The right operand is
1455 a numeric expression. The meaning is intuitive: add the given number
1456 to the pile of numbers to compute statistics of. (The specific list of
1457 statistics to gather is given separately, by the extraction functions.)
1458
1459 foo <<< 1
1460 stats[pid()] <<< memsize
1461
1462
1463 The extraction functions are also special. For each appearance of a
1464 distinct extraction function operating on a given identifier, the
1465 translator arranges to compute a set of statistics that satisfy it.
1466 The statistics system is thereby "on-demand". Each execution of an ex‐
1467 traction function causes the aggregation to be computed for that moment
1468 across all processors.
1469
1470 Here is the set of extractor functions. The first argument of each is
1471 the same style of lvalue used on the left hand side of the accumulate
1472 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1473 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1474 mum/average/variance of all accumulated values. The resulting values
1475 are all simple integers. Arrays containing aggregates may be sorted
1476 and iterated. See the foreach construct above.
1477
1478 Variance uses Welford's online algorithm. The calculations are based
1479 on integer arithmetic, and so may suffer from low precision and over‐
1480 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1481 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1482 Only one value of bit-shift may be used with given global variable. A
1483 larger bitshift value increases precision, but increases the likelihood
1484 of overflow.
1485
1486
1487 $ stap -e \
1488 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1489 12
1490 $ stap -e \
1491 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1492 2
1493 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1494 2.5
1495 $
1496
1497
1498 Overflow (from internal multiplication of large numbers) may occur and
1499 may cause a negative variance result. Consider normalizing your input
1500 data. Adding or subtracting a fixed value from all variance inputs
1501 preserves the original variance. Dividing the variance inputs by a
1502 fixed value shrinks the original variance by that value squared.
1503
1504
1505
1506 Histograms are also available, but are more complicated because they
1507 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1508 terval) represents a linear histogram from "start" to "stop" (inclu‐
1509 sive) by increments of "interval". The interval must be positive. Sim‐
1510 ilarly, @hist_log(v) represents a base-2 logarithmic histogram. Print‐
1511 ing a histogram with the print family of functions renders a histogram
1512 object as a tabular "ASCII art" bar chart.
1513
1514
1515 probe timer.profile {
1516 x[1] <<< pid()
1517 x[2] <<< uid()
1518 y <<< tid()
1519 }
1520 global x // an array containing aggregates
1521 global y // a scalar
1522 probe end {
1523 foreach ([i] in x @count+) {
1524 printf ("x[%d]: avg %d = sum %d / count %d\n",
1525 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1526 println (@hist_log(x[i]))
1527 }
1528 println ("y:")
1529 println (@hist_log(y))
1530 }
1531
1532
1533 The counts of each histogram bucket may be individually accessed via
1534 the [index] operator. Each bucket is addressed from 1 through N (for
1535 each natural bucket). In addition bucket #0 counts all the samples be‐
1536 neath the start value, and bucket #N+1 counts all the samples above the
1537 stop value. Histogram buckets (including the two out-of-range buckets)
1538 may also be iterated with foreach.
1539
1540
1541 global x
1542 probe oneshot {
1543 x <<< -100
1544 x <<< 1
1545 x <<< 2
1546 x <<< 3
1547 x <<< 100
1548 foreach (bucket in @hist_linear(x,1,3,1))
1549 // expecting 1 out-of-range-low bucket
1550 // 3 payload buckets
1551 // 1 out-of-range-high bucket
1552 printf("bucket %d count %d\n",
1553 bucket, @hist_linear(x,1,3,1)[bucket])
1554 }
1555
1556
1557
1558 TYPECASTING
1559 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1560 has been saved into a script integer variable, the translator loses the
1561 type information necessary to access members from that pointer. Using
1562 the @cast() operator tells the translator how to interpret the number
1563 as a typed pointer.
1564
1565 @cast(p, "type_name"[, "module"])->member
1566
1567
1568 This will interpret p as a pointer to a struct/union named type_name
1569 and dereference the member value. Further ->subfield expressions may
1570 be appended to dereference more levels. Note that for direct derefer‐
1571 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1572 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1573 ing operator -> is used to refer to both direct containment or pointer
1574 indirection. Systemtap automatically determines which. The optional
1575 module tells the translator where to look for information about that
1576 type. Multiple modules may be specified as a list with : separators.
1577 If the module is not specified, it will default either to the probe
1578 module for dwarf probes, or to "kernel" for functions and all other
1579 probes types.
1580
1581 The translator can create its own module with type information from a
1582 header surrounded by angle brackets, in case normal debuginfo is not
1583 available. For kernel headers, prefix it with "kernel" to use the ap‐
1584 propriate build system. All other headers are built with default GCC
1585 parameters into a user module. Multiple headers may be specified in
1586 sequence to resolve a codependency.
1587
1588 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1589 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1590 @cast(task, "task_struct",
1591 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1592
1593 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1594 operators, the same way as described in the CONTEXT VARIABLES section
1595 of the stapprobes(3stap) manual page.
1596
1597
1598 When in guru mode, the translator will also allow scripts to assign new
1599 values to members of typecasted pointers.
1600
1601 Typecasting is also useful in the case of void* members whose type may
1602 be determinable at runtime.
1603
1604 probe foo {
1605 if ($var->type == 1) {
1606 value = @cast($var->data, "type1")->bar
1607 } else {
1608 value = @cast($var->data, "type2")->baz
1609 }
1610 print(value)
1611 }
1612
1613
1614
1615 EMBEDDED C
1616 When in guru mode, the translator accepts embedded C code in the top
1617 level of the script. Such code is enclosed between %{ and %} markers,
1618 and is transcribed verbatim, without analysis, in some sequence, into
1619 the top level of the generated C code. At the outermost level, this
1620 may be useful to add #include instructions, and any auxiliary defini‐
1621 tions for use by other embedded code.
1622
1623 Another place where embedded code is permitted is as a function body.
1624 In this case, the script language body is replaced entirely by a piece
1625 of C code enclosed again between %{ and %} markers. This C code may do
1626 anything reasonable and safe. There are a number of undocumented but
1627 complex safety constraints on atomicity, concurrency, resource consump‐
1628 tion, and run time limits, so this is an advanced technique.
1629
1630 The memory locations set aside for input and output values are made
1631 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1632 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1633 The function may return early with STAP_RETURN. Here are some exam‐
1634 ples:
1635
1636 function integer_ops (val) %{
1637 STAP_PRINTF("%d\n", STAP_ARG_val);
1638 STAP_RETVALUE = STAP_ARG_val + 1;
1639 if (STAP_RETVALUE == 4)
1640 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1641 if (STAP_RETVALUE == 3)
1642 STAP_RETURN(0);
1643 STAP_RETVALUE ++;
1644 %}
1645 function string_ops (val) %{
1646 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1647 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1648 if (strcmp (STAP_RETVALUE, "three-two-one"))
1649 STAP_RETURN("parameter should be three-two-");
1650 %}
1651 function no_ops () %{
1652 STAP_RETURN(); /* function inferred with no return value */
1653 %}
1654
1655 The function argument and return value types have to be inferred by the
1656 translator from the call sites in order for this to work. The user
1657 should examine C code generated for ordinary script-language functions
1658 in order to write compatible embedded-C ones.
1659
1660 The last place where embedded code is permitted is as an expression
1661 rvalue. In this case, the C code enclosed between %{ and %} markers is
1662 interpreted as an ordinary expression value. It is assumed to be a
1663 normal 64-bit signed number, unless the marker /* string */ is includ‐
1664 ed, in which case it's treated as a string.
1665
1666 function add_one (val) {
1667 return val + %{ 1 %}
1668 }
1669 function add_string_two (val) {
1670 return val . %{ /* string */ "two" %}
1671 }
1672
1673
1674 The embedded-C code may contain markers to assert optimization and
1675 safety properties.
1676
1677 /* pure */
1678 means that the C code has no side effects and may be elided en‐
1679 tirely if its value is not used by script code.
1680
1681 /* stable */
1682 means that the C code always has the same value (in any given
1683 probe handler invocation), so repeated calls may be automatical‐
1684 ly replaced by memoized values. Such functions must take no pa‐
1685 rameters, and also be pure.
1686
1687 /* unprivileged */
1688 means that the C code is so safe that even unprivileged users
1689 are permitted to use it.
1690
1691 /* myproc-unprivileged */
1692 means that the C code is so safe that even unprivileged users
1693 are permitted to use it, provided that the target of the current
1694 probe is within the user's own process.
1695
1696 /* guru */
1697 means that the C code is so unsafe that a systemtap user must
1698 specify -g (guru mode) to use this. (Tapsets are permitted and
1699 presumed to call them safely.)
1700
1701 /* unmangled */
1702 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1703 ment access syntax should be made available inside the function.
1704 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1705 THIS->foo and THIS->__retvalue respectively inside the function.
1706 This is useful for quickly migrating code written for SystemTap
1707 version 1.7 and earlier.
1708
1709 /* unmodified-fnargs */
1710 in an embedded-C function, means that the function arguments are
1711 not modified inside the function body.
1712
1713 /* string */
1714 in embedded-C expressions only, means that the expression has
1715 const char * type and should be treated as a string value, in‐
1716 stead of the default long numeric.
1717
1718 Script level global variables may be accessed in embedded-C functions
1719 and blocks. To read or write the global variable var , the /* prag‐
1720 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1721 the embedded-C function or block. This provides the macros STAP_GLOB‐
1722 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1723 spectively. For example:
1724
1725 global var
1726 global var2[100]
1727 function increment() %{
1728 /* pragma:read:var */ /* pragma:write:var */
1729 /* pragma:read:var2 */ /* pragma:write:var2 */
1730 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1731 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1732 %}
1733
1734 Variables may be read and set in both embedded-C functions and expres‐
1735 sions. Strings returned from embedded-C code are decayed to pointers.
1736 Variables must also be assigned at script level to allow for type in‐
1737 ference. Map assignment does not return the value written, so chaining
1738 does not work.
1739
1740
1741 BUILT-INS
1742 A set of builtin probe point aliases are provided by the scripts in‐
1743 stalled in the directory specified in the stappaths(7) manual page.
1744 The functions are described in the stapprobes(3stap) manual page.
1745
1746
1747 DEREFERENCING
1748 Integers can be dereferenced from pointers saved as a script integer
1749 variables using the @kderef() or @uderef() operators. @kderef() is
1750 used for kernel space addresses and @uderef() is used for user space
1751 addresses.
1752
1753 @kderef(SIZE, addr)
1754 @uderef(SIZE, addr)
1755
1756 This will interpert addr as a kernel/user address and read SIZE bytes
1757 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1758
1759
1760 REGISTERS
1761 The value stored within a register can be accessed using the @kregis‐
1762 ter() or @uregister() operators. @kregister() is used for kernel space
1763 registers and @uregister() is used for user space registers. The regis‐
1764 ter of interest is specified using its DWARF number.
1765
1766 @kregister(0)
1767 @uregister(5)
1768
1769
1771 The translator begins pass 1 by parsing the given input script, and all
1772 scripts (files named *.stp) found in a tapset directory. The
1773 directories listed with -I are processed in sequence, each processed in
1774 "guru mode". For each directory, a number of subdirectories are also
1775 searched. These subdirectories are derived from the selected kernel
1776 version (the -R option), in order to allow more kernel-version-specific
1777 scripts to override less specific ones. For example, for a kernel
1778 version 2.6.12-23.FC3 the following patterns would be searched, in
1779 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1780 *.stp. Stopping the translator after pass 1 causes it to print the
1781 parse trees.
1782
1783
1784 In pass 2, the translator analyzes the input script to resolve symbols
1785 and types. References to variables, functions, and probe aliases that
1786 are unresolved internally are satisfied by searching through the parsed
1787 tapset script files. If any tapset script file is selected because it
1788 defines an unresolved symbol, then the entirety of that file is added
1789 to the translator's resolution queue. This process iterates until all
1790 symbols are resolved and a subset of tapset script files is selected.
1791
1792 Next, all probe point descriptions are validated against the wide
1793 variety supported by the translator. Probe points that refer to code
1794 locations ("synchronous probe points") require the appropriate kernel
1795 debugging information to be installed. In the associated probe
1796 handlers, target-side variables (whose names begin with "$") are found
1797 and have their run-time locations decoded.
1798
1799 Next, all probes and functions are analyzed for optimization
1800 opportunities, in order to remove variables, expressions, and functions
1801 that have no useful value and no side-effect. Embedded-C functions are
1802 assumed to have side-effects unless they include the magic string
1803 /* pure */. Since this optimization can hide latent code errors such
1804 as type mismatches or invalid $context variables, it sometimes may be
1805 useful to disable the optimizations with the -u option.
1806
1807 Finally, all variable, function, parameter, array, and index types are
1808 inferred from context (literals and operators). Stopping the
1809 translator after pass 2 causes it to list all the probes, functions,
1810 and variables, along with all inferred types. Any inconsistent or
1811 unresolved types cause an error.
1812
1813
1814 In pass 3, the translator writes C code that represents the actions of
1815 all selected script files, and creates a Makefile to build that into a
1816 kernel object. These files are placed into a temporary directory.
1817 Stopping the translator at this point causes it to print the contents
1818 of the C file.
1819
1820
1821 In pass 4, the translator invokes the Linux kernel build system to
1822 create the actual kernel object file. This involves running make in
1823 the temporary directory, and requires a kernel module build system
1824 (headers, config and Makefiles) to be installed in the usual spot
1825 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1826 the last chance before running the kernel object. This may be useful
1827 if you want to archive the file.
1828
1829
1830 In pass 5, the translator invokes the systemtap auxiliary program
1831 staprun program for the given kernel object. This program arranges to
1832 load the module then communicates with it, copying trace data from the
1833 kernel into temporary files, until the user sends an interrupt signal.
1834 Any run-time error encountered by the probe handlers, such as running
1835 out of memory, division by zero, exceeding nesting or runtime limits,
1836 results in a soft error indication. Soft errors in excess of MAXERRORS
1837 block of all subsequent probes (except error-handling probes), and
1838 terminate the session. Finally, staprun unloads the module, and cleans
1839 up.
1840
1841
1842 ABNORMAL TERMINATION
1843 One should avoid killing the stap process forcibly, for example with
1844 SIGKILL, because the stapio process (a child process of the stap
1845 process) and the loaded module may be left running on the system. If
1846 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1847 then use rmmod to unload the systemtap module.
1848
1849
1850
1852 See the stapex(3stap) manual page for a brief collection of samples, or
1853 a large set of installed samples under the systemtap
1854 documentation/testsuite directories. See stappaths(7stap) for the
1855 likely location of these on the system.
1856
1857
1859 The systemtap translator caches the pass 3 output (the generated C
1860 code) and the pass 4 output (the compiled kernel module) if pass 4
1861 completes successfully. This cached output is reused if the same
1862 script is translated again assuming the same conditions exist (same
1863 kernel version, same systemtap version, etc.). Cached files are stored
1864 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1865 having the file cache_mb_limit placed in the cache directory (shown
1866 above) containing only an ASCII integer representing how many MiB the
1867 cache should not exceed. In the absence of this file, a default will be
1868 created with the limit set to 256MiB. This is a 'soft' limit in that
1869 the cache will be cleaned after a new entry is added if the cache clean
1870 interval is exceeded, so the total cache size may temporarily exceed
1871 this limit. This interval can be specified by having the file
1872 cache_clean_interval_s placed in the cache directory (shown above)
1873 containing only an ASCII integer representing the interval in seconds.
1874 In the absence of this file, a default will be created with the
1875 interval set to 300 s.
1876
1877
1879 Systemtap may be used as a powerful administrative tool. It can expose
1880 kernel internal data structures and potentially private user
1881 information. (In dyninst runtime mode, this is not the case, see the
1882 ALTERNATE RUNTIMES section below.)
1883
1884 The translator asserts many safety constraints during compilation and
1885 more during run-time. It aims to ensure that no handler routine can
1886 run for very long, allocate boundless memory, perform unsafe
1887 operations, or in unintentionally interfere with the system. Uses of
1888 script global variables are automatically read/write locked as
1889 appropriate, to protect against manipulation by concurrent probe
1890 handlers. (Deadlocks are detected with timeouts. Use the -t flag to
1891 receive reports of excessive lock contention.) Experimenting with
1892 scripts is therefore generally safe. The guru-mode -g option allows
1893 administrators to bypass most safety measures, which permits invasive
1894 or state-changing operations, embedded-C code, and increases the risk
1895 of upset. By default, overload prevention is turned on for all
1896 modules. If you would like to disable overload processing, use the
1897 --suppress-time-limits option.
1898
1899 Errors that are caught at run time normally result in a clean script
1900 shutdown and a pass-5 error message. The --suppress-handler-errors
1901 option lets scripts tolerate soft errors without shutting down.
1902
1903
1904
1905 PERMISSIONS
1906 For the normal linux-kernel-module runtime, to run the kernel objects
1907 systemtap builds, a user must be one of the following:
1908
1909 · the root user;
1910
1911 · a member of the stapdev and stapusr groups;
1912
1913 · a member of the stapsys and stapusr groups; or
1914
1915 · a member of the stapusr group.
1916
1917 The root user or a user who is a member of both the stapdev and stapusr
1918 groups can build and run any systemtap script.
1919
1920 A user who is a member of both the stapsys and stapusr groups can only
1921 use pre-built modules under the following conditions:
1922
1923 · The module has been signed by a trusted signer. Trusted signers are
1924 normally systemtap compile-servers which sign modules when the
1925 --privilege option is specified by the client. See the
1926 stap-server(8) manual page for more information.
1927
1928 · The module was built using the --privilege=stapsys or the
1929 --privilege=stapusr options.
1930
1931 Members of only the stapusr group can only use pre-built modules under
1932 the following conditions:
1933
1934 · The module is located in the /lib/modules/VERSION/systemtap
1935 directory. This directory must be owned by root and not be world
1936 writable.
1937
1938 or
1939
1940 · The module has been signed by a trusted signer. Trusted signers are
1941 normally systemtap compile-servers which sign modules when the
1942 --privilege option is specified by the client. See the
1943 stap-server(8) manual page for more information.
1944
1945 · The module was built using the --privilege=stapusr option.
1946
1947 The kernel modules generated by stap program are run by the staprun
1948 program. The latter is a part of the Systemtap package, dedicated to
1949 module loading and unloading (but only in the white zone), and kernel-
1950 to-user data transfer. Since staprun does not perform any additional
1951 security checks on the kernel objects it is given, it would be unwise
1952 for a system administrator to add untrusted users to the stapdev or
1953 stapusr groups.
1954
1955
1956 SECUREBOOT
1957 If the current system has SecureBoot turned on in the UEFI firmware,
1958 all kernel modules must be signed. (Some kernels may allow disabling
1959 SecureBoot long after booting with a key sequence such as SysRq-X,
1960 making it unnecessary to sign modules.) The systemtap compile server
1961 can sign modules with a MOK (Machine Owner Key) that it has in common
1962 with a client system. See the following wiki page for more details:
1963
1964 https://sourceware.org/systemtap/wiki/SecureBoot
1965
1966 Some kernels do not let systemtap guess whether module module signing
1967 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
1968 variable to any value while running stap.
1969
1970
1971 RESOURCE LIMITS
1972 Many resource use limits are set by macros in the generated C code.
1973 These may be overridden with -D flags. A selection of these is as fol‐
1974 lows:
1975
1976 MAXNESTING
1977 Maximum number of nested function calls. Default determined by
1978 script analysis, with a bonus 10 slots added for recursive
1979 scripts.
1980
1981 MAXSTRINGLEN
1982 Maximum length of strings, default 128.
1983
1984 MAXTRYLOCK
1985 Maximum number of iterations to wait for locks on global vari‐
1986 ables before declaring possible deadlock and skipping the probe,
1987 default 1000.
1988
1989 MAXACTION
1990 Maximum number of statements to execute during any single probe
1991 hit (with interrupts disabled), default 1000. Note that for
1992 straight-through probe handlers lacking loops or recursion, due
1993 to optimization, this parameter may be interpreted too conserva‐
1994 tively.
1995
1996 MAXACTION_INTERRUPTIBLE
1997 Maximum number of statements to execute during any single probe
1998 hit which is executed with interrupts enabled (such as begin/end
1999 probes), default (MAXACTION * 10).
2000
2001 MAXBACKTRACE
2002 Maximum number of stack frames that will be be processed by the
2003 stap runtime unwinder as produced by the backtrace functions in
2004 the [u]context-unwind.stp tapsets, default 20.
2005
2006 MAXMAPENTRIES
2007 Maximum number of rows in any single global array, default 2048.
2008 Individual arrays may be declared with a larger or smaller limit
2009 instead:
2010
2011 global big[10000],little[5]
2012
2013 or denoted with % to make them wrap-around (replace old entries)
2014 automatically, as in
2015
2016 global big%
2017
2018 or both.
2019
2020 MAPHASHBIAS
2021 The number of powers-of-two to add or subtract from the natural
2022 size of the hash table backing each global associative array.
2023 Default is 0. Try small positive numbers to get extra perfor‐
2024 mance at the cost of more memory consumption, because that
2025 should reduce hash table collisions. Try small negative numbers
2026 for the opposite tradeoff.
2027
2028 MAXERRORS
2029 Maximum number of soft errors before an exit is triggered, de‐
2030 fault 0, which means that the first error will exit the script.
2031 Note that with the --suppress-handler-errors option, this limit
2032 is not enforced.
2033
2034 MAXSKIPPED
2035 Maximum number of skipped probes before an exit is triggered,
2036 default 100. Running systemtap with -t (timing) mode gives more
2037 details about skipped probes. With the default -DINTERRUPT‐
2038 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2039 lated against this limit. Note that with the --suppress-han‐
2040 dler-errors option, this limit is not enforced.
2041
2042 MINSTACKSPACE
2043 Minimum number of free kernel stack bytes required in order to
2044 run a probe handler, default 1024. This number should be large
2045 enough for the probe handler's own needs, plus a safety margin.
2046
2047 MAXUPROBES
2048 Maximum number of concurrently armed user-space probes (up‐
2049 robes), default somewhat larger than the number of user-space
2050 probe points named in the script. This pool needs to be poten‐
2051 tially large because individual uprobe objects (about 64 bytes
2052 each) are allocated for each process for each matching script-
2053 level probe.
2054
2055 STP_MAXMEMORY
2056 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2057 ule should use, default unlimited. The memory size includes the
2058 size of the module itself, plus any additional allocations.
2059 This only tracks direct allocations by the systemtap runtime.
2060 This does not track indirect allocations (as done by kprobes/up‐
2061 robes/etc. internals).
2062
2063 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2064 Maximum number of machine cycles spent in probes on any cpu per
2065 given interval, before an overload condition is declared and the
2066 script shut down. The defaults are 500 million and 1 billion,
2067 so as to limit stap script cpu consumption at around 50%.
2068
2069 STP_PROCFS_BUFSIZE
2070 Size of procfs probe read buffers (in bytes). Defaults to
2071 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2072 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2073
2074 With scripts that contain probes on any interrupt path, it is possible
2075 that those interrupts may occur in the middle of another probe handler.
2076 The probe in the interrupt handler would be skipped in this case to
2077 avoid reentrance. To work around this issue, execute stap with the op‐
2078 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2079 This does add some extra overhead to the probes, but it may prevent
2080 reentrance for common problem cases. However, probes in NMI handlers
2081 and in the callpath of the stap runtime may still be skipped due to
2082 reentrance.
2083
2084
2085 In case something goes wrong with stap or staprun after a probe has al‐
2086 ready started running, one may safely kill both user processes, and re‐
2087 move the active probe kernel module with rmmod. Any pending trace mes‐
2088 sages may be lost.
2089
2090
2092 Systemtap exposes kernel internal data structures and potentially pri‐
2093 vate user information. Because of this, use of systemtap's full capa‐
2094 bilities are restricted to root and to users who are members of the
2095 groups stapdev and stapusr.
2096
2097 However, a restricted set of systemtap's features can be made available
2098 to trusted, unprivileged users. These users are members of the group
2099 stapusr only, or members of the groups stapusr and stapsys. These
2100 users can load systemtap modules which have been compiled and certified
2101 by a trusted systemtap compile-server. See the descriptions of the op‐
2102 tions --privilege and --use-server. See README.unprivileged in the sys‐
2103 temtap source code for information about setting up a trusted compile
2104 server.
2105
2106 The restrictions enforced when --privilege=stapsys is specified are de‐
2107 signed to prevent unprivileged users from:
2108
2109 · harming the system maliciously.
2110
2111 The restrictions enforced when --privilege=stapusr is specified are de‐
2112 signed to prevent unprivileged users from:
2113
2114 · harming the system maliciously.
2115
2116 · gaining access to information which would not normally be
2117 available to an unprivileged user.
2118
2119 · disrupting the performance of processes owned by other users
2120 of the system. Some overhead to the system in general is
2121 unavoidable since the unprivileged user's probes will be
2122 triggered at the appropriate times. What we would like to
2123 avoid is targeted interruption of another user's processes
2124 which would not normally be possible by an unprivileged us‐
2125 er.
2126
2127
2128 PROBE RESTRICTIONS
2129 A member of the groups stapusr and stapsys may use all probe points.
2130
2131 A member of only the group stapusr may use only the following probes:
2132
2133 · begin, begin(n)
2134
2135 · end, end(n)
2136
2137 · error(n)
2138
2139 · never
2140
2141 · process.*, where the target process is owned by the user.
2142
2143 · timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2144
2145 · timer.hz(n)
2146
2147
2148 SCRIPT LANGUAGE RESTRICTIONS
2149 The following scripting language features are unavailable to all un‐
2150 privileged users:
2151
2152
2153 · any feature enabled by the Guru Mode (-g) option.
2154
2155 · embedded C code.
2156
2157
2158 RUNTIME RESTRICTIONS
2159 The following runtime restrictions are placed upon all unprivileged
2160 users:
2161
2162 · Only the default runtime code (see -R) may be used.
2163
2164 Additional restrictions are placed on members of only the group sta‐
2165 pusr:
2166
2167 · Probing of processes owned by other users is not permitted.
2168
2169 · Access of kernel memory (read and write) is not permitted.
2170
2171
2172 COMMAND LINE OPTION RESTRICTIONS
2173 Some command line options provide access to features which must not be
2174 available to all unprivileged users:
2175
2176
2177 · -g may not be specified.
2178
2179 · The following options may not be used by the compile-server
2180 client:
2181
2182 -a, -B, -D, -I, -r, -R
2183
2184
2185
2186 ENVIRONMENT RESTRICTIONS
2187 The following environment variables must not be set for all unprivi‐
2188 leged users:
2189
2190 SYSTEMTAP_RUNTIME
2191 SYSTEMTAP_TAPSET
2192 SYSTEMTAP_DEBUGINFO_PATH
2193
2194
2195
2196 TAPSET RESTRICTIONS
2197 In general, tapset functions are only available for members of the
2198 group stapusr when they do not gather information that an ordinary pro‐
2199 gram running with that user's privileges would be denied access to.
2200
2201 There are two categories of unprivileged tapset functions. The first
2202 category consists of utility functions that are unconditionally avail‐
2203 able to all users; these include such things as:
2204
2205 cpu:long ()
2206 exit ()
2207 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2208
2209
2210 The second category consists of so-called myproc-unprivileged functions
2211 that can only gather information within their own processes. Scripts
2212 that wish to use these functions must test the result of the tapset
2213 function is_myproc and only call these functions if the result is 1.
2214 The script will exit immediately if any of these functions are called
2215 by an unprivileged user within a probe within a process which is not
2216 owned by that user. Examples of myproc-unprivileged functions include:
2217
2218 print_usyms (stk:string)
2219 user_int:long (addr:long)
2220 usymname:string (addr:long)
2221
2222
2223 A compile error is triggered when any function not in either of the
2224 above categories is used by members of only the group stapusr.
2225
2226 No other built-in tapset functions may be used by members of only the
2227 group stapusr.
2228
2229
2231 As described above, systemtap's default runtime mode involves building
2232 and loading kernel modules, with various security tradeoffs presented.
2233 Systemtap now includes two new prototype backends: --runtime=dyninst
2234 and --runtime=bpf.
2235
2236 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2237 runtime. This backend does not use kernel modules, and does not require
2238 root privileges, but is restricted with respect to the kinds of probes
2239 and other constructs that a script may use. dyninst runtime operates in
2240 target-attach mode, so it does requirea -c COMMAND or -x PID process.
2241 For example:
2242
2243 stap --runtime=dyninst -c 'stap -V' \
2244 -e 'probe process.function("main")
2245 { println("hi from dyninst!") }'
2246
2247
2248 It may be necessary to disable a conflicting selinux check with
2249
2250 # setsebool allow_execstack 1
2251
2252
2253 --runtime=bpf compiles the user script into extended Berkeley Packet
2254 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2255 verified by the kernel for safety and are executed by an in-kernel vir‐
2256 tual machine. This runtime is in an early stage of development and
2257 currently lacks support for a number of features available in the de‐
2258 fault runtime. Please see the stapbpf(8) man page for more information.
2259
2260
2262 The systemtap translator generally returns with a success code of 0 if
2263 the requested script was processed and executed successfully through
2264 the requested pass. Otherwise, errors may be printed to stderr and a
2265 failure code is returned. Use -v or -vp N to increase (global or per-
2266 pass) verbosity to identify the source of the trouble.
2267
2268 In listings mode (-l and -L), error messages are normally suppressed.
2269 A success code of 0 is returned if at least one matching probe was
2270 found.
2271
2272 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2273 considered to be successful.
2274
2275
2277 Over time, some features of the script language and the tapset library
2278 may undergo incompatible changes, so that a script written against an
2279 old version of systemtap may no longer run. In these cases, it may
2280 help to run systemtap with the --compatible VERSION flag, specifying
2281 the last known working version. Running systemtap with the
2282 --check-version flag will output a warning if any possible incompatible
2283 elements have been parsed. Deprecation historical details may be found
2284 in the NEWS file.
2285
2286 The purpose of deprecation facility is to improve the experience of
2287 scripts written for newer versions of systemtap (by adding better al‐
2288 ternatives and removing conflicting or messy older alternatives), while
2289 at the same time permitting scripts written for older versions of sys‐
2290 temtap to continue running. Deprecation is thus intended a service to
2291 users (and an inconvenience to systemtap's developers), rather than the
2292 other way around.
2293
2294 Please note that underscore-prefixed identifiers in the tapset some‐
2295 times undergo such changes that are difficult to preserve compatibility
2296 for, even with the deprecation mechanisms. Avoid relying on these in
2297 your scripts; instead propose them for promotion to non-underscored
2298 status.
2299
2300
2301
2303 Important files and their corresponding paths can be located in the
2304 stappaths (7) manual page.
2305
2306
2308 stapprobes(3stap),
2309 function::*[24m(3stap),
2310 probe::*[24m(3stap),
2311 tapset::*[24m(3stap),
2312 stappaths(7),
2313 staprun(8),
2314 stapdyn(8),
2315 systemtap(8),
2316 stapvars(3stap),
2317 stapex(3stap),
2318 stap-server(8),
2319 stap-prep(1),
2320 stapref(1),
2321 awk(1),
2322 gdb(1)
2323
2324
2326 Use the Bugzilla link of the project web page or our mailing list.
2327 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2328
2329 error::reporting(7stap),
2330 https://sourceware.org/systemtap/wiki/HowToReportBugs
2331
2332
2333
2334 STAP(1)