1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a DynInst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use. For
46 online documentation, see the project web site,
47 https://sourceware.org/systemtap/
48
49
50 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
51 │man pages │ │
52 ├──────────────────────────┼──────────────────────────────────────────────────────┤
53 │stap (this page) │ language syntax, concepts, operation, options │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stapprobes │ probe points and their $context variables │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │stapref │ quick reference to language syntax │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │stappaths │ list of directories, including books & references │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stap-prep │ program to install auxiliary dependencies like ker‐ │
62 │ │ nel debuginfo │
63 ├──────────────────────────┼──────────────────────────────────────────────────────┤
64 │tapset::* │ generated list of tapsets │
65 ├──────────────────────────┼──────────────────────────────────────────────────────┤
66 │probe::* │ generated list of tapset probe aliases │
67 ├──────────────────────────┼──────────────────────────────────────────────────────┤
68 │function::* │ generated list of tapset functions │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │macro::* │ generated list of tapset macros │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │stapvars │ some of the tapset global variables │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │systemtap │ initscript, boot-time probing │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stap-server │ compilation server │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │stapex │ a few very basic script examples │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │books │ │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │Beginner's Guide │ tutorial book, language essentials, examples │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │Tutorial │ shorter tutorial, exercises │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │Language Reference │ detailed language manual, covers statistics/analysis │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Tapset Reference │ the tapset man pages, reformatted into a book │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │references │ │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
95 │ │ hacks to learn from │
96 └──────────────────────────┴──────────────────────────────────────────────────────┘
97
99 The systemtap translator supports the following options. Any other op‐
100 tion prints a list of supported options. Options may be given on the
101 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
102 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
103 faults to $HOME/.systemtap if unset.)
104
105
106 In some cases, the default value of an option depends on particular
107 system configuration and thus can't be mentioned here directly. In
108 some of those cases running "stap --help" might display the default.
109
110
111 - Use standard input instead of a given FILENAME as probe language
112 input, unless -e SCRIPT is given.
113
114 -h --help
115 Show help message.
116
117 -V --version
118 Show version message.
119
120 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
121 rate, translate, compile, run. See the PROCESSING section for
122 details.
123
124 -v Increase verbosity for all passes. Produce a larger volume of
125 informative (?) output each time option repeated.
126
127 --vp ABCDE
128 Increase verbosity on a per-pass basis. For example, "--vp 002"
129 adds 2 units of verbosity to pass 3 only. The combination
130 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
131 more for pass 5.
132
133 -k Keep the temporary directory after all processing. This may be
134 useful in order to examine the generated C code, or to reuse the
135 compiled kernel object.
136
137 -g Guru mode. Enable parsing of unsafe expert-level constructs
138 like embedded C.
139
140 -P Prologue-searching mode. This is equivalent to --pro‐
141 logue-searching=always. Activate heuristics to work around in‐
142 correct debugging information for function parameter $context
143 variables.
144
145 -u Unoptimized mode. Disable unused code elision and many other
146 optimizations during elaboration / translation.
147
148 -w Suppressed warnings mode. Disables all warning messages.
149
150 -W Treat all warnings as errors.
151
152 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
153 Use the stap-merge program to multiplex them back together lat‐
154 er.
155
156 -i --interactive
157 Interactive mode. Enable an interface to build the systemtap
158 script incrementally and interactively.
159
160 -t Collect timing information on the number of times probe executes
161 and average amount of time spent in each probe-point. Also shows
162 the derivation for each probe-point.
163
164 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer. On a
165 multiprocessor in bulk mode, this is a per-processor amount.
166
167 -I DIR Add the given directory to the tapset search directory. See the
168 description of pass 2 for details.
169
170 -D NAME=VALUE
171 Add the given C preprocessor directive to the module Makefile.
172 These can be used to override limit parameters described below.
173
174 -B NAME=VALUE
175 In kernel-runtime mode, add the given make directive to the ker‐
176 nel module build's make invocation. These can be used to add or
177 override kconfig options. For example, use
178
179 -B CONFIG_DEBUG_INFO=y
180
181 to add debugging information.
182
183 -B FLAG
184 In dyninst-runtime mode, add the given parameter to the compiler
185 CFLAGS used for building the dyninst shared library. For exam‐
186 ple, use
187
188 -B -g
189
190 to add debugging information.
191
192 -a ARCH
193 Use a cross-compilation mode for the given target architecture.
194 This requires access to the cross-compiler and the kernel build
195 tree, and goes along with the
196
197 -B CROSS_COMPILE=arch-tool-prefix-
198 and
199 -r /build/tree
200
201 options.
202
203 --modinfo NAME=VALUE
204 Add the name/value pair as a MODULE_INFO macro call to the gen‐
205 erated module. This may be useful to inform or override various
206 module-related checks in the kernel.
207
208 -G NAME=VALUE
209 Sets the value of global variable NAME to VALUE when staprun is
210 invoked. This applies to scalar variables declared global in
211 the script/tapset.
212
213 -R DIR Look for the systemtap runtime sources in the given directory.
214 Your DIR default can be seen using "stap --help".
215
216 -r /DIR
217 Build for kernel in given build tree. Can also be set with the
218 SYSTEMTAP_RELEASE environment variable.
219
220 -r RELEASE
221 Build for kernel in build tree /lib/modules/RELEASE/build. Can
222 also be set with the SYSTEMTAP_RELEASE environment variable.
223
224 -m MODULE
225 Use the given name for the generated kernel object module, in‐
226 stead of a unique randomized name. The generated kernel object
227 module is copied to the current directory.
228
229 -d MODULE
230 Add symbol/unwind information for the given module into the ker‐
231 nel object module. This may enable symbolic tracebacks from
232 those modules/programs, even if they do not have an explicit
233 probe placed into them.
234
235 --ldd Add symbol/unwind information for all user-space shared li‐
236 braries suspected by ldd to be necessary for user-space binaries
237 being probed or listed with the -d option. Caution: this can
238 make the probe modules considerably larger. Note that this op‐
239 tion does not deal with kernel-space modules: see instead
240 --all-modules below.
241
242 --all-modules
243 Equivalent to specifying "-dkernel" and a "-d" for each kernel
244 module that is currently loaded. Caution: this can make the
245 probe modules considerably larger.
246
247 -o FILE
248 Send standard output to named file. In bulk mode, percpu files
249 will start with FILE_ (FILE_cpu with -F) followed by the cpu
250 number. This supports strftime(3) formats for FILE.
251
252 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
253 so has the effect of setting target() to the pid of the command
254 ran.
255
256 -x PID Sets target() to PID. This allows scripts to be written that
257 filter on a specific process. Scripts run independent of the
258 PID's lifespan.
259
260 -e SCRIPT
261 Run the given SCRIPT specified on the command line.
262
263 -E SCRIPT
264 Run the given SCRIPT specified. This SCRIPT is run in addition
265 to the main script specified, through -e, or as a script file.
266 This option can be repeated to run multiple scripts, and can be
267 used in listing mode (-l/-L).
268
269 -l PROBE
270 Instead of running a probe script, just list all available probe
271 points matching the given single probe point. The pattern may
272 include wildcards and aliases, but not comma-separated multiple
273 probe points. The process result code will indicate failure if
274 there are no matches.
275
276 % stap -e 'probe syscall.* { }'
277 [...]
278 % stap -l 'syscall.*'
279 syscall.accept
280 [...]
281 syscall.writev
282
283
284 -L PROBE
285 Similar to "-l", but list matching probe points plus their
286 available context variables.
287
288 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
289 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
290 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
291 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
292 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
293 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
294 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
295 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
296
297
298 -F Without -o option, load module and start probes, then detach
299 from the module leaving the probes running. With -o option, run
300 staprun in background as a daemon and show its pid.
301
302 -S size[,N]
303 Sets the maximum size of output file and the maximum number of
304 output files. If the size of output file will exceed size ,
305 systemtap switches output file to the next file. And if the num‐
306 ber of output files exceed N , systemtap removes the oldest out‐
307 put file. You can omit the second argument.
308
309 -T TIMEOUT
310 Exit the script after TIMEOUT seconds.
311
312 --skip-badvars
313 Ignore unresolvable or run-time-inaccessible context variables
314 and substitute with 0, without errors.
315
316
317 --prologue-searching[=WHEN]
318 Prologue-searching mode. Activate heuristics to work around in‐
319 correct debugging information for function parameter $context
320 variables. WHEN can be either "never", "always", or "auto" (i.e.
321 enabled by heuristic). If WHEN is missing, then "always" is as‐
322 sumed. If the option is missing, then "auto" is assumed.
323
324
325 --suppress-handler-errors
326 Wrap all probe handlers into something like this
327
328 try { ... } catch { next }
329
330 block, which causes any runtime errors to be quietly suppressed.
331 Suppressed errors do not count against MAXERRORS limits. In
332 this mode, the MAXSKIPPED limits are also suppressed, so that
333 many errors and skipped probes may be accumulated during a
334 script's runtime. Any overall counts will still be reported at
335 shutdown.
336
337
338 --compatible VERSION
339 Suppress recent script language or tapset changes which are in‐
340 compatible with given older version of systemtap. This may be
341 useful if a much older systemtap script fails to run. See the
342 DEPRECATION section for more details.
343
344
345 --check-version
346 This option is used to check if the active script has any con‐
347 structs that may be systemtap version specific. See the DEPRE‐
348 CATION section for more details.
349
350
351 --clean-cache
352 This option prunes stale entries from the cache directory. This
353 is normally done automatically after successful runs, but this
354 option will trigger the cleanup manually and then exit. See the
355 CACHING section for more details about cache limits.
356
357
358 --color[=WHEN], --colour[=WHEN]
359 This option controls coloring of error messages. WHEN can be ei‐
360 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
361 minal). If WHEN is missing, then "always" is assumed. If the op‐
362 tion is missing, then "auto" is assumed.
363
364 Colors can be modified using the SYSTEMTAP_COLORS environment
365 variable. The format must be of the form
366 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
367 "warning", "source", "caret", and "token". Values constitute
368 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
369 mentation of your terminal for the SGRs it supports. As an exam‐
370 ple, the default colors would be expressed as
371 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
372 SYSTEMTAP_COLORS is absent, the default colors will be used. If
373 it is empty or invalid, coloring is turned off.
374
375
376 --disable-cache
377 This option disables all use of the cache directory. No files
378 will be either read from or written to the cache.
379
380
381 --poison-cache
382 This option treats files in the cache directory as invalid. No
383 files will be read from the cache, but resulting files from this
384 run will still be written to the cache. This is meant as a
385 troubleshooting aid when stap's cached behavior seems to be mis‐
386 behaving. If it helped, there is a probably a bug in systemtap
387 that the developers would like you to report.
388
389
390 --privilege[=stapusr | =stapsys | =stapdev]
391 This option instructs stap to examine the script looking for
392 constructs which are not allowed for the specified privilege
393 level (see UNPRIVILEGED USERS). Compilation fails if any such
394 constructs are used. If stapusr or stapsys are specified when
395 using a compile server (see --use-server), the server will exam‐
396 ine the script and, if compilation succeeds, the server will
397 cryptographically sign the resulting kernel module, certifying
398 that is it safe for use by users at the specified privilege lev‐
399 el.
400
401 If --privilege has not been specified, -pN has not been speci‐
402 fied with N < 5, and the invoking user is not root, and is not a
403 member of the group stapdev, then stap will automatically add
404 the appropriate --privilege option to the options already speci‐
405 fied.
406
407
408 --unprivileged
409 This option is equivalent to --privilege=stapusr.
410
411
412 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
413 Specify compile-server(s) to be used for compilation and/or in
414 conjunction with --list-servers and --trust-servers (see below)
415 for listing. If no argument is supplied, then the default in un‐
416 privileged mode (see --privilege) is to select compatible
417 servers which are trusted as SSL peers and as module signers and
418 currently online. Otherwise the default is to select compatible
419 servers which are trusted as SSL peers and currently online.
420 --use-server may be specified more than once, in which case a
421 list of servers is accumulated in the order specified. Servers
422 may be specified by host name, ip address, or by certificate se‐
423 rial number (obtained using --list-servers). The latter is most
424 commonly used when adding or revoking trust in a server (see
425 --trust-servers below). If a server is specified by host name or
426 ip address, then an optional port number may be specified. This
427 is useful for accessing servers which are not on the local net‐
428 work or to specify a particular server.
429
430 IP addresses may be IPv4 or IPv6 addresses.
431
432 If a particular IPv6 address is link local and exists on more
433 than one interface, the intended interface may be specified by
434 appending the address with a percent sign (%) followed by the
435 intended interface name. For example,
436 "fe80::5eff:35ff:fe07:55ca%eth0".
437
438 In order to specify a port number with an IPv6 address, it is
439 necessary to enclose the IPv6 address in square brackets ([]) in
440 order to separate the port number from the rest of the address.
441 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
442 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
443
444 If --use-server has not been specified, -pN has not been speci‐
445 fied with N < 5, and the invoking user not root, is not a member
446 of the group stapdev, but is a member of the group stapusr, then
447 stap will automatically add --use-server to the options already
448 specified.
449
450
451 --use-server-on-error[=yes|=no]
452 Instructs stap to retry compilation of a script using a compile
453 server if compilation on the local host fails in a manner which
454 suggests that it might succeed using a server. If this option
455 is not specified, the default is no. If no argument is provid‐
456 ed, then the default is yes. Compilation will be retried for
457 certain types of errors (e.g. insufficient data or resources)
458 which may not occur during re-compilation by a compile server.
459 Compile servers will be selected automatically for the re-compi‐
460 lation attempt as if --use-server was specified with no argu‐
461 ments.
462
463
464 --list-servers[=SERVERS]
465 Display the status of the requested SERVERS, where SERVERS is a
466 comma-separated list of server attributes. The list of at‐
467 tributes is combined to filter the list of servers displayed.
468 Supported attributes are:
469
470 all specifies all known servers (trusted SSL peers, trusted
471 module signers, online servers).
472
473 specified
474 specifies servers specified using --use-server.
475
476 online filters the output by retaining information about servers
477 which are currently online.
478
479 trusted
480 filters the output by retaining information about servers
481 which are trusted as SSL peers.
482
483 signer filters the output by retaining information about servers
484 which are trusted as module signers (see --privilege).
485
486 compatible
487 filters the output by retaining information about servers
488 which are compatible with the current kernel release and
489 architecture.
490
491 If no argument is provided, then the default is specified. If
492 no servers were specified using --use-server, then the default
493 servers for --use-server are listed.
494
495 Note that --list-servers uses the avahi-daemon service to detect
496 online servers. If this service is not available, then
497 --list-servers will fail to detect any online servers. In order
498 for --list-servers to detect servers listening on IPv6 address‐
499 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
500 mon.conf must contain an active "use-ipv6=yes" line. The service
501 must be restarted after adding this line in order for IPv6 to be
502 enabled.
503
504
505 --trust-servers[=TRUST_SPEC]
506 Grant or revoke trust in compile-servers, specified using
507 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
508 comma-separated list specifying the trust which is to be granted
509 or revoked. Supported elements are:
510
511 ssl trust the specified servers as SSL peers.
512
513 signer trust the specified servers as module signers (see
514 --privilege). Only root can specify signer.
515
516 all-users
517 grant trust as an ssl peer for all users on the local
518 host. The default is to grant trust as an ssl peer for
519 the current user only. Trust as a module signer is always
520 granted for all users. Only root can specify all-users.
521
522 revoke revoke the specified trust. The default is to grant it.
523
524 no-prompt
525 do not prompt the user for confirmation before carrying
526 out the requested action. The default is to prompt the
527 user for confirmation.
528
529 If no argument is provided, then the default is ssl. If no
530 servers were specified using --use-server, then no trust will be
531 granted or revoked.
532
533 Unless no-prompt has been specified, the user will be prompted
534 to confirm the trust to be granted or revoked before the opera‐
535 tion is performed.
536
537
538 --dump-probe-types
539 Dumps a list of supported probe types and exits. If --privi‐
540 lege=stapusr is also specified, the list will be limited to
541 probe types available to unprivileged users.
542
543
544 --dump-probe-aliases
545 Dumps a list of all probe aliases found in library files and ex‐
546 its.
547
548
549 --dump-functions
550 Dumps a list of all the public functions found in library files
551 and exits. Also includes their parameters and types. A function
552 of type 'unknown' indicates a function that does not return a
553 value. Note that not all function/parameter types may be re‐
554 solved (these are also shown by 'unknown'). This features is
555 very memory-intensive and thus may not work properly with --use-
556 server if the target server imposes an rlimit on process memory
557 (i.e. through the ~stap-server/.systemtap/rc configuration file,
558 see stap-server(8)).
559
560
561 --remote URL
562 Set the execution target to the given host. This option may be
563 repeated to target multiple execution targets. Passes 1-4 are
564 completed locally as normal to build the script, and then pass 5
565 will copy the module to the target and run it. Acceptable URL
566 forms include:
567
568 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
569 This mode uses ssh, optionally using a username not
570 matching your own. If a custom ssh_config file is in use,
571 add SendEnv LANG to retain internationalization function‐
572 ality.
573
574 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
575 This mode uses stapvirt to execute the script on a domain
576 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
577 fied to connect to a specific driver and/or a remote
578 host. For example, to connect to the local privileged QE‐
579 MU driver, use:
580
581 --remote libvirt://MyDomain/qemu:///system
582
583 See the page at <http://libvirt.org/uri.html> for sup‐
584 ported URIs. Also see stapvirt(1) for more information on
585 how to prepare the domain for stap probing.
586
587 unix:PATH
588 This mode connects to a UNIX socket. This can be used
589 with a QEMU virtio-serial port for executing scripts in‐
590 side a running virtual machine.
591
592 direct://
593 Special loopback mode to run on the local host.
594
595 --remote-prefix
596 Prefix each line of remote output with "N: ", where N is the in‐
597 dex of the remote execution target from which the given line
598 originated.
599
600
601 --download-debuginfo[=OPTION]
602 Enable, disable or set a timeout for the automatic debuginfo
603 downloading feature offered by abrt as specified by OPTION,
604 where OPTION is one of the following:
605
606 yes enable automatic downloading of debuginfo with no time‐
607 out. This is the same as not providing an OPTION value to
608 --download-debuginfo
609
610 no explicitly disable automatic downloading of debuginfo.
611 This is the same as not using the option at all.
612
613 ask show abrt output, and ask before continuing download. No
614 timeout will be set.
615
616 <timeout>
617 specify a timeout as a positive number to stop the down‐
618 load if it is taking longer than <timeout> seconds.
619
620 --rlimit-as=NUM
621 Specify the maximum size of the process's virtual memory (ad‐
622 dress space), in bytes.
623
624
625 --rlimit-cpu=NUM
626 Specify the CPU time limit, in seconds.
627
628
629 --rlimit-nproc=NUM
630 Specify the maximum number of processes that can be created.
631
632
633 --rlimit-stack=NUM
634 Specify the maximum size of the process stack, in bytes.
635
636
637 --rlimit-fsize=NUM
638 Specify the maximum size of files that the process may create,
639 in bytes.
640
641
642 --sysroot=DIR
643 Specify sysroot directory where target files (executables, li‐
644 braries, etc.) are located. With -r RELEASE, the sysroot will
645 be searched for the appropriate kernel build directory. With -r
646 /DIR, however, the sysroot will not be used to find the kernel
647 build.
648
649
650 --sysenv=VAR=VALUE
651 Provide an alternate value for an environment variable where the
652 value on a remote system differs. Path variables (e.g. PATH,
653 LD_LIBRARY_PATH) are assumed to be relative to the directory
654 provided by --sysroot, if provided.
655
656
657 --suppress-time-limits
658 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
659 and -DMAXTRYLOCK. This option requires guru mode.
660
661
662 --runtime=MODE
663 Set the pass-5 runtime mode. Valid options are kernel (de‐
664 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
665 information.
666
667
668 --dyninst
669 Shorthand for --runtime=dyninst.
670
671
672 --bpf Shorthand for --runtime=bpf.
673
674
675 --save-uprobes
676 On machines that require SystemTap to build its own uprobes mod‐
677 ule (kernels prior to version 3.5), this option instructs Sys‐
678 temTap to also save a copy of the module in the current directo‐
679 ry (creating a new "uprobes" directory first).
680
681
682 --target-namespaces=PID
683 Allow for a set of target namespaces to be set based on the
684 namespaces the given PID is in. This is for namespace-aware
685 tapset functions. If the target namespaces was not set, the tar‐
686 get defaults to the stap process' namespaces.
687
688
689 --monitor=INTERVAL
690 Enables an interface to display status information about the
691 module(uptime, module name, invoker uid, memory sizes, global
692 variables, list of probes with their statistics). An optional
693 argument INTERVAL can be supplied to set the refresh rate in
694 seconds of the status window. The module can also be controlled
695 by a list of commands using the following keys:
696
697 c Resets all global variables to their initial values or
698 zeroes them if they did not have an initial value.
699
700 s Rotates the attribute used to sort the list of probes.
701
702 t Brings up a prompt to allow toggling(on/off) of probes by
703 index. Probe points are still affected by their condi‐
704 tions.
705
706 r Resumes the script by toggling on all probes.
707
708 p Pauses the script by toggling off all probes.
709
710 x Hides/shows the status window. This allows for more out‐
711 put to be seen.
712
713 navigation-keys
714 The navigation keys can be used to scroll up and down the
715 windows.
716
717 Tab Toggle scrolling between status and output windows.
718
719
720 --example SCRIPT
721 This option is used to run example scripts without having to en‐
722 ter the entire path to the script. SCRIPT must match a valid ex‐
723 ample script. For a list of available examples see
724 http://sourceware.org/systemtap/examples/
725
726
728 Any additional arguments on the command line are passed to the script
729 parser for substitution. See below.
730
731
733 The systemtap script language resembles awk and C. There are two main
734 outermost constructs: probes and functions. Within these, statements
735 and expressions use C-like operator syntax and precedence.
736
737
738 GENERAL SYNTAX
739 Whitespace is ignored. Three forms of comments are supported:
740 # ... shell style, to the end of line, except for $# and @#
741 // ... C++ style, to the end of line
742 /* ... C style ... */
743 Literals are either strings enclosed in double-quotes (passing through
744 the usual C escape codes with backslashes, and with adjacent string
745 literals glued together, also as in C), or integers (in decimal, hexa‐
746 decimal, or octal, using the same notation as in C). All strings are
747 limited in length to some reasonable value (a few hundred bytes). In‐
748 tegers are 64-bit signed quantities, although the parser also accepts
749 (and wraps around) values above positive 2**63.
750
751 In addition, script arguments given at the end of the command line may
752 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
753 insertion as a string literal. The number of arguments may be accessed
754 through $# (as an unquoted number) or through @# (as a quoted number).
755 These may be used at any place a token may begin, including within the
756 preprocessing stage. Reference to an argument number beyond what was
757 actually given is an error.
758
759
760 PREPROCESSING
761 A simple conditional preprocessing stage is run as a part of parsing.
762 The general form is similar to the cond ? exp1 : exp2 ternary operator:
763
764 %( CONDITION %? TRUE-TOKENS %)
765 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
766
767 The CONDITION is either an expression whose format is determined by its
768 first keyword, or a string literals comparison or a numeric literals
769 comparison. It can be also composed of many alternatives and conjunc‐
770 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
771 spectively. However, parentheses are not supported yet, so remembering
772 that conjunction takes precedence over alternative is important.
773
774 If the first part is the identifier kernel_vr or kernel_v to refer to
775 the kernel version number, with ("2.6.13-1.322FC3smp") or without
776 ("2.6.13") the release code suffix, then the second part is one of the
777 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
778 the third part is a string literal that contains an RPM-style version-
779 release value. The condition is deemed satisfied if the version of the
780 target kernel (as optionally overridden by the -r option) compares to
781 the given version string. The comparison is performed by the glibc
782 function strverscmp. As a special case, if the operator is for simple
783 equality (==), or inequality (!=), and the third part contains any
784 wildcard characters (* or ? or [), then the expression is treated as a
785 wildcard (mis)match as evaluated by fnmatch.
786
787 If, on the other hand, the first part is the identifier arch to refer
788 to the processor architecture (as named by the kernel build system
789 ARCH/SUBARCH), then the second part is one of the two string comparison
790 operators == or !=, and the third part is a string literal for matching
791 it. This comparison is a wildcard (mis)match.
792
793 Similarly, if the first part is an identifier like CONFIG_something to
794 refer to a kernel configuration option, then the second part is == or
795 !=, and the third part is a string literal for matching the value (com‐
796 monly "y" or "m"). Nonexistent or unset kernel configuration options
797 are represented by the empty string. This comparison is also a wild‐
798 card (mis)match.
799
800 If the first part is the identifier systemtap_v, the test refers to the
801 systemtap compatibility version, which may be overridden for old
802 scripts with the --compatible flag. The comparison operator is as is
803 for kernel_v and the right operand is a version string. See also the
804 DEPRECATION section below.
805
806 If the first part is the identifier systemtap_privilege, the test
807 refers to the privilege level that the systemtap script is compiled
808 with. Here the second part is == or !=, and the third part is a string
809 literal, either "stapusr" or "stapsys" or "stapdev".
810
811 If the first part is the identifier guru_mode, the test refers to if
812 the systemtap script is compiled with guru_mode. Here the second part
813 is == or !=, and the third part is a number, either 1 or 0.
814
815 If the first part is the identifier runtime, the test refers to the
816 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
817 tion on runtimes. The second part is one of the two string comparison
818 operators == or !=, and the third part is a string literal for matching
819 it. This comparison is a wildcard (mis)match.
820
821 Otherwise, the CONDITION is expected to be a comparison between two
822 string literals or two numeric literals. In this case, the arguments
823 are the only variables usable.
824
825 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
826 (possibly including nested preprocessor conditionals), and are passed
827 into the input stream if the condition is true or false. For example,
828 the following code induces a parse error unless the target kernel ver‐
829 sion is newer than 2.6.5:
830
831 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
832
833 The following code might adapt to hypothetical kernel version drift:
834
835 probe kernel.function (
836 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
837 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
838 UNSUPPORTED %) %)
839 ) { /* ... */ }
840
841 %( arch == "ia64" %?
842 probe syscall.vliw = kernel.function("vliw_widget") {}
843 %)
844
845
846
847 PREPROCESSOR MACROS
848 The preprocessor also supports a simple macro facility, run as a sepa‐
849 rate pass before conditional preprocessing.
850
851 Macros are defined using the following construct:
852
853 @define NAME %( BODY %)
854 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
855
856 Macros, and parameters inside a macro body, are both invoked by prefix‐
857 ing the macro name with an @ symbol:
858
859 @define foo %( x %)
860 @define add(a,b) %( ((@a)+(@b)) %)
861
862 @foo = @add(2,2)
863
864
865 Macro expansion is currently performed in a separate pass before condi‐
866 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
867 tional expressions will be macroexpanded regardless of how the condi‐
868 tion is evaluated. This can sometimes lead to errors:
869
870 // The following results in a conflict:
871 %( CONFIG_UTRACE == "y" %?
872 @define foo %( process.syscall %)
873 %:
874 @define foo %( **ERROR** %)
875 %)
876
877 // The following works properly as expected:
878 @define foo %(
879 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
880 %)
881
882 The first example is incorrect because both @defines are evaluated in a
883 pass prior to the conditional being evaluated.
884
885 Normally, a macro definition is local to the file it occurs in. Thus,
886 defining a macro in a tapset does not make it available to the user of
887 the tapset. Publically available library macros can be defined by in‐
888 cluding .stpm files on the tapset search path. These files may only
889 contain @define constructs, which become visible across all tapsets and
890 user scripts. Optionally, within the .stpm files, a public macro defi‐
891 nition can be surrounded by a preprocessor conditional as described
892 above.
893
894
895 CONSTANTS
896 Tapsets or guru-mode user scripts can access header file constant to‐
897 kens, typically macros, using built-in @const() operator. The respec‐
898 tive header file inclusion is possible either via the tapset library,
899 or using a top-level guru mode embedded-C construct. This results in
900 appropriate embedded C pragma comments setting.
901
902 @const("STP_SKIP_BADVARS")
903
904
905
906 VARIABLES
907 Identifiers for variables and functions are an alphanumeric sequence,
908 and may include _ and $ characters. They may not start with a plain
909 digit, as in C. Each variable is by default local to the probe or
910 function statement block within which it is mentioned, and therefore
911 its scope and lifetime is limited to a particular probe or function in‐
912 vocation.
913
914 Scalar variables are implicitly typed as either string or integer. As‐
915 sociative arrays also have a string or integer value, and a tuple of
916 strings and/or integers serving as a key. Here are a few basic expres‐
917 sions.
918
919 var1 = 5
920 var2 = "bar"
921 array1 [pid()] = "name" # single numeric key
922 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
923 if (["hello",5,4] in array2) println ("yes") # membership test
924
925
926 The translator performs type inference on all identifiers, including
927 array indexes and function parameters. Inconsistent type-related use
928 of identifiers signals an error.
929
930 Variables may be declared global, so that they are shared amongst all
931 probes and functions and live as long as the entire systemtap session.
932 There is one namespace for all global variables, regardless of which
933 script file they are found within. Concurrent access to global vari‐
934 ables is automatically protected with locks, see the SAFETY AND SECURI‐
935 TY section for more details. A global declaration may be written at
936 the outermost level anywhere, not within a block of code. Global vari‐
937 ables which are written but never read will be displayed automatically
938 at session shutdown. The translator will infer for each its value
939 type, and if it is used as an array, its key types. Optionally, scalar
940 globals may be initialized with a string or number literal. The fol‐
941 lowing declaration marks variables as global.
942
943 global var1, var2, var3=4
944
945
946 Global variables can also be set as module options. One can do this by
947 either using the -G option, or the module must first be compiled using
948 stap -p4. Global variables can then be set on the command line when
949 calling staprun on the module generated by stap -p4. See staprun(8) for
950 more information.
951
952 The scope of a global variable may be limited to a tapset or user
953 script file using private keyword. The global keyword is optional when
954 defining a private global variable. Following declaration marks var1
955 and var2 private globals.
956
957 private global var1=2
958 private var2
959
960
961 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
962 SAFETY AND SECURITY section for details. Optionally, global arrays may
963 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
964 for that array only. Note that this doesn't indicate the type of keys
965 for the array, just the size.
966
967 global tiny_array[10], normal_array, big_array[50000]
968
969
970 Arrays may be configured for wrapping using the '%' suffix. This caus‐
971 es older elements to be overwritten if more elements are inserted than
972 the array can hold. This works for both associative and statistics
973 typed arrays.
974
975 global wrapped_array1%[10], wrapped_array2%
976
977
978
979 Many types of probe points provide context variables, which are run-
980 time values, safely extracted from the kernel or userspace program be‐
981 ing probed. These are prefixed with the $ character. The CONTEXT
982 VARIABLES section in stapprobes(3stap) lists what is available for each
983 type of probe point. These context variables become normal string or
984 numeric scalars once they are stored in normal script variables. See
985 the TYPECASTING section below on how to to turn them back into typed
986 pointers for further processing as context variables.
987
988
989 STATEMENTS
990 Statements enable procedural control flow. They may occur within func‐
991 tions and probe handlers. The total number of statements executed in
992 response to any single probe event is limited to some number defined by
993 the MAXACTION macro in the translated C code, and is in the neighbour‐
994 hood of 1000.
995
996 EXP Execute the string- or integer-valued expression and throw away
997 the value.
998
999 { STMT1 STMT2 ... }
1000 Execute each statement in sequence in this block. Note that
1001 separators or terminators are generally not necessary between
1002 statements.
1003
1004 ; Null statement, do nothing. It is useful as an optional separa‐
1005 tor between statements to improve syntax-error detection and to
1006 handle certain grammar ambiguities.
1007
1008 if (EXP) STMT1 [ else STMT2 ]
1009 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1010 ro) or second STMT (zero).
1011
1012 while (EXP) STMT
1013 While integer-valued EXP evaluates to non-zero, execute STMT.
1014
1015 for (EXP1; EXP2; EXP3) STMT
1016 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1017 STMT, then the iteration expression EXP3.
1018
1019 foreach (VAR in ARRAY [ limit EXP ]) STMT
1020 Loop over each element of the named global array, assigning cur‐
1021 rent key to VAR. The array may not be modified within the
1022 statement. By adding a single + or - operator after the VAR or
1023 the ARRAY identifier, the iteration will proceed in a sorted or‐
1024 der, by ascending or descending index or value. If the array
1025 contains statistics aggregates, adding the desired @operator be‐
1026 tween the ARRAY identifier and the + or - will specify the sort‐
1027 ing aggregate function. See the STATISTICS section below for
1028 the ones available. Default is @count. Using the optional lim‐
1029 it keyword limits the number of loop iterations to EXP times.
1030 EXP is evaluated once at the beginning of the loop.
1031
1032 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1033 Same as above, used when the array is indexed with a tuple of
1034 keys. A sorting suffix may be used on at most one VAR or ARRAY
1035 identifier.
1036
1037 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1038 ]) STMT
1039 Same as above, where iterations are limited to elements in the
1040 array where the keys match the index values specified. The sym‐
1041 bol * can be used to specify an index and will be treated as a
1042 wildcard.
1043
1044 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1045 This variant of foreach saves current value into VAR0 on each
1046 iteration, so it is the same as ARRAY[VAR]. This also works
1047 with a tuple of keys. Sorting suffixes on VAR0 have the same
1048 effect as on ARRAY.
1049
1050 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1051 Same as above, where iterations are limited to elements in the
1052 array where the keys match the index values specified. The sym‐
1053 bol * can be used to specify an index and will be treated as a
1054 wildcard.
1055
1056 break, continue
1057 Exit or iterate the innermost nesting loop (while or for or
1058 foreach) statement.
1059
1060 return EXP
1061 Return EXP value from enclosing function. If the function's
1062 value is not taken anywhere, then a return statement is not
1063 needed, and the function will have a special "unknown" type with
1064 no return value.
1065
1066 next Return now from enclosing probe handler. This is especially
1067 useful in probe aliases that apply event filtering predicates.
1068 When used in functions, the execution will be immediately trans‐
1069 ferred to the next overloaded function.
1070
1071 try { STMT1 } catch { STMT2 }
1072 Run the statements in the first block. Upon any run-time er‐
1073 rors, abort STMT1 and start executing STMT2. Any errors in
1074 STMT2 will propagate to outer try/catch blocks, if any.
1075
1076 try { STMT1 } catch(VAR) { STMT2 }
1077 Same as above, plus assign the error message to the string
1078 scalar variable VAR.
1079
1080 delete ARRAY[INDEX1, INDEX2, ...]
1081 Remove from ARRAY the element specified by the index tuple. If
1082 the index tuple contains a * in place of an index, the * is
1083 treated as a wildcard and all elements with keys that match the
1084 index tuple will be removed from ARRAY. The value will no
1085 longer be available, and subsequent iterations will not report
1086 the element. It is not an error to delete an element that does
1087 not exist.
1088
1089 delete ARRAY
1090 Remove all elements from ARRAY.
1091
1092 delete SCALAR
1093 Removes the value of SCALAR. Integers and strings are cleared
1094 to 0 and "" respectively, while statistics are reset to the ini‐
1095 tial empty state.
1096
1097
1098 EXPRESSIONS
1099 Systemtap supports a number of operators that have the same general
1100 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1101 formed as per typical C rules for signed integers. Division by zero or
1102 overflow is detected and results in an error.
1103
1104 binary numeric operators
1105 * / % + - >> << & ^ | && ||
1106
1107 binary string operators
1108 . (string concatenation)
1109
1110 numeric assignment operators
1111 = *= /= %= += -= >>= <<= &= ^= |=
1112
1113 string assignment operators
1114 = .=
1115
1116 unary numeric operators
1117 + - ! ~ ++ --
1118
1119 binary numeric, string comparison or regex matching operators
1120 < > <= >= == != =~ !~
1121
1122 ternary operator
1123 cond ? exp1 : exp2
1124
1125 grouping operator
1126 ( exp )
1127
1128 function call
1129 fn ([ arg1, arg2, ... ])
1130
1131 array membership check
1132 exp in array
1133 [exp1, exp2, ...] in array
1134 [*, *, ... ]in array
1135
1136
1137 REGULAR EXPRESSION MATCHING
1138 The scripting language supports regular expression matching. The basic
1139 syntax is as follows:
1140
1141 exp =~ regex
1142 exp !~ regex
1143
1144 (The first operand must be an expression evaluating to a string; the
1145 second operand must be a string literal containing a syntactically
1146 valid regular expression.)
1147
1148 The regular expression syntax supports most of the features of POSIX
1149 Extended Regular Expressions, except for subexpression reuse ("\1")
1150 functionality.
1151
1152 After a successful match, the contents of the matched string and subex‐
1153 pressions can be extracted using the matched() and ngroups() tapset
1154 functions as follows:
1155
1156 if ("an example string" =~ "str(ing)") {
1157 matched(0) // -> returns "string", the matched substring
1158 matched(1) // -> returns "ing", the 1st matched subexpression
1159 ngroups() // -> returns 2, the number of matched groups
1160 }
1161
1162
1163 PROBES
1164 The main construct in the scripting language identifies probes. Probes
1165 associate abstract events with a statement block ("probe handler") that
1166 is to be executed when any of those events occur. The general syntax
1167 is as follows:
1168
1169 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1170 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1171
1172
1173 Events are specified in a special syntax called "probe points". There
1174 are several varieties of probe points defined by the translator, and
1175 tapset scripts may define further ones using aliases. Probe points may
1176 be wildcarded, grouped, or listed in preference sequences, or declared
1177 optional. More details on probe point syntax and semantics are listed
1178 on the stapprobes(3stap) manual page.
1179
1180 The probe handler is interpreted relative to the context of each event.
1181 For events associated with kernel code, this context may include vari‐
1182 ables defined in the source code at that spot. These "context vari‐
1183 ables" are presented to the script as variables whose names are pre‐
1184 fixed with "$". They may be accessed only if the kernel's compiler
1185 preserved them despite optimization. This is the same constraint that
1186 a debugger user faces when working with optimized code. In addition,
1187 the objects must exist in paged-in memory at the moment of the system‐
1188 tap probe handler's execution, because systemtap must not cause (sup‐
1189 presses) any additional paging. Some probe types have very little con‐
1190 text. See the stapprobes(3stap) man pages to see the kinds of context
1191 variables available at each kind of probe point.
1192
1193 Probes may be decorated with an arming condition, consisting of a sim‐
1194 ple boolean expression on read-only global script variables. While
1195 disarmed (inactive, condition evaluates to false), some probe types re‐
1196 duce or eliminate their run-time overheads. When an arming condition
1197 evaluates to true, probes will be soon re-armed, and their probe han‐
1198 dlers will start getting called as the events fire. (Some events may
1199 be lost during the arming interval. If this is unacceptable, do not
1200 use arming conditions for those probes.) Example of the syntax:
1201
1202 probe timer.us(TIMER) if (enabled) {
1203 }
1204
1205
1206 New probe points may be defined using "aliases". Probe point aliases
1207 look similar to probe definitions, but instead of activating a probe at
1208 the given point, it just defines a new probe point name as an alias to
1209 an existing one. There are two types of alias, i.e. the prologue style
1210 and the epilogue style which are identified by "=" and "+=" respective‐
1211 ly.
1212
1213 For prologue style alias, the statement block that follows an alias
1214 definition is implicitly added as a prologue to any probe that refers
1215 to the alias. While for the epilogue style alias, the statement block
1216 that follows an alias definition is implicitly added as an epilogue to
1217 any probe that refers to the alias. For example:
1218
1219 probe syscall.read = kernel.function("sys_read") {
1220 fildes = $fd
1221 if (execname() == "init") next # skip rest of probe
1222 }
1223
1224 defines a new probe point syscall.read, which expands to
1225 kernel.function("sys_read"), with the given statement as a prologue,
1226 which is useful to predefine some variables for the alias user and/or
1227 to skip probe processing entirely based on some conditions. And
1228
1229 probe syscall.read += kernel.function("sys_read") {
1230 if (tracethis) println ($fd)
1231 }
1232
1233 defines a new probe point with the given statement as an epilogue,
1234 which is useful to take actions based upon variables set or left over
1235 by the the alias user. Please note that in each case, the statements
1236 in the alias handler block are treated ordinarily, so that variables
1237 assigned there constitute mere initialization, not a macro substitu‐
1238 tion.
1239
1240 An alias is used just like a built-in probe type.
1241
1242 probe syscall.read {
1243 printf("reading fd=%d\n", fildes)
1244 if (fildes > 10) tracethis = 1
1245 }
1246
1247
1248
1249 FUNCTIONS
1250 Systemtap scripts may define subroutines to factor out common work.
1251 Functions take any number of scalar (integer or string) arguments, and
1252 must return a single scalar (integer or string). An example function
1253 declaration looks like this:
1254
1255 function thisfn (arg1, arg2) {
1256 return arg1 + arg2
1257 }
1258
1259 Note the general absence of type declarations, which are instead in‐
1260 ferred by the translator. However, if desired, a function definition
1261 may include explicit type declarations for its return value and/or its
1262 arguments. This is especially helpful for embedded-C functions. In
1263 the following example, the type inference engine need only infer type
1264 type of arg2 (a string).
1265
1266 function thatfn:string (arg1:long, arg2) {
1267 return sprint(arg1) . arg2
1268 }
1269
1270 Functions may call others or themselves recursively, up to a fixed
1271 nesting limit. This limit is defined by the MAXNESTING macro in the
1272 translated C code and is in the neighbourhood of 10.
1273
1274 Functions may be marked private using the private keyword to limit
1275 their scope to the tapset or user script file they are defined in. An
1276 example definition of a private function follows:
1277
1278 private function three:long () { return 3 }
1279
1280
1281 Functions terminating without reaching an explicit return statement
1282 will return an implicit 0 or "", determined by type inference.
1283
1284 Functions may be overloaded during both runtime and compile time.
1285
1286 Runtime overloading allows the executed function to be selected while
1287 the module is running based on runtime conditions and is achieved using
1288 the "next" statement in script functions and STAP_NEXT macro for embed‐
1289 ded-C functions. For example,
1290
1291
1292 function f() { if (condition) next; print("first function") }
1293 function f() %{ STAP_NEXT; print("second function") %}
1294 function f() { print("third function") }
1295
1296
1297 During a functioncall f(), the execution will transfer to the third
1298 function if condition evaluates to true and print "third function".
1299 Note that the second function is unconditionally nexted.
1300
1301 Parameter overloading allows the function to be executed to be selected
1302 at compile time based on the number of arguments provided to the func‐
1303 tioncall. For example,
1304
1305
1306 function g() { print("first function") }
1307 function g(x) { print("second function") }
1308 g() -> "first function"
1309 g(1) -> "second function"
1310
1311
1312 Note that runtime overloading does not occur in the above example, as
1313 exactly one function will be resolved for the functioncall. The use of
1314 a next statement inside a function while no more overloads remain will
1315 trigger a runtime exception Runtime overloading will only occur if the
1316 functions have the same arity, functions with the same name but differ‐
1317 ent number of parameters are completely unrelated.
1318
1319 Execution order is determined by a priority value which may be speci‐
1320 fied. If no explicit priority is specified, user script functions are
1321 given a higher priority than library functions. User script functions
1322 and library functions are assigned a default priority value of 0 and 1
1323 respectively. Functions with the same priority are executed in decla‐
1324 ration order. For example,
1325
1326
1327 function f():3 { if (condition) next; print("first function") }
1328 function f():1 { if (condition) next; print("second function") }
1329 function f():2 { print("third function") }
1330
1331
1332 Since the second function has highest priority, it is executed first.
1333 The first function is never executed as there no "next" statements in
1334 the third function to transfer execution.
1335
1336
1337 PRINTING
1338 There are a set of function names that are specially treated by the
1339 translator. They format values for printing to the standard systemtap
1340 output stream in a more convenient way (note that data generated in the
1341 kernel module need to get transferred to user-space in order to get
1342 printed).
1343
1344 The sprint* variants return the formatted string instead of printing
1345 it.
1346
1347 print, sprint
1348 Print one or more values of any type, concatenated directly to‐
1349 gether.
1350
1351 println, sprintln
1352 Print values like print and sprint, but also append a newline.
1353
1354 printd, sprintd
1355 Take a string delimiter and two or more values of any type, and
1356 print the values with the delimiter interposed. The delimiter
1357 must be a literal string constant.
1358
1359 printdln, sprintdln
1360 Print values with a delimiter like printd and sprintd, but also
1361 append a newline.
1362
1363 printf, sprintf
1364 Take a formatting string and a number of values of corresponding
1365 types, and print them all. The format must be a literal string
1366 constant.
1367
1368 The printf formatting directives similar to those of C, except that
1369 they are fully type-checked by the translator:
1370
1371 %b Writes a binary blob of the value given, instead of ASCII
1372 text. The width specifier determines the number of bytes
1373 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1374 fault (%b) is 8 bytes.
1375
1376 %c Character.
1377
1378 %d,%i Signed decimal.
1379
1380 %m Safely reads kernel (without #) or user (with #) memory
1381 at the given address, outputs its content. The optional
1382 precision specifier (not field width) determines the num‐
1383 ber of bytes to read - default is 1 byte. %10.4m prints
1384 4 bytes of the memory in a 10-character-wide field.
1385 Note, on some architectures user memory can still be read
1386 without #.
1387
1388 %M Same as %m, but outputs in hexadecimal. The minimal size
1389 of output is double the optional precision specifier -
1390 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1391 of the memory as 8 hexadecimal characters in a 10-charac‐
1392 ter-wide field. %.*M hex-dumps a given number of bytes
1393 from a given buffer.
1394
1395 %o Unsigned octal.
1396
1397 %p Unsigned pointer address.
1398
1399 %s String.
1400
1401 %u Unsigned decimal.
1402
1403 %x Unsigned hex value, in all lower-case.
1404
1405 %X Unsigned hex value, in all upper-case.
1406
1407 %% Writes a %.
1408
1409 The # flag selects the alternate forms. For octal, this prefixes a 0.
1410 For hex, this prefixes 0x or 0X, depending on case. For characters,
1411 this escapes non-printing values with either C-like escapes or raw oc‐
1412 tal. In the case of %#m/%#M, this safely accesses user space memory
1413 rather than kernel space memory.
1414
1415 Examples:
1416
1417 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1418 print("hello")
1419 Prints: hello
1420 println(b)
1421 Prints: bob\n
1422 println(a . " is " . sprint(16))
1423 Prints: alice is 16
1424 foreach (name in id) printdln("|", strlen(name), name, id[name])
1425 Prints: 5|alice|1234\n3|bob|4567
1426 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1427 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1428 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1429 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1430 printf("%4b", p)
1431 Prints (these values as binary data): 0x1234abcd
1432 printf("%#o %#x %#X\n", 1, 2, 3)
1433 Prints: 01 0x2 0X3
1434 printf("%#c %#c %#c\n", 0, 9, 42)
1435 Prints: \000 \t *
1436
1437
1438
1439 STATISTICS
1440 It is often desirable to collect statistics in a way that avoids the
1441 penalties of repeatedly exclusive locking the global variables those
1442 numbers are being put into. Systemtap provides a solution using a spe‐
1443 cial operator to accumulate values, and several pseudo-functions to ex‐
1444 tract the statistical aggregates.
1445
1446 The aggregation operator is <<<, and resembles an assignment, or a C++
1447 output-streaming operation. The left operand specifies a scalar or ar‐
1448 ray-index lvalue, which must be declared global. The right operand is
1449 a numeric expression. The meaning is intuitive: add the given number
1450 to the pile of numbers to compute statistics of. (The specific list of
1451 statistics to gather is given separately, by the extraction functions.)
1452
1453 foo <<< 1
1454 stats[pid()] <<< memsize
1455
1456
1457 The extraction functions are also special. For each appearance of a
1458 distinct extraction function operating on a given identifier, the
1459 translator arranges to compute a set of statistics that satisfy it.
1460 The statistics system is thereby "on-demand". Each execution of an ex‐
1461 traction function causes the aggregation to be computed for that moment
1462 across all processors.
1463
1464 Here is the set of extractor functions. The first argument of each is
1465 the same style of lvalue used on the left hand side of the accumulate
1466 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1467 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1468 mum/average/variance of all accumulated values. The resulting values
1469 are all simple integers. Arrays containing aggregates may be sorted
1470 and iterated. See the foreach construct above.
1471
1472 Variance uses Welford's online algorithm. The calculations are based
1473 on integer arithmetic, and so may suffer from low precision and over‐
1474 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1475 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1476 Only one value of bit-shift may be used with given global variable. A
1477 larger bitshift value increases precision, but increases the likelihood
1478 of overflow.
1479
1480
1481 $ stap -e \
1482 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1483 12
1484 $ stap -e \
1485 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1486 2
1487 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1488 2.5
1489 $
1490
1491
1492 Overflow (from internal multiplication of large numbers) may occur and
1493 may cause a negative variance result. Consider normalizing your input
1494 data. Adding or subtracting a fixed value from all variance inputs
1495 preserves the original variance. Dividing the variance inputs by a
1496 fixed value shrinks the original variance by that value squared.
1497
1498
1499
1500 Histograms are also available, but are more complicated because they
1501 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1502 terval) represents a linear histogram from "start" to "stop" by incre‐
1503 ments of "interval". The interval must be positive. Similarly,
1504 @hist_log(v) represents a base-2 logarithmic histogram. Printing a his‐
1505 togram with the print family of functions renders a histogram object as
1506 a tabular "ASCII art" bar chart.
1507
1508 probe timer.profile {
1509 x[1] <<< pid()
1510 x[2] <<< uid()
1511 y <<< tid()
1512 }
1513 global x // an array containing aggregates
1514 global y // a scalar
1515 probe end {
1516 foreach ([i] in x @count+) {
1517 printf ("x[%d]: avg %d = sum %d / count %d\n",
1518 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1519 println (@hist_log(x[i]))
1520 }
1521 println ("y:")
1522 println (@hist_log(y))
1523 }
1524
1525
1526
1527 TYPECASTING
1528 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1529 has been saved into a script integer variable, the translator loses the
1530 type information necessary to access members from that pointer. Using
1531 the @cast() operator tells the translator how to interpret the number
1532 as a typed pointer.
1533
1534 @cast(p, "type_name"[, "module"])->member
1535
1536
1537 This will interpret p as a pointer to a struct/union named type_name
1538 and dereference the member value. Further ->subfield expressions may
1539 be appended to dereference more levels. Note that for direct derefer‐
1540 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1541 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1542 ing operator -> is used to refer to both direct containment or pointer
1543 indirection. Systemtap automatically determines which. The optional
1544 module tells the translator where to look for information about that
1545 type. Multiple modules may be specified as a list with : separators.
1546 If the module is not specified, it will default either to the probe
1547 module for dwarf probes, or to "kernel" for functions and all other
1548 probes types.
1549
1550 The translator can create its own module with type information from a
1551 header surrounded by angle brackets, in case normal debuginfo is not
1552 available. For kernel headers, prefix it with "kernel" to use the ap‐
1553 propriate build system. All other headers are built with default GCC
1554 parameters into a user module. Multiple headers may be specified in
1555 sequence to resolve a codependency.
1556
1557 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1558 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1559 @cast(task, "task_struct",
1560 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1561
1562 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1563 operators, the same way as described in the CONTEXT VARIABLES section
1564 of the stapprobes(3stap) manual page.
1565
1566
1567 When in guru mode, the translator will also allow scripts to assign new
1568 values to members of typecasted pointers.
1569
1570 Typecasting is also useful in the case of void* members whose type may
1571 be determinable at runtime.
1572
1573 probe foo {
1574 if ($var->type == 1) {
1575 value = @cast($var->data, "type1")->bar
1576 } else {
1577 value = @cast($var->data, "type2")->baz
1578 }
1579 print(value)
1580 }
1581
1582
1583
1584 EMBEDDED C
1585 When in guru mode, the translator accepts embedded C code in the top
1586 level of the script. Such code is enclosed between %{ and %} markers,
1587 and is transcribed verbatim, without analysis, in some sequence, into
1588 the top level of the generated C code. At the outermost level, this
1589 may be useful to add #include instructions, and any auxiliary defini‐
1590 tions for use by other embedded code.
1591
1592 Another place where embedded code is permitted is as a function body.
1593 In this case, the script language body is replaced entirely by a piece
1594 of C code enclosed again between %{ and %} markers. This C code may do
1595 anything reasonable and safe. There are a number of undocumented but
1596 complex safety constraints on atomicity, concurrency, resource consump‐
1597 tion, and run time limits, so this is an advanced technique.
1598
1599 The memory locations set aside for input and output values are made
1600 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1601 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1602 The function may return early with STAP_RETURN. Here are some exam‐
1603 ples:
1604
1605 function integer_ops (val) %{
1606 STAP_PRINTF("%d\n", STAP_ARG_val);
1607 STAP_RETVALUE = STAP_ARG_val + 1;
1608 if (STAP_RETVALUE == 4)
1609 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1610 if (STAP_RETVALUE == 3)
1611 STAP_RETURN(0);
1612 STAP_RETVALUE ++;
1613 %}
1614 function string_ops (val) %{
1615 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1616 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1617 if (strcmp (STAP_RETVALUE, "three-two-one"))
1618 STAP_RETURN("parameter should be three-two-");
1619 %}
1620 function no_ops () %{
1621 STAP_RETURN(); /* function inferred with no return value */
1622 %}
1623
1624 The function argument and return value types have to be inferred by the
1625 translator from the call sites in order for this to work. The user
1626 should examine C code generated for ordinary script-language functions
1627 in order to write compatible embedded-C ones.
1628
1629 The last place where embedded code is permitted is as an expression
1630 rvalue. In this case, the C code enclosed between %{ and %} markers is
1631 interpreted as an ordinary expression value. It is assumed to be a
1632 normal 64-bit signed number, unless the marker /* string */ is includ‐
1633 ed, in which case it's treated as a string.
1634
1635 function add_one (val) {
1636 return val + %{ 1 %}
1637 }
1638 function add_string_two (val) {
1639 return val . %{ /* string */ "two" %}
1640 }
1641
1642
1643 The embedded-C code may contain markers to assert optimization and
1644 safety properties.
1645
1646 /* pure */
1647 means that the C code has no side effects and may be elided en‐
1648 tirely if its value is not used by script code.
1649
1650 /* stable */
1651 means that the C code always has the same value (in any given
1652 probe handler invocation), so repeated calls may be automatical‐
1653 ly replaced by memoized values. Such functions must take no pa‐
1654 rameters, and also be pure.
1655
1656 /* unprivileged */
1657 means that the C code is so safe that even unprivileged users
1658 are permitted to use it.
1659
1660 /* myproc-unprivileged */
1661 means that the C code is so safe that even unprivileged users
1662 are permitted to use it, provided that the target of the current
1663 probe is within the user's own process.
1664
1665 /* guru */
1666 means that the C code is so unsafe that a systemtap user must
1667 specify -g (guru mode) to use this.
1668
1669 /* unmangled */
1670 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1671 ment access syntax should be made available inside the function.
1672 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1673 THIS->foo and THIS->__retvalue respectively inside the function.
1674 This is useful for quickly migrating code written for SystemTap
1675 version 1.7 and earlier.
1676
1677 /* unmodified-fnargs */
1678 in an embedded-C function, means that the function arguments are
1679 not modified inside the function body.
1680
1681 /* string */
1682 in embedded-C expressions only, means that the expression has
1683 const char * type and should be treated as a string value, in‐
1684 stead of the default long numeric.
1685
1686 Script level global variables may be accessed in embedded-C functions
1687 and blocks. To read or write the global variable var , the /* prag‐
1688 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1689 the embedded-C function or block. This provides the macros STAP_GLOB‐
1690 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1691 spectively. For example:
1692
1693 global var
1694 global var2[100]
1695 function increment() %{
1696 /* pragma:read:var */ /* pragma:write:var */
1697 /* pragma:read:var2 */ /* pragma:write:var2 */
1698 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1699 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1700 %}
1701
1702 Variables may be read and set in both embedded-C functions and expres‐
1703 sions. Strings returned from embedded-C code are decayed to pointers.
1704 Variables must also be assigned at script level to allow for type in‐
1705 ference. Map assignment does not return the value written, so chaining
1706 does not work.
1707
1708
1709 BUILT-INS
1710 A set of builtin probe point aliases are provided by the scripts in‐
1711 stalled in the directory specified in the stappaths(7) manual page.
1712 The functions are described in the stapprobes(3stap) manual page.
1713
1714
1716 The translator begins pass 1 by parsing the given input script, and all
1717 scripts (files named *.stp) found in a tapset directory. The directo‐
1718 ries listed with -I are processed in sequence, each processed in "guru
1719 mode". For each directory, a number of subdirectories are also
1720 searched. These subdirectories are derived from the selected kernel
1721 version (the -R option), in order to allow more kernel-version-specific
1722 scripts to override less specific ones. For example, for a kernel ver‐
1723 sion 2.6.12-23.FC3 the following patterns would be searched, in se‐
1724 quence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1725 *.stp. Stopping the translator after pass 1 causes it to print the
1726 parse trees.
1727
1728
1729 In pass 2, the translator analyzes the input script to resolve symbols
1730 and types. References to variables, functions, and probe aliases that
1731 are unresolved internally are satisfied by searching through the parsed
1732 tapset script files. If any tapset script file is selected because it
1733 defines an unresolved symbol, then the entirety of that file is added
1734 to the translator's resolution queue. This process iterates until all
1735 symbols are resolved and a subset of tapset script files is selected.
1736
1737 Next, all probe point descriptions are validated against the wide vari‐
1738 ety supported by the translator. Probe points that refer to code loca‐
1739 tions ("synchronous probe points") require the appropriate kernel de‐
1740 bugging information to be installed. In the associated probe handlers,
1741 target-side variables (whose names begin with "$") are found and have
1742 their run-time locations decoded.
1743
1744 Next, all probes and functions are analyzed for optimization opportuni‐
1745 ties, in order to remove variables, expressions, and functions that
1746 have no useful value and no side-effect. Embedded-C functions are as‐
1747 sumed to have side-effects unless they include the magic string
1748 /* pure */. Since this optimization can hide latent code errors such
1749 as type mismatches or invalid $context variables, it sometimes may be
1750 useful to disable the optimizations with the -u option.
1751
1752 Finally, all variable, function, parameter, array, and index types are
1753 inferred from context (literals and operators). Stopping the transla‐
1754 tor after pass 2 causes it to list all the probes, functions, and vari‐
1755 ables, along with all inferred types. Any inconsistent or unresolved
1756 types cause an error.
1757
1758
1759 In pass 3, the translator writes C code that represents the actions of
1760 all selected script files, and creates a Makefile to build that into a
1761 kernel object. These files are placed into a temporary directory.
1762 Stopping the translator at this point causes it to print the contents
1763 of the C file.
1764
1765
1766 In pass 4, the translator invokes the Linux kernel build system to cre‐
1767 ate the actual kernel object file. This involves running make in the
1768 temporary directory, and requires a kernel module build system (head‐
1769 ers, config and Makefiles) to be installed in the usual spot /lib/mod‐
1770 ules/VERSION/build. Stopping the translator after pass 4 is the last
1771 chance before running the kernel object. This may be useful if you
1772 want to archive the file.
1773
1774
1775 In pass 5, the translator invokes the systemtap auxiliary program
1776 staprun program for the given kernel object. This program arranges to
1777 load the module then communicates with it, copying trace data from the
1778 kernel into temporary files, until the user sends an interrupt signal.
1779 Any run-time error encountered by the probe handlers, such as running
1780 out of memory, division by zero, exceeding nesting or runtime limits,
1781 results in a soft error indication. Soft errors in excess of MAXERRORS
1782 block of all subsequent probes (except error-handling probes), and ter‐
1783 minate the session. Finally, staprun unloads the module, and cleans
1784 up.
1785
1786
1787 ABNORMAL TERMINATION
1788 One should avoid killing the stap process forcibly, for example with
1789 SIGKILL, because the stapio process (a child process of the stap
1790 process) and the loaded module may be left running on the system. If
1791 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1792 then use rmmod to unload the systemtap module.
1793
1794
1795
1797 See the stapex(3stap) manual page for a brief collection of samples, or
1798 a large set of installed samples under the systemtap documenta‐
1799 tion/testsuite directories. See stappaths(7stap) for the likely loca‐
1800 tion of these on the system.
1801
1802
1804 The systemtap translator caches the pass 3 output (the generated C
1805 code) and the pass 4 output (the compiled kernel module) if pass 4 com‐
1806 pletes successfully. This cached output is reused if the same script
1807 is translated again assuming the same conditions exist (same kernel
1808 version, same systemtap version, etc.). Cached files are stored in the
1809 $SYSTEMTAP_DIR/cache directory. The cache can be limited by having the
1810 file cache_mb_limit placed in the cache directory (shown above) con‐
1811 taining only an ASCII integer representing how many MiB the cache
1812 should not exceed. In the absence of this file, a default will be cre‐
1813 ated with the limit set to 256MiB. This is a 'soft' limit in that the
1814 cache will be cleaned after a new entry is added if the cache clean in‐
1815 terval is exceeded, so the total cache size may temporarily exceed this
1816 limit. This interval can be specified by having the file
1817 cache_clean_interval_s placed in the cache directory (shown above) con‐
1818 taining only an ASCII integer representing the interval in seconds. In
1819 the absence of this file, a default will be created with the interval
1820 set to 300 s.
1821
1822
1824 Systemtap may be used as a powerful administrative tool. It can expose
1825 kernel internal data structures and potentially private user informa‐
1826 tion. (In dyninst runtime mode, this is not the case, see the ALTER‐
1827 NATE RUNTIMES section below.)
1828
1829 The translator asserts many safety constraints during compilation and
1830 more during run-time. It aims to ensure that no handler routine can
1831 run for very long, allocate boundless memory, perform unsafe opera‐
1832 tions, or in unintentionally interfere with the system. Uses of script
1833 global variables are automatically read/write locked as appropriate, to
1834 protect against manipulation by concurrent probe handlers. (Deadlocks
1835 are detected with timeouts. Use the -t flag to receive reports of ex‐
1836 cessive lock contention.) Experimenting with scripts is therefore gen‐
1837 erally safe. The guru-mode -g option allows administrators to bypass
1838 most safety measures, which permits invasive or state-changing opera‐
1839 tions, embedded-C code, and increases the risk of upset. By default,
1840 overload prevention is turned on for all modules. If you would like to
1841 disable overload processing, use the --suppress-time-limits option.
1842
1843 Errors that are caught at run time normally result in a clean script
1844 shutdown and a pass-5 error message. The --suppress-handler-errors op‐
1845 tion lets scripts tolerate soft errors without shutting down.
1846
1847
1848
1849 PERMISSIONS
1850 For the normal linux-kernel-module runtime, to run the kernel objects
1851 systemtap builds, a user must be one of the following:
1852
1853 · the root user;
1854
1855 · a member of the stapdev and stapusr groups;
1856
1857 · a member of the stapsys and stapusr groups; or
1858
1859 · a member of the stapusr group.
1860
1861 The root user or a user who is a member of both the stapdev and stapusr
1862 groups can build and run any systemtap script.
1863
1864 A user who is a member of both the stapsys and stapusr groups can only
1865 use pre-built modules under the following conditions:
1866
1867 · The module has been signed by a trusted signer. Trusted signers are
1868 normally systemtap compile-servers which sign modules when the
1869 --privilege option is specified by the client. See the stap-serv‐
1870 er(8) manual page for more information.
1871
1872 · The module was built using the --privilege=stapsys or the --privi‐
1873 lege=stapusr options.
1874
1875 Members of only the stapusr group can only use pre-built modules under
1876 the following conditions:
1877
1878 · The module is located in the /lib/modules/VERSION/systemtap direc‐
1879 tory. This directory must be owned by root and not be world
1880 writable.
1881
1882 or
1883
1884 · The module has been signed by a trusted signer. Trusted signers are
1885 normally systemtap compile-servers which sign modules when the
1886 --privilege option is specified by the client. See the stap-serv‐
1887 er(8) manual page for more information.
1888
1889 · The module was built using the --privilege=stapusr option.
1890
1891 The kernel modules generated by stap program are run by the staprun
1892 program. The latter is a part of the Systemtap package, dedicated to
1893 module loading and unloading (but only in the white zone), and kernel-
1894 to-user data transfer. Since staprun does not perform any additional
1895 security checks on the kernel objects it is given, it would be unwise
1896 for a system administrator to add untrusted users to the stapdev or
1897 stapusr groups.
1898
1899
1900 SECUREBOOT
1901 If the current system has SecureBoot turned on in the UEFI firmware,
1902 all kernel modules must be signed. (Some kernels may allow disabling
1903 SecureBoot long after booting with a key sequence such as SysRq-X, mak‐
1904 ing it unnecessary to sign modules.) The systemtap compile server can
1905 sign modules with a MOK (Machine Owner Key) that it has in common with
1906 a client system. See the following wiki page for more details:
1907
1908 https://sourceware.org/systemtap/wiki/SecureBoot
1909
1910
1911 RESOURCE LIMITS
1912 Many resource use limits are set by macros in the generated C code.
1913 These may be overridden with -D flags. A selection of these is as fol‐
1914 lows:
1915
1916 MAXNESTING
1917 Maximum number of nested function calls. Default determined by
1918 script analysis, with a bonus 10 slots added for recursive
1919 scripts.
1920
1921 MAXSTRINGLEN
1922 Maximum length of strings, default 128.
1923
1924 MAXTRYLOCK
1925 Maximum number of iterations to wait for locks on global vari‐
1926 ables before declaring possible deadlock and skipping the probe,
1927 default 1000.
1928
1929 MAXACTION
1930 Maximum number of statements to execute during any single probe
1931 hit (with interrupts disabled), default 1000. Note that for
1932 straight-through probe handlers lacking loops or recursion, due
1933 to optimization, this parameter may be interpreted too conserva‐
1934 tively.
1935
1936 MAXACTION_INTERRUPTIBLE
1937 Maximum number of statements to execute during any single probe
1938 hit which is executed with interrupts enabled (such as begin/end
1939 probes), default (MAXACTION * 10).
1940
1941 MAXBACKTRACE
1942 Maximum number of stack frames that will be be processed by the
1943 stap runtime unwinder as produced by the backtrace functions in
1944 the [u]context-unwind.stp tapsets, default 20.
1945
1946 MAXMAPENTRIES
1947 Maximum number of rows in any single global array, default 2048.
1948 Individual arrays may be declared with a larger or smaller limit
1949 instead:
1950
1951 global big[10000],little[5]
1952
1953 or denoted with % to make them wrap-around (replace old entries)
1954 automatically, as in
1955
1956 global big%
1957
1958 or both.
1959
1960 MAPHASHBIAS
1961 The number of powers-of-two to add or subtract from the natural
1962 size of the hash table backing each global associative array.
1963 Default is 0. Try small positive numbers to get extra perfor‐
1964 mance at the cost of more memory consumption, because that
1965 should reduce hash table collisions. Try small negative numbers
1966 for the opposite tradeoff.
1967
1968 MAXERRORS
1969 Maximum number of soft errors before an exit is triggered, de‐
1970 fault 0, which means that the first error will exit the script.
1971 Note that with the --suppress-handler-errors option, this limit
1972 is not enforced.
1973
1974 MAXSKIPPED
1975 Maximum number of skipped probes before an exit is triggered,
1976 default 100. Running systemtap with -t (timing) mode gives more
1977 details about skipped probes. With the default -DINTERRUPT‐
1978 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
1979 lated against this limit. Note that with the --suppress-han‐
1980 dler-errors option, this limit is not enforced.
1981
1982 MINSTACKSPACE
1983 Minimum number of free kernel stack bytes required in order to
1984 run a probe handler, default 1024. This number should be large
1985 enough for the probe handler's own needs, plus a safety margin.
1986
1987 MAXUPROBES
1988 Maximum number of concurrently armed user-space probes (up‐
1989 robes), default somewhat larger than the number of user-space
1990 probe points named in the script. This pool needs to be poten‐
1991 tially large because individual uprobe objects (about 64 bytes
1992 each) are allocated for each process for each matching script-
1993 level probe.
1994
1995 STP_MAXMEMORY
1996 Maximum amount of memory (in kilobytes) that the systemtap mod‐
1997 ule should use, default unlimited. The memory size includes the
1998 size of the module itself, plus any additional allocations.
1999 This only tracks direct allocations by the systemtap runtime.
2000 This does not track indirect allocations (as done by kprobes/up‐
2001 robes/etc. internals).
2002
2003 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2004 Maximum number of machine cycles spent in probes on any cpu per
2005 given interval, before an overload condition is declared and the
2006 script shut down. The defaults are 500 million and 1 billion,
2007 so as to limit stap script cpu consumption at around 50%.
2008
2009 STP_PROCFS_BUFSIZE
2010 Size of procfs probe read buffers (in bytes). Defaults to
2011 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2012 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2013
2014 With scripts that contain probes on any interrupt path, it is possible
2015 that those interrupts may occur in the middle of another probe handler.
2016 The probe in the interrupt handler would be skipped in this case to
2017 avoid reentrance. To work around this issue, execute stap with the op‐
2018 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2019 This does add some extra overhead to the probes, but it may prevent
2020 reentrance for common problem cases. However, probes in NMI handlers
2021 and in the callpath of the stap runtime may still be skipped due to
2022 reentrance.
2023
2024
2025 In case something goes wrong with stap or staprun after a probe has al‐
2026 ready started running, one may safely kill both user processes, and re‐
2027 move the active probe kernel module with rmmod. Any pending trace mes‐
2028 sages may be lost.
2029
2030
2032 Systemtap exposes kernel internal data structures and potentially pri‐
2033 vate user information. Because of this, use of systemtap's full capa‐
2034 bilities are restricted to root and to users who are members of the
2035 groups stapdev and stapusr.
2036
2037 However, a restricted set of systemtap's features can be made available
2038 to trusted, unprivileged users. These users are members of the group
2039 stapusr only, or members of the groups stapusr and stapsys. These
2040 users can load systemtap modules which have been compiled and certified
2041 by a trusted systemtap compile-server. See the descriptions of the op‐
2042 tions --privilege and --use-server. See README.unprivileged in the sys‐
2043 temtap source code for information about setting up a trusted compile
2044 server.
2045
2046 The restrictions enforced when --privilege=stapsys is specified are de‐
2047 signed to prevent unprivileged users from:
2048
2049 · harming the system maliciously.
2050
2051 The restrictions enforced when --privilege=stapusr is specified are de‐
2052 signed to prevent unprivileged users from:
2053
2054 · harming the system maliciously.
2055
2056 · gaining access to information which would not normally be
2057 available to an unprivileged user.
2058
2059 · disrupting the performance of processes owned by other users
2060 of the system. Some overhead to the system in general is
2061 unavoidable since the unprivileged user's probes will be
2062 triggered at the appropriate times. What we would like to
2063 avoid is targeted interruption of another user's processes
2064 which would not normally be possible by an unprivileged us‐
2065 er.
2066
2067
2068 PROBE RESTRICTIONS
2069 A member of the groups stapusr and stapsys may use all probe points.
2070
2071 A member of only the group stapusr may use only the following probes:
2072
2073 · begin, begin(n)
2074
2075 · end, end(n)
2076
2077 · error(n)
2078
2079 · never
2080
2081 · process.*, where the target process is owned by the user.
2082
2083 · timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2084
2085 · timer.hz(n)
2086
2087
2088 SCRIPT LANGUAGE RESTRICTIONS
2089 The following scripting language features are unavailable to all un‐
2090 privileged users:
2091
2092
2093 · any feature enabled by the Guru Mode (-g) option.
2094
2095 · embedded C code.
2096
2097
2098 RUNTIME RESTRICTIONS
2099 The following runtime restrictions are placed upon all unprivileged
2100 users:
2101
2102 · Only the default runtime code (see -R) may be used.
2103
2104 Additional restrictions are placed on members of only the group sta‐
2105 pusr:
2106
2107 · Probing of processes owned by other users is not permitted.
2108
2109 · Access of kernel memory (read and write) is not permitted.
2110
2111
2112 COMMAND LINE OPTION RESTRICTIONS
2113 Some command line options provide access to features which must not be
2114 available to all unprivileged users:
2115
2116
2117 · -g may not be specified.
2118
2119 · The following options may not be used by the compile-server
2120 client:
2121
2122 -a, -B, -D, -I, -r, -R
2123
2124
2125
2126 ENVIRONMENT RESTRICTIONS
2127 The following environment variables must not be set for all unprivi‐
2128 leged users:
2129
2130 SYSTEMTAP_RUNTIME
2131 SYSTEMTAP_TAPSET
2132 SYSTEMTAP_DEBUGINFO_PATH
2133
2134
2135
2136 TAPSET RESTRICTIONS
2137 In general, tapset functions are only available for members of the
2138 group stapusr when they do not gather information that an ordinary pro‐
2139 gram running with that user's privileges would be denied access to.
2140
2141 There are two categories of unprivileged tapset functions. The first
2142 category consists of utility functions that are unconditionally avail‐
2143 able to all users; these include such things as:
2144
2145 cpu:long ()
2146 exit ()
2147 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2148
2149
2150 The second category consists of so-called myproc-unprivileged functions
2151 that can only gather information within their own processes. Scripts
2152 that wish to use these functions must test the result of the tapset
2153 function is_myproc and only call these functions if the result is 1.
2154 The script will exit immediately if any of these functions are called
2155 by an unprivileged user within a probe within a process which is not
2156 owned by that user. Examples of myproc-unprivileged functions include:
2157
2158 print_usyms (stk:string)
2159 user_int:long (addr:long)
2160 usymname:string (addr:long)
2161
2162
2163 A compile error is triggered when any function not in either of the
2164 above categories is used by members of only the group stapusr.
2165
2166 No other built-in tapset functions may be used by members of only the
2167 group stapusr.
2168
2169
2171 As described above, systemtap's default runtime mode involves building
2172 and loading kernel modules, with various security tradeoffs presented.
2173 Systemtap now includes two new prototype backends: --runtime=dyninst
2174 and --runtime=bpf.
2175
2176 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2177 runtime. This backend does not use kernel modules, and does not require
2178 root privileges, but is restricted with respect to the kinds of probes
2179 and other constructs that a script may use. dyninst runtime operates in
2180 target-attach mode, so it does requirea -c COMMAND or -x PID process.
2181 For example:
2182
2183 stap --runtime=dyninst -c 'stap -V' \
2184 -e 'probe process.function("main")
2185 { println("hi from dyninst!") }'
2186
2187
2188 It may be necessary to disable a conflicting selinux check with
2189
2190 # setsebool allow_execstack 1
2191
2192
2193 --runtime=bpf compiles the user script into extended Berkeley Packet
2194 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2195 verified by the kernel for safety and are executed by an in-kernel vir‐
2196 tual machine. This runtime is in an early stage of development and
2197 currently lacks support for a number of features available in the de‐
2198 fault runtime. Please see the stapbpf(8) man page for more information
2199 (only on x86_64 systems).
2200
2201
2203 The systemtap translator generally returns with a success code of 0 if
2204 the requested script was processed and executed successfully through
2205 the requested pass. Otherwise, errors may be printed to stderr and a
2206 failure code is returned. Use -v or -vp N to increase (global or per-
2207 pass) verbosity to identify the source of the trouble.
2208
2209 In listings mode (-l and -L), error messages are normally suppressed.
2210 A success code of 0 is returned if at least one matching probe was
2211 found.
2212
2213 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2214 considered to be successful.
2215
2216
2218 Over time, some features of the script language and the tapset library
2219 may undergo incompatible changes, so that a script written against an
2220 old version of systemtap may no longer run. In these cases, it may
2221 help to run systemtap with the --compatible VERSION flag, specifying
2222 the last known working version. Running systemtap with the
2223 --check-version flag will output a warning if any possible incompatible
2224 elements have been parsed. Deprecation historical details may be found
2225 in the NEWS file.
2226
2227 The purpose of deprecation facility is to improve the experience of
2228 scripts written for newer versions of systemtap (by adding better al‐
2229 ternatives and removing conflicting or messy older alternatives), while
2230 at the same time permitting scripts written for older versions of sys‐
2231 temtap to continue running. Deprecation is thus intended a service to
2232 users (and an inconvenience to systemtap's developers), rather than the
2233 other way around.
2234
2235 Please note that underscore-prefixed identifiers in the tapset some‐
2236 times undergo such changes that are difficult to preserve compatibility
2237 for, even with the deprecation mechanisms. Avoid relying on these in
2238 your scripts; instead propose them for promotion to non-underscored
2239 status.
2240
2241
2242
2244 Important files and their corresponding paths can be located in the
2245 stappaths (7) manual page.
2246
2247
2249 stapprobes(3stap),
2250 function::*[24m(3stap),
2251 probe::*[24m(3stap),
2252 tapset::*[24m(3stap),
2253 stappaths(7),
2254 staprun(8),
2255 stapdyn(8),
2256 systemtap(8),
2257 stapvars(3stap),
2258 stapex(3stap),
2259 stap-server(8),
2260 stap-prep(1),
2261 stapref(1),
2262 awk(1),
2263 gdb(1)
2264
2265
2267 Use the Bugzilla link of the project web page or our mailing list.
2268 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2269
2270 error::reporting(7stap),
2271 https://sourceware.org/systemtap/wiki/HowToReportBugs
2272
2273
2274
2275 STAP(1)