1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a DynInst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use. For
46 online documentation, see the project web site,
47 https://sourceware.org/systemtap/
48
49
50 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
51 │man pages │ │
52 ├──────────────────────────┼──────────────────────────────────────────────────────┤
53 │stap (this page) │ language syntax, concepts, operation, options │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stapprobes │ probe points and their $context variables │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │stapref │ quick reference to language syntax │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │stappaths │ list of directories, including books & references │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stap-prep │ program to install auxiliary dependencies like ker‐ │
62 │ │ nel debuginfo │
63 ├──────────────────────────┼──────────────────────────────────────────────────────┤
64 │tapset::* │ generated list of tapsets │
65 ├──────────────────────────┼──────────────────────────────────────────────────────┤
66 │probe::* │ generated list of tapset probe aliases │
67 ├──────────────────────────┼──────────────────────────────────────────────────────┤
68 │function::* │ generated list of tapset functions │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │macro::* │ generated list of tapset macros │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │stapvars │ some of the tapset global variables │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │systemtap │ initscript, boot-time probing │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stap-server │ compilation server │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │stapex │ a few very basic script examples │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │books │ │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │Beginner's Guide │ tutorial book, language essentials, examples │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │Tutorial │ shorter tutorial, exercises │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │Language Reference │ detailed language manual, covers statistics/analysis │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Tapset Reference │ the tapset man pages, reformatted into a book │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │references │ │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
95 │ │ hacks to learn from │
96 └──────────────────────────┴──────────────────────────────────────────────────────┘
97
99 The systemtap translator supports the following options. Any other op‐
100 tion prints a list of supported options. Options may be given on the
101 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
102 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
103 faults to $HOME/.systemtap if unset.)
104
105
106 In some cases, the default value of an option depends on particular
107 system configuration and thus can't be mentioned here directly. In
108 some of those cases running "stap --help" might display the default.
109
110
111 - Use standard input instead of a given FILENAME as probe language
112 input, unless -e SCRIPT is given.
113
114 -h --help
115 Show help message.
116
117 -V --version
118 Show version message.
119
120 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
121 rate, translate, compile, run. See the PROCESSING section for
122 details.
123
124 -v Increase verbosity for all passes. Produce a larger volume of
125 informative (?) output each time option repeated.
126
127 --vp ABCDE
128 Increase verbosity on a per-pass basis. For example, "--vp 002"
129 adds 2 units of verbosity to pass 3 only. The combination
130 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
131 more for pass 5.
132
133 -k Keep the temporary directory after all processing. This may be
134 useful in order to examine the generated C code, or to reuse the
135 compiled kernel object.
136
137 -g Guru mode. Enable parsing of unsafe expert-level constructs
138 like embedded C.
139
140 -P Prologue-searching mode. This is equivalent to --pro‐
141 logue-searching=always. Activate heuristics to work around in‐
142 correct debugging information for function parameter $context
143 variables.
144
145 -u Unoptimized mode. Disable unused code elision and many other
146 optimizations during elaboration / translation.
147
148 -w Suppressed warnings mode. Disables all warning messages.
149
150 -W Treat all warnings as errors.
151
152 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
153 Use the stap-merge program to multiplex them back together lat‐
154 er.
155
156 -i --interactive
157 Interactive mode. Enable an interface to build the systemtap
158 script incrementally and interactively.
159
160 -t Collect timing information on the number of times probe executes
161 and average amount of time spent in each probe-point. Also shows
162 the derivation for each probe-point.
163
164 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer. On a
165 multiprocessor in bulk mode, this is a per-processor amount.
166
167 -I DIR Add the given directory to the tapset search directory. See the
168 description of pass 2 for details.
169
170 -D NAME=VALUE
171 Add the given C preprocessor directive to the module Makefile.
172 These can be used to override limit parameters described below.
173
174 -B NAME=VALUE
175 In kernel-runtime mode, add the given make directive to the ker‐
176 nel module build's make invocation. These can be used to add or
177 override kconfig options. For example, use
178
179 -B CONFIG_DEBUG_INFO=y
180
181 to add debugging information.
182
183 -B FLAG
184 In dyninst-runtime mode, add the given parameter to the compiler
185 CFLAGS used for building the dyninst shared library. For exam‐
186 ple, use
187
188 -B -g
189
190 to add debugging information.
191
192 -a ARCH
193 Use a cross-compilation mode for the given target architecture.
194 This requires access to the cross-compiler and the kernel build
195 tree, and goes along with the
196
197 -B CROSS_COMPILE=arch-tool-prefix-
198 and
199 -r /build/tree
200
201 options.
202
203 --modinfo NAME=VALUE
204 Add the name/value pair as a MODULE_INFO macro call to the gen‐
205 erated module. This may be useful to inform or override various
206 module-related checks in the kernel.
207
208 -G NAME=VALUE
209 Sets the value of global variable NAME to VALUE when staprun is
210 invoked. This applies to scalar variables declared global in
211 the script/tapset.
212
213 -R DIR Look for the systemtap runtime sources in the given directory.
214 Your DIR default can be seen using "stap --help".
215
216 -r /DIR
217 Build for kernel in given build tree. Can also be set with the
218 SYSTEMTAP_RELEASE environment variable.
219
220 -r RELEASE
221 Build for kernel in build tree /lib/modules/RELEASE/build. Can
222 also be set with the SYSTEMTAP_RELEASE environment variable.
223
224 -m MODULE
225 Use the given name for the generated kernel object module, in‐
226 stead of a unique randomized name. The generated kernel object
227 module is copied to the current directory.
228
229 -d MODULE
230 Add symbol/unwind information for the given module into the ker‐
231 nel object module. This may enable symbolic tracebacks from
232 those modules/programs, even if they do not have an explicit
233 probe placed into them.
234
235 --ldd Add symbol/unwind information for all user-space shared li‐
236 braries suspected by ldd to be necessary for user-space binaries
237 being probed or listed with the -d option. Caution: this can
238 make the probe modules considerably larger. Note that this op‐
239 tion does not deal with kernel-space modules: see instead
240 --all-modules below.
241
242 --all-modules
243 Equivalent to specifying "-dkernel" and a "-d" for each kernel
244 module that is currently loaded. Caution: this can make the
245 probe modules considerably larger.
246
247 -o FILE
248 Send standard output to named file. In bulk mode, percpu files
249 will start with FILE_ (FILE_cpu with -F) followed by the cpu
250 number. This supports strftime(3) formats for FILE.
251
252 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
253 so has the effect of setting target() to the pid of the command
254 ran.
255
256 -x PID Sets target() to PID. This allows scripts to be written that
257 filter on a specific process. Scripts run independent of the
258 PID's lifespan.
259
260 -e SCRIPT
261 Run the given SCRIPT specified on the command line.
262
263 -E SCRIPT
264 Run the given SCRIPT specified. This SCRIPT is run in addition
265 to the main script specified, through -e, or as a script file.
266 This option can be repeated to run multiple scripts, and can be
267 used in listing mode (-l/-L).
268
269 -l PROBE
270 Instead of running a probe script, just list all available probe
271 points matching the given single probe point. The pattern may
272 include wildcards and aliases, but not comma-separated multiple
273 probe points. The process result code will indicate failure if
274 there are no matches.
275
276 % stap -e 'probe syscall.* { }'
277 [...]
278 % stap -l 'syscall.*'
279 syscall.accept
280 [...]
281 syscall.writev
282
283
284 -L PROBE
285 Similar to "-l", but list matching probe points plus their
286 available context variables.
287
288 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
289 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
290 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
291 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
292 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
293 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
294 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
295 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
296
297
298 -F Without -o option, load module and start probes, then detach
299 from the module leaving the probes running. With -o option, run
300 staprun in background as a daemon and show its pid.
301
302 -S size[,N]
303 Sets the maximum size of output file and the maximum number of
304 output files. If the size of output file will exceed size ,
305 systemtap switches output file to the next file. And if the num‐
306 ber of output files exceed N , systemtap removes the oldest out‐
307 put file. You can omit the second argument.
308
309 -T TIMEOUT
310 Exit the script after TIMEOUT seconds.
311
312 --skip-badvars
313 Ignore unresolvable or run-time-inaccessible context variables
314 and substitute with 0, without errors.
315
316
317 --prologue-searching[=WHEN]
318 Prologue-searching mode. Activate heuristics to work around in‐
319 correct debugging information for function parameter $context
320 variables. WHEN can be either "never", "always", or "auto" (i.e.
321 enabled by heuristic). If WHEN is missing, then "always" is as‐
322 sumed. If the option is missing, then "auto" is assumed.
323
324
325 --suppress-handler-errors
326 Wrap all probe handlers into something like this
327
328 try { ... } catch { next }
329
330 block, which causes any runtime errors to be quietly suppressed.
331 Suppressed errors do not count against MAXERRORS limits. In
332 this mode, the MAXSKIPPED limits are also suppressed, so that
333 many errors and skipped probes may be accumulated during a
334 script's runtime. Any overall counts will still be reported at
335 shutdown.
336
337
338 --compatible VERSION
339 Suppress recent script language or tapset changes which are in‐
340 compatible with given older version of systemtap. This may be
341 useful if a much older systemtap script fails to run. See the
342 DEPRECATION section for more details.
343
344
345 --check-version
346 This option is used to check if the active script has any con‐
347 structs that may be systemtap version specific. See the DEPRE‐
348 CATION section for more details.
349
350
351 --clean-cache
352 This option prunes stale entries from the cache directory. This
353 is normally done automatically after successful runs, but this
354 option will trigger the cleanup manually and then exit. See the
355 CACHING section for more details about cache limits.
356
357
358 --color[=WHEN], --colour[=WHEN]
359 This option controls coloring of error messages. WHEN can be ei‐
360 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
361 minal). If WHEN is missing, then "always" is assumed. If the op‐
362 tion is missing, then "auto" is assumed.
363
364 Colors can be modified using the SYSTEMTAP_COLORS environment
365 variable. The format must be of the form
366 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
367 "warning", "source", "caret", and "token". Values constitute
368 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
369 mentation of your terminal for the SGRs it supports. As an exam‐
370 ple, the default colors would be expressed as
371 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
372 SYSTEMTAP_COLORS is absent, the default colors will be used. If
373 it is empty or invalid, coloring is turned off.
374
375
376 --disable-cache
377 This option disables all use of the cache directory. No files
378 will be either read from or written to the cache.
379
380
381 --poison-cache
382 This option treats files in the cache directory as invalid. No
383 files will be read from the cache, but resulting files from this
384 run will still be written to the cache. This is meant as a
385 troubleshooting aid when stap's cached behavior seems to be mis‐
386 behaving. If it helped, there is a probably a bug in systemtap
387 that the developers would like you to report.
388
389
390 --privilege[=stapusr | =stapsys | =stapdev]
391 This option instructs stap to examine the script looking for
392 constructs which are not allowed for the specified privilege
393 level (see UNPRIVILEGED USERS). Compilation fails if any such
394 constructs are used. If stapusr or stapsys are specified when
395 using a compile server (see --use-server), the server will exam‐
396 ine the script and, if compilation succeeds, the server will
397 cryptographically sign the resulting kernel module, certifying
398 that is it safe for use by users at the specified privilege lev‐
399 el.
400
401 If --privilege has not been specified, -pN has not been speci‐
402 fied with N < 5, and the invoking user is not root, and is not a
403 member of the group stapdev, then stap will automatically add
404 the appropriate --privilege option to the options already speci‐
405 fied.
406
407
408 --unprivileged
409 This option is equivalent to --privilege=stapusr.
410
411
412 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
413 Specify compile-server(s) to be used for compilation and/or in
414 conjunction with --list-servers and --trust-servers (see below)
415 for listing. If no argument is supplied, then the default in un‐
416 privileged mode (see --privilege) is to select compatible
417 servers which are trusted as SSL peers and as module signers and
418 currently online. Otherwise the default is to select compatible
419 servers which are trusted as SSL peers and currently online.
420 --use-server may be specified more than once, in which case a
421 list of servers is accumulated in the order specified. Servers
422 may be specified by host name, ip address, or by certificate se‐
423 rial number (obtained using --list-servers). The latter is most
424 commonly used when adding or revoking trust in a server (see
425 --trust-servers below). If a server is specified by host name or
426 ip address, then an optional port number may be specified. This
427 is useful for accessing servers which are not on the local net‐
428 work or to specify a particular server.
429
430 IP addresses may be IPv4 or IPv6 addresses.
431
432 If a particular IPv6 address is link local and exists on more
433 than one interface, the intended interface may be specified by
434 appending the address with a percent sign (%) followed by the
435 intended interface name. For example,
436 "fe80::5eff:35ff:fe07:55ca%eth0".
437
438 In order to specify a port number with an IPv6 address, it is
439 necessary to enclose the IPv6 address in square brackets ([]) in
440 order to separate the port number from the rest of the address.
441 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
442 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
443
444 If --use-server has not been specified, -pN has not been speci‐
445 fied with N < 5, and the invoking user not root, is not a member
446 of the group stapdev, but is a member of the group stapusr, then
447 stap will automatically add --use-server to the options already
448 specified.
449
450
451 --use-server-on-error[=yes|=no]
452 Instructs stap to retry compilation of a script using a compile
453 server if compilation on the local host fails in a manner which
454 suggests that it might succeed using a server. If this option
455 is not specified, the default is no. If no argument is provid‐
456 ed, then the default is yes. Compilation will be retried for
457 certain types of errors (e.g. insufficient data or resources)
458 which may not occur during re-compilation by a compile server.
459 Compile servers will be selected automatically for the re-compi‐
460 lation attempt as if --use-server was specified with no argu‐
461 ments.
462
463
464 --list-servers[=SERVERS]
465 Display the status of the requested SERVERS, where SERVERS is a
466 comma-separated list of server attributes. The list of at‐
467 tributes is combined to filter the list of servers displayed.
468 Supported attributes are:
469
470 all specifies all known servers (trusted SSL peers, trusted
471 module signers, online servers).
472
473 specified
474 specifies servers specified using --use-server.
475
476 online filters the output by retaining information about servers
477 which are currently online.
478
479 trusted
480 filters the output by retaining information about servers
481 which are trusted as SSL peers.
482
483 signer filters the output by retaining information about servers
484 which are trusted as module signers (see --privilege).
485
486 compatible
487 filters the output by retaining information about servers
488 which are compatible with the current kernel release and
489 architecture.
490
491 If no argument is provided, then the default is specified. If
492 no servers were specified using --use-server, then the default
493 servers for --use-server are listed.
494
495 Note that --list-servers uses the avahi-daemon service to detect
496 online servers. If this service is not available, then
497 --list-servers will fail to detect any online servers. In order
498 for --list-servers to detect servers listening on IPv6 address‐
499 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
500 mon.conf must contain an active "use-ipv6=yes" line. The service
501 must be restarted after adding this line in order for IPv6 to be
502 enabled.
503
504
505 --trust-servers[=TRUST_SPEC]
506 Grant or revoke trust in compile-servers, specified using
507 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
508 comma-separated list specifying the trust which is to be granted
509 or revoked. Supported elements are:
510
511 ssl trust the specified servers as SSL peers.
512
513 signer trust the specified servers as module signers (see
514 --privilege). Only root can specify signer.
515
516 all-users
517 grant trust as an ssl peer for all users on the local
518 host. The default is to grant trust as an ssl peer for
519 the current user only. Trust as a module signer is always
520 granted for all users. Only root can specify all-users.
521
522 revoke revoke the specified trust. The default is to grant it.
523
524 no-prompt
525 do not prompt the user for confirmation before carrying
526 out the requested action. The default is to prompt the
527 user for confirmation.
528
529 If no argument is provided, then the default is ssl. If no
530 servers were specified using --use-server, then no trust will be
531 granted or revoked.
532
533 Unless no-prompt has been specified, the user will be prompted
534 to confirm the trust to be granted or revoked before the opera‐
535 tion is performed.
536
537
538 --dump-probe-types
539 Dumps a list of supported probe types and exits. If --privi‐
540 lege=stapusr is also specified, the list will be limited to
541 probe types available to unprivileged users.
542
543
544 --dump-probe-aliases
545 Dumps a list of all probe aliases found in library files and ex‐
546 its.
547
548
549 --dump-functions
550 Dumps a list of all the public functions found in library files
551 and exits. Also includes their parameters and types. A function
552 of type 'unknown' indicates a function that does not return a
553 value. Note that not all function/parameter types may be re‐
554 solved (these are also shown by 'unknown'). This features is
555 very memory-intensive and thus may not work properly with --use-
556 server if the target server imposes an rlimit on process memory
557 (i.e. through the ~stap-server/.systemtap/rc configuration file,
558 see stap-server(8)).
559
560
561 --remote URL
562 Set the execution target to the given host. This option may be
563 repeated to target multiple execution targets. Passes 1-4 are
564 completed locally as normal to build the script, and then pass 5
565 will copy the module to the target and run it. Acceptable URL
566 forms include:
567
568 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
569 This mode uses ssh, optionally using a username not
570 matching your own. If a custom ssh_config file is in use,
571 add SendEnv LANG to retain internationalization function‐
572 ality.
573
574 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
575 This mode uses stapvirt to execute the script on a domain
576 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
577 fied to connect to a specific driver and/or a remote
578 host. For example, to connect to the local privileged QE‐
579 MU driver, use:
580
581 --remote libvirt://MyDomain/qemu:///system
582
583 See the page at <http://libvirt.org/uri.html> for sup‐
584 ported URIs. Also see stapvirt(1) for more information on
585 how to prepare the domain for stap probing.
586
587 unix:PATH
588 This mode connects to a UNIX socket. This can be used
589 with a QEMU virtio-serial port for executing scripts in‐
590 side a running virtual machine.
591
592 direct://
593 Special loopback mode to run on the local host.
594
595 --remote-prefix
596 Prefix each line of remote output with "N: ", where N is the in‐
597 dex of the remote execution target from which the given line
598 originated.
599
600
601 --download-debuginfo[=OPTION]
602 Enable, disable or set a timeout for the automatic debuginfo
603 downloading feature offered by abrt as specified by OPTION,
604 where OPTION is one of the following:
605
606 yes enable automatic downloading of debuginfo with no time‐
607 out. This is the same as not providing an OPTION value to
608 --download-debuginfo
609
610 no explicitly disable automatic downloading of debuginfo.
611 This is the same as not using the option at all.
612
613 ask show abrt output, and ask before continuing download. No
614 timeout will be set.
615
616 <timeout>
617 specify a timeout as a positive number to stop the down‐
618 load if it is taking longer than <timeout> seconds.
619
620 --rlimit-as=NUM
621 Specify the maximum size of the process's virtual memory (ad‐
622 dress space), in bytes.
623
624
625 --rlimit-cpu=NUM
626 Specify the CPU time limit, in seconds.
627
628
629 --rlimit-nproc=NUM
630 Specify the maximum number of processes that can be created.
631
632
633 --rlimit-stack=NUM
634 Specify the maximum size of the process stack, in bytes.
635
636
637 --rlimit-fsize=NUM
638 Specify the maximum size of files that the process may create,
639 in bytes.
640
641
642 --sysroot=DIR
643 Specify sysroot directory where target files (executables, li‐
644 braries, etc.) are located. With -r RELEASE, the sysroot will
645 be searched for the appropriate kernel build directory. With -r
646 /DIR, however, the sysroot will not be used to find the kernel
647 build.
648
649
650 --sysenv=VAR=VALUE
651 Provide an alternate value for an environment variable where the
652 value on a remote system differs. Path variables (e.g. PATH,
653 LD_LIBRARY_PATH) are assumed to be relative to the directory
654 provided by --sysroot, if provided.
655
656
657 --suppress-time-limits
658 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
659 and -DMAXTRYLOCK. This option requires guru mode.
660
661
662 --runtime=MODE
663 Set the pass-5 runtime mode. Valid options are kernel (de‐
664 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
665 information.
666
667
668 --dyninst
669 Shorthand for --runtime=dyninst.
670
671
672 --bpf Shorthand for --runtime=bpf.
673
674
675 --save-uprobes
676 On machines that require SystemTap to build its own uprobes mod‐
677 ule (kernels prior to version 3.5), this option instructs Sys‐
678 temTap to also save a copy of the module in the current directo‐
679 ry (creating a new "uprobes" directory first).
680
681
682 --target-namespaces=PID
683 Allow for a set of target namespaces to be set based on the
684 namespaces the given PID is in. This is for namespace-aware
685 tapset functions. If the target namespaces was not set, the tar‐
686 get defaults to the stap process' namespaces.
687
688
689 --monitor=INTERVAL
690 Enables an interface to display status information about the
691 module(uptime, module name, invoker uid, memory sizes, global
692 variables, list of probes with their statistics). An optional
693 argument INTERVAL can be supplied to set the refresh rate in
694 seconds of the status window. The module can also be controlled
695 by a list of commands using the following keys:
696
697 c Resets all global variables to their initial values or
698 zeroes them if they did not have an initial value.
699
700 s Rotates the attribute used to sort the list of probes.
701
702 t Brings up a prompt to allow toggling(on/off) of probes by
703 index. Probe points are still affected by their condi‐
704 tions.
705
706 r Resumes the script by toggling on all probes.
707
708 p Pauses the script by toggling off all probes.
709
710 x Hides/shows the status window. This allows for more out‐
711 put to be seen.
712
713 navigation-keys
714 The navigation keys can be used to scroll up and down the
715 windows.
716
717 Tab Toggle scrolling between status and output windows.
718
719
720 --example
721 This option is used to run example scripts without having to en‐
722 ter the entire path to the script. Example scripts can be found
723 in the directory specified in the stappaths(7) manual page.
724
725
727 Any additional arguments on the command line are passed to the script
728 parser for substitution. See below.
729
730
732 The systemtap script language resembles awk and C. There are two main
733 outermost constructs: probes and functions. Within these, statements
734 and expressions use C-like operator syntax and precedence.
735
736
737 GENERAL SYNTAX
738 Whitespace is ignored. Three forms of comments are supported:
739 # ... shell style, to the end of line, except for $# and @#
740 // ... C++ style, to the end of line
741 /* ... C style ... */
742 Literals are either strings enclosed in double-quotes (passing through
743 the usual C escape codes with backslashes, and with adjacent string
744 literals glued together, also as in C), or integers (in decimal, hexa‐
745 decimal, or octal, using the same notation as in C). All strings are
746 limited in length to some reasonable value (a few hundred bytes). In‐
747 tegers are 64-bit signed quantities, although the parser also accepts
748 (and wraps around) values above positive 2**63.
749
750 In addition, script arguments given at the end of the command line may
751 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
752 insertion as a string literal. The number of arguments may be accessed
753 through $# (as an unquoted number) or through @# (as a quoted number).
754 These may be used at any place a token may begin, including within the
755 preprocessing stage. Reference to an argument number beyond what was
756 actually given is an error.
757
758
759 PREPROCESSING
760 A simple conditional preprocessing stage is run as a part of parsing.
761 The general form is similar to the cond ? exp1 : exp2 ternary operator:
762
763 %( CONDITION %? TRUE-TOKENS %)
764 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
765
766 The CONDITION is either an expression whose format is determined by its
767 first keyword, or a string literals comparison or a numeric literals
768 comparison. It can be also composed of many alternatives and conjunc‐
769 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
770 spectively. However, parentheses are not supported yet, so remembering
771 that conjunction takes precedence over alternative is important.
772
773 If the first part is the identifier kernel_vr or kernel_v to refer to
774 the kernel version number, with ("2.6.13-1.322FC3smp") or without
775 ("2.6.13") the release code suffix, then the second part is one of the
776 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
777 the third part is a string literal that contains an RPM-style version-
778 release value. The condition is deemed satisfied if the version of the
779 target kernel (as optionally overridden by the -r option) compares to
780 the given version string. The comparison is performed by the glibc
781 function strverscmp. As a special case, if the operator is for simple
782 equality (==), or inequality (!=), and the third part contains any
783 wildcard characters (* or ? or [), then the expression is treated as a
784 wildcard (mis)match as evaluated by fnmatch.
785
786 If, on the other hand, the first part is the identifier arch to refer
787 to the processor architecture (as named by the kernel build system
788 ARCH/SUBARCH), then the second part is one of the two string comparison
789 operators == or !=, and the third part is a string literal for matching
790 it. This comparison is a wildcard (mis)match.
791
792 Similarly, if the first part is an identifier like CONFIG_something to
793 refer to a kernel configuration option, then the second part is == or
794 !=, and the third part is a string literal for matching the value (com‐
795 monly "y" or "m"). Nonexistent or unset kernel configuration options
796 are represented by the empty string. This comparison is also a wild‐
797 card (mis)match.
798
799 If the first part is the identifier systemtap_v, the test refers to the
800 systemtap compatibility version, which may be overridden for old
801 scripts with the --compatible flag. The comparison operator is as is
802 for kernel_v and the right operand is a version string. See also the
803 DEPRECATION section below.
804
805 If the first part is the identifier systemtap_privilege, the test
806 refers to the privilege level that the systemtap script is compiled
807 with. Here the second part is == or !=, and the third part is a string
808 literal, either "stapusr" or "stapsys" or "stapdev".
809
810 If the first part is the identifier guru_mode, the test refers to if
811 the systemtap script is compiled with guru_mode. Here the second part
812 is == or !=, and the third part is a number, either 1 or 0.
813
814 If the first part is the identifier runtime, the test refers to the
815 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
816 tion on runtimes. The second part is one of the two string comparison
817 operators == or !=, and the third part is a string literal for matching
818 it. This comparison is a wildcard (mis)match.
819
820 Otherwise, the CONDITION is expected to be a comparison between two
821 string literals or two numeric literals. In this case, the arguments
822 are the only variables usable.
823
824 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
825 (possibly including nested preprocessor conditionals), and are passed
826 into the input stream if the condition is true or false. For example,
827 the following code induces a parse error unless the target kernel ver‐
828 sion is newer than 2.6.5:
829
830 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
831
832 The following code might adapt to hypothetical kernel version drift:
833
834 probe kernel.function (
835 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
836 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
837 UNSUPPORTED %) %)
838 ) { /* ... */ }
839
840 %( arch == "ia64" %?
841 probe syscall.vliw = kernel.function("vliw_widget") {}
842 %)
843
844
845
846 PREPROCESSOR MACROS
847 The preprocessor also supports a simple macro facility, run as a sepa‐
848 rate pass before conditional preprocessing.
849
850 Macros are defined using the following construct:
851
852 @define NAME %( BODY %)
853 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
854
855 Macros, and parameters inside a macro body, are both invoked by prefix‐
856 ing the macro name with an @ symbol:
857
858 @define foo %( x %)
859 @define add(a,b) %( ((@a)+(@b)) %)
860
861 @foo = @add(2,2)
862
863
864 Macro expansion is currently performed in a separate pass before condi‐
865 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
866 tional expressions will be macroexpanded regardless of how the condi‐
867 tion is evaluated. This can sometimes lead to errors:
868
869 // The following results in a conflict:
870 %( CONFIG_UTRACE == "y" %?
871 @define foo %( process.syscall %)
872 %:
873 @define foo %( **ERROR** %)
874 %)
875
876 // The following works properly as expected:
877 @define foo %(
878 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
879 %)
880
881 The first example is incorrect because both @defines are evaluated in a
882 pass prior to the conditional being evaluated.
883
884 Normally, a macro definition is local to the file it occurs in. Thus,
885 defining a macro in a tapset does not make it available to the user of
886 the tapset. Publically available library macros can be defined by in‐
887 cluding .stpm files on the tapset search path. These files may only
888 contain @define constructs, which become visible across all tapsets and
889 user scripts. Optionally, within the .stpm files, a public macro defi‐
890 nition can be surrounded by a preprocessor conditional as described
891 above.
892
893
894 CONSTANTS
895 Tapsets or guru-mode user scripts can access header file constant to‐
896 kens, typically macros, using built-in @const() operator. The respec‐
897 tive header file inclusion is possible either via the tapset library,
898 or using a top-level guru mode embedded-C construct. This results in
899 appropriate embedded C pragma comments setting.
900
901 @const("STP_SKIP_BADVARS")
902
903
904
905 VARIABLES
906 Identifiers for variables and functions are an alphanumeric sequence,
907 and may include _ and $ characters. They may not start with a plain
908 digit, as in C. Each variable is by default local to the probe or
909 function statement block within which it is mentioned, and therefore
910 its scope and lifetime is limited to a particular probe or function in‐
911 vocation.
912
913 Scalar variables are implicitly typed as either string or integer. As‐
914 sociative arrays also have a string or integer value, and a tuple of
915 strings and/or integers serving as a key. Here are a few basic expres‐
916 sions.
917
918 var1 = 5
919 var2 = "bar"
920 array1 [pid()] = "name" # single numeric key
921 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
922 if (["hello",5,4] in array2) println ("yes") # membership test
923
924
925 The translator performs type inference on all identifiers, including
926 array indexes and function parameters. Inconsistent type-related use
927 of identifiers signals an error.
928
929 Variables may be declared global, so that they are shared amongst all
930 probes and functions and live as long as the entire systemtap session.
931 There is one namespace for all global variables, regardless of which
932 script file they are found within. Concurrent access to global vari‐
933 ables is automatically protected with locks, see the SAFETY AND SECURI‐
934 TY section for more details. A global declaration may be written at
935 the outermost level anywhere, not within a block of code. Global vari‐
936 ables which are written but never read will be displayed automatically
937 at session shutdown. The translator will infer for each its value
938 type, and if it is used as an array, its key types. Optionally, scalar
939 globals may be initialized with a string or number literal. The fol‐
940 lowing declaration marks variables as global.
941
942 global var1, var2, var3=4
943
944
945 Global variables can also be set as module options. One can do this by
946 either using the -G option, or the module must first be compiled using
947 stap -p4. Global variables can then be set on the command line when
948 calling staprun on the module generated by stap -p4. See staprun(8) for
949 more information.
950
951 The scope of a global variable may be limited to a tapset or user
952 script file using private keyword. The global keyword is optional when
953 defining a private global variable. Following declaration marks var1
954 and var2 private globals.
955
956 private global var1=2
957 private var2
958
959
960 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
961 SAFETY AND SECURITY section for details. Optionally, global arrays may
962 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
963 for that array only. Note that this doesn't indicate the type of keys
964 for the array, just the size.
965
966 global tiny_array[10], normal_array, big_array[50000]
967
968
969 Arrays may be configured for wrapping using the '%' suffix. This caus‐
970 es older elements to be overwritten if more elements are inserted than
971 the array can hold. This works for both associative and statistics
972 typed arrays.
973
974 global wrapped_array1%[10], wrapped_array2%
975
976
977
978 Many types of probe points provide context variables, which are run-
979 time values, safely extracted from the kernel or userspace program be‐
980 ing probed. These are prefixed with the $ character. The CONTEXT
981 VARIABLES section in stapprobes(3stap) lists what is available for each
982 type of probe point. These context variables become normal string or
983 numeric scalars once they are stored in normal script variables. See
984 the TYPECASTING section below on how to to turn them back into typed
985 pointers for further processing as context variables.
986
987
988 STATEMENTS
989 Statements enable procedural control flow. They may occur within func‐
990 tions and probe handlers. The total number of statements executed in
991 response to any single probe event is limited to some number defined by
992 the MAXACTION macro in the translated C code, and is in the neighbour‐
993 hood of 1000.
994
995 EXP Execute the string- or integer-valued expression and throw away
996 the value.
997
998 { STMT1 STMT2 ... }
999 Execute each statement in sequence in this block. Note that
1000 separators or terminators are generally not necessary between
1001 statements.
1002
1003 ; Null statement, do nothing. It is useful as an optional separa‐
1004 tor between statements to improve syntax-error detection and to
1005 handle certain grammar ambiguities.
1006
1007 if (EXP) STMT1 [ else STMT2 ]
1008 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1009 ro) or second STMT (zero).
1010
1011 while (EXP) STMT
1012 While integer-valued EXP evaluates to non-zero, execute STMT.
1013
1014 for (EXP1; EXP2; EXP3) STMT
1015 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1016 STMT, then the iteration expression EXP3.
1017
1018 foreach (VAR in ARRAY [ limit EXP ]) STMT
1019 Loop over each element of the named global array, assigning cur‐
1020 rent key to VAR. The array may not be modified within the
1021 statement. By adding a single + or - operator after the VAR or
1022 the ARRAY identifier, the iteration will proceed in a sorted or‐
1023 der, by ascending or descending index or value. If the array
1024 contains statistics aggregates, adding the desired @operator be‐
1025 tween the ARRAY identifier and the + or - will specify the sort‐
1026 ing aggregate function. See the STATISTICS section below for
1027 the ones available. Default is @count. Using the optional lim‐
1028 it keyword limits the number of loop iterations to EXP times.
1029 EXP is evaluated once at the beginning of the loop.
1030
1031 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1032 Same as above, used when the array is indexed with a tuple of
1033 keys. A sorting suffix may be used on at most one VAR or ARRAY
1034 identifier.
1035
1036 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1037 ]) STMT
1038 Same as above, where iterations are limited to elements in the
1039 array where the keys match the index values specified. The sym‐
1040 bol * can be used to specify an index and will be treated as a
1041 wildcard.
1042
1043 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1044 This variant of foreach saves current value into VAR0 on each
1045 iteration, so it is the same as ARRAY[VAR]. This also works
1046 with a tuple of keys. Sorting suffixes on VAR0 have the same
1047 effect as on ARRAY.
1048
1049 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1050 Same as above, where iterations are limited to elements in the
1051 array where the keys match the index values specified. The sym‐
1052 bol * can be used to specify an index and will be treated as a
1053 wildcard.
1054
1055 break, continue
1056 Exit or iterate the innermost nesting loop (while or for or
1057 foreach) statement.
1058
1059 return EXP
1060 Return EXP value from enclosing function. If the function's
1061 value is not taken anywhere, then a return statement is not
1062 needed, and the function will have a special "unknown" type with
1063 no return value.
1064
1065 next Return now from enclosing probe handler. This is especially
1066 useful in probe aliases that apply event filtering predicates.
1067 When used in functions, the execution will be immediately trans‐
1068 ferred to the next overloaded function.
1069
1070 try { STMT1 } catch { STMT2 }
1071 Run the statements in the first block. Upon any run-time er‐
1072 rors, abort STMT1 and start executing STMT2. Any errors in
1073 STMT2 will propagate to outer try/catch blocks, if any.
1074
1075 try { STMT1 } catch(VAR) { STMT2 }
1076 Same as above, plus assign the error message to the string
1077 scalar variable VAR.
1078
1079 delete ARRAY[INDEX1, INDEX2, ...]
1080 Remove from ARRAY the element specified by the index tuple. If
1081 the index tuple contains a * in place of an index, the * is
1082 treated as a wildcard and all elements with keys that match the
1083 index tuple will be removed from ARRAY. The value will no
1084 longer be available, and subsequent iterations will not report
1085 the element. It is not an error to delete an element that does
1086 not exist.
1087
1088 delete ARRAY
1089 Remove all elements from ARRAY.
1090
1091 delete SCALAR
1092 Removes the value of SCALAR. Integers and strings are cleared
1093 to 0 and "" respectively, while statistics are reset to the ini‐
1094 tial empty state.
1095
1096
1097 EXPRESSIONS
1098 Systemtap supports a number of operators that have the same general
1099 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1100 formed as per typical C rules for signed integers. Division by zero or
1101 overflow is detected and results in an error.
1102
1103 binary numeric operators
1104 * / % + - >> << & ^ | && ||
1105
1106 binary string operators
1107 . (string concatenation)
1108
1109 numeric assignment operators
1110 = *= /= %= += -= >>= <<= &= ^= |=
1111
1112 string assignment operators
1113 = .=
1114
1115 unary numeric operators
1116 + - ! ~ ++ --
1117
1118 binary numeric, string comparison or regex matching operators
1119 < > <= >= == != =~ !~
1120
1121 ternary operator
1122 cond ? exp1 : exp2
1123
1124 grouping operator
1125 ( exp )
1126
1127 function call
1128 fn ([ arg1, arg2, ... ])
1129
1130 array membership check
1131 exp in array
1132 [exp1, exp2, ...] in array
1133 [*, *, ... ]in array
1134
1135
1136 REGULAR EXPRESSION MATCHING
1137 The scripting language supports regular expression matching. The basic
1138 syntax is as follows:
1139
1140 exp =~ regex
1141 exp !~ regex
1142
1143 (The first operand must be an expression evaluating to a string; the
1144 second operand must be a string literal containing a syntactically
1145 valid regular expression.)
1146
1147 The regular expression syntax supports most of the features of POSIX
1148 Extended Regular Expressions, except for subexpression reuse ("\1")
1149 functionality.
1150
1151 After a successful match, the contents of the matched string and subex‐
1152 pressions can be extracted using the matched() and ngroups() tapset
1153 functions as follows:
1154
1155 if ("an example string" =~ "str(ing)") {
1156 matched(0) // -> returns "string", the matched substring
1157 matched(1) // -> returns "ing", the 1st matched subexpression
1158 ngroups() // -> returns 2, the number of matched groups
1159 }
1160
1161
1162 PROBES
1163 The main construct in the scripting language identifies probes. Probes
1164 associate abstract events with a statement block ("probe handler") that
1165 is to be executed when any of those events occur. The general syntax
1166 is as follows:
1167
1168 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1169 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1170
1171
1172 Events are specified in a special syntax called "probe points". There
1173 are several varieties of probe points defined by the translator, and
1174 tapset scripts may define further ones using aliases. Probe points may
1175 be wildcarded, grouped, or listed in preference sequences, or declared
1176 optional. More details on probe point syntax and semantics are listed
1177 on the stapprobes(3stap) manual page.
1178
1179 The probe handler is interpreted relative to the context of each event.
1180 For events associated with kernel code, this context may include vari‐
1181 ables defined in the source code at that spot. These "context vari‐
1182 ables" are presented to the script as variables whose names are pre‐
1183 fixed with "$". They may be accessed only if the kernel's compiler
1184 preserved them despite optimization. This is the same constraint that
1185 a debugger user faces when working with optimized code. In addition,
1186 the objects must exist in paged-in memory at the moment of the system‐
1187 tap probe handler's execution, because systemtap must not cause (sup‐
1188 presses) any additional paging. Some probe types have very little con‐
1189 text. See the stapprobes(3stap) man pages to see the kinds of context
1190 variables available at each kind of probe point.
1191
1192 Probes may be decorated with an arming condition, consisting of a sim‐
1193 ple boolean expression on read-only global script variables. While
1194 disarmed (inactive, condition evaluates to false), some probe types re‐
1195 duce or eliminate their run-time overheads. When an arming condition
1196 evaluates to true, probes will be soon re-armed, and their probe han‐
1197 dlers will start getting called as the events fire. (Some events may
1198 be lost during the arming interval. If this is unacceptable, do not
1199 use arming conditions for those probes.) Example of the syntax:
1200
1201 probe timer.us(TIMER) if (enabled) {
1202 }
1203
1204
1205 New probe points may be defined using "aliases". Probe point aliases
1206 look similar to probe definitions, but instead of activating a probe at
1207 the given point, it just defines a new probe point name as an alias to
1208 an existing one. There are two types of alias, i.e. the prologue style
1209 and the epilogue style which are identified by "=" and "+=" respective‐
1210 ly.
1211
1212 For prologue style alias, the statement block that follows an alias
1213 definition is implicitly added as a prologue to any probe that refers
1214 to the alias. While for the epilogue style alias, the statement block
1215 that follows an alias definition is implicitly added as an epilogue to
1216 any probe that refers to the alias. For example:
1217
1218 probe syscall.read = kernel.function("sys_read") {
1219 fildes = $fd
1220 if (execname() == "init") next # skip rest of probe
1221 }
1222
1223 defines a new probe point syscall.read, which expands to
1224 kernel.function("sys_read"), with the given statement as a prologue,
1225 which is useful to predefine some variables for the alias user and/or
1226 to skip probe processing entirely based on some conditions. And
1227
1228 probe syscall.read += kernel.function("sys_read") {
1229 if (tracethis) println ($fd)
1230 }
1231
1232 defines a new probe point with the given statement as an epilogue,
1233 which is useful to take actions based upon variables set or left over
1234 by the the alias user. Please note that in each case, the statements
1235 in the alias handler block are treated ordinarily, so that variables
1236 assigned there constitute mere initialization, not a macro substitu‐
1237 tion.
1238
1239 An alias is used just like a built-in probe type.
1240
1241 probe syscall.read {
1242 printf("reading fd=%d\n", fildes)
1243 if (fildes > 10) tracethis = 1
1244 }
1245
1246
1247
1248 FUNCTIONS
1249 Systemtap scripts may define subroutines to factor out common work.
1250 Functions take any number of scalar (integer or string) arguments, and
1251 must return a single scalar (integer or string). An example function
1252 declaration looks like this:
1253
1254 function thisfn (arg1, arg2) {
1255 return arg1 + arg2
1256 }
1257
1258 Note the general absence of type declarations, which are instead in‐
1259 ferred by the translator. However, if desired, a function definition
1260 may include explicit type declarations for its return value and/or its
1261 arguments. This is especially helpful for embedded-C functions. In
1262 the following example, the type inference engine need only infer type
1263 type of arg2 (a string).
1264
1265 function thatfn:string (arg1:long, arg2) {
1266 return sprint(arg1) . arg2
1267 }
1268
1269 Functions may call others or themselves recursively, up to a fixed
1270 nesting limit. This limit is defined by the MAXNESTING macro in the
1271 translated C code and is in the neighbourhood of 10.
1272
1273 Functions may be marked private using the private keyword to limit
1274 their scope to the tapset or user script file they are defined in. An
1275 example definition of a private function follows:
1276
1277 private function three:long () { return 3 }
1278
1279
1280 Functions terminating without reaching an explicit return statement
1281 will return an implicit 0 or "", determined by type inference.
1282
1283 Functions may be overloaded during both runtime and compile time.
1284
1285 Runtime overloading allows the executed function to be selected while
1286 the module is running based on runtime conditions and is achieved using
1287 the "next" statement in script functions and STAP_NEXT macro for embed‐
1288 ded-C functions. For example,
1289
1290
1291 function f() { if (condition) next; print("first function") }
1292 function f() %{ STAP_NEXT; print("second function") %}
1293 function f() { print("third function") }
1294
1295
1296 During a functioncall f(), the execution will transfer to the third
1297 function if condition evaluates to true and print "third function".
1298 Note that the second function is unconditionally nexted.
1299
1300 Parameter overloading allows the function to be executed to be selected
1301 at compile time based on the number of arguments provided to the func‐
1302 tioncall. For example,
1303
1304
1305 function g() { print("first function") }
1306 function g(x) { print("second function") }
1307 g() -> "first function"
1308 g(1) -> "second function"
1309
1310
1311 Note that runtime overloading does not occur in the above example, as
1312 exactly one function will be resolved for the functioncall. The use of
1313 a next statement inside a function while no more overloads remain will
1314 trigger a runtime exception Runtime overloading will only occur if the
1315 functions have the same arity, functions with the same name but differ‐
1316 ent number of parameters are completely unrelated.
1317
1318 Execution order is determined by a priority value which may be speci‐
1319 fied. If no explicit priority is specified, user script functions are
1320 given a higher priority than library functions. User script functions
1321 and library functions are assigned a default priority value of 0 and 1
1322 respectively. Functions with the same priority are executed in decla‐
1323 ration order. For example,
1324
1325
1326 function f():3 { if (condition) next; print("first function") }
1327 function f():1 { if (condition) next; print("second function") }
1328 function f():2 { print("third function") }
1329
1330
1331 Since the second function has highest priority, it is executed first.
1332 The first function is never executed as there no "next" statements in
1333 the third function to transfer execution.
1334
1335
1336 PRINTING
1337 There are a set of function names that are specially treated by the
1338 translator. They format values for printing to the standard systemtap
1339 output stream in a more convenient way (note that data generated in the
1340 kernel module need to get transferred to user-space in order to get
1341 printed).
1342
1343 The sprint* variants return the formatted string instead of printing
1344 it.
1345
1346 print, sprint
1347 Print one or more values of any type, concatenated directly to‐
1348 gether.
1349
1350 println, sprintln
1351 Print values like print and sprint, but also append a newline.
1352
1353 printd, sprintd
1354 Take a string delimiter and two or more values of any type, and
1355 print the values with the delimiter interposed. The delimiter
1356 must be a literal string constant.
1357
1358 printdln, sprintdln
1359 Print values with a delimiter like printd and sprintd, but also
1360 append a newline.
1361
1362 printf, sprintf
1363 Take a formatting string and a number of values of corresponding
1364 types, and print them all. The format must be a literal string
1365 constant.
1366
1367 The printf formatting directives similar to those of C, except that
1368 they are fully type-checked by the translator:
1369
1370 %b Writes a binary blob of the value given, instead of ASCII
1371 text. The width specifier determines the number of bytes
1372 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1373 fault (%b) is 8 bytes.
1374
1375 %c Character.
1376
1377 %d,%i Signed decimal.
1378
1379 %m Safely reads kernel (without #) or user (with #) memory
1380 at the given address, outputs its content. The optional
1381 precision specifier (not field width) determines the num‐
1382 ber of bytes to read - default is 1 byte. %10.4m prints
1383 4 bytes of the memory in a 10-character-wide field.
1384 Note, on some architectures user memory can still be read
1385 without #.
1386
1387 %M Same as %m, but outputs in hexadecimal. The minimal size
1388 of output is double the optional precision specifier -
1389 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1390 of the memory as 8 hexadecimal characters in a 10-charac‐
1391 ter-wide field. %.*M hex-dumps a given number of bytes
1392 from a given buffer.
1393
1394 %o Unsigned octal.
1395
1396 %p Unsigned pointer address.
1397
1398 %s String.
1399
1400 %u Unsigned decimal.
1401
1402 %x Unsigned hex value, in all lower-case.
1403
1404 %X Unsigned hex value, in all upper-case.
1405
1406 %% Writes a %.
1407
1408 The # flag selects the alternate forms. For octal, this prefixes a 0.
1409 For hex, this prefixes 0x or 0X, depending on case. For characters,
1410 this escapes non-printing values with either C-like escapes or raw oc‐
1411 tal. In the case of %#m/%#M, this safely accesses user space memory
1412 rather than kernel space memory.
1413
1414 Examples:
1415
1416 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1417 print("hello")
1418 Prints: hello
1419 println(b)
1420 Prints: bob\n
1421 println(a . " is " . sprint(16))
1422 Prints: alice is 16
1423 foreach (name in id) printdln("|", strlen(name), name, id[name])
1424 Prints: 5|alice|1234\n3|bob|4567
1425 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1426 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1427 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1428 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1429 printf("%4b", p)
1430 Prints (these values as binary data): 0x1234abcd
1431 printf("%#o %#x %#X\n", 1, 2, 3)
1432 Prints: 01 0x2 0X3
1433 printf("%#c %#c %#c\n", 0, 9, 42)
1434 Prints: \000 \t *
1435
1436
1437
1438 STATISTICS
1439 It is often desirable to collect statistics in a way that avoids the
1440 penalties of repeatedly exclusive locking the global variables those
1441 numbers are being put into. Systemtap provides a solution using a spe‐
1442 cial operator to accumulate values, and several pseudo-functions to ex‐
1443 tract the statistical aggregates.
1444
1445 The aggregation operator is <<<, and resembles an assignment, or a C++
1446 output-streaming operation. The left operand specifies a scalar or ar‐
1447 ray-index lvalue, which must be declared global. The right operand is
1448 a numeric expression. The meaning is intuitive: add the given number
1449 to the pile of numbers to compute statistics of. (The specific list of
1450 statistics to gather is given separately, by the extraction functions.)
1451
1452 foo <<< 1
1453 stats[pid()] <<< memsize
1454
1455
1456 The extraction functions are also special. For each appearance of a
1457 distinct extraction function operating on a given identifier, the
1458 translator arranges to compute a set of statistics that satisfy it.
1459 The statistics system is thereby "on-demand". Each execution of an ex‐
1460 traction function causes the aggregation to be computed for that moment
1461 across all processors.
1462
1463 Here is the set of extractor functions. The first argument of each is
1464 the same style of lvalue used on the left hand side of the accumulate
1465 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1466 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1467 mum/average/variance of all accumulated values. The resulting values
1468 are all simple integers. Arrays containing aggregates may be sorted
1469 and iterated. See the foreach construct above.
1470
1471 Variance uses Welford's online algorithm. The calculations are based
1472 on integer arithmetic, and so may suffer from low precision and over‐
1473 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1474 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1475 Only one value of bit-shift may be used with given global variable. A
1476 larger bitshift value increases precision, but increases the likelihood
1477 of overflow.
1478
1479
1480 $ stap -e \
1481 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1482 12
1483 $ stap -e \
1484 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1485 2
1486 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1487 2.5
1488 $
1489
1490
1491 Overflow (from internal multiplication of large numbers) may occur and
1492 may cause a negative variance result. Consider normalizing your input
1493 data. Adding or subtracting a fixed value from all variance inputs
1494 preserves the original variance. Dividing the variance inputs by a
1495 fixed value shrinks the original variance by that value squared.
1496
1497
1498
1499 Histograms are also available, but are more complicated because they
1500 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1501 terval) represents a linear histogram from "start" to "stop" by incre‐
1502 ments of "interval". The interval must be positive. Similarly,
1503 @hist_log(v) represents a base-2 logarithmic histogram. Printing a his‐
1504 togram with the print family of functions renders a histogram object as
1505 a tabular "ASCII art" bar chart.
1506
1507 probe timer.profile {
1508 x[1] <<< pid()
1509 x[2] <<< uid()
1510 y <<< tid()
1511 }
1512 global x // an array containing aggregates
1513 global y // a scalar
1514 probe end {
1515 foreach ([i] in x @count+) {
1516 printf ("x[%d]: avg %d = sum %d / count %d\n",
1517 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1518 println (@hist_log(x[i]))
1519 }
1520 println ("y:")
1521 println (@hist_log(y))
1522 }
1523
1524
1525
1526 TYPECASTING
1527 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1528 has been saved into a script integer variable, the translator loses the
1529 type information necessary to access members from that pointer. Using
1530 the @cast() operator tells the translator how to interpret the number
1531 as a typed pointer.
1532
1533 @cast(p, "type_name"[, "module"])->member
1534
1535
1536 This will interpret p as a pointer to a struct/union named type_name
1537 and dereference the member value. Further ->subfield expressions may
1538 be appended to dereference more levels. Note that for direct derefer‐
1539 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1540 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1541 ing operator -> is used to refer to both direct containment or pointer
1542 indirection. Systemtap automatically determines which. The optional
1543 module tells the translator where to look for information about that
1544 type. Multiple modules may be specified as a list with : separators.
1545 If the module is not specified, it will default either to the probe
1546 module for dwarf probes, or to "kernel" for functions and all other
1547 probes types.
1548
1549 The translator can create its own module with type information from a
1550 header surrounded by angle brackets, in case normal debuginfo is not
1551 available. For kernel headers, prefix it with "kernel" to use the ap‐
1552 propriate build system. All other headers are built with default GCC
1553 parameters into a user module. Multiple headers may be specified in
1554 sequence to resolve a codependency.
1555
1556 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1557 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1558 @cast(task, "task_struct",
1559 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1560
1561 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1562 operators, the same way as described in the CONTEXT VARIABLES section
1563 of the stapprobes(3stap) manual page.
1564
1565
1566 When in guru mode, the translator will also allow scripts to assign new
1567 values to members of typecasted pointers.
1568
1569 Typecasting is also useful in the case of void* members whose type may
1570 be determinable at runtime.
1571
1572 probe foo {
1573 if ($var->type == 1) {
1574 value = @cast($var->data, "type1")->bar
1575 } else {
1576 value = @cast($var->data, "type2")->baz
1577 }
1578 print(value)
1579 }
1580
1581
1582
1583 EMBEDDED C
1584 When in guru mode, the translator accepts embedded C code in the top
1585 level of the script. Such code is enclosed between %{ and %} markers,
1586 and is transcribed verbatim, without analysis, in some sequence, into
1587 the top level of the generated C code. At the outermost level, this
1588 may be useful to add #include instructions, and any auxiliary defini‐
1589 tions for use by other embedded code.
1590
1591 Another place where embedded code is permitted is as a function body.
1592 In this case, the script language body is replaced entirely by a piece
1593 of C code enclosed again between %{ and %} markers. This C code may do
1594 anything reasonable and safe. There are a number of undocumented but
1595 complex safety constraints on atomicity, concurrency, resource consump‐
1596 tion, and run time limits, so this is an advanced technique.
1597
1598 The memory locations set aside for input and output values are made
1599 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1600 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1601 The function may return early with STAP_RETURN. Here are some exam‐
1602 ples:
1603
1604 function integer_ops (val) %{
1605 STAP_PRINTF("%d\n", STAP_ARG_val);
1606 STAP_RETVALUE = STAP_ARG_val + 1;
1607 if (STAP_RETVALUE == 4)
1608 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1609 if (STAP_RETVALUE == 3)
1610 STAP_RETURN(0);
1611 STAP_RETVALUE ++;
1612 %}
1613 function string_ops (val) %{
1614 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1615 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1616 if (strcmp (STAP_RETVALUE, "three-two-one"))
1617 STAP_RETURN("parameter should be three-two-");
1618 %}
1619 function no_ops () %{
1620 STAP_RETURN(); /* function inferred with no return value */
1621 %}
1622
1623 The function argument and return value types have to be inferred by the
1624 translator from the call sites in order for this to work. The user
1625 should examine C code generated for ordinary script-language functions
1626 in order to write compatible embedded-C ones.
1627
1628 The last place where embedded code is permitted is as an expression
1629 rvalue. In this case, the C code enclosed between %{ and %} markers is
1630 interpreted as an ordinary expression value. It is assumed to be a
1631 normal 64-bit signed number, unless the marker /* string */ is includ‐
1632 ed, in which case it's treated as a string.
1633
1634 function add_one (val) {
1635 return val + %{ 1 %}
1636 }
1637 function add_string_two (val) {
1638 return val . %{ /* string */ "two" %}
1639 }
1640
1641
1642 The embedded-C code may contain markers to assert optimization and
1643 safety properties.
1644
1645 /* pure */
1646 means that the C code has no side effects and may be elided en‐
1647 tirely if its value is not used by script code.
1648
1649 /* stable */
1650 means that the C code always has the same value (in any given
1651 probe handler invocation), so repeated calls may be automatical‐
1652 ly replaced by memoized values. Such functions must take no pa‐
1653 rameters, and also be pure.
1654
1655 /* unprivileged */
1656 means that the C code is so safe that even unprivileged users
1657 are permitted to use it.
1658
1659 /* myproc-unprivileged */
1660 means that the C code is so safe that even unprivileged users
1661 are permitted to use it, provided that the target of the current
1662 probe is within the user's own process.
1663
1664 /* guru */
1665 means that the C code is so unsafe that a systemtap user must
1666 specify -g (guru mode) to use this. (Tapsets are permitted and
1667 presumed to call them safely.)
1668
1669 /* unmangled */
1670 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1671 ment access syntax should be made available inside the function.
1672 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1673 THIS->foo and THIS->__retvalue respectively inside the function.
1674 This is useful for quickly migrating code written for SystemTap
1675 version 1.7 and earlier.
1676
1677 /* unmodified-fnargs */
1678 in an embedded-C function, means that the function arguments are
1679 not modified inside the function body.
1680
1681 /* string */
1682 in embedded-C expressions only, means that the expression has
1683 const char * type and should be treated as a string value, in‐
1684 stead of the default long numeric.
1685
1686 Script level global variables may be accessed in embedded-C functions
1687 and blocks. To read or write the global variable var , the /* prag‐
1688 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1689 the embedded-C function or block. This provides the macros STAP_GLOB‐
1690 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1691 spectively. For example:
1692
1693 global var
1694 global var2[100]
1695 function increment() %{
1696 /* pragma:read:var */ /* pragma:write:var */
1697 /* pragma:read:var2 */ /* pragma:write:var2 */
1698 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1699 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1700 %}
1701
1702 Variables may be read and set in both embedded-C functions and expres‐
1703 sions. Strings returned from embedded-C code are decayed to pointers.
1704 Variables must also be assigned at script level to allow for type in‐
1705 ference. Map assignment does not return the value written, so chaining
1706 does not work.
1707
1708
1709 BUILT-INS
1710 A set of builtin probe point aliases are provided by the scripts in‐
1711 stalled in the directory specified in the stappaths(7) manual page.
1712 The functions are described in the stapprobes(3stap) manual page.
1713
1714
1715 DEREFERENCING
1716 Integers can be dereferenced from pointers saved as a script integer
1717 variables using the @kderef() or @uderef() operators. @kderef() is
1718 used for kernel space addresses and @uderef() is used for user space
1719 addresses.
1720
1721 @kderef(SIZE, addr)
1722 @uderef(SIZE, addr)
1723
1724 This will interpert addr as a kernel/user address and read SIZE bytes
1725 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1726
1727
1728 REGISTERS
1729 The value stored within a register can be accessed using the @kregis‐
1730 ter() or @uregister() operators. @kregister() is used for kernel space
1731 registers and @uregister() is used for user space registers. The regis‐
1732 ter of interest is specified using its DWARF number.
1733
1734 @kregister(0)
1735 @uregister(5)
1736
1737
1739 The translator begins pass 1 by parsing the given input script, and all
1740 scripts (files named *.stp) found in a tapset directory. The
1741 directories listed with -I are processed in sequence, each processed in
1742 "guru mode". For each directory, a number of subdirectories are also
1743 searched. These subdirectories are derived from the selected kernel
1744 version (the -R option), in order to allow more kernel-version-specific
1745 scripts to override less specific ones. For example, for a kernel
1746 version 2.6.12-23.FC3 the following patterns would be searched, in
1747 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1748 *.stp. Stopping the translator after pass 1 causes it to print the
1749 parse trees.
1750
1751
1752 In pass 2, the translator analyzes the input script to resolve symbols
1753 and types. References to variables, functions, and probe aliases that
1754 are unresolved internally are satisfied by searching through the parsed
1755 tapset script files. If any tapset script file is selected because it
1756 defines an unresolved symbol, then the entirety of that file is added
1757 to the translator's resolution queue. This process iterates until all
1758 symbols are resolved and a subset of tapset script files is selected.
1759
1760 Next, all probe point descriptions are validated against the wide
1761 variety supported by the translator. Probe points that refer to code
1762 locations ("synchronous probe points") require the appropriate kernel
1763 debugging information to be installed. In the associated probe
1764 handlers, target-side variables (whose names begin with "$") are found
1765 and have their run-time locations decoded.
1766
1767 Next, all probes and functions are analyzed for optimization
1768 opportunities, in order to remove variables, expressions, and functions
1769 that have no useful value and no side-effect. Embedded-C functions are
1770 assumed to have side-effects unless they include the magic string
1771 /* pure */. Since this optimization can hide latent code errors such
1772 as type mismatches or invalid $context variables, it sometimes may be
1773 useful to disable the optimizations with the -u option.
1774
1775 Finally, all variable, function, parameter, array, and index types are
1776 inferred from context (literals and operators). Stopping the
1777 translator after pass 2 causes it to list all the probes, functions,
1778 and variables, along with all inferred types. Any inconsistent or
1779 unresolved types cause an error.
1780
1781
1782 In pass 3, the translator writes C code that represents the actions of
1783 all selected script files, and creates a Makefile to build that into a
1784 kernel object. These files are placed into a temporary directory.
1785 Stopping the translator at this point causes it to print the contents
1786 of the C file.
1787
1788
1789 In pass 4, the translator invokes the Linux kernel build system to
1790 create the actual kernel object file. This involves running make in
1791 the temporary directory, and requires a kernel module build system
1792 (headers, config and Makefiles) to be installed in the usual spot
1793 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1794 the last chance before running the kernel object. This may be useful
1795 if you want to archive the file.
1796
1797
1798 In pass 5, the translator invokes the systemtap auxiliary program
1799 staprun program for the given kernel object. This program arranges to
1800 load the module then communicates with it, copying trace data from the
1801 kernel into temporary files, until the user sends an interrupt signal.
1802 Any run-time error encountered by the probe handlers, such as running
1803 out of memory, division by zero, exceeding nesting or runtime limits,
1804 results in a soft error indication. Soft errors in excess of MAXERRORS
1805 block of all subsequent probes (except error-handling probes), and
1806 terminate the session. Finally, staprun unloads the module, and cleans
1807 up.
1808
1809
1810 ABNORMAL TERMINATION
1811 One should avoid killing the stap process forcibly, for example with
1812 SIGKILL, because the stapio process (a child process of the stap
1813 process) and the loaded module may be left running on the system. If
1814 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1815 then use rmmod to unload the systemtap module.
1816
1817
1818
1820 See the stapex(3stap) manual page for a brief collection of samples, or
1821 a large set of installed samples under the systemtap
1822 documentation/testsuite directories. See stappaths(7stap) for the
1823 likely location of these on the system.
1824
1825
1827 The systemtap translator caches the pass 3 output (the generated C
1828 code) and the pass 4 output (the compiled kernel module) if pass 4
1829 completes successfully. This cached output is reused if the same
1830 script is translated again assuming the same conditions exist (same
1831 kernel version, same systemtap version, etc.). Cached files are stored
1832 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1833 having the file cache_mb_limit placed in the cache directory (shown
1834 above) containing only an ASCII integer representing how many MiB the
1835 cache should not exceed. In the absence of this file, a default will be
1836 created with the limit set to 256MiB. This is a 'soft' limit in that
1837 the cache will be cleaned after a new entry is added if the cache clean
1838 interval is exceeded, so the total cache size may temporarily exceed
1839 this limit. This interval can be specified by having the file
1840 cache_clean_interval_s placed in the cache directory (shown above)
1841 containing only an ASCII integer representing the interval in seconds.
1842 In the absence of this file, a default will be created with the
1843 interval set to 300 s.
1844
1845
1847 Systemtap may be used as a powerful administrative tool. It can expose
1848 kernel internal data structures and potentially private user
1849 information. (In dyninst runtime mode, this is not the case, see the
1850 ALTERNATE RUNTIMES section below.)
1851
1852 The translator asserts many safety constraints during compilation and
1853 more during run-time. It aims to ensure that no handler routine can
1854 run for very long, allocate boundless memory, perform unsafe
1855 operations, or in unintentionally interfere with the system. Uses of
1856 script global variables are automatically read/write locked as
1857 appropriate, to protect against manipulation by concurrent probe
1858 handlers. (Deadlocks are detected with timeouts. Use the -t flag to
1859 receive reports of excessive lock contention.) Experimenting with
1860 scripts is therefore generally safe. The guru-mode -g option allows
1861 administrators to bypass most safety measures, which permits invasive
1862 or state-changing operations, embedded-C code, and increases the risk
1863 of upset. By default, overload prevention is turned on for all
1864 modules. If you would like to disable overload processing, use the
1865 --suppress-time-limits option.
1866
1867 Errors that are caught at run time normally result in a clean script
1868 shutdown and a pass-5 error message. The --suppress-handler-errors
1869 option lets scripts tolerate soft errors without shutting down.
1870
1871
1872
1873 PERMISSIONS
1874 For the normal linux-kernel-module runtime, to run the kernel objects
1875 systemtap builds, a user must be one of the following:
1876
1877 · the root user;
1878
1879 · a member of the stapdev and stapusr groups;
1880
1881 · a member of the stapsys and stapusr groups; or
1882
1883 · a member of the stapusr group.
1884
1885 The root user or a user who is a member of both the stapdev and stapusr
1886 groups can build and run any systemtap script.
1887
1888 A user who is a member of both the stapsys and stapusr groups can only
1889 use pre-built modules under the following conditions:
1890
1891 · The module has been signed by a trusted signer. Trusted signers are
1892 normally systemtap compile-servers which sign modules when the
1893 --privilege option is specified by the client. See the
1894 stap-server(8) manual page for more information.
1895
1896 · The module was built using the --privilege=stapsys or the
1897 --privilege=stapusr options.
1898
1899 Members of only the stapusr group can only use pre-built modules under
1900 the following conditions:
1901
1902 · The module is located in the /lib/modules/VERSION/systemtap
1903 directory. This directory must be owned by root and not be world
1904 writable.
1905
1906 or
1907
1908 · The module has been signed by a trusted signer. Trusted signers are
1909 normally systemtap compile-servers which sign modules when the
1910 --privilege option is specified by the client. See the
1911 stap-server(8) manual page for more information.
1912
1913 · The module was built using the --privilege=stapusr option.
1914
1915 The kernel modules generated by stap program are run by the staprun
1916 program. The latter is a part of the Systemtap package, dedicated to
1917 module loading and unloading (but only in the white zone), and kernel-
1918 to-user data transfer. Since staprun does not perform any additional
1919 security checks on the kernel objects it is given, it would be unwise
1920 for a system administrator to add untrusted users to the stapdev or
1921 stapusr groups.
1922
1923
1924 SECUREBOOT
1925 If the current system has SecureBoot turned on in the UEFI firmware,
1926 all kernel modules must be signed. (Some kernels may allow disabling
1927 SecureBoot long after booting with a key sequence such as SysRq-X,
1928 making it unnecessary to sign modules.) The systemtap compile server
1929 can sign modules with a MOK (Machine Owner Key) that it has in common
1930 with a client system. See the following wiki page for more details:
1931
1932 https://sourceware.org/systemtap/wiki/SecureBoot
1933
1934 Some kernels do not let systemtap guess whether module module signing
1935 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
1936 variable to any value while running stap.
1937
1938
1939 RESOURCE LIMITS
1940 Many resource use limits are set by macros in the generated C code.
1941 These may be overridden with -D flags. A selection of these is as fol‐
1942 lows:
1943
1944 MAXNESTING
1945 Maximum number of nested function calls. Default determined by
1946 script analysis, with a bonus 10 slots added for recursive
1947 scripts.
1948
1949 MAXSTRINGLEN
1950 Maximum length of strings, default 128.
1951
1952 MAXTRYLOCK
1953 Maximum number of iterations to wait for locks on global vari‐
1954 ables before declaring possible deadlock and skipping the probe,
1955 default 1000.
1956
1957 MAXACTION
1958 Maximum number of statements to execute during any single probe
1959 hit (with interrupts disabled), default 1000. Note that for
1960 straight-through probe handlers lacking loops or recursion, due
1961 to optimization, this parameter may be interpreted too conserva‐
1962 tively.
1963
1964 MAXACTION_INTERRUPTIBLE
1965 Maximum number of statements to execute during any single probe
1966 hit which is executed with interrupts enabled (such as begin/end
1967 probes), default (MAXACTION * 10).
1968
1969 MAXBACKTRACE
1970 Maximum number of stack frames that will be be processed by the
1971 stap runtime unwinder as produced by the backtrace functions in
1972 the [u]context-unwind.stp tapsets, default 20.
1973
1974 MAXMAPENTRIES
1975 Maximum number of rows in any single global array, default 2048.
1976 Individual arrays may be declared with a larger or smaller limit
1977 instead:
1978
1979 global big[10000],little[5]
1980
1981 or denoted with % to make them wrap-around (replace old entries)
1982 automatically, as in
1983
1984 global big%
1985
1986 or both.
1987
1988 MAPHASHBIAS
1989 The number of powers-of-two to add or subtract from the natural
1990 size of the hash table backing each global associative array.
1991 Default is 0. Try small positive numbers to get extra perfor‐
1992 mance at the cost of more memory consumption, because that
1993 should reduce hash table collisions. Try small negative numbers
1994 for the opposite tradeoff.
1995
1996 MAXERRORS
1997 Maximum number of soft errors before an exit is triggered, de‐
1998 fault 0, which means that the first error will exit the script.
1999 Note that with the --suppress-handler-errors option, this limit
2000 is not enforced.
2001
2002 MAXSKIPPED
2003 Maximum number of skipped probes before an exit is triggered,
2004 default 100. Running systemtap with -t (timing) mode gives more
2005 details about skipped probes. With the default -DINTERRUPT‐
2006 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2007 lated against this limit. Note that with the --suppress-han‐
2008 dler-errors option, this limit is not enforced.
2009
2010 MINSTACKSPACE
2011 Minimum number of free kernel stack bytes required in order to
2012 run a probe handler, default 1024. This number should be large
2013 enough for the probe handler's own needs, plus a safety margin.
2014
2015 MAXUPROBES
2016 Maximum number of concurrently armed user-space probes (up‐
2017 robes), default somewhat larger than the number of user-space
2018 probe points named in the script. This pool needs to be poten‐
2019 tially large because individual uprobe objects (about 64 bytes
2020 each) are allocated for each process for each matching script-
2021 level probe.
2022
2023 STP_MAXMEMORY
2024 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2025 ule should use, default unlimited. The memory size includes the
2026 size of the module itself, plus any additional allocations.
2027 This only tracks direct allocations by the systemtap runtime.
2028 This does not track indirect allocations (as done by kprobes/up‐
2029 robes/etc. internals).
2030
2031 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2032 Maximum number of machine cycles spent in probes on any cpu per
2033 given interval, before an overload condition is declared and the
2034 script shut down. The defaults are 500 million and 1 billion,
2035 so as to limit stap script cpu consumption at around 50%.
2036
2037 STP_PROCFS_BUFSIZE
2038 Size of procfs probe read buffers (in bytes). Defaults to
2039 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2040 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2041
2042 With scripts that contain probes on any interrupt path, it is possible
2043 that those interrupts may occur in the middle of another probe handler.
2044 The probe in the interrupt handler would be skipped in this case to
2045 avoid reentrance. To work around this issue, execute stap with the op‐
2046 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2047 This does add some extra overhead to the probes, but it may prevent
2048 reentrance for common problem cases. However, probes in NMI handlers
2049 and in the callpath of the stap runtime may still be skipped due to
2050 reentrance.
2051
2052
2053 In case something goes wrong with stap or staprun after a probe has al‐
2054 ready started running, one may safely kill both user processes, and re‐
2055 move the active probe kernel module with rmmod. Any pending trace mes‐
2056 sages may be lost.
2057
2058
2060 Systemtap exposes kernel internal data structures and potentially pri‐
2061 vate user information. Because of this, use of systemtap's full capa‐
2062 bilities are restricted to root and to users who are members of the
2063 groups stapdev and stapusr.
2064
2065 However, a restricted set of systemtap's features can be made available
2066 to trusted, unprivileged users. These users are members of the group
2067 stapusr only, or members of the groups stapusr and stapsys. These
2068 users can load systemtap modules which have been compiled and certified
2069 by a trusted systemtap compile-server. See the descriptions of the op‐
2070 tions --privilege and --use-server. See README.unprivileged in the sys‐
2071 temtap source code for information about setting up a trusted compile
2072 server.
2073
2074 The restrictions enforced when --privilege=stapsys is specified are de‐
2075 signed to prevent unprivileged users from:
2076
2077 · harming the system maliciously.
2078
2079 The restrictions enforced when --privilege=stapusr is specified are de‐
2080 signed to prevent unprivileged users from:
2081
2082 · harming the system maliciously.
2083
2084 · gaining access to information which would not normally be
2085 available to an unprivileged user.
2086
2087 · disrupting the performance of processes owned by other users
2088 of the system. Some overhead to the system in general is
2089 unavoidable since the unprivileged user's probes will be
2090 triggered at the appropriate times. What we would like to
2091 avoid is targeted interruption of another user's processes
2092 which would not normally be possible by an unprivileged us‐
2093 er.
2094
2095
2096 PROBE RESTRICTIONS
2097 A member of the groups stapusr and stapsys may use all probe points.
2098
2099 A member of only the group stapusr may use only the following probes:
2100
2101 · begin, begin(n)
2102
2103 · end, end(n)
2104
2105 · error(n)
2106
2107 · never
2108
2109 · process.*, where the target process is owned by the user.
2110
2111 · timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2112
2113 · timer.hz(n)
2114
2115
2116 SCRIPT LANGUAGE RESTRICTIONS
2117 The following scripting language features are unavailable to all un‐
2118 privileged users:
2119
2120
2121 · any feature enabled by the Guru Mode (-g) option.
2122
2123 · embedded C code.
2124
2125
2126 RUNTIME RESTRICTIONS
2127 The following runtime restrictions are placed upon all unprivileged
2128 users:
2129
2130 · Only the default runtime code (see -R) may be used.
2131
2132 Additional restrictions are placed on members of only the group sta‐
2133 pusr:
2134
2135 · Probing of processes owned by other users is not permitted.
2136
2137 · Access of kernel memory (read and write) is not permitted.
2138
2139
2140 COMMAND LINE OPTION RESTRICTIONS
2141 Some command line options provide access to features which must not be
2142 available to all unprivileged users:
2143
2144
2145 · -g may not be specified.
2146
2147 · The following options may not be used by the compile-server
2148 client:
2149
2150 -a, -B, -D, -I, -r, -R
2151
2152
2153
2154 ENVIRONMENT RESTRICTIONS
2155 The following environment variables must not be set for all unprivi‐
2156 leged users:
2157
2158 SYSTEMTAP_RUNTIME
2159 SYSTEMTAP_TAPSET
2160 SYSTEMTAP_DEBUGINFO_PATH
2161
2162
2163
2164 TAPSET RESTRICTIONS
2165 In general, tapset functions are only available for members of the
2166 group stapusr when they do not gather information that an ordinary pro‐
2167 gram running with that user's privileges would be denied access to.
2168
2169 There are two categories of unprivileged tapset functions. The first
2170 category consists of utility functions that are unconditionally avail‐
2171 able to all users; these include such things as:
2172
2173 cpu:long ()
2174 exit ()
2175 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2176
2177
2178 The second category consists of so-called myproc-unprivileged functions
2179 that can only gather information within their own processes. Scripts
2180 that wish to use these functions must test the result of the tapset
2181 function is_myproc and only call these functions if the result is 1.
2182 The script will exit immediately if any of these functions are called
2183 by an unprivileged user within a probe within a process which is not
2184 owned by that user. Examples of myproc-unprivileged functions include:
2185
2186 print_usyms (stk:string)
2187 user_int:long (addr:long)
2188 usymname:string (addr:long)
2189
2190
2191 A compile error is triggered when any function not in either of the
2192 above categories is used by members of only the group stapusr.
2193
2194 No other built-in tapset functions may be used by members of only the
2195 group stapusr.
2196
2197
2199 As described above, systemtap's default runtime mode involves building
2200 and loading kernel modules, with various security tradeoffs presented.
2201 Systemtap now includes two new prototype backends: --runtime=dyninst
2202 and --runtime=bpf.
2203
2204 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2205 runtime. This backend does not use kernel modules, and does not require
2206 root privileges, but is restricted with respect to the kinds of probes
2207 and other constructs that a script may use. dyninst runtime operates in
2208 target-attach mode, so it does requirea -c COMMAND or -x PID process.
2209 For example:
2210
2211 stap --runtime=dyninst -c 'stap -V' \
2212 -e 'probe process.function("main")
2213 { println("hi from dyninst!") }'
2214
2215
2216 It may be necessary to disable a conflicting selinux check with
2217
2218 # setsebool allow_execstack 1
2219
2220
2221 --runtime=bpf compiles the user script into extended Berkeley Packet
2222 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2223 verified by the kernel for safety and are executed by an in-kernel vir‐
2224 tual machine. This runtime is in an early stage of development and
2225 currently lacks support for a number of features available in the de‐
2226 fault runtime. Please see the stapbpf(8) man page for more information.
2227
2228
2230 The systemtap translator generally returns with a success code of 0 if
2231 the requested script was processed and executed successfully through
2232 the requested pass. Otherwise, errors may be printed to stderr and a
2233 failure code is returned. Use -v or -vp N to increase (global or per-
2234 pass) verbosity to identify the source of the trouble.
2235
2236 In listings mode (-l and -L), error messages are normally suppressed.
2237 A success code of 0 is returned if at least one matching probe was
2238 found.
2239
2240 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2241 considered to be successful.
2242
2243
2245 Over time, some features of the script language and the tapset library
2246 may undergo incompatible changes, so that a script written against an
2247 old version of systemtap may no longer run. In these cases, it may
2248 help to run systemtap with the --compatible VERSION flag, specifying
2249 the last known working version. Running systemtap with the
2250 --check-version flag will output a warning if any possible incompatible
2251 elements have been parsed. Deprecation historical details may be found
2252 in the NEWS file.
2253
2254 The purpose of deprecation facility is to improve the experience of
2255 scripts written for newer versions of systemtap (by adding better al‐
2256 ternatives and removing conflicting or messy older alternatives), while
2257 at the same time permitting scripts written for older versions of sys‐
2258 temtap to continue running. Deprecation is thus intended a service to
2259 users (and an inconvenience to systemtap's developers), rather than the
2260 other way around.
2261
2262 Please note that underscore-prefixed identifiers in the tapset some‐
2263 times undergo such changes that are difficult to preserve compatibility
2264 for, even with the deprecation mechanisms. Avoid relying on these in
2265 your scripts; instead propose them for promotion to non-underscored
2266 status.
2267
2268
2269
2271 Important files and their corresponding paths can be located in the
2272 stappaths (7) manual page.
2273
2274
2276 stapprobes(3stap),
2277 function::*[24m(3stap),
2278 probe::*[24m(3stap),
2279 tapset::*[24m(3stap),
2280 stappaths(7),
2281 staprun(8),
2282 stapdyn(8),
2283 systemtap(8),
2284 stapvars(3stap),
2285 stapex(3stap),
2286 stap-server(8),
2287 stap-prep(1),
2288 stapref(1),
2289 awk(1),
2290 gdb(1)
2291
2292
2294 Use the Bugzilla link of the project web page or our mailing list.
2295 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2296
2297 error::reporting(7stap),
2298 https://sourceware.org/systemtap/wiki/HowToReportBugs
2299
2300
2301
2302 STAP(1)