1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a DynInst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use. For
46 online documentation, see the project web site,
47 https://sourceware.org/systemtap/
48
49
50 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
51 │man pages │ │
52 ├──────────────────────────┼──────────────────────────────────────────────────────┤
53 │stap (this page) │ language syntax, concepts, operation, options │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stapprobes │ probe points and their $context variables │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │stapref │ quick reference to language syntax │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │stappaths │ list of directories, including books & references │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stap-prep │ program to install auxiliary dependencies like ker‐ │
62 │ │ nel debuginfo │
63 ├──────────────────────────┼──────────────────────────────────────────────────────┤
64 │tapset::* │ generated list of tapsets │
65 ├──────────────────────────┼──────────────────────────────────────────────────────┤
66 │probe::* │ generated list of tapset probe aliases │
67 ├──────────────────────────┼──────────────────────────────────────────────────────┤
68 │function::* │ generated list of tapset functions │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │macro::* │ generated list of tapset macros │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │stapvars │ some of the tapset global variables │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │systemtap │ initscript, boot-time probing │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stap-server │ compilation server │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │stapex │ a few very basic script examples │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │books │ │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │Beginner's Guide │ tutorial book, language essentials, examples │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │Tutorial │ shorter tutorial, exercises │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │Language Reference │ detailed language manual, covers statistics/analysis │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Tapset Reference │ the tapset man pages, reformatted into a book │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │references │ │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
95 │ │ hacks to learn from │
96 └──────────────────────────┴──────────────────────────────────────────────────────┘
97
99 The systemtap translator supports the following options. Any other op‐
100 tion prints a list of supported options. Options may be given on the
101 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
102 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
103 faults to $HOME/.systemtap if unset.)
104
105
106 In some cases, the default value of an option depends on particular
107 system configuration and thus can't be mentioned here directly. In
108 some of those cases running "stap --help" might display the default.
109
110
111 - Use standard input instead of a given FILENAME as probe language
112 input, unless -e SCRIPT is given.
113
114 -h --help
115 Show help message.
116
117 -V --version
118 Show version message.
119
120 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
121 rate, translate, compile, run. See the PROCESSING section for
122 details.
123
124 -v Increase verbosity for all passes. Produce a larger volume of
125 informative (?) output each time option repeated.
126
127 --vp ABCDE
128 Increase verbosity on a per-pass basis. For example, "--vp 002"
129 adds 2 units of verbosity to pass 3 only. The combination
130 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
131 more for pass 5.
132
133 -k Keep the temporary directory after all processing. This may be
134 useful in order to examine the generated C code, or to reuse the
135 compiled kernel object.
136
137 -g Guru mode. Enable parsing of unsafe expert-level constructs
138 like embedded C.
139
140 -P Prologue-searching mode. This is equivalent to --pro‐
141 logue-searching=always. Activate heuristics to work around in‐
142 correct debugging information for function parameter $context
143 variables.
144
145 -u Unoptimized mode. Disable unused code elision and many other
146 optimizations during elaboration / translation.
147
148 -w Suppressed warnings mode. Disables all warning messages.
149
150 -W Treat all warnings as errors.
151
152 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
153 Use the stap-merge program to multiplex them back together lat‐
154 er.
155
156 -i --interactive
157 Interactive mode. Enable an interface to build the systemtap
158 script incrementally and interactively.
159
160 -t Collect timing information on the number of times probe executes
161 and average amount of time spent in each probe-point. Also shows
162 the derivation for each probe-point.
163
164 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer. On a
165 multiprocessor in bulk mode, this is a per-processor amount.
166
167 -I DIR Add the given directory to the tapset search directory. See the
168 description of pass 2 for details.
169
170 -D NAME=VALUE
171 Add the given C preprocessor directive to the module Makefile.
172 These can be used to override limit parameters described below.
173
174 -B NAME=VALUE
175 In kernel-runtime mode, add the given make directive to the ker‐
176 nel module build's make invocation. These can be used to add or
177 override kconfig options. For example, use
178
179 -B CONFIG_DEBUG_INFO=y
180
181 to add debugging information.
182
183 -B FLAG
184 In dyninst-runtime mode, add the given parameter to the compiler
185 CFLAGS used for building the dyninst shared library. For exam‐
186 ple, use
187
188 -B -g
189
190 to add debugging information.
191
192 -a ARCH
193 Use a cross-compilation mode for the given target architecture.
194 This requires access to the cross-compiler and the kernel build
195 tree, and goes along with the
196
197 -B CROSS_COMPILE=arch-tool-prefix-
198 and
199 -r /build/tree
200
201 options.
202
203 --modinfo NAME=VALUE
204 Add the name/value pair as a MODULE_INFO macro call to the gen‐
205 erated module. This may be useful to inform or override various
206 module-related checks in the kernel.
207
208 -G NAME=VALUE
209 Sets the value of global variable NAME to VALUE when staprun is
210 invoked. This applies to scalar variables declared global in
211 the script/tapset.
212
213 -R DIR Look for the systemtap runtime sources in the given directory.
214 Your DIR default can be seen using "stap --help".
215
216 -r /DIR
217 Build for kernel in given build tree. Can also be set with the
218 SYSTEMTAP_RELEASE environment variable.
219
220 -r RELEASE
221 Build for kernel in build tree /lib/modules/RELEASE/build. Can
222 also be set with the SYSTEMTAP_RELEASE environment variable.
223
224 -m MODULE
225 Use the given name for the generated kernel object module, in‐
226 stead of a unique randomized name. The generated kernel object
227 module is copied to the current directory.
228
229 -d MODULE
230 Add symbol/unwind information for the given module into the ker‐
231 nel object module. This may enable symbolic tracebacks from
232 those modules/programs, even if they do not have an explicit
233 probe placed into them.
234
235 --ldd Add symbol/unwind information for all user-space shared li‐
236 braries suspected by ldd to be necessary for user-space binaries
237 being probed or listed with the -d option. Caution: this can
238 make the probe modules considerably larger. Note that this op‐
239 tion does not deal with kernel-space modules: see instead
240 --all-modules below.
241
242 --all-modules
243 Equivalent to specifying "-dkernel" and a "-d" for each kernel
244 module that is currently loaded. Caution: this can make the
245 probe modules considerably larger.
246
247 -o FILE
248 Send standard output to named file. In bulk mode, percpu files
249 will start with FILE_ (FILE_cpu with -F) followed by the cpu
250 number. This supports strftime(3) formats for FILE.
251
252 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
253 so has the effect of setting target() to the pid of the command
254 ran.
255
256 -x PID Sets target() to PID. This allows scripts to be written that
257 filter on a specific process. Scripts run independent of the
258 PID's lifespan.
259
260 -e SCRIPT
261 Run the given SCRIPT specified on the command line.
262
263 -E SCRIPT
264 Run the given SCRIPT specified. This SCRIPT is run in addition
265 to the main script specified, through -e, or as a script file.
266 This option can be repeated to run multiple scripts, and can be
267 used in listing mode (-l/-L).
268
269 -l PROBE
270 Instead of running a probe script, just list all available probe
271 points matching the given single probe point. The pattern may
272 include wildcards and aliases, but not comma-separated multiple
273 probe points. The process result code will indicate failure if
274 there are no matches.
275
276 % stap -e 'probe syscall.* { }'
277 [...]
278 % stap -l 'syscall.*'
279 syscall.accept
280 [...]
281 syscall.writev
282
283
284 -L PROBE
285 Similar to "-l", but list matching probe points plus their
286 available context variables.
287
288 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
289 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
290 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
291 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
292 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
293 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
294 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
295 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
296
297
298 -F Without -o option, load module and start probes, then detach
299 from the module leaving the probes running. With -o option, run
300 staprun in background as a daemon and show its pid.
301
302 -S size[,N]
303 Sets the maximum size of output file and the maximum number of
304 output files. If the size of output file will exceed size ,
305 systemtap switches output file to the next file. And if the num‐
306 ber of output files exceed N , systemtap removes the oldest out‐
307 put file. You can omit the second argument.
308
309 -T TIMEOUT
310 Exit the script after TIMEOUT seconds.
311
312 --skip-badvars
313 Ignore unresolvable or run-time-inaccessible context variables
314 and substitute with 0, without errors.
315
316
317 --prologue-searching[=WHEN]
318 Prologue-searching mode. Activate heuristics to work around in‐
319 correct debugging information for function parameter $context
320 variables. WHEN can be either "never", "always", or "auto" (i.e.
321 enabled by heuristic). If WHEN is missing, then "always" is as‐
322 sumed. If the option is missing, then "auto" is assumed.
323
324
325 --suppress-handler-errors
326 Wrap all probe handlers into something like this
327
328 try { ... } catch { next }
329
330 block, which causes any runtime errors to be quietly suppressed.
331 Suppressed errors do not count against MAXERRORS limits. In
332 this mode, the MAXSKIPPED limits are also suppressed, so that
333 many errors and skipped probes may be accumulated during a
334 script's runtime. Any overall counts will still be reported at
335 shutdown.
336
337
338 --compatible VERSION
339 Suppress recent script language or tapset changes which are in‐
340 compatible with given older version of systemtap. This may be
341 useful if a much older systemtap script fails to run. See the
342 DEPRECATION section for more details.
343
344
345 --check-version
346 This option is used to check if the active script has any con‐
347 structs that may be systemtap version specific. See the DEPRE‐
348 CATION section for more details.
349
350
351 --clean-cache
352 This option prunes stale entries from the cache directory. This
353 is normally done automatically after successful runs, but this
354 option will trigger the cleanup manually and then exit. See the
355 CACHING section for more details about cache limits.
356
357
358 --color[=WHEN], --colour[=WHEN]
359 This option controls coloring of error messages. WHEN can be ei‐
360 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
361 minal). If WHEN is missing, then "always" is assumed. If the op‐
362 tion is missing, then "auto" is assumed.
363
364 Colors can be modified using the SYSTEMTAP_COLORS environment
365 variable. The format must be of the form
366 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
367 "warning", "source", "caret", and "token". Values constitute
368 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
369 mentation of your terminal for the SGRs it supports. As an exam‐
370 ple, the default colors would be expressed as
371 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
372 SYSTEMTAP_COLORS is absent, the default colors will be used. If
373 it is empty or invalid, coloring is turned off.
374
375
376 --disable-cache
377 This option disables all use of the cache directory. No files
378 will be either read from or written to the cache.
379
380
381 --poison-cache
382 This option treats files in the cache directory as invalid. No
383 files will be read from the cache, but resulting files from this
384 run will still be written to the cache. This is meant as a
385 troubleshooting aid when stap's cached behavior seems to be mis‐
386 behaving. If it helped, there is a probably a bug in systemtap
387 that the developers would like you to report.
388
389
390 --privilege[=stapusr | =stapsys | =stapdev]
391 This option instructs stap to examine the script looking for
392 constructs which are not allowed for the specified privilege
393 level (see UNPRIVILEGED USERS). Compilation fails if any such
394 constructs are used. If stapusr or stapsys are specified when
395 using a compile server (see --use-server), the server will exam‐
396 ine the script and, if compilation succeeds, the server will
397 cryptographically sign the resulting kernel module, certifying
398 that is it safe for use by users at the specified privilege lev‐
399 el.
400
401 If --privilege has not been specified, -pN has not been speci‐
402 fied with N < 5, and the invoking user is not root, and is not a
403 member of the group stapdev, then stap will automatically add
404 the appropriate --privilege option to the options already speci‐
405 fied.
406
407
408 --unprivileged
409 This option is equivalent to --privilege=stapusr.
410
411
412 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
413 Specify compile-server(s) to be used for compilation and/or in
414 conjunction with --list-servers and --trust-servers (see below)
415 for listing. If no argument is supplied, then the default in un‐
416 privileged mode (see --privilege) is to select compatible
417 servers which are trusted as SSL peers and as module signers and
418 currently online. Otherwise the default is to select compatible
419 servers which are trusted as SSL peers and currently online.
420 --use-server may be specified more than once, in which case a
421 list of servers is accumulated in the order specified. Servers
422 may be specified by host name, ip address, or by certificate se‐
423 rial number (obtained using --list-servers). The latter is most
424 commonly used when adding or revoking trust in a server (see
425 --trust-servers below). If a server is specified by host name or
426 ip address, then an optional port number may be specified. This
427 is useful for accessing servers which are not on the local net‐
428 work or to specify a particular server.
429
430 IP addresses may be IPv4 or IPv6 addresses.
431
432 If a particular IPv6 address is link local and exists on more
433 than one interface, the intended interface may be specified by
434 appending the address with a percent sign (%) followed by the
435 intended interface name. For example,
436 "fe80::5eff:35ff:fe07:55ca%eth0".
437
438 In order to specify a port number with an IPv6 address, it is
439 necessary to enclose the IPv6 address in square brackets ([]) in
440 order to separate the port number from the rest of the address.
441 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
442 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
443
444 If --use-server has not been specified, -pN has not been speci‐
445 fied with N < 5, and the invoking user not root, is not a member
446 of the group stapdev, but is a member of the group stapusr, then
447 stap will automatically add --use-server to the options already
448 specified.
449
450
451 --use-server-on-error[=yes|=no]
452 Instructs stap to retry compilation of a script using a compile
453 server if compilation on the local host fails in a manner which
454 suggests that it might succeed using a server. If this option
455 is not specified, the default is no. If no argument is provid‐
456 ed, then the default is yes. Compilation will be retried for
457 certain types of errors (e.g. insufficient data or resources)
458 which may not occur during re-compilation by a compile server.
459 Compile servers will be selected automatically for the re-compi‐
460 lation attempt as if --use-server was specified with no argu‐
461 ments.
462
463
464 --list-servers[=SERVERS]
465 Display the status of the requested SERVERS, where SERVERS is a
466 comma-separated list of server attributes. The list of at‐
467 tributes is combined to filter the list of servers displayed.
468 Supported attributes are:
469
470 all specifies all known servers (trusted SSL peers, trusted
471 module signers, online servers).
472
473 specified
474 specifies servers specified using --use-server.
475
476 online filters the output by retaining information about servers
477 which are currently online.
478
479 trusted
480 filters the output by retaining information about servers
481 which are trusted as SSL peers.
482
483 signer filters the output by retaining information about servers
484 which are trusted as module signers (see --privilege).
485
486 compatible
487 filters the output by retaining information about servers
488 which are compatible with the current kernel release and
489 architecture.
490
491 If no argument is provided, then the default is specified. If
492 no servers were specified using --use-server, then the default
493 servers for --use-server are listed.
494
495 Note that --list-servers uses the avahi-daemon service to detect
496 online servers. If this service is not available, then
497 --list-servers will fail to detect any online servers. In order
498 for --list-servers to detect servers listening on IPv6 address‐
499 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
500 mon.conf must contain an active "use-ipv6=yes" line. The service
501 must be restarted after adding this line in order for IPv6 to be
502 enabled.
503
504
505 --trust-servers[=TRUST_SPEC]
506 Grant or revoke trust in compile-servers, specified using
507 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
508 comma-separated list specifying the trust which is to be granted
509 or revoked. Supported elements are:
510
511 ssl trust the specified servers as SSL peers.
512
513 signer trust the specified servers as module signers (see
514 --privilege). Only root can specify signer.
515
516 all-users
517 grant trust as an ssl peer for all users on the local
518 host. The default is to grant trust as an ssl peer for
519 the current user only. Trust as a module signer is always
520 granted for all users. Only root can specify all-users.
521
522 revoke revoke the specified trust. The default is to grant it.
523
524 no-prompt
525 do not prompt the user for confirmation before carrying
526 out the requested action. The default is to prompt the
527 user for confirmation.
528
529 If no argument is provided, then the default is ssl. If no
530 servers were specified using --use-server, then no trust will be
531 granted or revoked.
532
533 Unless no-prompt has been specified, the user will be prompted
534 to confirm the trust to be granted or revoked before the opera‐
535 tion is performed.
536
537
538 --dump-probe-types
539 Dumps a list of supported probe types and exits. If --privi‐
540 lege=stapusr is also specified, the list will be limited to
541 probe types available to unprivileged users.
542
543
544 --dump-probe-aliases
545 Dumps a list of all probe aliases found in library files and ex‐
546 its.
547
548
549 --dump-functions
550 Dumps a list of all the public functions found in library files
551 and exits. Also includes their parameters and types. A function
552 of type 'unknown' indicates a function that does not return a
553 value. Note that not all function/parameter types may be re‐
554 solved (these are also shown by 'unknown'). This features is
555 very memory-intensive and thus may not work properly with --use-
556 server if the target server imposes an rlimit on process memory
557 (i.e. through the ~stap-server/.systemtap/rc configuration file,
558 see stap-server(8)).
559
560
561 --remote URL
562 Set the execution target to the given host. This option may be
563 repeated to target multiple execution targets. Passes 1-4 are
564 completed locally as normal to build the script, and then pass 5
565 will copy the module to the target and run it. Acceptable URL
566 forms include:
567
568 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
569 This mode uses ssh, optionally using a username not
570 matching your own. If a custom ssh_config file is in use,
571 add SendEnv LANG to retain internationalization function‐
572 ality.
573
574 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
575 This mode uses stapvirt to execute the script on a domain
576 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
577 fied to connect to a specific driver and/or a remote
578 host. For example, to connect to the local privileged QE‐
579 MU driver, use:
580
581 --remote libvirt://MyDomain/qemu:///system
582
583 See the page at <http://libvirt.org/uri.html> for sup‐
584 ported URIs. Also see stapvirt(1) for more information on
585 how to prepare the domain for stap probing.
586
587 unix:PATH
588 This mode connects to a UNIX socket. This can be used
589 with a QEMU virtio-serial port for executing scripts in‐
590 side a running virtual machine.
591
592 direct://
593 Special loopback mode to run on the local host.
594
595 --remote-prefix
596 Prefix each line of remote output with "N: ", where N is the in‐
597 dex of the remote execution target from which the given line
598 originated.
599
600
601 --download-debuginfo[=OPTION]
602 Enable, disable or set a timeout for the automatic debuginfo
603 downloading feature offered by abrt as specified by OPTION,
604 where OPTION is one of the following:
605
606 yes enable automatic downloading of debuginfo with no time‐
607 out. This is the same as not providing an OPTION value to
608 --download-debuginfo
609
610 no explicitly disable automatic downloading of debuginfo.
611 This is the same as not using the option at all.
612
613 ask show abrt output, and ask before continuing download. No
614 timeout will be set.
615
616 <timeout>
617 specify a timeout as a positive number to stop the down‐
618 load if it is taking longer than <timeout> seconds.
619
620 --rlimit-as=NUM
621 Specify the maximum size of the process's virtual memory (ad‐
622 dress space), in bytes.
623
624
625 --rlimit-cpu=NUM
626 Specify the CPU time limit, in seconds.
627
628
629 --rlimit-nproc=NUM
630 Specify the maximum number of processes that can be created.
631
632
633 --rlimit-stack=NUM
634 Specify the maximum size of the process stack, in bytes.
635
636
637 --rlimit-fsize=NUM
638 Specify the maximum size of files that the process may create,
639 in bytes.
640
641
642 --sysroot=DIR
643 Specify sysroot directory where target files (executables, li‐
644 braries, etc.) are located. With -r RELEASE, the sysroot will
645 be searched for the appropriate kernel build directory. With -r
646 /DIR, however, the sysroot will not be used to find the kernel
647 build.
648
649
650 --sysenv=VAR=VALUE
651 Provide an alternate value for an environment variable where the
652 value on a remote system differs. Path variables (e.g. PATH,
653 LD_LIBRARY_PATH) are assumed to be relative to the directory
654 provided by --sysroot, if provided.
655
656
657 --suppress-time-limits
658 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
659 and -DMAXTRYLOCK. This option requires guru mode.
660
661
662 --runtime=MODE
663 Set the pass-5 runtime mode. Valid options are kernel (de‐
664 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
665 information.
666
667
668 --dyninst
669 Shorthand for --runtime=dyninst.
670
671
672 --bpf Shorthand for --runtime=bpf.
673
674
675 --save-uprobes
676 On machines that require SystemTap to build its own uprobes mod‐
677 ule (kernels prior to version 3.5), this option instructs Sys‐
678 temTap to also save a copy of the module in the current directo‐
679 ry (creating a new "uprobes" directory first).
680
681
682 --target-namespaces=PID
683 Allow for a set of target namespaces to be set based on the
684 namespaces the given PID is in. This is for namespace-aware
685 tapset functions. If the target namespaces was not set, the tar‐
686 get defaults to the stap process' namespaces.
687
688
689 --monitor=INTERVAL
690 Enables an interface to display status information about the
691 module(uptime, module name, invoker uid, memory sizes, global
692 variables, list of probes with their statistics). An optional
693 argument INTERVAL can be supplied to set the refresh rate in
694 seconds of the status window. The module can also be controlled
695 by a list of commands using the following keys:
696
697 c Resets all global variables to their initial values or
698 zeroes them if they did not have an initial value.
699
700 s Rotates the attribute used to sort the list of probes.
701
702 t Brings up a prompt to allow toggling(on/off) of probes by
703 index. Probe points are still affected by their condi‐
704 tions.
705
706 r Resumes the script by toggling on all probes.
707
708 p Pauses the script by toggling off all probes.
709
710 x Hides/shows the status window. This allows for more out‐
711 put to be seen.
712
713 navigation-keys
714 The navigation keys can be used to scroll up and down the
715 windows.
716
717 Tab Toggle scrolling between status and output windows.
718
719
720 --example
721 This option is used to run example scripts without having to en‐
722 ter the entire path to the script. Example scripts can be found
723 in the directory specified in the stappaths(7) manual page.
724
725
726 --no-global-var-display
727 This option is used to disable the automatic logging of unused
728 global variables at the end of a stap session.
729
730
732 Any additional arguments on the command line are passed to the script
733 parser for substitution. See below.
734
735
737 The systemtap script language resembles awk and C. There are two main
738 outermost constructs: probes and functions. Within these, statements
739 and expressions use C-like operator syntax and precedence.
740
741
742 GENERAL SYNTAX
743 Whitespace is ignored. Three forms of comments are supported:
744 # ... shell style, to the end of line, except for $# and @#
745 // ... C++ style, to the end of line
746 /* ... C style ... */
747 Literals are either strings enclosed in double-quotes (passing through
748 the usual C escape codes with backslashes, and with adjacent string
749 literals glued together, also as in C), or integers (in decimal, hexa‐
750 decimal, or octal, using the same notation as in C). All strings are
751 limited in length to some reasonable value (a few hundred bytes). In‐
752 tegers are 64-bit signed quantities, although the parser also accepts
753 (and wraps around) values above positive 2**63.
754
755 In addition, script arguments given at the end of the command line may
756 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
757 insertion as a string literal. The number of arguments may be accessed
758 through $# (as an unquoted number) or through @# (as a quoted number).
759 These may be used at any place a token may begin, including within the
760 preprocessing stage. Reference to an argument number beyond what was
761 actually given is an error.
762
763
764 PREPROCESSING
765 A simple conditional preprocessing stage is run as a part of parsing.
766 The general form is similar to the cond ? exp1 : exp2 ternary operator:
767
768 %( CONDITION %? TRUE-TOKENS %)
769 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
770
771 The CONDITION is either an expression whose format is determined by its
772 first keyword, or a string literals comparison or a numeric literals
773 comparison. It can be also composed of many alternatives and conjunc‐
774 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
775 spectively. However, parentheses are not supported yet, so remembering
776 that conjunction takes precedence over alternative is important.
777
778 If the first part is the identifier kernel_vr or kernel_v to refer to
779 the kernel version number, with ("2.6.13-1.322FC3smp") or without
780 ("2.6.13") the release code suffix, then the second part is one of the
781 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
782 the third part is a string literal that contains an RPM-style version-
783 release value. The condition is deemed satisfied if the version of the
784 target kernel (as optionally overridden by the -r option) compares to
785 the given version string. The comparison is performed by the glibc
786 function strverscmp. As a special case, if the operator is for simple
787 equality (==), or inequality (!=), and the third part contains any
788 wildcard characters (* or ? or [), then the expression is treated as a
789 wildcard (mis)match as evaluated by fnmatch.
790
791 If, on the other hand, the first part is the identifier arch to refer
792 to the processor architecture (as named by the kernel build system
793 ARCH/SUBARCH), then the second part is one of the two string comparison
794 operators == or !=, and the third part is a string literal for matching
795 it. This comparison is a wildcard (mis)match.
796
797 Similarly, if the first part is an identifier like CONFIG_something to
798 refer to a kernel configuration option, then the second part is == or
799 !=, and the third part is a string literal for matching the value (com‐
800 monly "y" or "m"). Nonexistent or unset kernel configuration options
801 are represented by the empty string. This comparison is also a wild‐
802 card (mis)match.
803
804 If the first part is the identifier systemtap_v, the test refers to the
805 systemtap compatibility version, which may be overridden for old
806 scripts with the --compatible flag. The comparison operator is as is
807 for kernel_v and the right operand is a version string. See also the
808 DEPRECATION section below.
809
810 If the first part is the identifier systemtap_privilege, the test
811 refers to the privilege level that the systemtap script is compiled
812 with. Here the second part is == or !=, and the third part is a string
813 literal, either "stapusr" or "stapsys" or "stapdev".
814
815 If the first part is the identifier guru_mode, the test refers to if
816 the systemtap script is compiled with guru_mode. Here the second part
817 is == or !=, and the third part is a number, either 1 or 0.
818
819 If the first part is the identifier runtime, the test refers to the
820 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
821 tion on runtimes. The second part is one of the two string comparison
822 operators == or !=, and the third part is a string literal for matching
823 it. This comparison is a wildcard (mis)match.
824
825 Otherwise, the CONDITION is expected to be a comparison between two
826 string literals or two numeric literals. In this case, the arguments
827 are the only variables usable.
828
829 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
830 (possibly including nested preprocessor conditionals), and are passed
831 into the input stream if the condition is true or false. For example,
832 the following code induces a parse error unless the target kernel ver‐
833 sion is newer than 2.6.5:
834
835 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
836
837 The following code might adapt to hypothetical kernel version drift:
838
839 probe kernel.function (
840 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
841 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
842 UNSUPPORTED %) %)
843 ) { /* ... */ }
844
845 %( arch == "ia64" %?
846 probe syscall.vliw = kernel.function("vliw_widget") {}
847 %)
848
849
850
851 PREPROCESSOR MACROS
852 The preprocessor also supports a simple macro facility, run as a sepa‐
853 rate pass before conditional preprocessing.
854
855 Macros are defined using the following construct:
856
857 @define NAME %( BODY %)
858 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
859
860 Macros, and parameters inside a macro body, are both invoked by prefix‐
861 ing the macro name with an @ symbol:
862
863 @define foo %( x %)
864 @define add(a,b) %( ((@a)+(@b)) %)
865
866 @foo = @add(2,2)
867
868
869 Macro expansion is currently performed in a separate pass before condi‐
870 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
871 tional expressions will be macroexpanded regardless of how the condi‐
872 tion is evaluated. This can sometimes lead to errors:
873
874 // The following results in a conflict:
875 %( CONFIG_UTRACE == "y" %?
876 @define foo %( process.syscall %)
877 %:
878 @define foo %( **ERROR** %)
879 %)
880
881 // The following works properly as expected:
882 @define foo %(
883 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
884 %)
885
886 The first example is incorrect because both @defines are evaluated in a
887 pass prior to the conditional being evaluated.
888
889 Normally, a macro definition is local to the file it occurs in. Thus,
890 defining a macro in a tapset does not make it available to the user of
891 the tapset. Publically available library macros can be defined by in‐
892 cluding .stpm files on the tapset search path. These files may only
893 contain @define constructs, which become visible across all tapsets and
894 user scripts. Optionally, within the .stpm files, a public macro defi‐
895 nition can be surrounded by a preprocessor conditional as described
896 above.
897
898
899 CONSTANTS
900 Tapsets or guru-mode user scripts can access header file constant to‐
901 kens, typically macros, using built-in @const() operator. The respec‐
902 tive header file inclusion is possible either via the tapset library,
903 or using a top-level guru mode embedded-C construct. This results in
904 appropriate embedded C pragma comments setting.
905
906 @const("STP_SKIP_BADVARS")
907
908
909
910 VARIABLES
911 Identifiers for variables and functions are an alphanumeric sequence,
912 and may include _ and $ characters. They may not start with a plain
913 digit, as in C. Each variable is by default local to the probe or
914 function statement block within which it is mentioned, and therefore
915 its scope and lifetime is limited to a particular probe or function in‐
916 vocation.
917
918 Scalar variables are implicitly typed as either string or integer. As‐
919 sociative arrays also have a string or integer value, and a tuple of
920 strings and/or integers serving as a key. Here are a few basic expres‐
921 sions.
922
923 var1 = 5
924 var2 = "bar"
925 array1 [pid()] = "name" # single numeric key
926 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
927 if (["hello",5,4] in array2) println ("yes") # membership test
928
929
930 The translator performs type inference on all identifiers, including
931 array indexes and function parameters. Inconsistent type-related use
932 of identifiers signals an error.
933
934 Variables may be declared global, so that they are shared amongst all
935 probes and functions and live as long as the entire systemtap session.
936 There is one namespace for all global variables, regardless of which
937 script file they are found within. Concurrent access to global vari‐
938 ables is automatically protected with locks, see the SAFETY AND SECURI‐
939 TY section for more details. A global declaration may be written at
940 the outermost level anywhere, not within a block of code. Global vari‐
941 ables which are written but never read will be displayed automatically
942 at session shutdown. The translator will infer for each its value
943 type, and if it is used as an array, its key types. Optionally, scalar
944 globals may be initialized with a string or number literal. The fol‐
945 lowing declaration marks variables as global.
946
947 global var1, var2, var3=4
948
949
950 Global variables can also be set as module options. One can do this by
951 either using the -G option, or the module must first be compiled using
952 stap -p4. Global variables can then be set on the command line when
953 calling staprun on the module generated by stap -p4. See staprun(8) for
954 more information.
955
956 The scope of a global variable may be limited to a tapset or user
957 script file using private keyword. The global keyword is optional when
958 defining a private global variable. Following declaration marks var1
959 and var2 private globals.
960
961 private global var1=2
962 private var2
963
964
965 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
966 SAFETY AND SECURITY section for details. Optionally, global arrays may
967 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
968 for that array only. Note that this doesn't indicate the type of keys
969 for the array, just the size.
970
971 global tiny_array[10], normal_array, big_array[50000]
972
973
974 Arrays may be configured for wrapping using the '%' suffix. This caus‐
975 es older elements to be overwritten if more elements are inserted than
976 the array can hold. This works for both associative and statistics
977 typed arrays.
978
979 global wrapped_array1%[10], wrapped_array2%
980
981
982
983 Many types of probe points provide context variables, which are run-
984 time values, safely extracted from the kernel or userspace program be‐
985 ing probed. These are prefixed with the $ character. The CONTEXT
986 VARIABLES section in stapprobes(3stap) lists what is available for each
987 type of probe point. These context variables become normal string or
988 numeric scalars once they are stored in normal script variables. See
989 the TYPECASTING section below on how to to turn them back into typed
990 pointers for further processing as context variables.
991
992
993 STATEMENTS
994 Statements enable procedural control flow. They may occur within func‐
995 tions and probe handlers. The total number of statements executed in
996 response to any single probe event is limited to some number defined by
997 the MAXACTION macro in the translated C code, and is in the neighbour‐
998 hood of 1000.
999
1000 EXP Execute the string- or integer-valued expression and throw away
1001 the value.
1002
1003 { STMT1 STMT2 ... }
1004 Execute each statement in sequence in this block. Note that
1005 separators or terminators are generally not necessary between
1006 statements.
1007
1008 ; Null statement, do nothing. It is useful as an optional separa‐
1009 tor between statements to improve syntax-error detection and to
1010 handle certain grammar ambiguities.
1011
1012 if (EXP) STMT1 [ else STMT2 ]
1013 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1014 ro) or second STMT (zero).
1015
1016 while (EXP) STMT
1017 While integer-valued EXP evaluates to non-zero, execute STMT.
1018
1019 for (EXP1; EXP2; EXP3) STMT
1020 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1021 STMT, then the iteration expression EXP3.
1022
1023 foreach (VAR in ARRAY [ limit EXP ]) STMT
1024 Loop over each element of the named global array, assigning cur‐
1025 rent key to VAR. The array may not be modified within the
1026 statement. By adding a single + or - operator after the VAR or
1027 the ARRAY identifier, the iteration will proceed in a sorted or‐
1028 der, by ascending or descending index or value. If the array
1029 contains statistics aggregates, adding the desired @operator be‐
1030 tween the ARRAY identifier and the + or - will specify the sort‐
1031 ing aggregate function. See the STATISTICS section below for
1032 the ones available. Default is @count. Using the optional lim‐
1033 it keyword limits the number of loop iterations to EXP times.
1034 EXP is evaluated once at the beginning of the loop.
1035
1036 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1037 Same as above, used when the array is indexed with a tuple of
1038 keys. A sorting suffix may be used on at most one VAR or ARRAY
1039 identifier.
1040
1041 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1042 ]) STMT
1043 Same as above, where iterations are limited to elements in the
1044 array where the keys match the index values specified. The sym‐
1045 bol * can be used to specify an index and will be treated as a
1046 wildcard.
1047
1048 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1049 This variant of foreach saves current value into VAR0 on each
1050 iteration, so it is the same as ARRAY[VAR]. This also works
1051 with a tuple of keys. Sorting suffixes on VAR0 have the same
1052 effect as on ARRAY.
1053
1054 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1055 Same as above, where iterations are limited to elements in the
1056 array where the keys match the index values specified. The sym‐
1057 bol * can be used to specify an index and will be treated as a
1058 wildcard.
1059
1060 break, continue
1061 Exit or iterate the innermost nesting loop (while or for or
1062 foreach) statement.
1063
1064 return EXP
1065 Return EXP value from enclosing function. If the function's
1066 value is not taken anywhere, then a return statement is not
1067 needed, and the function will have a special "unknown" type with
1068 no return value.
1069
1070 next Return now from enclosing probe handler. This is especially
1071 useful in probe aliases that apply event filtering predicates.
1072 When used in functions, the execution will be immediately trans‐
1073 ferred to the next overloaded function.
1074
1075 try { STMT1 } catch { STMT2 }
1076 Run the statements in the first block. Upon any run-time er‐
1077 rors, abort STMT1 and start executing STMT2. Any errors in
1078 STMT2 will propagate to outer try/catch blocks, if any.
1079
1080 try { STMT1 } catch(VAR) { STMT2 }
1081 Same as above, plus assign the error message to the string
1082 scalar variable VAR.
1083
1084 delete ARRAY[INDEX1, INDEX2, ...]
1085 Remove from ARRAY the element specified by the index tuple. If
1086 the index tuple contains a * in place of an index, the * is
1087 treated as a wildcard and all elements with keys that match the
1088 index tuple will be removed from ARRAY. The value will no
1089 longer be available, and subsequent iterations will not report
1090 the element. It is not an error to delete an element that does
1091 not exist.
1092
1093 delete ARRAY
1094 Remove all elements from ARRAY.
1095
1096 delete SCALAR
1097 Removes the value of SCALAR. Integers and strings are cleared
1098 to 0 and "" respectively, while statistics are reset to the ini‐
1099 tial empty state.
1100
1101
1102 EXPRESSIONS
1103 Systemtap supports a number of operators that have the same general
1104 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1105 formed as per typical C rules for signed integers. Division by zero or
1106 overflow is detected and results in an error.
1107
1108 binary numeric operators
1109 * / % + - >> << & ^ | && ||
1110
1111 binary string operators
1112 . (string concatenation)
1113
1114 numeric assignment operators
1115 = *= /= %= += -= >>= <<= &= ^= |=
1116
1117 string assignment operators
1118 = .=
1119
1120 unary numeric operators
1121 + - ! ~ ++ --
1122
1123 binary numeric, string comparison or regex matching operators
1124 < > <= >= == != =~ !~
1125
1126 ternary operator
1127 cond ? exp1 : exp2
1128
1129 grouping operator
1130 ( exp )
1131
1132 function call
1133 fn ([ arg1, arg2, ... ])
1134
1135 array membership check
1136 exp in array
1137 [exp1, exp2, ...] in array
1138 [*, *, ... ]in array
1139
1140
1141 REGULAR EXPRESSION MATCHING
1142 The scripting language supports regular expression matching. The basic
1143 syntax is as follows:
1144
1145 exp =~ regex
1146 exp !~ regex
1147
1148 (The first operand must be an expression evaluating to a string; the
1149 second operand must be a string literal containing a syntactically
1150 valid regular expression.)
1151
1152 The regular expression syntax supports most of the features of POSIX
1153 Extended Regular Expressions, except for subexpression reuse ("\1")
1154 functionality.
1155
1156 After a successful match, the contents of the matched string and subex‐
1157 pressions can be extracted using the matched() and ngroups() tapset
1158 functions as follows:
1159
1160 if ("an example string" =~ "str(ing)") {
1161 matched(0) // -> returns "string", the matched substring
1162 matched(1) // -> returns "ing", the 1st matched subexpression
1163 ngroups() // -> returns 2, the number of matched groups
1164 }
1165
1166
1167 PROBES
1168 The main construct in the scripting language identifies probes. Probes
1169 associate abstract events with a statement block ("probe handler") that
1170 is to be executed when any of those events occur. The general syntax
1171 is as follows:
1172
1173 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1174 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1175
1176
1177 Events are specified in a special syntax called "probe points". There
1178 are several varieties of probe points defined by the translator, and
1179 tapset scripts may define further ones using aliases. Probe points may
1180 be wildcarded, grouped, or listed in preference sequences, or declared
1181 optional. More details on probe point syntax and semantics are listed
1182 on the stapprobes(3stap) manual page.
1183
1184 The probe handler is interpreted relative to the context of each event.
1185 For events associated with kernel code, this context may include vari‐
1186 ables defined in the source code at that spot. These "context vari‐
1187 ables" are presented to the script as variables whose names are pre‐
1188 fixed with "$". They may be accessed only if the kernel's compiler
1189 preserved them despite optimization. This is the same constraint that
1190 a debugger user faces when working with optimized code. In addition,
1191 the objects must exist in paged-in memory at the moment of the system‐
1192 tap probe handler's execution, because systemtap must not cause (sup‐
1193 presses) any additional paging. Some probe types have very little con‐
1194 text. See the stapprobes(3stap) man pages to see the kinds of context
1195 variables available at each kind of probe point.
1196
1197 Probes may be decorated with an arming condition, consisting of a sim‐
1198 ple boolean expression on read-only global script variables. While
1199 disarmed (inactive, condition evaluates to false), some probe types re‐
1200 duce or eliminate their run-time overheads. When an arming condition
1201 evaluates to true, probes will be soon re-armed, and their probe han‐
1202 dlers will start getting called as the events fire. (Some events may
1203 be lost during the arming interval. If this is unacceptable, do not
1204 use arming conditions for those probes.) Example of the syntax:
1205
1206 probe timer.us(TIMER) if (enabled) {
1207 }
1208
1209
1210 New probe points may be defined using "aliases". Probe point aliases
1211 look similar to probe definitions, but instead of activating a probe at
1212 the given point, it just defines a new probe point name as an alias to
1213 an existing one. There are two types of alias, i.e. the prologue style
1214 and the epilogue style which are identified by "=" and "+=" respective‐
1215 ly.
1216
1217 For prologue style alias, the statement block that follows an alias
1218 definition is implicitly added as a prologue to any probe that refers
1219 to the alias. While for the epilogue style alias, the statement block
1220 that follows an alias definition is implicitly added as an epilogue to
1221 any probe that refers to the alias. For example:
1222
1223 probe syscall.read = kernel.function("sys_read") {
1224 fildes = $fd
1225 if (execname() == "init") next # skip rest of probe
1226 }
1227
1228 defines a new probe point syscall.read, which expands to
1229 kernel.function("sys_read"), with the given statement as a prologue,
1230 which is useful to predefine some variables for the alias user and/or
1231 to skip probe processing entirely based on some conditions. And
1232
1233 probe syscall.read += kernel.function("sys_read") {
1234 if (tracethis) println ($fd)
1235 }
1236
1237 defines a new probe point with the given statement as an epilogue,
1238 which is useful to take actions based upon variables set or left over
1239 by the the alias user. Please note that in each case, the statements
1240 in the alias handler block are treated ordinarily, so that variables
1241 assigned there constitute mere initialization, not a macro substitu‐
1242 tion.
1243
1244 An alias is used just like a built-in probe type.
1245
1246 probe syscall.read {
1247 printf("reading fd=%d\n", fildes)
1248 if (fildes > 10) tracethis = 1
1249 }
1250
1251
1252
1253 FUNCTIONS
1254 Systemtap scripts may define subroutines to factor out common work.
1255 Functions take any number of scalar (integer or string) arguments, and
1256 must return a single scalar (integer or string). An example function
1257 declaration looks like this:
1258
1259 function thisfn (arg1, arg2) {
1260 return arg1 + arg2
1261 }
1262
1263 Note the general absence of type declarations, which are instead in‐
1264 ferred by the translator. However, if desired, a function definition
1265 may include explicit type declarations for its return value and/or its
1266 arguments. This is especially helpful for embedded-C functions. In
1267 the following example, the type inference engine need only infer type
1268 type of arg2 (a string).
1269
1270 function thatfn:string (arg1:long, arg2) {
1271 return sprint(arg1) . arg2
1272 }
1273
1274 Functions may call others or themselves recursively, up to a fixed
1275 nesting limit. This limit is defined by the MAXNESTING macro in the
1276 translated C code and is in the neighbourhood of 10.
1277
1278 Functions may be marked private using the private keyword to limit
1279 their scope to the tapset or user script file they are defined in. An
1280 example definition of a private function follows:
1281
1282 private function three:long () { return 3 }
1283
1284
1285 Functions terminating without reaching an explicit return statement
1286 will return an implicit 0 or "", determined by type inference.
1287
1288 Functions may be overloaded during both runtime and compile time.
1289
1290 Runtime overloading allows the executed function to be selected while
1291 the module is running based on runtime conditions and is achieved using
1292 the "next" statement in script functions and STAP_NEXT macro for embed‐
1293 ded-C functions. For example,
1294
1295
1296 function f() { if (condition) next; print("first function") }
1297 function f() %{ STAP_NEXT; print("second function") %}
1298 function f() { print("third function") }
1299
1300
1301 During a functioncall f(), the execution will transfer to the third
1302 function if condition evaluates to true and print "third function".
1303 Note that the second function is unconditionally nexted.
1304
1305 Parameter overloading allows the function to be executed to be selected
1306 at compile time based on the number of arguments provided to the func‐
1307 tioncall. For example,
1308
1309
1310 function g() { print("first function") }
1311 function g(x) { print("second function") }
1312 g() -> "first function"
1313 g(1) -> "second function"
1314
1315
1316 Note that runtime overloading does not occur in the above example, as
1317 exactly one function will be resolved for the functioncall. The use of
1318 a next statement inside a function while no more overloads remain will
1319 trigger a runtime exception Runtime overloading will only occur if the
1320 functions have the same arity, functions with the same name but differ‐
1321 ent number of parameters are completely unrelated.
1322
1323 Execution order is determined by a priority value which may be speci‐
1324 fied. If no explicit priority is specified, user script functions are
1325 given a higher priority than library functions. User script functions
1326 and library functions are assigned a default priority value of 0 and 1
1327 respectively. Functions with the same priority are executed in decla‐
1328 ration order. For example,
1329
1330
1331 function f():3 { if (condition) next; print("first function") }
1332 function f():1 { if (condition) next; print("second function") }
1333 function f():2 { print("third function") }
1334
1335
1336 Since the second function has highest priority, it is executed first.
1337 The first function is never executed as there no "next" statements in
1338 the third function to transfer execution.
1339
1340
1341 PRINTING
1342 There are a set of function names that are specially treated by the
1343 translator. They format values for printing to the standard systemtap
1344 output stream in a more convenient way (note that data generated in the
1345 kernel module need to get transferred to user-space in order to get
1346 printed).
1347
1348 The sprint* variants return the formatted string instead of printing
1349 it.
1350
1351 print, sprint
1352 Print one or more values of any type, concatenated directly to‐
1353 gether.
1354
1355 println, sprintln
1356 Print values like print and sprint, but also append a newline.
1357
1358 printd, sprintd
1359 Take a string delimiter and two or more values of any type, and
1360 print the values with the delimiter interposed. The delimiter
1361 must be a literal string constant.
1362
1363 printdln, sprintdln
1364 Print values with a delimiter like printd and sprintd, but also
1365 append a newline.
1366
1367 printf, sprintf
1368 Take a formatting string and a number of values of corresponding
1369 types, and print them all. The format must be a literal string
1370 constant.
1371
1372 The printf formatting directives similar to those of C, except that
1373 they are fully type-checked by the translator:
1374
1375 %b Writes a binary blob of the value given, instead of ASCII
1376 text. The width specifier determines the number of bytes
1377 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1378 fault (%b) is 8 bytes.
1379
1380 %c Character.
1381
1382 %d,%i Signed decimal.
1383
1384 %m Safely reads kernel (without #) or user (with #) memory
1385 at the given address, outputs its content. The optional
1386 precision specifier (not field width) determines the num‐
1387 ber of bytes to read - default is 1 byte. %10.4m prints
1388 4 bytes of the memory in a 10-character-wide field.
1389 Note, on some architectures user memory can still be read
1390 without #.
1391
1392 %M Same as %m, but outputs in hexadecimal. The minimal size
1393 of output is double the optional precision specifier -
1394 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1395 of the memory as 8 hexadecimal characters in a 10-charac‐
1396 ter-wide field. %.*M hex-dumps a given number of bytes
1397 from a given buffer.
1398
1399 %o Unsigned octal.
1400
1401 %p Unsigned pointer address.
1402
1403 %s String.
1404
1405 %u Unsigned decimal.
1406
1407 %x Unsigned hex value, in all lower-case.
1408
1409 %X Unsigned hex value, in all upper-case.
1410
1411 %% Writes a %.
1412
1413 The # flag selects the alternate forms. For octal, this prefixes a 0.
1414 For hex, this prefixes 0x or 0X, depending on case. For characters,
1415 this escapes non-printing values with either C-like escapes or raw oc‐
1416 tal. In the case of %#m/%#M, this safely accesses user space memory
1417 rather than kernel space memory.
1418
1419 Examples:
1420
1421 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1422 print("hello")
1423 Prints: hello
1424 println(b)
1425 Prints: bob\n
1426 println(a . " is " . sprint(16))
1427 Prints: alice is 16
1428 foreach (name in id) printdln("|", strlen(name), name, id[name])
1429 Prints: 5|alice|1234\n3|bob|4567
1430 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1431 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1432 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1433 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1434 printf("%4b", p)
1435 Prints (these values as binary data): 0x1234abcd
1436 printf("%#o %#x %#X\n", 1, 2, 3)
1437 Prints: 01 0x2 0X3
1438 printf("%#c %#c %#c\n", 0, 9, 42)
1439 Prints: \000 \t *
1440
1441
1442
1443 STATISTICS
1444 It is often desirable to collect statistics in a way that avoids the
1445 penalties of repeatedly exclusive locking the global variables those
1446 numbers are being put into. Systemtap provides a solution using a spe‐
1447 cial operator to accumulate values, and several pseudo-functions to ex‐
1448 tract the statistical aggregates.
1449
1450 The aggregation operator is <<<, and resembles an assignment, or a C++
1451 output-streaming operation. The left operand specifies a scalar or ar‐
1452 ray-index lvalue, which must be declared global. The right operand is
1453 a numeric expression. The meaning is intuitive: add the given number
1454 to the pile of numbers to compute statistics of. (The specific list of
1455 statistics to gather is given separately, by the extraction functions.)
1456
1457 foo <<< 1
1458 stats[pid()] <<< memsize
1459
1460
1461 The extraction functions are also special. For each appearance of a
1462 distinct extraction function operating on a given identifier, the
1463 translator arranges to compute a set of statistics that satisfy it.
1464 The statistics system is thereby "on-demand". Each execution of an ex‐
1465 traction function causes the aggregation to be computed for that moment
1466 across all processors.
1467
1468 Here is the set of extractor functions. The first argument of each is
1469 the same style of lvalue used on the left hand side of the accumulate
1470 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1471 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1472 mum/average/variance of all accumulated values. The resulting values
1473 are all simple integers. Arrays containing aggregates may be sorted
1474 and iterated. See the foreach construct above.
1475
1476 Variance uses Welford's online algorithm. The calculations are based
1477 on integer arithmetic, and so may suffer from low precision and over‐
1478 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1479 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1480 Only one value of bit-shift may be used with given global variable. A
1481 larger bitshift value increases precision, but increases the likelihood
1482 of overflow.
1483
1484
1485 $ stap -e \
1486 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1487 12
1488 $ stap -e \
1489 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1490 2
1491 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1492 2.5
1493 $
1494
1495
1496 Overflow (from internal multiplication of large numbers) may occur and
1497 may cause a negative variance result. Consider normalizing your input
1498 data. Adding or subtracting a fixed value from all variance inputs
1499 preserves the original variance. Dividing the variance inputs by a
1500 fixed value shrinks the original variance by that value squared.
1501
1502
1503
1504 Histograms are also available, but are more complicated because they
1505 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1506 terval) represents a linear histogram from "start" to "stop" (inclu‐
1507 sive) by increments of "interval". The interval must be positive. Sim‐
1508 ilarly, @hist_log(v) represents a base-2 logarithmic histogram. Print‐
1509 ing a histogram with the print family of functions renders a histogram
1510 object as a tabular "ASCII art" bar chart.
1511
1512
1513 probe timer.profile {
1514 x[1] <<< pid()
1515 x[2] <<< uid()
1516 y <<< tid()
1517 }
1518 global x // an array containing aggregates
1519 global y // a scalar
1520 probe end {
1521 foreach ([i] in x @count+) {
1522 printf ("x[%d]: avg %d = sum %d / count %d\n",
1523 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1524 println (@hist_log(x[i]))
1525 }
1526 println ("y:")
1527 println (@hist_log(y))
1528 }
1529
1530
1531 The counts of each histogram bucket may be individually accessed via
1532 the [index] operator. Each bucket is addressed from 1 through N (for
1533 each natural bucket). In addition bucket #0 counts all the samples be‐
1534 neath the start value, and bucket #N+1 counts all the samples above the
1535 stop value. Histogram buckets (including the two out-of-range buckets)
1536 may also be iterated with foreach.
1537
1538
1539 global x
1540 probe oneshot {
1541 x <<< -100
1542 x <<< 1
1543 x <<< 2
1544 x <<< 3
1545 x <<< 100
1546 foreach (bucket in @hist_linear(x,1,3,1))
1547 // expecting 1 out-of-range-low bucket
1548 // 3 payload buckets
1549 // 1 out-of-range-high bucket
1550 printf("bucket %d count %d\n",
1551 bucket, @hist_linear(x,1,3,1)[bucket])
1552 }
1553
1554
1555
1556 TYPECASTING
1557 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1558 has been saved into a script integer variable, the translator loses the
1559 type information necessary to access members from that pointer. Using
1560 the @cast() operator tells the translator how to interpret the number
1561 as a typed pointer.
1562
1563 @cast(p, "type_name"[, "module"])->member
1564
1565
1566 This will interpret p as a pointer to a struct/union named type_name
1567 and dereference the member value. Further ->subfield expressions may
1568 be appended to dereference more levels. Note that for direct derefer‐
1569 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1570 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1571 ing operator -> is used to refer to both direct containment or pointer
1572 indirection. Systemtap automatically determines which. The optional
1573 module tells the translator where to look for information about that
1574 type. Multiple modules may be specified as a list with : separators.
1575 If the module is not specified, it will default either to the probe
1576 module for dwarf probes, or to "kernel" for functions and all other
1577 probes types.
1578
1579 The translator can create its own module with type information from a
1580 header surrounded by angle brackets, in case normal debuginfo is not
1581 available. For kernel headers, prefix it with "kernel" to use the ap‐
1582 propriate build system. All other headers are built with default GCC
1583 parameters into a user module. Multiple headers may be specified in
1584 sequence to resolve a codependency.
1585
1586 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1587 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1588 @cast(task, "task_struct",
1589 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1590
1591 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1592 operators, the same way as described in the CONTEXT VARIABLES section
1593 of the stapprobes(3stap) manual page.
1594
1595
1596 When in guru mode, the translator will also allow scripts to assign new
1597 values to members of typecasted pointers.
1598
1599 Typecasting is also useful in the case of void* members whose type may
1600 be determinable at runtime.
1601
1602 probe foo {
1603 if ($var->type == 1) {
1604 value = @cast($var->data, "type1")->bar
1605 } else {
1606 value = @cast($var->data, "type2")->baz
1607 }
1608 print(value)
1609 }
1610
1611
1612
1613 EMBEDDED C
1614 When in guru mode, the translator accepts embedded C code in the top
1615 level of the script. Such code is enclosed between %{ and %} markers,
1616 and is transcribed verbatim, without analysis, in some sequence, into
1617 the top level of the generated C code. At the outermost level, this
1618 may be useful to add #include instructions, and any auxiliary defini‐
1619 tions for use by other embedded code.
1620
1621 Another place where embedded code is permitted is as a function body.
1622 In this case, the script language body is replaced entirely by a piece
1623 of C code enclosed again between %{ and %} markers. This C code may do
1624 anything reasonable and safe. There are a number of undocumented but
1625 complex safety constraints on atomicity, concurrency, resource consump‐
1626 tion, and run time limits, so this is an advanced technique.
1627
1628 The memory locations set aside for input and output values are made
1629 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1630 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1631 The function may return early with STAP_RETURN. Here are some exam‐
1632 ples:
1633
1634 function integer_ops (val) %{
1635 STAP_PRINTF("%d\n", STAP_ARG_val);
1636 STAP_RETVALUE = STAP_ARG_val + 1;
1637 if (STAP_RETVALUE == 4)
1638 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1639 if (STAP_RETVALUE == 3)
1640 STAP_RETURN(0);
1641 STAP_RETVALUE ++;
1642 %}
1643 function string_ops (val) %{
1644 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1645 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1646 if (strcmp (STAP_RETVALUE, "three-two-one"))
1647 STAP_RETURN("parameter should be three-two-");
1648 %}
1649 function no_ops () %{
1650 STAP_RETURN(); /* function inferred with no return value */
1651 %}
1652
1653 The function argument and return value types have to be inferred by the
1654 translator from the call sites in order for this to work. The user
1655 should examine C code generated for ordinary script-language functions
1656 in order to write compatible embedded-C ones.
1657
1658 The last place where embedded code is permitted is as an expression
1659 rvalue. In this case, the C code enclosed between %{ and %} markers is
1660 interpreted as an ordinary expression value. It is assumed to be a
1661 normal 64-bit signed number, unless the marker /* string */ is includ‐
1662 ed, in which case it's treated as a string.
1663
1664 function add_one (val) {
1665 return val + %{ 1 %}
1666 }
1667 function add_string_two (val) {
1668 return val . %{ /* string */ "two" %}
1669 }
1670
1671
1672 The embedded-C code may contain markers to assert optimization and
1673 safety properties.
1674
1675 /* pure */
1676 means that the C code has no side effects and may be elided en‐
1677 tirely if its value is not used by script code.
1678
1679 /* stable */
1680 means that the C code always has the same value (in any given
1681 probe handler invocation), so repeated calls may be automatical‐
1682 ly replaced by memoized values. Such functions must take no pa‐
1683 rameters, and also be pure.
1684
1685 /* unprivileged */
1686 means that the C code is so safe that even unprivileged users
1687 are permitted to use it.
1688
1689 /* myproc-unprivileged */
1690 means that the C code is so safe that even unprivileged users
1691 are permitted to use it, provided that the target of the current
1692 probe is within the user's own process.
1693
1694 /* guru */
1695 means that the C code is so unsafe that a systemtap user must
1696 specify -g (guru mode) to use this. (Tapsets are permitted and
1697 presumed to call them safely.)
1698
1699 /* unmangled */
1700 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1701 ment access syntax should be made available inside the function.
1702 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1703 THIS->foo and THIS->__retvalue respectively inside the function.
1704 This is useful for quickly migrating code written for SystemTap
1705 version 1.7 and earlier.
1706
1707 /* unmodified-fnargs */
1708 in an embedded-C function, means that the function arguments are
1709 not modified inside the function body.
1710
1711 /* string */
1712 in embedded-C expressions only, means that the expression has
1713 const char * type and should be treated as a string value, in‐
1714 stead of the default long numeric.
1715
1716 Script level global variables may be accessed in embedded-C functions
1717 and blocks. To read or write the global variable var , the /* prag‐
1718 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1719 the embedded-C function or block. This provides the macros STAP_GLOB‐
1720 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1721 spectively. For example:
1722
1723 global var
1724 global var2[100]
1725 function increment() %{
1726 /* pragma:read:var */ /* pragma:write:var */
1727 /* pragma:read:var2 */ /* pragma:write:var2 */
1728 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1729 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1730 %}
1731
1732 Variables may be read and set in both embedded-C functions and expres‐
1733 sions. Strings returned from embedded-C code are decayed to pointers.
1734 Variables must also be assigned at script level to allow for type in‐
1735 ference. Map assignment does not return the value written, so chaining
1736 does not work.
1737
1738
1739 BUILT-INS
1740 A set of builtin probe point aliases are provided by the scripts in‐
1741 stalled in the directory specified in the stappaths(7) manual page.
1742 The functions are described in the stapprobes(3stap) manual page.
1743
1744
1745 DEREFERENCING
1746 Integers can be dereferenced from pointers saved as a script integer
1747 variables using the @kderef() or @uderef() operators. @kderef() is
1748 used for kernel space addresses and @uderef() is used for user space
1749 addresses.
1750
1751 @kderef(SIZE, addr)
1752 @uderef(SIZE, addr)
1753
1754 This will interpert addr as a kernel/user address and read SIZE bytes
1755 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1756
1757
1758 REGISTERS
1759 The value stored within a register can be accessed using the @kregis‐
1760 ter() or @uregister() operators. @kregister() is used for kernel space
1761 registers and @uregister() is used for user space registers. The regis‐
1762 ter of interest is specified using its DWARF number.
1763
1764 @kregister(0)
1765 @uregister(5)
1766
1767
1769 The translator begins pass 1 by parsing the given input script, and all
1770 scripts (files named *.stp) found in a tapset directory. The
1771 directories listed with -I are processed in sequence, each processed in
1772 "guru mode". For each directory, a number of subdirectories are also
1773 searched. These subdirectories are derived from the selected kernel
1774 version (the -R option), in order to allow more kernel-version-specific
1775 scripts to override less specific ones. For example, for a kernel
1776 version 2.6.12-23.FC3 the following patterns would be searched, in
1777 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1778 *.stp. Stopping the translator after pass 1 causes it to print the
1779 parse trees.
1780
1781
1782 In pass 2, the translator analyzes the input script to resolve symbols
1783 and types. References to variables, functions, and probe aliases that
1784 are unresolved internally are satisfied by searching through the parsed
1785 tapset script files. If any tapset script file is selected because it
1786 defines an unresolved symbol, then the entirety of that file is added
1787 to the translator's resolution queue. This process iterates until all
1788 symbols are resolved and a subset of tapset script files is selected.
1789
1790 Next, all probe point descriptions are validated against the wide
1791 variety supported by the translator. Probe points that refer to code
1792 locations ("synchronous probe points") require the appropriate kernel
1793 debugging information to be installed. In the associated probe
1794 handlers, target-side variables (whose names begin with "$") are found
1795 and have their run-time locations decoded.
1796
1797 Next, all probes and functions are analyzed for optimization
1798 opportunities, in order to remove variables, expressions, and functions
1799 that have no useful value and no side-effect. Embedded-C functions are
1800 assumed to have side-effects unless they include the magic string
1801 /* pure */. Since this optimization can hide latent code errors such
1802 as type mismatches or invalid $context variables, it sometimes may be
1803 useful to disable the optimizations with the -u option.
1804
1805 Finally, all variable, function, parameter, array, and index types are
1806 inferred from context (literals and operators). Stopping the
1807 translator after pass 2 causes it to list all the probes, functions,
1808 and variables, along with all inferred types. Any inconsistent or
1809 unresolved types cause an error.
1810
1811
1812 In pass 3, the translator writes C code that represents the actions of
1813 all selected script files, and creates a Makefile to build that into a
1814 kernel object. These files are placed into a temporary directory.
1815 Stopping the translator at this point causes it to print the contents
1816 of the C file.
1817
1818
1819 In pass 4, the translator invokes the Linux kernel build system to
1820 create the actual kernel object file. This involves running make in
1821 the temporary directory, and requires a kernel module build system
1822 (headers, config and Makefiles) to be installed in the usual spot
1823 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1824 the last chance before running the kernel object. This may be useful
1825 if you want to archive the file.
1826
1827
1828 In pass 5, the translator invokes the systemtap auxiliary program
1829 staprun program for the given kernel object. This program arranges to
1830 load the module then communicates with it, copying trace data from the
1831 kernel into temporary files, until the user sends an interrupt signal.
1832 Any run-time error encountered by the probe handlers, such as running
1833 out of memory, division by zero, exceeding nesting or runtime limits,
1834 results in a soft error indication. Soft errors in excess of MAXERRORS
1835 block of all subsequent probes (except error-handling probes), and
1836 terminate the session. Finally, staprun unloads the module, and cleans
1837 up.
1838
1839
1840 ABNORMAL TERMINATION
1841 One should avoid killing the stap process forcibly, for example with
1842 SIGKILL, because the stapio process (a child process of the stap
1843 process) and the loaded module may be left running on the system. If
1844 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1845 then use rmmod to unload the systemtap module.
1846
1847
1848
1850 See the stapex(3stap) manual page for a brief collection of samples, or
1851 a large set of installed samples under the systemtap
1852 documentation/testsuite directories. See stappaths(7stap) for the
1853 likely location of these on the system.
1854
1855
1857 The systemtap translator caches the pass 3 output (the generated C
1858 code) and the pass 4 output (the compiled kernel module) if pass 4
1859 completes successfully. This cached output is reused if the same
1860 script is translated again assuming the same conditions exist (same
1861 kernel version, same systemtap version, etc.). Cached files are stored
1862 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1863 having the file cache_mb_limit placed in the cache directory (shown
1864 above) containing only an ASCII integer representing how many MiB the
1865 cache should not exceed. In the absence of this file, a default will be
1866 created with the limit set to 256MiB. This is a 'soft' limit in that
1867 the cache will be cleaned after a new entry is added if the cache clean
1868 interval is exceeded, so the total cache size may temporarily exceed
1869 this limit. This interval can be specified by having the file
1870 cache_clean_interval_s placed in the cache directory (shown above)
1871 containing only an ASCII integer representing the interval in seconds.
1872 In the absence of this file, a default will be created with the
1873 interval set to 300 s.
1874
1875
1877 Systemtap may be used as a powerful administrative tool. It can expose
1878 kernel internal data structures and potentially private user
1879 information. (In dyninst runtime mode, this is not the case, see the
1880 ALTERNATE RUNTIMES section below.)
1881
1882 The translator asserts many safety constraints during compilation and
1883 more during run-time. It aims to ensure that no handler routine can
1884 run for very long, allocate boundless memory, perform unsafe
1885 operations, or in unintentionally interfere with the system. Uses of
1886 script global variables are automatically read/write locked as
1887 appropriate, to protect against manipulation by concurrent probe
1888 handlers. (Deadlocks are detected with timeouts. Use the -t flag to
1889 receive reports of excessive lock contention.) Experimenting with
1890 scripts is therefore generally safe. The guru-mode -g option allows
1891 administrators to bypass most safety measures, which permits invasive
1892 or state-changing operations, embedded-C code, and increases the risk
1893 of upset. By default, overload prevention is turned on for all
1894 modules. If you would like to disable overload processing, use the
1895 --suppress-time-limits option.
1896
1897 Errors that are caught at run time normally result in a clean script
1898 shutdown and a pass-5 error message. The --suppress-handler-errors
1899 option lets scripts tolerate soft errors without shutting down.
1900
1901
1902
1903 PERMISSIONS
1904 For the normal linux-kernel-module runtime, to run the kernel objects
1905 systemtap builds, a user must be one of the following:
1906
1907 · the root user;
1908
1909 · a member of the stapdev and stapusr groups;
1910
1911 · a member of the stapsys and stapusr groups; or
1912
1913 · a member of the stapusr group.
1914
1915 The root user or a user who is a member of both the stapdev and stapusr
1916 groups can build and run any systemtap script.
1917
1918 A user who is a member of both the stapsys and stapusr groups can only
1919 use pre-built modules under the following conditions:
1920
1921 · The module has been signed by a trusted signer. Trusted signers are
1922 normally systemtap compile-servers which sign modules when the
1923 --privilege option is specified by the client. See the
1924 stap-server(8) manual page for more information.
1925
1926 · The module was built using the --privilege=stapsys or the
1927 --privilege=stapusr options.
1928
1929 Members of only the stapusr group can only use pre-built modules under
1930 the following conditions:
1931
1932 · The module is located in the /lib/modules/VERSION/systemtap
1933 directory. This directory must be owned by root and not be world
1934 writable.
1935
1936 or
1937
1938 · The module has been signed by a trusted signer. Trusted signers are
1939 normally systemtap compile-servers which sign modules when the
1940 --privilege option is specified by the client. See the
1941 stap-server(8) manual page for more information.
1942
1943 · The module was built using the --privilege=stapusr option.
1944
1945 The kernel modules generated by stap program are run by the staprun
1946 program. The latter is a part of the Systemtap package, dedicated to
1947 module loading and unloading (but only in the white zone), and kernel-
1948 to-user data transfer. Since staprun does not perform any additional
1949 security checks on the kernel objects it is given, it would be unwise
1950 for a system administrator to add untrusted users to the stapdev or
1951 stapusr groups.
1952
1953
1954 SECUREBOOT
1955 If the current system has SecureBoot turned on in the UEFI firmware,
1956 all kernel modules must be signed. (Some kernels may allow disabling
1957 SecureBoot long after booting with a key sequence such as SysRq-X,
1958 making it unnecessary to sign modules.) The systemtap compile server
1959 can sign modules with a MOK (Machine Owner Key) that it has in common
1960 with a client system. See the following wiki page for more details:
1961
1962 https://sourceware.org/systemtap/wiki/SecureBoot
1963
1964 Some kernels do not let systemtap guess whether module module signing
1965 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
1966 variable to any value while running stap.
1967
1968
1969 RESOURCE LIMITS
1970 Many resource use limits are set by macros in the generated C code.
1971 These may be overridden with -D flags. A selection of these is as fol‐
1972 lows:
1973
1974 MAXNESTING
1975 Maximum number of nested function calls. Default determined by
1976 script analysis, with a bonus 10 slots added for recursive
1977 scripts.
1978
1979 MAXSTRINGLEN
1980 Maximum length of strings, default 128.
1981
1982 MAXTRYLOCK
1983 Maximum number of iterations to wait for locks on global vari‐
1984 ables before declaring possible deadlock and skipping the probe,
1985 default 1000.
1986
1987 MAXACTION
1988 Maximum number of statements to execute during any single probe
1989 hit (with interrupts disabled), default 1000. Note that for
1990 straight-through probe handlers lacking loops or recursion, due
1991 to optimization, this parameter may be interpreted too conserva‐
1992 tively.
1993
1994 MAXACTION_INTERRUPTIBLE
1995 Maximum number of statements to execute during any single probe
1996 hit which is executed with interrupts enabled (such as begin/end
1997 probes), default (MAXACTION * 10).
1998
1999 MAXBACKTRACE
2000 Maximum number of stack frames that will be be processed by the
2001 stap runtime unwinder as produced by the backtrace functions in
2002 the [u]context-unwind.stp tapsets, default 20.
2003
2004 MAXMAPENTRIES
2005 Maximum number of rows in any single global array, default 2048.
2006 Individual arrays may be declared with a larger or smaller limit
2007 instead:
2008
2009 global big[10000],little[5]
2010
2011 or denoted with % to make them wrap-around (replace old entries)
2012 automatically, as in
2013
2014 global big%
2015
2016 or both.
2017
2018 MAPHASHBIAS
2019 The number of powers-of-two to add or subtract from the natural
2020 size of the hash table backing each global associative array.
2021 Default is 0. Try small positive numbers to get extra perfor‐
2022 mance at the cost of more memory consumption, because that
2023 should reduce hash table collisions. Try small negative numbers
2024 for the opposite tradeoff.
2025
2026 MAXERRORS
2027 Maximum number of soft errors before an exit is triggered, de‐
2028 fault 0, which means that the first error will exit the script.
2029 Note that with the --suppress-handler-errors option, this limit
2030 is not enforced.
2031
2032 MAXSKIPPED
2033 Maximum number of skipped probes before an exit is triggered,
2034 default 100. Running systemtap with -t (timing) mode gives more
2035 details about skipped probes. With the default -DINTERRUPT‐
2036 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2037 lated against this limit. Note that with the --suppress-han‐
2038 dler-errors option, this limit is not enforced.
2039
2040 MINSTACKSPACE
2041 Minimum number of free kernel stack bytes required in order to
2042 run a probe handler, default 1024. This number should be large
2043 enough for the probe handler's own needs, plus a safety margin.
2044
2045 MAXUPROBES
2046 Maximum number of concurrently armed user-space probes (up‐
2047 robes), default somewhat larger than the number of user-space
2048 probe points named in the script. This pool needs to be poten‐
2049 tially large because individual uprobe objects (about 64 bytes
2050 each) are allocated for each process for each matching script-
2051 level probe.
2052
2053 STP_MAXMEMORY
2054 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2055 ule should use, default unlimited. The memory size includes the
2056 size of the module itself, plus any additional allocations.
2057 This only tracks direct allocations by the systemtap runtime.
2058 This does not track indirect allocations (as done by kprobes/up‐
2059 robes/etc. internals).
2060
2061 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2062 Maximum number of machine cycles spent in probes on any cpu per
2063 given interval, before an overload condition is declared and the
2064 script shut down. The defaults are 500 million and 1 billion,
2065 so as to limit stap script cpu consumption at around 50%.
2066
2067 STP_PROCFS_BUFSIZE
2068 Size of procfs probe read buffers (in bytes). Defaults to
2069 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2070 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2071
2072 With scripts that contain probes on any interrupt path, it is possible
2073 that those interrupts may occur in the middle of another probe handler.
2074 The probe in the interrupt handler would be skipped in this case to
2075 avoid reentrance. To work around this issue, execute stap with the op‐
2076 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2077 This does add some extra overhead to the probes, but it may prevent
2078 reentrance for common problem cases. However, probes in NMI handlers
2079 and in the callpath of the stap runtime may still be skipped due to
2080 reentrance.
2081
2082
2083 In case something goes wrong with stap or staprun after a probe has al‐
2084 ready started running, one may safely kill both user processes, and re‐
2085 move the active probe kernel module with rmmod. Any pending trace mes‐
2086 sages may be lost.
2087
2088
2090 Systemtap exposes kernel internal data structures and potentially pri‐
2091 vate user information. Because of this, use of systemtap's full capa‐
2092 bilities are restricted to root and to users who are members of the
2093 groups stapdev and stapusr.
2094
2095 However, a restricted set of systemtap's features can be made available
2096 to trusted, unprivileged users. These users are members of the group
2097 stapusr only, or members of the groups stapusr and stapsys. These
2098 users can load systemtap modules which have been compiled and certified
2099 by a trusted systemtap compile-server. See the descriptions of the op‐
2100 tions --privilege and --use-server. See README.unprivileged in the sys‐
2101 temtap source code for information about setting up a trusted compile
2102 server.
2103
2104 The restrictions enforced when --privilege=stapsys is specified are de‐
2105 signed to prevent unprivileged users from:
2106
2107 · harming the system maliciously.
2108
2109 The restrictions enforced when --privilege=stapusr is specified are de‐
2110 signed to prevent unprivileged users from:
2111
2112 · harming the system maliciously.
2113
2114 · gaining access to information which would not normally be
2115 available to an unprivileged user.
2116
2117 · disrupting the performance of processes owned by other users
2118 of the system. Some overhead to the system in general is
2119 unavoidable since the unprivileged user's probes will be
2120 triggered at the appropriate times. What we would like to
2121 avoid is targeted interruption of another user's processes
2122 which would not normally be possible by an unprivileged us‐
2123 er.
2124
2125
2126 PROBE RESTRICTIONS
2127 A member of the groups stapusr and stapsys may use all probe points.
2128
2129 A member of only the group stapusr may use only the following probes:
2130
2131 · begin, begin(n)
2132
2133 · end, end(n)
2134
2135 · error(n)
2136
2137 · never
2138
2139 · process.*, where the target process is owned by the user.
2140
2141 · timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2142
2143 · timer.hz(n)
2144
2145
2146 SCRIPT LANGUAGE RESTRICTIONS
2147 The following scripting language features are unavailable to all un‐
2148 privileged users:
2149
2150
2151 · any feature enabled by the Guru Mode (-g) option.
2152
2153 · embedded C code.
2154
2155
2156 RUNTIME RESTRICTIONS
2157 The following runtime restrictions are placed upon all unprivileged
2158 users:
2159
2160 · Only the default runtime code (see -R) may be used.
2161
2162 Additional restrictions are placed on members of only the group sta‐
2163 pusr:
2164
2165 · Probing of processes owned by other users is not permitted.
2166
2167 · Access of kernel memory (read and write) is not permitted.
2168
2169
2170 COMMAND LINE OPTION RESTRICTIONS
2171 Some command line options provide access to features which must not be
2172 available to all unprivileged users:
2173
2174
2175 · -g may not be specified.
2176
2177 · The following options may not be used by the compile-server
2178 client:
2179
2180 -a, -B, -D, -I, -r, -R
2181
2182
2183
2184 ENVIRONMENT RESTRICTIONS
2185 The following environment variables must not be set for all unprivi‐
2186 leged users:
2187
2188 SYSTEMTAP_RUNTIME
2189 SYSTEMTAP_TAPSET
2190 SYSTEMTAP_DEBUGINFO_PATH
2191
2192
2193
2194 TAPSET RESTRICTIONS
2195 In general, tapset functions are only available for members of the
2196 group stapusr when they do not gather information that an ordinary pro‐
2197 gram running with that user's privileges would be denied access to.
2198
2199 There are two categories of unprivileged tapset functions. The first
2200 category consists of utility functions that are unconditionally avail‐
2201 able to all users; these include such things as:
2202
2203 cpu:long ()
2204 exit ()
2205 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2206
2207
2208 The second category consists of so-called myproc-unprivileged functions
2209 that can only gather information within their own processes. Scripts
2210 that wish to use these functions must test the result of the tapset
2211 function is_myproc and only call these functions if the result is 1.
2212 The script will exit immediately if any of these functions are called
2213 by an unprivileged user within a probe within a process which is not
2214 owned by that user. Examples of myproc-unprivileged functions include:
2215
2216 print_usyms (stk:string)
2217 user_int:long (addr:long)
2218 usymname:string (addr:long)
2219
2220
2221 A compile error is triggered when any function not in either of the
2222 above categories is used by members of only the group stapusr.
2223
2224 No other built-in tapset functions may be used by members of only the
2225 group stapusr.
2226
2227
2229 As described above, systemtap's default runtime mode involves building
2230 and loading kernel modules, with various security tradeoffs presented.
2231 Systemtap now includes two new prototype backends: --runtime=dyninst
2232 and --runtime=bpf.
2233
2234 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2235 runtime. This backend does not use kernel modules, and does not require
2236 root privileges, but is restricted with respect to the kinds of probes
2237 and other constructs that a script may use. dyninst runtime operates in
2238 target-attach mode, so it does requirea -c COMMAND or -x PID process.
2239 For example:
2240
2241 stap --runtime=dyninst -c 'stap -V' \
2242 -e 'probe process.function("main")
2243 { println("hi from dyninst!") }'
2244
2245
2246 It may be necessary to disable a conflicting selinux check with
2247
2248 # setsebool allow_execstack 1
2249
2250
2251 --runtime=bpf compiles the user script into extended Berkeley Packet
2252 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2253 verified by the kernel for safety and are executed by an in-kernel vir‐
2254 tual machine. This runtime is in an early stage of development and
2255 currently lacks support for a number of features available in the de‐
2256 fault runtime. Please see the stapbpf(8) man page for more information.
2257
2258
2260 The systemtap translator generally returns with a success code of 0 if
2261 the requested script was processed and executed successfully through
2262 the requested pass. Otherwise, errors may be printed to stderr and a
2263 failure code is returned. Use -v or -vp N to increase (global or per-
2264 pass) verbosity to identify the source of the trouble.
2265
2266 In listings mode (-l and -L), error messages are normally suppressed.
2267 A success code of 0 is returned if at least one matching probe was
2268 found.
2269
2270 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2271 considered to be successful.
2272
2273
2275 Over time, some features of the script language and the tapset library
2276 may undergo incompatible changes, so that a script written against an
2277 old version of systemtap may no longer run. In these cases, it may
2278 help to run systemtap with the --compatible VERSION flag, specifying
2279 the last known working version. Running systemtap with the
2280 --check-version flag will output a warning if any possible incompatible
2281 elements have been parsed. Deprecation historical details may be found
2282 in the NEWS file.
2283
2284 The purpose of deprecation facility is to improve the experience of
2285 scripts written for newer versions of systemtap (by adding better al‐
2286 ternatives and removing conflicting or messy older alternatives), while
2287 at the same time permitting scripts written for older versions of sys‐
2288 temtap to continue running. Deprecation is thus intended a service to
2289 users (and an inconvenience to systemtap's developers), rather than the
2290 other way around.
2291
2292 Please note that underscore-prefixed identifiers in the tapset some‐
2293 times undergo such changes that are difficult to preserve compatibility
2294 for, even with the deprecation mechanisms. Avoid relying on these in
2295 your scripts; instead propose them for promotion to non-underscored
2296 status.
2297
2298
2299
2301 Important files and their corresponding paths can be located in the
2302 stappaths (7) manual page.
2303
2304
2306 stapprobes(3stap),
2307 function::*[24m(3stap),
2308 probe::*[24m(3stap),
2309 tapset::*[24m(3stap),
2310 stappaths(7),
2311 staprun(8),
2312 stapdyn(8),
2313 systemtap(8),
2314 stapvars(3stap),
2315 stapex(3stap),
2316 stap-server(8),
2317 stap-prep(1),
2318 stapref(1),
2319 awk(1),
2320 gdb(1)
2321
2322
2324 Use the Bugzilla link of the project web page or our mailing list.
2325 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2326
2327 error::reporting(7stap),
2328 https://sourceware.org/systemtap/wiki/HowToReportBugs
2329
2330
2331
2332 STAP(1)