1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a Dyninst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use.
46 Some systemtap diagnostic warning/error messages specially suggest
47 reading a man page by including a string like [man error::pass5]. For
48 online documentation, see the project web site,
49 https://sourceware.org/systemtap/
50
51
52 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
53 │man pages │ │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stap (this page) │ language syntax, concepts, operation, options │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │error::* │ further explanation of error conditions │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │warning::* │ further explanation of warning conditions │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stapprobes │ probe points and their $context variables │
62 ├──────────────────────────┼──────────────────────────────────────────────────────┤
63 │stapref │ quick reference to language syntax │
64 ├──────────────────────────┼──────────────────────────────────────────────────────┤
65 │stappaths │ list of directories, including books & references │
66 ├──────────────────────────┼──────────────────────────────────────────────────────┤
67 │stap-prep │ program to install auxiliary dependencies like ker‐ │
68 │ │ nel debuginfo │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │tapset::* │ generated list of tapsets │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │probe::* │ generated list of tapset probe aliases │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │function::* │ generated list of tapset functions │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │macro::* │ generated list of tapset macros │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stapvars │ some of the tapset global variables │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │systemtap │ initscript, boot-time probing │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │stap-server │ compilation server │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │stapex │ a few very basic script examples │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │books │ │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Beginner's Guide │ tutorial book, language essentials, examples │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │Tutorial │ shorter tutorial, exercises │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │Language Reference │ detailed language manual, covers statistics/analysis │
95 ├──────────────────────────┼──────────────────────────────────────────────────────┤
96 │Tapset Reference │ the tapset man pages, reformatted into a book │
97 ├──────────────────────────┼──────────────────────────────────────────────────────┤
98 │references │ │
99 ├──────────────────────────┼──────────────────────────────────────────────────────┤
100 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
101 │ │ hacks to learn from │
102 └──────────────────────────┴──────────────────────────────────────────────────────┘
103
105 The systemtap translator supports the following options. Any other op‐
106 tion prints a list of supported options. Options may be given on the
107 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
108 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
109 faults to $HOME/.systemtap if unset.)
110
111
112 In some cases, the default value of an option depends on particular
113 system configuration and thus can't be mentioned here directly. In
114 some of those cases running "stap --help" might display the default.
115
116
117 - Use standard input instead of a given FILENAME as probe language
118 input, unless -e SCRIPT is given.
119
120 -h --help
121 Show help message.
122
123 -V --version
124 Show version message.
125
126 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
127 rate, translate, compile, run. See the PROCESSING section for
128 details.
129
130 -v Increase verbosity for all passes. Produce a larger volume of
131 informative (?) output each time option repeated.
132
133 --vp ABCDE
134 Increase verbosity on a per-pass basis. For example, "--vp 002"
135 adds 2 units of verbosity to pass 3 only. The combination
136 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
137 more for pass 5.
138
139 -k Keep the temporary directory after all processing. This may be
140 useful in order to examine the generated C code, or to reuse the
141 compiled kernel object.
142
143 -g Guru mode. Enable parsing of unsafe expert-level constructs
144 like embedded C.
145
146 -P Prologue-searching mode. This is equivalent to --pro‐
147 logue-searching=always. Activate heuristics to work around in‐
148 correct debugging information for function parameter $context
149 variables.
150
151 -u Unoptimized mode. Disable unused code elision and many other
152 optimizations during elaboration / translation.
153
154 -w Suppressed warnings mode. Disables all warning messages.
155
156 -W Treat all warnings as errors.
157
158 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
159 Use the stap-merge program to multiplex them back together lat‐
160 er.
161
162 -i --interactive
163 Interactive mode. Enable an interface to build the systemtap
164 script incrementally and interactively.
165
166 -t Collect timing information on the number of times probe executes
167 and average amount of time spent in each probe-point. Also shows
168 the derivation for each probe-point.
169
170 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer per
171 processor. The default is 16MB, or less on smaller memory ma‐
172 chines.
173
174 -I DIR Add the given directory to the tapset search directory. See the
175 description of pass 2 for details.
176
177 -D NAME=VALUE
178 Add the given C preprocessor directive to the module Makefile.
179 These can be used to override limit parameters described below.
180
181 -B NAME=VALUE
182 In kernel-runtime mode, add the given make directive to the ker‐
183 nel module build's make invocation. These can be used to add or
184 override kconfig options. For example, use
185
186 -B CONFIG_DEBUG_INFO=y
187
188 to add debugging information.
189
190 -B FLAG
191 In dyninst-runtime mode, add the given parameter to the compiler
192 CFLAGS used for building the dyninst shared library. For exam‐
193 ple, use
194
195 -B -g
196
197 to add debugging information.
198
199 -a ARCH
200 Use a cross-compilation mode for the given target architecture.
201 This requires access to the cross-compiler and the kernel build
202 tree, and goes along with the
203
204 -B CROSS_COMPILE=arch-tool-prefix-
205 and
206 -r /build/tree
207
208 options.
209
210 --modinfo NAME=VALUE
211 Add the name/value pair as a MODULE_INFO macro call to the gen‐
212 erated module. This may be useful to inform or override various
213 module-related checks in the kernel.
214
215 -G NAME=VALUE
216 Sets the value of global variable NAME to VALUE when staprun is
217 invoked. This applies to scalar variables declared global in
218 the script/tapset.
219
220 -R DIR Look for the systemtap runtime sources in the given directory.
221 Your DIR default can be seen using "stap --help".
222
223 -r /DIR
224 Build for kernel in given build tree. Can also be set with the
225 SYSTEMTAP_RELEASE environment variable.
226
227 -r RELEASE
228 Build for kernel in build tree /lib/modules/RELEASE/build. Can
229 also be set with the SYSTEMTAP_RELEASE environment variable.
230
231 -m MODULE
232 Use the given name for the generated kernel object module, in‐
233 stead of a unique randomized name. The generated kernel object
234 module is copied to the current directory.
235
236 -d MODULE
237 Add symbol/unwind information for the given module into the ker‐
238 nel object module. This may enable symbolic tracebacks from
239 those modules/programs, even if they do not have an explicit
240 probe placed into them.
241
242 --ldd Add symbol/unwind information for all user-space shared li‐
243 braries suspected by ldd to be necessary for user-space binaries
244 being probed or listed with the -d option. Caution: this can
245 make the probe modules considerably larger. Note that this op‐
246 tion does not deal with kernel-space modules: see instead
247 --all-modules below.
248
249 --all-modules
250 Equivalent to specifying "-dkernel" and a "-d" for each kernel
251 module that is currently loaded. Caution: this can make the
252 probe modules considerably larger.
253
254 -o FILE
255 Send standard output to named file. In bulk mode, percpu files
256 will start with FILE_ (FILE_cpu with -F) followed by the cpu
257 number. This supports strftime(3) formats for FILE.
258
259 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
260 so has the effect of setting target() to the pid of that
261 process. Note that many probe types trigger independently of
262 this setting. Consider including something like this to focus
263 your script.
264
265 probe FOO { if (pid() != target()) next; .... }
266
267
268 -x PID Sets target() to PID. The script runs independently of the
269 PID's lifespan.
270
271 -e SCRIPT
272 Run the given SCRIPT specified on the command line.
273
274 -E SCRIPT
275 Run the given SCRIPT specified. This SCRIPT is run in addition
276 to the main script specified, through -e, or as a script file.
277 This option can be repeated to run multiple scripts, and can be
278 used in listing mode (-l/-L).
279
280 -l PROBE
281 Instead of running a probe script, just list all available probe
282 points matching the given single probe point. The pattern may
283 include wildcards and aliases, but not comma-separated multiple
284 probe points. The process result code will indicate failure if
285 there are no matches.
286
287 % stap -e 'probe syscall.* { }'
288 [...]
289 % stap -l 'syscall.*'
290 syscall.accept
291 [...]
292 syscall.writev
293
294
295 -L PROBE
296 Similar to "-l", but list matching probe points plus their
297 available context variables. When -v is set with -L, the output
298 includes duplicate probe points which are distinguished by their
299 PC address.
300
301 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
302 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
303 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
304 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
305 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
306 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
307 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
308 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
309
310
311 -F Without -o option, load module and start probes, then detach
312 from the module leaving the probes running. With -o option, run
313 staprun in background as a daemon and show its pid.
314
315 -S size[,N]
316 Sets the maximum size of output file and the maximum number of
317 output files. If the size of output file will exceed size ,
318 systemtap switches output file to the next file. And if the num‐
319 ber of output files exceed N , systemtap removes the oldest out‐
320 put file. You can omit the second argument.
321
322 -T TIMEOUT
323 Exit the script after TIMEOUT seconds.
324
325 --skip-badvars
326 Ignore unresolvable or run-time-inaccessible context variables
327 and substitute with 0, without errors.
328
329
330 --prologue-searching[=WHEN]
331 Prologue-searching mode. Activate heuristics to work around in‐
332 correct debugging information for function parameter $context
333 variables. WHEN can be either "never", "always", or "auto" (i.e.
334 enabled by heuristic). If WHEN is missing, then "always" is as‐
335 sumed. If the option is missing, then "auto" is assumed.
336
337
338 --suppress-handler-errors
339 Wrap all probe handlers into something like this
340
341 try { ... } catch { next }
342
343 block, which causes any runtime errors to be quietly suppressed.
344 Suppressed errors do not count against MAXERRORS limits. In
345 this mode, the MAXSKIPPED limits are also suppressed, so that
346 many errors and skipped probes may be accumulated during a
347 script's runtime. Any overall counts will still be reported at
348 shutdown.
349
350
351 --compatible VERSION
352 Suppress recent script language or tapset changes which are in‐
353 compatible with given older version of systemtap. This may be
354 useful if a much older systemtap script fails to run. See the
355 DEPRECATION section for more details.
356
357
358 --check-version
359 This option is used to check if the active script has any con‐
360 structs that may be systemtap version specific. See the DEPRE‐
361 CATION section for more details.
362
363
364 --clean-cache
365 This option prunes stale entries from the cache directory. This
366 is normally done automatically after successful runs, but this
367 option will trigger the cleanup manually and then exit. See the
368 CACHING section for more details about cache limits.
369
370
371 --color[=WHEN], --colour[=WHEN]
372 This option controls coloring of error messages. WHEN can be ei‐
373 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
374 minal). If WHEN is missing, then "always" is assumed. If the op‐
375 tion is missing, then "auto" is assumed.
376
377 Colors can be modified using the SYSTEMTAP_COLORS environment
378 variable. The format must be of the form
379 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
380 "warning", "source", "caret", and "token". Values constitute
381 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
382 mentation of your terminal for the SGRs it supports. As an exam‐
383 ple, the default colors would be expressed as
384 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
385 SYSTEMTAP_COLORS is absent, the default colors will be used. If
386 it is empty or invalid, coloring is turned off.
387
388
389 --disable-cache
390 This option disables all use of the cache directory. No files
391 will be either read from or written to the cache.
392
393
394 --poison-cache
395 This option treats files in the cache directory as invalid. No
396 files will be read from the cache, but resulting files from this
397 run will still be written to the cache. This is meant as a
398 troubleshooting aid when stap's cached behavior seems to be mis‐
399 behaving. If it helped, there is a probably a bug in systemtap
400 that the developers would like you to report.
401
402
403 --privilege[=stapusr | =stapsys | =stapdev]
404 This option instructs stap to examine the script looking for
405 constructs which are not allowed for the specified privilege
406 level (see UNPRIVILEGED USERS). Compilation fails if any such
407 constructs are used. If stapusr or stapsys are specified when
408 using a compile server (see --use-server), the server will exam‐
409 ine the script and, if compilation succeeds, the server will
410 cryptographically sign the resulting kernel module, certifying
411 that is it safe for use by users at the specified privilege lev‐
412 el.
413
414 If --privilege has not been specified, -pN has not been speci‐
415 fied with N < 5, and the invoking user is not root, and is not a
416 member of the group stapdev, then stap will automatically add
417 the appropriate --privilege option to the options already speci‐
418 fied.
419
420
421 --unprivileged
422 This option is equivalent to --privilege=stapusr.
423
424
425 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
426 Specify compile-server(s) to be used for compilation and/or in
427 conjunction with --list-servers and --trust-servers (see below)
428 for listing. If no argument is supplied, then the default in un‐
429 privileged mode (see --privilege) is to select compatible
430 servers which are trusted as SSL peers and as module signers and
431 currently online. Otherwise the default is to select compatible
432 servers which are trusted as SSL peers and currently online.
433 --use-server may be specified more than once, in which case a
434 list of servers is accumulated in the order specified. Servers
435 may be specified by host name, ip address, or by certificate se‐
436 rial number (obtained using --list-servers). The latter is most
437 commonly used when adding or revoking trust in a server (see
438 --trust-servers below). If a server is specified by host name or
439 ip address, then an optional port number may be specified. This
440 is useful for accessing servers which are not on the local net‐
441 work or to specify a particular server.
442
443 IP addresses may be IPv4 or IPv6 addresses.
444
445 If a particular IPv6 address is link local and exists on more
446 than one interface, the intended interface may be specified by
447 appending the address with a percent sign (%) followed by the
448 intended interface name. For example,
449 "fe80::5eff:35ff:fe07:55ca%eth0".
450
451 In order to specify a port number with an IPv6 address, it is
452 necessary to enclose the IPv6 address in square brackets ([]) in
453 order to separate the port number from the rest of the address.
454 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
455 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
456
457 If --use-server has not been specified, -pN has not been speci‐
458 fied with N < 5, and the invoking user not root, is not a member
459 of the group stapdev, but is a member of the group stapusr, then
460 stap will automatically add --use-server to the options already
461 specified.
462
463
464 --use-server-on-error[=yes|=no]
465 Instructs stap to retry compilation of a script using a compile
466 server if compilation on the local host fails in a manner which
467 suggests that it might succeed using a server. If this option
468 is not specified, the default is no. If no argument is provid‐
469 ed, then the default is yes. Compilation will be retried for
470 certain types of errors (e.g. insufficient data or resources)
471 which may not occur during re-compilation by a compile server.
472 Compile servers will be selected automatically for the re-compi‐
473 lation attempt as if --use-server was specified with no argu‐
474 ments.
475
476
477 --list-servers[=SERVERS]
478 Display the status of the requested SERVERS, where SERVERS is a
479 comma-separated list of server attributes. The list of at‐
480 tributes is combined to filter the list of servers displayed.
481 Supported attributes are:
482
483 all specifies all known servers (trusted SSL peers, trusted
484 module signers, online servers).
485
486 specified
487 specifies servers specified using --use-server.
488
489 online filters the output by retaining information about servers
490 which are currently online.
491
492 trusted
493 filters the output by retaining information about servers
494 which are trusted as SSL peers.
495
496 signer filters the output by retaining information about servers
497 which are trusted as module signers (see --privilege).
498
499 compatible
500 filters the output by retaining information about servers
501 which are compatible with the current kernel release and
502 architecture.
503
504 If no argument is provided, then the default is specified. If
505 no servers were specified using --use-server, then the default
506 servers for --use-server are listed.
507
508 Note that --list-servers uses the avahi-daemon service to detect
509 online servers. If this service is not available, then
510 --list-servers will fail to detect any online servers. In order
511 for --list-servers to detect servers listening on IPv6 address‐
512 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
513 mon.conf must contain an active "use-ipv6=yes" line. The service
514 must be restarted after adding this line in order for IPv6 to be
515 enabled.
516
517
518 --trust-servers[=TRUST_SPEC]
519 Grant or revoke trust in compile-servers, specified using
520 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
521 comma-separated list specifying the trust which is to be granted
522 or revoked. Supported elements are:
523
524 ssl trust the specified servers as SSL peers.
525
526 signer trust the specified servers as module signers (see
527 --privilege). Only root can specify signer.
528
529 all-users
530 grant trust as an ssl peer for all users on the local
531 host. The default is to grant trust as an ssl peer for
532 the current user only. Trust as a module signer is always
533 granted for all users. Only root can specify all-users.
534
535 revoke revoke the specified trust. The default is to grant it.
536
537 no-prompt
538 do not prompt the user for confirmation before carrying
539 out the requested action. The default is to prompt the
540 user for confirmation.
541
542 If no argument is provided, then the default is ssl. If no
543 servers were specified using --use-server, then no trust will be
544 granted or revoked.
545
546 Unless no-prompt has been specified, the user will be prompted
547 to confirm the trust to be granted or revoked before the opera‐
548 tion is performed.
549
550
551 --sign-module
552 Sign the module with a MOK (Machine Owner Key) on UEFI/Secure‐
553 Boot systems. See the SECUREBOOT section for more details.
554
555
556 --dump-probe-types
557 Dumps a list of supported probe types and exits. If --privi‐
558 lege=stapusr is also specified, the list will be limited to
559 probe types available to unprivileged users.
560
561
562 --dump-probe-aliases
563 Dumps a list of all probe aliases found in library files and ex‐
564 its.
565
566
567 --dump-functions
568 Dumps a list of all the public functions found in library files
569 and exits. Also includes their parameters and types. A function
570 of type 'unknown' indicates a function that does not return a
571 value. Note that not all function/parameter types may be re‐
572 solved (these are also shown by 'unknown'). This features is
573 very memory-intensive and thus may not work properly with --use-
574 server if the target server imposes an rlimit on process memory
575 (i.e. through the ~stap-server/.systemtap/rc configuration file,
576 see stap-server(8)).
577
578
579 --remote URL
580 Set the execution target to the given host. This option may be
581 repeated to target multiple execution targets. Passes 1-4 are
582 completed locally as normal to build the script, and then pass 5
583 will copy the module to the target and run it. Acceptable URL
584 forms include:
585
586 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
587 This mode uses ssh, optionally using a username not
588 matching your own. If a custom ssh_config file is in use,
589 add SendEnv LANG to retain internationalization function‐
590 ality.
591
592 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
593 This mode uses stapvirt to execute the script on a domain
594 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
595 fied to connect to a specific driver and/or a remote
596 host. For example, to connect to the local privileged QE‐
597 MU driver, use:
598
599 --remote libvirt://MyDomain/qemu:///system
600
601 See the page at <http://libvirt.org/uri.html> for sup‐
602 ported URIs. Also see stapvirt(1) for more information on
603 how to prepare the domain for stap probing.
604
605 unix:PATH
606 This mode connects to a UNIX socket. This can be used
607 with a QEMU virtio-serial port for executing scripts in‐
608 side a running virtual machine.
609
610 direct://
611 Special loopback mode to run on the local host.
612
613 --remote-prefix
614 Prefix each line of remote output with "N: ", where N is the in‐
615 dex of the remote execution target from which the given line
616 originated.
617
618
619 --download-debuginfo[=OPTION]
620 Enable, disable or set a timeout for the automatic debuginfo
621 downloading feature offered by abrt as specified by OPTION,
622 where OPTION is one of the following:
623
624 yes enable automatic downloading of debuginfo with no time‐
625 out. This is the same as not providing an OPTION value to
626 --download-debuginfo
627
628 no explicitly disable automatic downloading of debuginfo.
629 This is the same as not using the option at all.
630
631 ask show abrt output, and ask before continuing download. No
632 timeout will be set.
633
634 <timeout>
635 specify a timeout as a positive number to stop the down‐
636 load if it is taking longer than <timeout> seconds.
637
638 --rlimit-as=NUM
639 Specify the maximum size of the process's virtual memory (ad‐
640 dress space), in bytes.
641
642
643 --rlimit-cpu=NUM
644 Specify the CPU time limit, in seconds.
645
646
647 --rlimit-nproc=NUM
648 Specify the maximum number of processes that can be created.
649
650
651 --rlimit-stack=NUM
652 Specify the maximum size of the process stack, in bytes.
653
654
655 --rlimit-fsize=NUM
656 Specify the maximum size of files that the process may create,
657 in bytes.
658
659
660 --sysroot=DIR
661 Specify sysroot directory where target files (executables, li‐
662 braries, etc.) are located. With -r RELEASE, the sysroot will
663 be searched for the appropriate kernel build directory. With -r
664 /DIR, however, the sysroot will not be used to find the kernel
665 build.
666
667
668 --sysenv=VAR=VALUE
669 Provide an alternate value for an environment variable where the
670 value on a remote system differs. Path variables (e.g. PATH,
671 LD_LIBRARY_PATH) are assumed to be relative to the directory
672 provided by --sysroot, if provided.
673
674
675 --suppress-time-limits
676 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
677 and -DMAXTRYLOCK. This option requires guru mode.
678
679
680 --runtime=MODE
681 Set the pass-5 runtime mode. Valid options are kernel (de‐
682 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
683 information.
684
685
686 --dyninst
687 Shorthand for --runtime=dyninst.
688
689
690 --bpf Shorthand for --runtime=bpf.
691
692
693 --save-uprobes
694 On machines that require SystemTap to build its own uprobes mod‐
695 ule (kernels prior to version 3.5), this option instructs Sys‐
696 temTap to also save a copy of the module in the current directo‐
697 ry (creating a new "uprobes" directory first).
698
699
700 --target-namespaces=PID
701 Allow for a set of target namespaces to be set based on the
702 namespaces the given PID is in. This is for namespace-aware
703 tapset functions. If the target namespaces was not set, the tar‐
704 get defaults to the stap process' namespaces.
705
706
707 --monitor=INTERVAL
708 Enables an interface to display status information about the
709 module(uptime, module name, invoker uid, memory sizes, global
710 variables, list of probes with their statistics). An optional
711 argument INTERVAL can be supplied to set the refresh rate in
712 seconds of the status window. The module can also be controlled
713 by a list of commands using the following keys:
714
715 c Resets all global variables to their initial values or
716 zeroes them if they did not have an initial value.
717
718 s Rotates the attribute used to sort the list of probes.
719
720 t Brings up a prompt to allow toggling(on/off) of probes by
721 index. Probe points are still affected by their condi‐
722 tions.
723
724 r Resumes the script by toggling on all probes.
725
726 p Pauses the script by toggling off all probes.
727
728 x Hides/shows the status window. This allows for more out‐
729 put to be seen.
730
731 navigation-keys
732 The navigation keys can be used to scroll up and down the
733 windows.
734
735 Tab Toggle scrolling between status and output windows.
736
737
738 --example
739 This option is used to run example scripts without having to en‐
740 ter the entire path to the script. Example scripts can be found
741 in the directory specified in the stappaths(7) manual page.
742
743
744 --no-global-var-display
745 This option is used to disable the automatic logging of unused
746 global variables at the end of a stap session.
747
748
750 Any additional arguments on the command line are passed to the script
751 parser for substitution. See below.
752
753
755 The systemtap script language resembles awk and C. There are two main
756 outermost constructs: probes and functions. Within these, statements
757 and expressions use C-like operator syntax and precedence.
758
759
760 GENERAL SYNTAX
761 Whitespace is ignored. Three forms of comments are supported:
762 # ... shell style, to the end of line, except for $# and @#
763 // ... C++ style, to the end of line
764 /* ... C style ... */
765 Literals are either strings enclosed in double-quotes (passing through
766 the usual C escape codes with backslashes, and with adjacent string
767 literals glued together, also as in C), or integers (in decimal, hexa‐
768 decimal, or octal, using the same notation as in C). All strings are
769 limited in length to some reasonable value (a few hundred bytes). In‐
770 tegers are 64-bit signed quantities, although the parser also accepts
771 (and wraps around) values above positive 2**63.
772
773 In addition, script arguments given at the end of the command line may
774 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
775 insertion as a string literal. The number of arguments may be accessed
776 through $# (as an unquoted number) or through @# (as a quoted number).
777 These may be used at any place a token may begin, including within the
778 preprocessing stage. Reference to an argument number beyond what was
779 actually given is an error.
780
781
782 PREPROCESSING
783 A simple conditional preprocessing stage is run as a part of parsing.
784 The general form is similar to the cond ? exp1 : exp2 ternary operator:
785
786 %( CONDITION %? TRUE-TOKENS %)
787 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
788
789 The CONDITION is either an expression whose format is determined by its
790 first keyword, or a string literals comparison or a numeric literals
791 comparison. It can be also composed of many alternatives and conjunc‐
792 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
793 spectively. However, parentheses are not supported yet, so remembering
794 that conjunction takes precedence over alternative is important.
795
796 If the first part is the identifier kernel_vr or kernel_v to refer to
797 the kernel version number, with ("2.6.13-1.322FC3smp") or without
798 ("2.6.13") the release code suffix, then the second part is one of the
799 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
800 the third part is a string literal that contains an RPM-style version-
801 release value. The condition is deemed satisfied if the version of the
802 target kernel (as optionally overridden by the -r option) compares to
803 the given version string. The comparison is performed by the glibc
804 function strverscmp. As a special case, if the operator is for simple
805 equality (==), or inequality (!=), and the third part contains any
806 wildcard characters (* or ? or [), then the expression is treated as a
807 wildcard (mis)match as evaluated by fnmatch.
808
809 If, on the other hand, the first part is the identifier arch to refer
810 to the processor architecture (as named by the kernel build system
811 ARCH/SUBARCH), then the second part is one of the two string comparison
812 operators == or !=, and the third part is a string literal for matching
813 it. This comparison is a wildcard (mis)match.
814
815 Similarly, if the first part is an identifier like CONFIG_something to
816 refer to a kernel configuration option, then the second part is == or
817 !=, and the third part is a string literal for matching the value (com‐
818 monly "y" or "m"). Nonexistent or unset kernel configuration options
819 are represented by the empty string. This comparison is also a wild‐
820 card (mis)match.
821
822 If the first part is the identifier systemtap_v, the test refers to the
823 systemtap compatibility version, which may be overridden for old
824 scripts with the --compatible flag. The comparison operator is as is
825 for kernel_v and the right operand is a version string. See also the
826 DEPRECATION section below.
827
828 If the first part is the identifier systemtap_privilege, the test
829 refers to the privilege level that the systemtap script is compiled
830 with. Here the second part is == or !=, and the third part is a string
831 literal, either "stapusr" or "stapsys" or "stapdev".
832
833 If the first part is the identifier guru_mode, the test refers to if
834 the systemtap script is compiled with guru_mode. Here the second part
835 is == or !=, and the third part is a number, either 1 or 0.
836
837 If the first part is the identifier runtime, the test refers to the
838 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
839 tion on runtimes. The second part is one of the two string comparison
840 operators == or !=, and the third part is a string literal for matching
841 it. This comparison is a wildcard (mis)match.
842
843 Otherwise, the CONDITION is expected to be a comparison between two
844 string literals or two numeric literals. In this case, the arguments
845 are the only variables usable.
846
847 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
848 (possibly including nested preprocessor conditionals), and are passed
849 into the input stream if the condition is true or false. For example,
850 the following code induces a parse error unless the target kernel ver‐
851 sion is newer than 2.6.5:
852
853 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
854
855 The following code might adapt to hypothetical kernel version drift:
856
857 probe kernel.function (
858 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
859 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
860 UNSUPPORTED %) %)
861 ) { /* ... */ }
862
863 %( arch == "ia64" %?
864 probe syscall.vliw = kernel.function("vliw_widget") {}
865 %)
866
867
868
869 PREPROCESSOR MACROS
870 The preprocessor also supports a simple macro facility, run as a sepa‐
871 rate pass before conditional preprocessing.
872
873 Macros are defined using the following construct:
874
875 @define NAME %( BODY %)
876 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
877
878 Macros, and parameters inside a macro body, are both invoked by prefix‐
879 ing the macro name with an @ symbol:
880
881 @define foo %( x %)
882 @define add(a,b) %( ((@a)+(@b)) %)
883
884 @foo = @add(2,2)
885
886
887 Macro expansion is currently performed in a separate pass before condi‐
888 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
889 tional expressions will be macroexpanded regardless of how the condi‐
890 tion is evaluated. This can sometimes lead to errors:
891
892 // The following results in a conflict:
893 %( CONFIG_UTRACE == "y" %?
894 @define foo %( process.syscall %)
895 %:
896 @define foo %( **ERROR** %)
897 %)
898
899 // The following works properly as expected:
900 @define foo %(
901 %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
902 %)
903
904 The first example is incorrect because both @defines are evaluated in a
905 pass prior to the conditional being evaluated.
906
907 Normally, a macro definition is local to the file it occurs in. Thus,
908 defining a macro in a tapset does not make it available to the user of
909 the tapset. Publically available library macros can be defined by in‐
910 cluding .stpm files on the tapset search path. These files may only
911 contain @define constructs, which become visible across all tapsets and
912 user scripts. Optionally, within the .stpm files, a public macro defi‐
913 nition can be surrounded by a preprocessor conditional as described
914 above.
915
916
917 CONSTANTS
918 Tapsets or guru-mode user scripts can access header file constant to‐
919 kens, typically macros, using built-in @const() operator. The respec‐
920 tive header file inclusion is possible either via the tapset library,
921 or using a top-level guru mode embedded-C construct. This results in
922 appropriate embedded C pragma comments setting.
923
924 @const("STP_SKIP_BADVARS")
925
926
927
928 VARIABLES
929 Identifiers for variables and functions are an alphanumeric sequence,
930 and may include _ and $ characters. They may not start with a plain
931 digit, as in C. Each variable is by default local to the probe or
932 function statement block within which it is mentioned, and therefore
933 its scope and lifetime is limited to a particular probe or function in‐
934 vocation.
935
936 Scalar variables are implicitly typed as either string or integer. As‐
937 sociative arrays also have a string or integer value, and a tuple of
938 strings and/or integers serving as a key. Here are a few basic expres‐
939 sions.
940
941 var1 = 5
942 var2 = "bar"
943 array1 [pid()] = "name" # single numeric key
944 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
945 if (["hello",5,4] in array2) println ("yes") # membership test
946
947
948 The translator performs type inference on all identifiers, including
949 array indexes and function parameters. Inconsistent type-related use
950 of identifiers signals an error.
951
952 Variables may be declared global, so that they are shared amongst all
953 probes and functions and live as long as the entire systemtap session.
954 There is one namespace for all global variables, regardless of which
955 script file they are found within. Concurrent access to global vari‐
956 ables is automatically protected with locks, see the SAFETY AND SECURI‐
957 TY section for more details. A global declaration may be written at
958 the outermost level anywhere, not within a block of code. Global vari‐
959 ables which are written but never read will be displayed automatically
960 at session shutdown. The translator will infer for each its value
961 type, and if it is used as an array, its key types. Optionally, scalar
962 globals may be initialized with a string or number literal. The fol‐
963 lowing declaration marks variables as global.
964
965 global var1, var2, var3=4
966
967
968 Global variables can also be set as module options. One can do this by
969 either using the -G option, or the module must first be compiled using
970 stap -p4. Global variables can then be set on the command line when
971 calling staprun on the module generated by stap -p4. See staprun(8) for
972 more information.
973
974 The scope of a global variable may be limited to a tapset or user
975 script file using private keyword. The global keyword is optional when
976 defining a private global variable. Following declaration marks var1
977 and var2 private globals.
978
979 private global var1=2
980 private var2
981
982
983 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
984 SAFETY AND SECURITY section for details. Optionally, global arrays may
985 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
986 for that array only. Note that this doesn't indicate the type of keys
987 for the array, just the size.
988
989 global tiny_array[10], normal_array, big_array[50000]
990
991
992 Arrays may be configured for wrapping using the '%' suffix. This caus‐
993 es older elements to be overwritten if more elements are inserted than
994 the array can hold. This works for both associative and statistics
995 typed arrays.
996
997 global wrapped_array1%[10], wrapped_array2%
998
999
1000
1001 Many types of probe points provide context variables, which are run-
1002 time values, safely extracted from the kernel or userspace program be‐
1003 ing probed. These are prefixed with the $ character. The CONTEXT
1004 VARIABLES section in stapprobes(3stap) lists what is available for each
1005 type of probe point. These context variables become normal string or
1006 numeric scalars once they are stored in normal script variables. See
1007 the TYPECASTING section below on how to to turn them back into typed
1008 pointers for further processing as context variables. There is some
1009 automation to help!
1010
1011
1012 STATEMENTS
1013 Statements enable procedural control flow. They may occur within func‐
1014 tions and probe handlers. The total number of statements executed in
1015 response to any single probe event is limited to some number defined by
1016 the MAXACTION macro in the translated C code, and is in the neighbour‐
1017 hood of 1000.
1018
1019 EXP Execute the string- or integer-valued expression and throw away
1020 the value.
1021
1022 { STMT1 STMT2 ... }
1023 Execute each statement in sequence in this block. Note that
1024 separators or terminators are generally not necessary between
1025 statements.
1026
1027 ; Null statement, do nothing. It is useful as an optional separa‐
1028 tor between statements to improve syntax-error detection and to
1029 handle certain grammar ambiguities.
1030
1031 if (EXP) STMT1 [ else STMT2 ]
1032 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1033 ro) or second STMT (zero).
1034
1035 while (EXP) STMT
1036 While integer-valued EXP evaluates to non-zero, execute STMT.
1037
1038 for (EXP1; EXP2; EXP3) STMT
1039 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1040 STMT, then the iteration expression EXP3.
1041
1042 foreach (VAR in ARRAY [ limit EXP ]) STMT
1043 Loop over each element of the named global array, assigning cur‐
1044 rent key to VAR. The array may not be modified within the
1045 statement. By adding a single + or - operator after the VAR or
1046 the ARRAY identifier, the iteration will proceed in a sorted or‐
1047 der, by ascending or descending index or value. If the array
1048 contains statistics aggregates, adding the desired @operator be‐
1049 tween the ARRAY identifier and the + or - will specify the sort‐
1050 ing aggregate function. See the STATISTICS section below for
1051 the ones available. Default is @count. Using the optional lim‐
1052 it keyword limits the number of loop iterations to EXP times.
1053 EXP is evaluated once at the beginning of the loop.
1054
1055 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1056 Same as above, used when the array is indexed with a tuple of
1057 keys. A sorting suffix may be used on at most one VAR or ARRAY
1058 identifier.
1059
1060 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1061 ]) STMT
1062 Same as above, where iterations are limited to elements in the
1063 array where the keys match the index values specified. The sym‐
1064 bol * can be used to specify an index and will be treated as a
1065 wildcard.
1066
1067 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1068 This variant of foreach saves current value into VAR0 on each
1069 iteration, so it is the same as ARRAY[VAR]. This also works
1070 with a tuple of keys. Sorting suffixes on VAR0 have the same
1071 effect as on ARRAY.
1072
1073 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1074 Same as above, where iterations are limited to elements in the
1075 array where the keys match the index values specified. The sym‐
1076 bol * can be used to specify an index and will be treated as a
1077 wildcard.
1078
1079 break, continue
1080 Exit or iterate the innermost nesting loop (while or for or
1081 foreach) statement.
1082
1083 return EXP
1084 Return EXP value from enclosing function. If the function's
1085 value is not taken anywhere, then a return statement is not
1086 needed, and the function will have a special "unknown" type with
1087 no return value.
1088
1089 next Return now from enclosing probe handler. This is especially
1090 useful in probe aliases that apply event filtering predicates.
1091 When used in functions, the execution will be immediately trans‐
1092 ferred to the next overloaded function.
1093
1094 try { STMT1 } catch { STMT2 }
1095 Run the statements in the first block. Upon any run-time er‐
1096 rors, abort STMT1 and start executing STMT2. Any errors in
1097 STMT2 will propagate to outer try/catch blocks, if any.
1098
1099 try { STMT1 } catch(VAR) { STMT2 }
1100 Same as above, plus assign the error message to the string
1101 scalar variable VAR.
1102
1103 delete ARRAY[INDEX1, INDEX2, ...]
1104 Remove from ARRAY the element specified by the index tuple. If
1105 the index tuple contains a * in place of an index, the * is
1106 treated as a wildcard and all elements with keys that match the
1107 index tuple will be removed from ARRAY. The value will no
1108 longer be available, and subsequent iterations will not report
1109 the element. It is not an error to delete an element that does
1110 not exist.
1111
1112 delete ARRAY
1113 Remove all elements from ARRAY.
1114
1115 delete SCALAR
1116 Removes the value of SCALAR. Integers and strings are cleared
1117 to 0 and "" respectively, while statistics are reset to the ini‐
1118 tial empty state.
1119
1120
1121 EXPRESSIONS
1122 Systemtap supports a number of operators that have the same general
1123 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1124 formed as per typical C rules for signed integers. Division by zero or
1125 overflow is detected and results in an error.
1126
1127 binary numeric operators
1128 * / % + - >> << & ^ | && ||
1129
1130 binary string operators
1131 . (string concatenation)
1132
1133 numeric assignment operators
1134 = *= /= %= += -= >>= <<= &= ^= |=
1135
1136 string assignment operators
1137 = .=
1138
1139 unary numeric operators
1140 + - ! ~ ++ --
1141
1142 binary numeric, string comparison or regex matching operators
1143 < > <= >= == != =~ !~
1144
1145 ternary operator
1146 cond ? exp1 : exp2
1147
1148 grouping operator
1149 ( exp )
1150
1151 function call
1152 fn ([ arg1, arg2, ... ])
1153
1154 array membership check
1155 exp in array
1156 [exp1, exp2, ... ] in array
1157 [*, *, ... ] in array
1158
1159
1160 REGULAR EXPRESSION MATCHING
1161 The scripting language supports regular expression matching. The basic
1162 syntax is as follows:
1163
1164 exp =~ regex
1165 exp !~ regex
1166
1167 (The first operand must be an expression evaluating to a string; the
1168 second operand must be a string literal containing a syntactically
1169 valid regular expression.)
1170
1171 The regular expression syntax supports POSIX Extended Regular Expres‐
1172 sion features as documented in egrep(1) except for subexpression reuse
1173 ("\1") functionality.
1174
1175 After a successful match, the contents of the matched string and subex‐
1176 pressions can be extracted using the matched() and ngroups() tapset
1177 functions as follows:
1178
1179 if ("an example string" =~ "str(ing)") {
1180 matched(0) // -> returns "string", the matched substring
1181 matched(1) // -> returns "ing", the 1st matched subexpression
1182 ngroups() // -> returns 2, the number of matched groups
1183 }
1184
1185
1186 PROBES
1187 The main construct in the scripting language identifies probes. Probes
1188 associate abstract events with a statement block ("probe handler") that
1189 is to be executed when any of those events occur. The general syntax
1190 is as follows:
1191
1192 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1193 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1194
1195
1196 Events are specified in a special syntax called "probe points". There
1197 are several varieties of probe points defined by the translator, and
1198 tapset scripts may define further ones using aliases. Probe points may
1199 be wildcarded, grouped, or listed in preference sequences, or declared
1200 optional. More details on probe point syntax and semantics are listed
1201 on the stapprobes(3stap) manual page.
1202
1203 The probe handler is interpreted relative to the context of each event.
1204 For events associated with kernel code, this context may include vari‐
1205 ables defined in the source code at that spot. These "context vari‐
1206 ables" are presented to the script as variables whose names are pre‐
1207 fixed with "$". They may be accessed only if the kernel's compiler
1208 preserved them despite optimization. This is the same constraint that
1209 a debugger user faces when working with optimized code. In addition,
1210 the objects must exist in paged-in memory at the moment of the system‐
1211 tap probe handler's execution, because systemtap must not cause (sup‐
1212 presses) any additional paging. Some probe types have very little con‐
1213 text. See the stapprobes(3stap) man pages to see the kinds of context
1214 variables available at each kind of probe point. As of systemtap ver‐
1215 sion 4.3, functions called from the handlers of some probe point types
1216 may also refer to context variables. These are treated as if a clone
1217 of that function was inlined into the calling probe handler and $vari‐
1218 ables evaluated in its context.
1219
1220 Probes may be decorated with an arming condition, consisting of a sim‐
1221 ple boolean expression on read-only global script variables. While
1222 disarmed (inactive, condition evaluates to false), some probe types re‐
1223 duce or eliminate their run-time overheads. When an arming condition
1224 evaluates to true, probes will be soon re-armed, and their probe han‐
1225 dlers will start getting called as the events fire. (Some events may
1226 be lost during the arming interval. If this is unacceptable, do not
1227 use arming conditions for those probes.) Example of the syntax:
1228
1229 probe timer.us(TIMER) if (enabled) {
1230 }
1231
1232
1233 New probe points may be defined using "aliases". Probe point aliases
1234 look similar to probe definitions, but instead of activating a probe at
1235 the given point, it just defines a new probe point name as an alias to
1236 an existing one. There are two types of alias, i.e. the prologue style
1237 and the epilogue style which are identified by "=" and "+=" respective‐
1238 ly.
1239
1240 For prologue style alias, the statement block that follows an alias
1241 definition is implicitly added as a prologue to any probe that refers
1242 to the alias. While for the epilogue style alias, the statement block
1243 that follows an alias definition is implicitly added as an epilogue to
1244 any probe that refers to the alias. For example:
1245
1246 probe syscall.read = kernel.function("sys_read") {
1247 fildes = $fd
1248 if (execname() == "init") next # skip rest of probe
1249 }
1250
1251 defines a new probe point syscall.read, which expands to
1252 kernel.function("sys_read"), with the given statement as a prologue,
1253 which is useful to predefine some variables for the alias user and/or
1254 to skip probe processing entirely based on some conditions. And
1255
1256 probe syscall.read += kernel.function("sys_read") {
1257 if (tracethis) println ($fd)
1258 }
1259
1260 defines a new probe point with the given statement as an epilogue,
1261 which is useful to take actions based upon variables set or left over
1262 by the the alias user. Please note that in each case, the statements
1263 in the alias handler block are treated ordinarily, so that variables
1264 assigned there constitute mere initialization, not a macro substitu‐
1265 tion.
1266
1267 Aliases can also be defined to include both a prologue and an epilogue.
1268
1269 probe syscall.read = kernel.function("sys_read") {
1270 fildes = $fd
1271 if (execname() == "init") next
1272 },{
1273 if (tracethis) println ($fd)
1274 }
1275
1276
1277 An alias is used just like a built-in probe type.
1278
1279 probe syscall.read {
1280 printf("reading fd=%d\n", fildes)
1281 if (fildes > 10) tracethis = 1
1282 }
1283
1284
1285 Probes with an alias can make use of the @probewrite predicate. This
1286 check is used to detect whether a script variable or target variable
1287 has been written to in the probe handler body.
1288
1289 @probewrite(var)
1290 expands to 1 iff var has been written to in the probe handler
1291 body, otherwise it expands to 0.
1292
1293 In the following example, @probewrite(var) expands to 1 because var has
1294 been written to in the probe handler body and consequently, the condi‐
1295 tional statement will run.
1296
1297 probe foo = begin { var = 0 }, { if (@probewrite(var)) println(var) }
1298
1299 probe foo {
1300 var = 1
1301 }
1302
1303
1304
1305 FUNCTIONS
1306 Systemtap scripts may define subroutines to factor out common work.
1307 Functions take any number of scalar (integer or string) arguments, and
1308 must return a single scalar (integer or string). An example function
1309 declaration looks like this:
1310
1311 function thisfn (arg1, arg2) {
1312 return arg1 + arg2
1313 }
1314
1315 Note the general absence of type declarations, which are instead in‐
1316 ferred by the translator. However, if desired, a function definition
1317 may include explicit type declarations for its return value and/or its
1318 arguments. This is especially helpful for embedded-C functions. In
1319 the following example, the type inference engine need only infer type
1320 type of arg2 (a string).
1321
1322 function thatfn:string (arg1:long, arg2) {
1323 return sprint(arg1) . arg2
1324 }
1325
1326 Functions may call others or themselves recursively, up to a fixed
1327 nesting limit. This limit is defined by the MAXNESTING macro in the
1328 translated C code and is in the neighbourhood of 10.
1329
1330 Functions may be marked private using the private keyword to limit
1331 their scope to the tapset or user script file they are defined in. An
1332 example definition of a private function follows:
1333
1334 private function three:long () { return 3 }
1335
1336
1337 Functions terminating without reaching an explicit return statement
1338 will return an implicit 0 or "", determined by type inference.
1339
1340 Functions may be overloaded during both runtime and compile time.
1341
1342 Runtime overloading allows the executed function to be selected while
1343 the module is running based on runtime conditions and is achieved using
1344 the "next" statement in script functions and STAP_NEXT macro for embed‐
1345 ded-C functions. For example,
1346
1347
1348 function f() { if (condition) next; print("first function") }
1349 function f() %{ STAP_NEXT; print("second function") %}
1350 function f() { print("third function") }
1351
1352
1353 During a functioncall f(), the execution will transfer to the third
1354 function if condition evaluates to true and print "third function".
1355 Note that the second function is unconditionally nexted.
1356
1357 Parameter overloading allows the function to be executed to be selected
1358 at compile time based on the number of arguments provided to the func‐
1359 tioncall. For example,
1360
1361
1362 function g() { print("first function") }
1363 function g(x) { print("second function") }
1364 g() -> "first function"
1365 g(1) -> "second function"
1366
1367
1368 Note that runtime overloading does not occur in the above example, as
1369 exactly one function will be resolved for the functioncall. The use of
1370 a next statement inside a function while no more overloads remain will
1371 trigger a runtime exception Runtime overloading will only occur if the
1372 functions have the same arity, functions with the same name but differ‐
1373 ent number of parameters are completely unrelated.
1374
1375 Execution order is determined by a priority value which may be speci‐
1376 fied. If no explicit priority is specified, user script functions are
1377 given a higher priority than library functions. User script functions
1378 and library functions are assigned a default priority value of 0 and 1
1379 respectively. Functions with the same priority are executed in decla‐
1380 ration order. For example,
1381
1382
1383 function f():3 { if (condition) next; print("first function") }
1384 function f():1 { if (condition) next; print("second function") }
1385 function f():2 { print("third function") }
1386
1387
1388 Since the second function has highest priority, it is executed first.
1389 The first function is never executed as there no "next" statements in
1390 the third function to transfer execution.
1391
1392
1393 PRINTING
1394 There are a set of function names that are specially treated by the
1395 translator. They format values for printing to the standard systemtap
1396 output stream in a more convenient way (note that data generated in the
1397 kernel module need to get transferred to user-space in order to get
1398 printed).
1399
1400 The sprint* variants return the formatted string instead of printing
1401 it.
1402
1403 print, sprint
1404 Print one or more values of any type, concatenated directly to‐
1405 gether.
1406
1407 println, sprintln
1408 Print values like print and sprint, but also append a newline.
1409
1410 printd, sprintd
1411 Take a string delimiter and two or more values of any type, and
1412 print the values with the delimiter interposed. The delimiter
1413 must be a literal string constant.
1414
1415 printdln, sprintdln
1416 Print values with a delimiter like printd and sprintd, but also
1417 append a newline.
1418
1419 printf, sprintf
1420 Take a formatting string and a number of values of corresponding
1421 types, and print them all. The format must be a literal string
1422 constant.
1423
1424 The printf formatting directives similar to those of C, except that
1425 they are fully type-checked by the translator:
1426
1427 %b Writes a binary blob of the value given, instead of ASCII
1428 text. The width specifier determines the number of bytes
1429 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1430 fault (%b) is 8 bytes.
1431
1432 %c Character.
1433
1434 %d,%i Signed decimal.
1435
1436 %m Safely reads kernel (without #) or user (with #) memory
1437 at the given address, outputs its content. The optional
1438 precision specifier (not field width) determines the num‐
1439 ber of bytes to read - default is 1 byte. %10.4m prints
1440 4 bytes of the memory in a 10-character-wide field.
1441 Note, on some architectures user memory can still be read
1442 without #.
1443
1444 %M Same as %m, but outputs in hexadecimal. The minimal size
1445 of output is double the optional precision specifier -
1446 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1447 of the memory as 8 hexadecimal characters in a 10-charac‐
1448 ter-wide field. %.*M hex-dumps a given number of bytes
1449 from a given buffer.
1450
1451 %o Unsigned octal.
1452
1453 %p Unsigned pointer address.
1454
1455 %s String.
1456
1457 %u Unsigned decimal.
1458
1459 %x Unsigned hex value, in all lower-case.
1460
1461 %X Unsigned hex value, in all upper-case.
1462
1463 %% Writes a %.
1464
1465 The # flag selects the alternate forms. For octal, this prefixes a 0.
1466 For hex, this prefixes 0x or 0X, depending on case. For characters,
1467 this escapes non-printing values with either C-like escapes or raw oc‐
1468 tal. In the case of %#m/%#M, this safely accesses user space memory
1469 rather than kernel space memory.
1470
1471 Examples:
1472
1473 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1474 print("hello")
1475 Prints: hello
1476 println(b)
1477 Prints: bob\n
1478 println(a . " is " . sprint(16))
1479 Prints: alice is 16
1480 foreach (name in id) printdln("|", strlen(name), name, id[name])
1481 Prints: 5|alice|1234\n3|bob|4567
1482 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1483 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1484 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1485 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1486 printf("%4b", p)
1487 Prints (these values as binary data): 0x1234abcd
1488 printf("%#o %#x %#X\n", 1, 2, 3)
1489 Prints: 01 0x2 0X3
1490 printf("%#c %#c %#c\n", 0, 9, 42)
1491 Prints: \000 \t *
1492
1493
1494
1495 STATISTICS
1496 It is often desirable to collect statistics in a way that avoids the
1497 penalties of repeatedly exclusive locking the global variables those
1498 numbers are being put into. Systemtap provides a solution using a spe‐
1499 cial operator to accumulate values, and several pseudo-functions to ex‐
1500 tract the statistical aggregates.
1501
1502 The aggregation operator is <<<, and resembles an assignment, or a C++
1503 output-streaming operation. The left operand specifies a scalar or ar‐
1504 ray-index lvalue, which must be declared global. The right operand is
1505 a numeric expression. The meaning is intuitive: add the given number
1506 to the pile of numbers to compute statistics of. (The specific list of
1507 statistics to gather is given separately, by the extraction functions.)
1508
1509 foo <<< 1
1510 stats[pid()] <<< memsize
1511
1512
1513 The extraction functions are also special. For each appearance of a
1514 distinct extraction function operating on a given identifier, the
1515 translator arranges to compute a set of statistics that satisfy it.
1516 The statistics system is thereby "on-demand". Each execution of an ex‐
1517 traction function causes the aggregation to be computed for that moment
1518 across all processors.
1519
1520 Here is the set of extractor functions. The first argument of each is
1521 the same style of lvalue used on the left hand side of the accumulate
1522 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1523 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1524 mum/average/variance of all accumulated values. The resulting values
1525 are all simple integers. Arrays containing aggregates may be sorted
1526 and iterated. See the foreach construct above.
1527
1528 Variance uses Welford's online algorithm. The calculations are based
1529 on integer arithmetic, and so may suffer from low precision and over‐
1530 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1531 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1532 Only one value of bit-shift may be used with given global variable. A
1533 larger bitshift value increases precision, but increases the likelihood
1534 of overflow.
1535
1536
1537 $ stap -e \
1538 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1539 12
1540 $ stap -e \
1541 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1542 2
1543 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1544 2.5
1545 $
1546
1547
1548 Overflow (from internal multiplication of large numbers) may occur and
1549 may cause a negative variance result. Consider normalizing your input
1550 data. Adding or subtracting a fixed value from all variance inputs
1551 preserves the original variance. Dividing the variance inputs by a
1552 fixed value shrinks the original variance by that value squared.
1553
1554
1555
1556 Histograms are also available, but are more complicated because they
1557 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1558 terval) represents a linear histogram from "start" to "stop" (inclu‐
1559 sive) by increments of "interval". The interval must be positive. Sim‐
1560 ilarly, @hist_log(v) represents a base-2 logarithmic histogram. Print‐
1561 ing a histogram with the print family of functions renders a histogram
1562 object as a tabular "ASCII art" bar chart.
1563
1564
1565 probe timer.profile {
1566 x[1] <<< pid()
1567 x[2] <<< uid()
1568 y <<< tid()
1569 }
1570 global x // an array containing aggregates
1571 global y // a scalar
1572 probe end {
1573 foreach ([i] in x @count+) {
1574 printf ("x[%d]: avg %d = sum %d / count %d\n",
1575 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1576 println (@hist_log(x[i]))
1577 }
1578 println ("y:")
1579 println (@hist_log(y))
1580 }
1581
1582
1583 The counts of each histogram bucket may be individually accessed via
1584 the [index] operator. Each bucket is addressed from 1 through N (for
1585 each natural bucket). In addition bucket #0 counts all the samples be‐
1586 neath the start value, and bucket #N+1 counts all the samples above the
1587 stop value. Histogram buckets (including the two out-of-range buckets)
1588 may also be iterated with foreach.
1589
1590
1591 global x
1592 probe oneshot {
1593 x <<< -100
1594 x <<< 1
1595 x <<< 2
1596 x <<< 3
1597 x <<< 100
1598 foreach (bucket in @hist_linear(x,1,3,1))
1599 // expecting 1 out-of-range-low bucket
1600 // 3 payload buckets
1601 // 1 out-of-range-high bucket
1602 printf("bucket %d count %d\n",
1603 bucket, @hist_linear(x,1,3,1)[bucket])
1604 }
1605
1606
1607
1608 TYPECASTING
1609 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1610 has been saved into a script integer variable, the translator attempts
1611 to keep the type information necessary to access members from that
1612 pointer.
1613
1614 The translator attempts to track DWARF typing associated with script
1615 variables assigned from addresses of context $variables, @cast or @var
1616 operators. Depending on the complexity of the script code, this asso‐
1617 ciation may pass to related variables, so that -> and [] operators may
1618 be used on them, just as on the original context variable. For exam‐
1619 ple:
1620
1621
1622 foo = $param->foo; printf("x:%d y:%d\n", foo->x, foo->y)
1623 printf("my value is %d\n", ($type == 42 ? $foo : $bar)->value)
1624 printf("my parent pid is %d\n", task_parent(task_current())->tgid)
1625
1626
1627 However, if this association heuristic doesn't work for a script, using
1628 the @cast() operator tells the translator how to interpret the number
1629 as a typed pointer.
1630
1631 @cast(p, "type_name"[, "module"])->member
1632
1633
1634 This will interpret p as a pointer to a struct/union named type_name
1635 and dereference the member value. Further ->subfield expressions may
1636 be appended to dereference more levels. Note that for direct derefer‐
1637 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1638 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1639 ing operator -> is used to refer to both direct containment or pointer
1640 indirection. Systemtap automatically determines which. The optional
1641 module tells the translator where to look for information about that
1642 type. Multiple modules may be specified as a list with : separators.
1643 If the module is not specified, it will default either to the probe
1644 module for dwarf probes, or to "kernel" for functions and all other
1645 probes types.
1646
1647 Previously up to systemtap version 4.2, "kernel" was inferred if un‐
1648 specified. Use --compatible=4.2 to activate this default.
1649
1650 The translator can create its own module with type information from a
1651 header surrounded by angle brackets, in case normal debuginfo is not
1652 available. For kernel headers, prefix it with "kernel" to use the ap‐
1653 propriate build system. All other headers are built with default GCC
1654 parameters into a user module. Multiple headers may be specified in
1655 sequence to resolve a codependency.
1656
1657 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1658 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1659 @cast(task, "task_struct",
1660 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1661
1662 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1663 operators, the same way as described in the CONTEXT VARIABLES section
1664 of the stapprobes(3stap) manual page.
1665
1666
1667 When in guru mode, the translator will also allow scripts to assign new
1668 values to members of typecasted pointers.
1669
1670 Typecasting is also useful in the case of void* members whose type may
1671 be determinable at runtime.
1672
1673 probe foo {
1674 if ($var->type == 1) {
1675 value = @cast($var->data, "type1")->bar
1676 } else {
1677 value = @cast($var->data, "type2")->baz
1678 }
1679 print(value)
1680 }
1681
1682
1683
1684 EMBEDDED C
1685 When in guru mode, the translator accepts embedded C code in the top
1686 level of the script. Such code is enclosed between %{ and %} markers,
1687 and is transcribed verbatim, without analysis, in some sequence, into
1688 the top level of the generated C code. At the outermost level, this
1689 may be useful to add #include instructions, and any auxiliary defini‐
1690 tions for use by other embedded code.
1691
1692 Another place where embedded code is permitted is as a function body.
1693 In this case, the script language body is replaced entirely by a piece
1694 of C code enclosed again between %{ and %} markers. This C code may do
1695 anything reasonable and safe. There are a number of undocumented but
1696 complex safety constraints on atomicity, concurrency, resource consump‐
1697 tion, and run time limits, so this is an advanced technique.
1698
1699 The memory locations set aside for input and output values are made
1700 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1701 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1702 The function may return early with STAP_RETURN. Here are some exam‐
1703 ples:
1704
1705 function integer_ops (val) %{
1706 STAP_PRINTF("%d\n", STAP_ARG_val);
1707 STAP_RETVALUE = STAP_ARG_val + 1;
1708 if (STAP_RETVALUE == 4)
1709 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1710 if (STAP_RETVALUE == 3)
1711 STAP_RETURN(0);
1712 STAP_RETVALUE ++;
1713 %}
1714 function string_ops (val) %{
1715 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1716 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1717 if (strcmp (STAP_RETVALUE, "three-two-one"))
1718 STAP_RETURN("parameter should be three-two-");
1719 %}
1720 function no_ops () %{
1721 STAP_RETURN(); /* function inferred with no return value */
1722 %}
1723
1724 The function argument and return value types have to be inferred by the
1725 translator from the call sites in order for this to work. The user
1726 should examine C code generated for ordinary script-language functions
1727 in order to write compatible embedded-C ones.
1728
1729 The last place where embedded code is permitted is as an expression
1730 rvalue. In this case, the C code enclosed between %{ and %} markers is
1731 interpreted as an ordinary expression value. It is assumed to be a
1732 normal 64-bit signed number, unless the marker /* string */ is includ‐
1733 ed, in which case it's treated as a string.
1734
1735 function add_one (val) {
1736 return val + %{ 1 %}
1737 }
1738 function add_string_two (val) {
1739 return val . %{ /* string */ "two" %}
1740 }
1741 @define SOME_STAP_MACRO %( %{ SOME_C_MACRO %} %)
1742 probe begin {
1743 printf("SOME_C_MACRO has value: %d\n", @SOME_STAP_MACRO);
1744 }
1745
1746
1747 The embedded-C code may contain markers to assert optimization and
1748 safety properties.
1749
1750 /* pure */
1751 means that the C code has no side effects and may be elided en‐
1752 tirely if its value is not used by script code.
1753
1754 /* stable */
1755 means that the C code always has the same value (in any given
1756 probe handler invocation), so repeated calls may be automatical‐
1757 ly replaced by memoized values. Such functions must take no pa‐
1758 rameters, and also be pure.
1759
1760 /* unprivileged */
1761 means that the C code is so safe that even unprivileged users
1762 are permitted to use it.
1763
1764 /* myproc-unprivileged */
1765 means that the C code is so safe that even unprivileged users
1766 are permitted to use it, provided that the target of the current
1767 probe is within the user's own process.
1768
1769 /* guru */
1770 means that the C code is so unsafe that a systemtap user must
1771 specify -g (guru mode) to use this. (Tapsets are permitted and
1772 presumed to call them safely.)
1773
1774 /* unmangled */
1775 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1776 ment access syntax should be made available inside the function.
1777 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1778 THIS->foo and THIS->__retvalue respectively inside the function.
1779 This is useful for quickly migrating code written for SystemTap
1780 version 1.7 and earlier.
1781
1782 /* unmodified-fnargs */
1783 in an embedded-C function, means that the function arguments are
1784 not modified inside the function body.
1785
1786 /* string */
1787 in embedded-C expressions only, means that the expression has
1788 const char * type and should be treated as a string value, in‐
1789 stead of the default long numeric.
1790
1791 Script level global variables may be accessed in embedded-C functions
1792 and blocks. To read or write the global variable var , the /* prag‐
1793 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1794 the embedded-C function or block. This provides the macros STAP_GLOB‐
1795 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1796 spectively. For example:
1797
1798 global var
1799 global var2[100]
1800 function increment() %{
1801 /* pragma:read:var */ /* pragma:write:var */
1802 /* pragma:read:var2 */ /* pragma:write:var2 */
1803 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1804 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1805 %}
1806
1807 Variables may be read and set in both embedded-C functions and expres‐
1808 sions. Strings returned from embedded-C code are decayed to pointers.
1809 Variables must also be assigned at script level to allow for type in‐
1810 ference. Map assignment does not return the value written, so chaining
1811 does not work.
1812
1813
1814 BUILT-INS
1815 A set of builtin probe point aliases are provided by the scripts in‐
1816 stalled in the directory specified in the stappaths(7) manual page.
1817 The functions are described in the stapprobes(3stap) manual page.
1818
1819
1820 DEREFERENCING
1821 Integers can be dereferenced from pointers saved as a script integer
1822 variables using the @kderef() or @uderef() operators. @kderef() is
1823 used for kernel space addresses and @uderef() is used for user space
1824 addresses.
1825
1826 @kderef(SIZE, addr)
1827 @uderef(SIZE, addr)
1828
1829 This will interpret addr as a kernel/user address and read SIZE bytes
1830 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1831
1832
1833 REGISTERS
1834 The value stored within a register can be accessed using the @kregis‐
1835 ter() or @uregister() operators. @kregister() is used for kernel space
1836 registers and @uregister() is used for user space registers. The regis‐
1837 ter of interest is specified using its DWARF number.
1838
1839 @kregister(0)
1840 @uregister(5)
1841
1842
1844 The translator begins pass 1 by parsing the given input script, and all
1845 scripts (files named *.stp) found in a tapset directory. The
1846 directories listed with -I are processed in sequence, each processed in
1847 "guru mode". For each directory, a number of subdirectories are also
1848 searched. These subdirectories are derived from the selected kernel
1849 version (the -R option), in order to allow more kernel-version-specific
1850 scripts to override less specific ones. For example, for a kernel
1851 version 2.6.12-23.FC3 the following patterns would be searched, in
1852 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1853 *.stp. Stopping the translator after pass 1 causes it to print the
1854 parse trees.
1855
1856
1857 In pass 2, the translator analyzes the input script to resolve symbols
1858 and types. References to variables, functions, and probe aliases that
1859 are unresolved internally are satisfied by searching through the parsed
1860 tapset script files. If any tapset script file is selected because it
1861 defines an unresolved symbol, then the entirety of that file is added
1862 to the translator's resolution queue. This process iterates until all
1863 symbols are resolved and a subset of tapset script files is selected.
1864
1865 Next, all probe point descriptions are validated against the wide
1866 variety supported by the translator. Probe points that refer to code
1867 locations ("synchronous probe points") require the appropriate kernel
1868 debugging information to be installed. In the associated probe
1869 handlers, target-side variables (whose names begin with "$") are found
1870 and have their run-time locations decoded.
1871
1872 Next, all probes and functions are analyzed for optimization
1873 opportunities, in order to remove variables, expressions, and functions
1874 that have no useful value and no side-effect. Embedded-C functions are
1875 assumed to have side-effects unless they include the magic string
1876 /* pure */. Since this optimization can hide latent code errors such
1877 as type mismatches or invalid $context variables, it sometimes may be
1878 useful to disable the optimizations with the -u option.
1879
1880 Finally, all variable, function, parameter, array, and index types are
1881 inferred from context (literals and operators). Stopping the
1882 translator after pass 2 causes it to list all the probes, functions,
1883 and variables, along with all inferred types. Any inconsistent or
1884 unresolved types cause an error.
1885
1886
1887 In pass 3, the translator writes C code that represents the actions of
1888 all selected script files, and creates a Makefile to build that into a
1889 kernel object. These files are placed into a temporary directory.
1890 Stopping the translator at this point causes it to print the contents
1891 of the C file.
1892
1893
1894 In pass 4, the translator invokes the Linux kernel build system to
1895 create the actual kernel object file. This involves running make in
1896 the temporary directory, and requires a kernel module build system
1897 (headers, config and Makefiles) to be installed in the usual spot
1898 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1899 the last chance before running the kernel object. This may be useful
1900 if you want to archive the file.
1901
1902
1903 In pass 5, the translator invokes the systemtap auxiliary program
1904 staprun program for the given kernel object. This program arranges to
1905 load the module then communicates with it, copying trace data from the
1906 kernel into temporary files, until the user sends an interrupt signal.
1907 Any run-time error encountered by the probe handlers, such as running
1908 out of memory, division by zero, exceeding nesting or runtime limits,
1909 results in a soft error indication. Soft errors in excess of MAXERRORS
1910 block of all subsequent probes (except error-handling probes), and
1911 terminate the session. Finally, staprun unloads the module, and cleans
1912 up.
1913
1914
1915 ABNORMAL TERMINATION
1916 One should avoid killing the stap process forcibly, for example with
1917 SIGKILL, because the stapio process (a child process of the stap
1918 process) and the loaded module may be left running on the system. If
1919 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1920 then use rmmod to unload the systemtap module.
1921
1922
1923
1925 See the stapex(3stap) manual page for a brief collection of samples, or
1926 a large set of installed samples under the systemtap
1927 documentation/testsuite directories. See stappaths(7stap) for the
1928 likely location of these on the system.
1929
1930
1932 The systemtap translator caches the pass 3 output (the generated C
1933 code) and the pass 4 output (the compiled kernel module) if pass 4
1934 completes successfully. This cached output is reused if the same
1935 script is translated again assuming the same conditions exist (same
1936 kernel version, same systemtap version, etc.). Cached files are stored
1937 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1938 having the file cache_mb_limit placed in the cache directory (shown
1939 above) containing only an ASCII integer representing how many MiB the
1940 cache should not exceed. In the absence of this file, a default will be
1941 created with the limit set to 256MiB. This is a 'soft' limit in that
1942 the cache will be cleaned after a new entry is added if the cache clean
1943 interval is exceeded, so the total cache size may temporarily exceed
1944 this limit. This interval can be specified by having the file
1945 cache_clean_interval_s placed in the cache directory (shown above)
1946 containing only an ASCII integer representing the interval in seconds.
1947 In the absence of this file, a default will be created with the
1948 interval set to 300 s.
1949
1950
1952 Systemtap may be used as a powerful administrative tool. It can expose
1953 kernel internal data structures and potentially private user
1954 information. (In dyninst runtime mode, this is not the case, see the
1955 ALTERNATE RUNTIMES section below.)
1956
1957 The translator asserts many safety constraints during compilation and
1958 more during run-time. It aims to ensure that no handler routine can
1959 run for very long, allocate boundless memory, perform unsafe
1960 operations, or in unintentionally interfere with the system. Uses of
1961 script global variables are automatically read/write locked as
1962 appropriate, to protect against manipulation by concurrent probe
1963 handlers. Locks are taken so as to run the global-variable
1964 manipulation portion of probe handlers atomically (locks are taken all-
1965 or-none). Deadlocks are detected with timeouts. Use the -t flag to
1966 receive reports of excessive lock contention. Experimenting with
1967 scripts is therefore generally safe. The guru-mode -g option allows
1968 administrators to bypass most safety measures, which permits invasive
1969 or state-changing operations, embedded-C code, and increases the risk
1970 of upset. By default, overload prevention is turned on for all
1971 modules. If you would like to disable overload processing, use the
1972 --suppress-time-limits option.
1973
1974 Errors that are caught at run time normally result in a clean script
1975 shutdown and a pass-5 error message. The --suppress-handler-errors
1976 option lets scripts tolerate soft errors without shutting down.
1977
1978
1979
1980 PERMISSIONS
1981 For the normal linux-kernel-module runtime, to run the kernel objects
1982 systemtap builds, a user must be one of the following:
1983
1984 • the root user;
1985
1986 • a member of the stapdev and stapusr groups;
1987
1988 • a member of the stapsys and stapusr groups; or
1989
1990 • a member of the stapusr group.
1991
1992 The root user or a user who is a member of both the stapdev and stapusr
1993 groups can build and run any systemtap script.
1994
1995 A user who is a member of both the stapsys and stapusr groups can only
1996 use pre-built modules under the following conditions:
1997
1998 • The module has been signed by a trusted signer. Trusted signers are
1999 normally systemtap compile-servers which sign modules when the
2000 --privilege option is specified by the client. See the
2001 stap-server(8) manual page for more information.
2002
2003 • The module was built using the --privilege=stapsys or the
2004 --privilege=stapusr options.
2005
2006 Members of only the stapusr group can only use pre-built modules under
2007 the following conditions:
2008
2009 • The module is located in the /lib/modules/VERSION/systemtap
2010 directory. This directory must be owned by root and not be world
2011 writable.
2012
2013 or
2014
2015 • The module has been signed by a trusted signer. Trusted signers are
2016 normally systemtap compile-servers which sign modules when the
2017 --privilege option is specified by the client. See the
2018 stap-server(8) manual page for more information.
2019
2020 • The module was built using the --privilege=stapusr option.
2021
2022 The kernel modules generated by stap program are run by the staprun
2023 program. The latter is a part of the Systemtap package, dedicated to
2024 module loading and unloading (but only in the white zone), and kernel-
2025 to-user data transfer. Since staprun does not perform any additional
2026 security checks on the kernel objects it is given, it would be unwise
2027 for a system administrator to add untrusted users to the stapdev or
2028 stapusr groups.
2029
2030
2031 SECUREBOOT
2032 If the current system has SecureBoot turned on in the UEFI firmware,
2033 all kernel modules must be signed. (Some kernels may allow disabling
2034 SecureBoot long after booting with a key sequence such as SysRq-X,
2035 making it unnecessary to sign modules.) There are two ways to sign a
2036 systemtap module. The systemtap compile server can sign modules with a
2037 MOK (Machine Owner Key) that it has in common with a client system.
2038 For example:
2039
2040 stap --use-server=HOSTNAME:PORT -e 'SCRIPT'
2041 # If there is no mok key in common with the server's systemtap mok key
2042 # list and the client's mok database then the user is directed by stap
2043 # to invoke:
2044 sudo mokutil --import signing_key.x509
2045 # then after rebooting the system:
2046 stap --use-server=HOSTNAME:PORT -e 'SCRIPT'
2047 # will use the server to build and sign the module and the module will run
2048 # on the client
2049
2050 Another way to sign modules is to use the stap --sign-module option,
2051 which uses a MOK on the client system without using a server. For ex‐
2052 ample:
2053
2054 stap --sign-module -e 'SCRIPT'
2055 # If there is no systemtap mok key in the system mok database
2056 # then the user is directed by stap to invoke:
2057 sudo mokutil --import /home/USER/.systemtap/ssl/server/moks/FINGERPRINT/signing_key.x509
2058 # then after rebooting the system:
2059 stap --sign-module -e 'SCRIPT'
2060 # will sign and run the module
2061
2062
2063 See the following wiki page for more details:
2064
2065 https://sourceware.org/systemtap/wiki/SecureBoot
2066
2067 Some kernels do not let systemtap guess whether module module signing
2068 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
2069 variable to any value while running stap.
2070
2071
2072 RESOURCE LIMITS
2073 Many resource use limits are set by macros in the generated C code.
2074 These may be overridden with -D flags. A selection of these is as fol‐
2075 lows:
2076
2077 MAXNESTING
2078 Maximum number of nested function calls. Default determined by
2079 script analysis, with a bonus 10 slots added for recursive
2080 scripts.
2081
2082 MAXSTRINGLEN
2083 Maximum length of strings, default 128.
2084
2085 MAXTRYLOCK
2086 Maximum number of iterations to wait for locks on global vari‐
2087 ables before declaring possible deadlock and skipping the probe,
2088 default 1000.
2089
2090 MAXACTION
2091 Maximum number of statements to execute during any single probe
2092 hit (with interrupts disabled), default 1000. Note that for
2093 straight-through probe handlers lacking loops or recursion, due
2094 to optimization, this parameter may be interpreted too conserva‐
2095 tively.
2096
2097 MAXACTION_INTERRUPTIBLE
2098 Maximum number of statements to execute during any single probe
2099 hit which is executed with interrupts enabled (such as begin/end
2100 probes), default (MAXACTION * 10).
2101
2102 MAXBACKTRACE
2103 Maximum number of stack frames that will be be processed by the
2104 stap runtime unwinder as produced by the backtrace functions in
2105 the [u]context-unwind.stp tapsets, default 20.
2106
2107 MAXMAPENTRIES
2108 Maximum number of rows in any single global array, default 2048.
2109 Individual arrays may be declared with a larger or smaller limit
2110 instead:
2111
2112 global big[10000],little[5]
2113
2114 or denoted with % to make them wrap-around (replace old entries)
2115 automatically, as in
2116
2117 global big%
2118
2119 or both.
2120
2121 MAPHASHBIAS
2122 The number of powers-of-two to add or subtract from the natural
2123 size of the hash table backing each global associative array.
2124 Default is 0. Try small positive numbers to get extra perfor‐
2125 mance at the cost of more memory consumption, because that
2126 should reduce hash table collisions. Try small negative numbers
2127 for the opposite tradeoff.
2128
2129 MAXERRORS
2130 Maximum number of soft errors before an exit is triggered, de‐
2131 fault 0, which means that the first error will exit the script.
2132 Note that with the --suppress-handler-errors option, this limit
2133 is not enforced.
2134
2135 MAXSKIPPED
2136 Maximum number of skipped probes before an exit is triggered,
2137 default 100. Running systemtap with -t (timing) mode gives more
2138 details about skipped probes. With the default -DINTERRUPT‐
2139 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2140 lated against this limit. Note that with the --suppress-han‐
2141 dler-errors option, this limit is not enforced.
2142
2143 MINSTACKSPACE
2144 Minimum number of free kernel stack bytes required in order to
2145 run a probe handler, default 1024. This number should be large
2146 enough for the probe handler's own needs, plus a safety margin.
2147
2148 MAXUPROBES
2149 Maximum number of concurrently armed user-space probes (up‐
2150 robes), default somewhat larger than the number of user-space
2151 probe points named in the script. This pool needs to be poten‐
2152 tially large because individual uprobe objects (about 64 bytes
2153 each) are allocated for each process for each matching script-
2154 level probe.
2155
2156 STP_MAXMEMORY
2157 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2158 ule should use, default unlimited. The memory size includes the
2159 size of the module itself, plus any additional allocations.
2160 This only tracks direct allocations by the systemtap runtime.
2161 This does not track indirect allocations (as done by kprobes/up‐
2162 robes/etc. internals).
2163
2164 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2165 Maximum number of machine cycles spent in probes on any cpu per
2166 given interval, before an overload condition is declared and the
2167 script shut down. The defaults are 500 million and 1 billion,
2168 so as to limit stap script cpu consumption at around 50%.
2169
2170 STP_PROCFS_BUFSIZE
2171 Size of procfs probe read buffers (in bytes). Defaults to
2172 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2173 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2174
2175 With scripts that contain probes on any interrupt path, it is possible
2176 that those interrupts may occur in the middle of another probe handler.
2177 The probe in the interrupt handler would be skipped in this case to
2178 avoid reentrance. To work around this issue, execute stap with the op‐
2179 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2180 This does add some extra overhead to the probes, but it may prevent
2181 reentrance for common problem cases. However, probes in NMI handlers
2182 and in the callpath of the stap runtime may still be skipped due to
2183 reentrance.
2184
2185
2186 In case something goes wrong with stap or staprun after a probe has al‐
2187 ready started running, one may safely kill both user processes, and re‐
2188 move the active probe kernel module with rmmod. Any pending trace mes‐
2189 sages may be lost.
2190
2191
2193 Systemtap exposes kernel internal data structures and potentially pri‐
2194 vate user information. Because of this, use of systemtap's full capa‐
2195 bilities are restricted to root and to users who are members of the
2196 groups stapdev and stapusr.
2197
2198 However, a restricted set of systemtap's features can be made available
2199 to trusted, unprivileged users. These users are members of the group
2200 stapusr only, or members of the groups stapusr and stapsys. These
2201 users can load systemtap modules which have been compiled and certified
2202 by a trusted systemtap compile-server. See the descriptions of the op‐
2203 tions --privilege and --use-server. See README.unprivileged in the sys‐
2204 temtap source code for information about setting up a trusted compile
2205 server.
2206
2207 The restrictions enforced when --privilege=stapsys is specified are de‐
2208 signed to prevent unprivileged users from:
2209
2210 • harming the system maliciously.
2211
2212 The restrictions enforced when --privilege=stapusr is specified are de‐
2213 signed to prevent unprivileged users from:
2214
2215 • harming the system maliciously.
2216
2217 • gaining access to information which would not normally be
2218 available to an unprivileged user.
2219
2220 • disrupting the performance of processes owned by other users
2221 of the system. Some overhead to the system in general is
2222 unavoidable since the unprivileged user's probes will be
2223 triggered at the appropriate times. What we would like to
2224 avoid is targeted interruption of another user's processes
2225 which would not normally be possible by an unprivileged us‐
2226 er.
2227
2228
2229 PROBE RESTRICTIONS
2230 A member of the groups stapusr and stapsys may use all probe points.
2231
2232 A member of only the group stapusr may use only the following probes:
2233
2234 • begin, begin(n)
2235
2236 • end, end(n)
2237
2238 • error(n)
2239
2240 • never
2241
2242 • process.*, where the target process is owned by the user.
2243
2244 • timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2245
2246 • timer.hz(n)
2247
2248
2249 SCRIPT LANGUAGE RESTRICTIONS
2250 The following scripting language features are unavailable to all un‐
2251 privileged users:
2252
2253
2254 • any feature enabled by the Guru Mode (-g) option.
2255
2256 • embedded C code.
2257
2258
2259 RUNTIME RESTRICTIONS
2260 The following runtime restrictions are placed upon all unprivileged
2261 users:
2262
2263 • Only the default runtime code (see -R) may be used.
2264
2265 Additional restrictions are placed on members of only the group sta‐
2266 pusr:
2267
2268 • Probing of processes owned by other users is not permitted.
2269
2270 • Access of kernel memory (read and write) is not permitted.
2271
2272
2273 COMMAND LINE OPTION RESTRICTIONS
2274 Some command line options provide access to features which must not be
2275 available to all unprivileged users:
2276
2277
2278 • -g may not be specified.
2279
2280 • The following options may not be used by the compile-server
2281 client:
2282
2283 -a, -B, -D, -I, -r, -R
2284
2285
2286
2287 ENVIRONMENT RESTRICTIONS
2288 The following environment variables must not be set for all unprivi‐
2289 leged users:
2290
2291 SYSTEMTAP_RUNTIME
2292 SYSTEMTAP_TAPSET
2293 SYSTEMTAP_DEBUGINFO_PATH
2294
2295
2296
2297 TAPSET RESTRICTIONS
2298 In general, tapset functions are only available for members of the
2299 group stapusr when they do not gather information that an ordinary pro‐
2300 gram running with that user's privileges would be denied access to.
2301
2302 There are two categories of unprivileged tapset functions. The first
2303 category consists of utility functions that are unconditionally avail‐
2304 able to all users; these include such things as:
2305
2306 cpu:long ()
2307 exit ()
2308 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2309
2310
2311 The second category consists of so-called myproc-unprivileged functions
2312 that can only gather information within their own processes. Scripts
2313 that wish to use these functions must test the result of the tapset
2314 function is_myproc and only call these functions if the result is 1.
2315 The script will exit immediately if any of these functions are called
2316 by an unprivileged user within a probe within a process which is not
2317 owned by that user. Examples of myproc-unprivileged functions include:
2318
2319 print_usyms (stk:string)
2320 user_int:long (addr:long)
2321 usymname:string (addr:long)
2322
2323
2324 A compile error is triggered when any function not in either of the
2325 above categories is used by members of only the group stapusr.
2326
2327 No other built-in tapset functions may be used by members of only the
2328 group stapusr.
2329
2330
2332 As described above, systemtap's default runtime mode involves building
2333 and loading kernel modules, with various security tradeoffs presented.
2334 Systemtap now includes two new prototype backends: --runtime=dyninst
2335 and --runtime=bpf.
2336
2337 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2338 runtime. This backend does not use kernel modules, and does not require
2339 root privileges, but is restricted with respect to the kinds of probes
2340 and other constructs that a script may use. dyninst runtime operates in
2341 target-attach mode, so it does require a -c COMMAND or -x PID process.
2342 For example:
2343
2344 stap --runtime=dyninst -c 'stap -V' \
2345 -e 'probe process.function("main")
2346 { println("hi from dyninst!") }'
2347
2348
2349 It may be necessary to disable a conflicting selinux check with
2350
2351 # setsebool allow_execstack 1
2352
2353
2354 --runtime=bpf compiles the user script into extended Berkeley Packet
2355 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2356 verified by the kernel for safety and are executed by an in-kernel vir‐
2357 tual machine. This runtime is in an early stage of development and
2358 currently lacks support for a number of features available in the de‐
2359 fault runtime. Please see the stapbpf(8) man page for more information.
2360
2361
2363 The systemtap translator generally returns with a success code of 0 if
2364 the requested script was processed and executed successfully through
2365 the requested pass. Otherwise, errors may be printed to stderr and a
2366 failure code is returned. Use -v or -vp N to increase (global or per-
2367 pass) verbosity to identify the source of the trouble.
2368
2369 In listings mode (-l and -L), error messages are normally suppressed.
2370 A success code of 0 is returned if at least one matching probe was
2371 found.
2372
2373 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2374 considered to be successful.
2375
2376
2378 Over time, some features of the script language and the tapset library
2379 may undergo incompatible changes, so that a script written against an
2380 old version of systemtap may no longer run. In these cases, it may
2381 help to run systemtap with the --compatible VERSION flag, specifying
2382 the last known working version. Running systemtap with the
2383 --check-version flag will output a warning if any possible incompatible
2384 elements have been parsed. Deprecation historical details may be found
2385 in the NEWS file.
2386
2387 The purpose of deprecation facility is to improve the experience of
2388 scripts written for newer versions of systemtap (by adding better al‐
2389 ternatives and removing conflicting or messy older alternatives), while
2390 at the same time permitting scripts written for older versions of sys‐
2391 temtap to continue running. Deprecation is thus intended a service to
2392 users (and an inconvenience to systemtap's developers), rather than the
2393 other way around.
2394
2395 Please note that underscore-prefixed identifiers in the tapset some‐
2396 times undergo such changes that are difficult to preserve compatibility
2397 for, even with the deprecation mechanisms. Avoid relying on these in
2398 your scripts; instead propose them for promotion to non-underscored
2399 status.
2400
2401
2402
2404 Important files and their corresponding paths can be located in the
2405 stappaths (7) manual page.
2406
2407
2409 stapprobes(3stap),
2410 function::*[24m(3stap),
2411 probe::*[24m(3stap),
2412 tapset::*[24m(3stap),
2413 stappaths(7),
2414 staprun(8),
2415 stapdyn(8),
2416 systemtap(8),
2417 stapvars(3stap),
2418 stapex(3stap),
2419 stap-server(8),
2420 stap-prep(1),
2421 stapref(1),
2422 awk(1),
2423 gdb(1)
2424
2425
2427 Use the Bugzilla link of the project web page or our mailing list.
2428 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2429
2430 error::reporting(7stap),
2431 https://sourceware.org/systemtap/wiki/HowToReportBugs
2432
2433
2434
2435 STAP(1)