1STAP(1) General Commands Manual STAP(1)
2
3
4
6 stap - systemtap script translator/driver
7
8
9
11 stap [ OPTIONS ] FILENAME [ ARGUMENTS ]
12 stap [ OPTIONS ] - [ ARGUMENTS ]
13 stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
14 stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
15 stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]
16 stap [ OPTIONS ] --dump-probe-types
17 stap [ OPTIONS ] --dump-probe-aliases
18 stap [ OPTIONS ] --dump-functions
19
20
21
22
24 The stap program is the front-end to the Systemtap tool. It accepts
25 probing instructions written in a simple domain-specific language,
26 translates those instructions into C code, compiles this C code, and
27 loads the resulting module into a running Linux kernel or a Dyninst
28 user-space mutator, to perform the requested system trace/probe func‐
29 tions. You can supply the script in a named file (FILENAME), from
30 standard input (use - instead of FILENAME), or from the command line
31 (using -e SCRIPT). The program runs until it is interrupted by the
32 user, or if the script voluntarily invokes the exit() function, or by
33 sufficient number of soft errors.
34
35 The language, which is described the SCRIPT LANGUAGE section below, is
36 strictly typed, expressive, declaration free, procedural, prototyping-
37 friendly, and inspired by awk and C. It allows source code points or
38 events in the system to be associated with handlers, which are subrou‐
39 tines that are executed synchronously. It is somewhat similar concep‐
40 tually to "breakpoint command lists" in the gdb debugger.
41
42
44 systemtap comes with a variety of educational, documentation and refer‐
45 ence resources. They come online and/or packaged for offline use.
46 Some systemtap diagnostic warning/error messages specially suggest
47 reading a man page by including a string like [man error::pass5]. For
48 online documentation, see the project web site,
49 https://sourceware.org/systemtap/
50
51
52 ┌──────────────────────────┬──────────────────────────────────────────────────────┐
53 │man pages │ │
54 ├──────────────────────────┼──────────────────────────────────────────────────────┤
55 │stap (this page) │ language syntax, concepts, operation, options │
56 ├──────────────────────────┼──────────────────────────────────────────────────────┤
57 │error::* │ further explanation of error conditions │
58 ├──────────────────────────┼──────────────────────────────────────────────────────┤
59 │warning::* │ further explanation of warning conditions │
60 ├──────────────────────────┼──────────────────────────────────────────────────────┤
61 │stapprobes │ probe points and their $context variables │
62 ├──────────────────────────┼──────────────────────────────────────────────────────┤
63 │stapref │ quick reference to language syntax │
64 ├──────────────────────────┼──────────────────────────────────────────────────────┤
65 │stappaths │ list of directories, including books & references │
66 ├──────────────────────────┼──────────────────────────────────────────────────────┤
67 │stap-prep │ program to install auxiliary dependencies like ker‐ │
68 │ │ nel debuginfo │
69 ├──────────────────────────┼──────────────────────────────────────────────────────┤
70 │tapset::* │ generated list of tapsets │
71 ├──────────────────────────┼──────────────────────────────────────────────────────┤
72 │probe::* │ generated list of tapset probe aliases │
73 ├──────────────────────────┼──────────────────────────────────────────────────────┤
74 │function::* │ generated list of tapset functions │
75 ├──────────────────────────┼──────────────────────────────────────────────────────┤
76 │macro::* │ generated list of tapset macros │
77 ├──────────────────────────┼──────────────────────────────────────────────────────┤
78 │stapvars │ some of the tapset global variables │
79 ├──────────────────────────┼──────────────────────────────────────────────────────┤
80 │staprun, stapdyn, stapbpf │ programs for executing compiled systemtap scripts │
81 ├──────────────────────────┼──────────────────────────────────────────────────────┤
82 │systemtap │ initscript, boot-time probing │
83 ├──────────────────────────┼──────────────────────────────────────────────────────┤
84 │stap-server │ compilation server │
85 ├──────────────────────────┼──────────────────────────────────────────────────────┤
86 │stapex │ a few very basic script examples │
87 ├──────────────────────────┼──────────────────────────────────────────────────────┤
88 │books │ │
89 ├──────────────────────────┼──────────────────────────────────────────────────────┤
90 │Beginner's Guide │ tutorial book, language essentials, examples │
91 ├──────────────────────────┼──────────────────────────────────────────────────────┤
92 │Tutorial │ shorter tutorial, exercises │
93 ├──────────────────────────┼──────────────────────────────────────────────────────┤
94 │Language Reference │ detailed language manual, covers statistics/analysis │
95 ├──────────────────────────┼──────────────────────────────────────────────────────┤
96 │Tapset Reference │ the tapset man pages, reformatted into a book │
97 ├──────────────────────────┼──────────────────────────────────────────────────────┤
98 │references │ │
99 ├──────────────────────────┼──────────────────────────────────────────────────────┤
100 │example scripts │ over a hundred directly usable sysadmin tools, toys, │
101 │ │ hacks to learn from │
102 └──────────────────────────┴──────────────────────────────────────────────────────┘
103
105 The systemtap translator supports the following options. Any other op‐
106 tion prints a list of supported options. Options may be given on the
107 command line, as usual. If the file $SYSTEMTAP_DIR/rc exist, options
108 are also loaded from there and interpreted first. ($SYSTEMTAP_DIR de‐
109 faults to $HOME/.systemtap if unset.)
110
111
112 In some cases, the default value of an option depends on particular
113 system configuration and thus can't be mentioned here directly. In
114 some of those cases running "stap --help" might display the default.
115
116
117 - Use standard input instead of a given FILENAME as probe language
118 input, unless -e SCRIPT is given.
119
120 -h --help
121 Show help message.
122
123 -V --version
124 Show version message.
125
126 -p NUM Stop after pass NUM. The passes are numbered 1-5: parse, elabo‐
127 rate, translate, compile, run. See the PROCESSING section for
128 details.
129
130 -v Increase verbosity for all passes. Produce a larger volume of
131 informative (?) output each time option repeated.
132
133 --vp ABCDE
134 Increase verbosity on a per-pass basis. For example, "--vp 002"
135 adds 2 units of verbosity to pass 3 only. The combination
136 "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4
137 more for pass 5.
138
139 -k Keep the temporary directory after all processing. This may be
140 useful in order to examine the generated C code, or to reuse the
141 compiled kernel object.
142
143 -g Guru mode. Enable parsing of unsafe expert-level constructs
144 like embedded C.
145
146 -P Prologue-searching mode. This is equivalent to --pro‐
147 logue-searching=always. Activate heuristics to work around in‐
148 correct debugging information for function parameter $context
149 variables.
150
151 -u Unoptimized mode. Disable unused code elision and many other
152 optimizations during elaboration / translation.
153
154 -w Suppressed warnings mode. Disables all warning messages.
155
156 -W Treat all warnings as errors.
157
158 -b Use bulk mode (percpu files) for kernel-to-user data transfer.
159 Use the stap-merge program to multiplex them back together lat‐
160 er.
161
162 -i --interactive
163 Interactive mode. Enable an interface to build the systemtap
164 script incrementally and interactively.
165
166 -t Collect timing information on the number of times probe executes
167 and average amount of time spent in each probe-point. Also shows
168 the derivation for each probe-point.
169
170 -s NUM Use NUM megabyte buffers for kernel-to-user data transfer per
171 processor. The default is 16MB, or less on smaller memory ma‐
172 chines.
173
174 -I DIR Add the given directory to the tapset search directory. See the
175 description of pass 2 for details.
176
177 -D NAME=VALUE
178 Add the given C preprocessor directive to the module Makefile.
179 These can be used to override limit parameters described below.
180
181 -B NAME=VALUE
182 In kernel-runtime mode, add the given make directive to the ker‐
183 nel module build's make invocation. These can be used to add or
184 override kconfig options. For example, use
185
186 -B CONFIG_DEBUG_INFO=y
187
188 to add debugging information.
189
190 -B FLAG
191 In dyninst-runtime mode, add the given parameter to the compiler
192 CFLAGS used for building the dyninst shared library. For exam‐
193 ple, use
194
195 -B -g
196
197 to add debugging information.
198
199 -a ARCH
200 Use a cross-compilation mode for the given target architecture.
201 This requires access to the cross-compiler and the kernel build
202 tree, and goes along with the
203
204 -B CROSS_COMPILE=arch-tool-prefix-
205 and
206 -r /build/tree
207
208 options.
209
210 --modinfo NAME=VALUE
211 Add the name/value pair as a MODULE_INFO macro call to the gen‐
212 erated module. This may be useful to inform or override various
213 module-related checks in the kernel.
214
215 -G NAME=VALUE
216 Sets the value of global variable NAME to VALUE when staprun is
217 invoked. This applies to scalar variables declared global in
218 the script/tapset.
219
220 -R DIR Look for the systemtap runtime sources in the given directory.
221 Your DIR default can be seen using "stap --help".
222
223 -r /DIR
224 Build for kernel in given build tree. Can also be set with the
225 SYSTEMTAP_RELEASE environment variable.
226
227 -r RELEASE
228 Build for kernel in build tree /lib/modules/RELEASE/build. Can
229 also be set with the SYSTEMTAP_RELEASE environment variable.
230
231 -m MODULE
232 Use the given name for the generated kernel object module, in‐
233 stead of a unique randomized name. The generated kernel object
234 module is copied to the current directory.
235
236 -d MODULE
237 Add symbol/unwind information for the given module into the ker‐
238 nel object module. This may enable symbolic tracebacks from
239 those modules/programs, even if they do not have an explicit
240 probe placed into them.
241
242 --ldd Add symbol/unwind information for all user-space shared li‐
243 braries suspected by ldd to be necessary for user-space binaries
244 being probed or listed with the -d option. Caution: this can
245 make the probe modules considerably larger. Note that this op‐
246 tion does not deal with kernel-space modules: see instead
247 --all-modules below.
248
249 --all-modules
250 Equivalent to specifying "-dkernel" and a "-d" for each kernel
251 module that is currently loaded. Caution: this can make the
252 probe modules considerably larger.
253
254 -o FILE
255 Send standard output to named file. In bulk mode, percpu files
256 will start with FILE_ (FILE_cpu with -F) followed by the cpu
257 number. This supports strftime(3) formats for FILE.
258
259 -c CMD Start the probes, run CMD, and exit when CMD finishes. This al‐
260 so has the effect of setting target() to the pid of that
261 process. Note that many probe types trigger independently of
262 this setting. Consider including something like this to focus
263 your script.
264
265 probe FOO { if (pid() != target()) next; .... }
266
267
268 -x PID Sets target() to PID. The script runs independently of the
269 PID's lifespan.
270
271 -e SCRIPT
272 Run the given SCRIPT specified on the command line.
273
274 -E SCRIPT
275 Run the given SCRIPT specified. This SCRIPT is run in addition
276 to the main script specified, through -e, or as a script file.
277 This option can be repeated to run multiple scripts, and can be
278 used in listing mode (-l/-L).
279
280 -l PROBE
281 Instead of running a probe script, just list all available probe
282 points matching the given single probe point. The pattern may
283 include wildcards and aliases, but not comma-separated multiple
284 probe points. The process result code will indicate failure if
285 there are no matches.
286
287 % stap -e 'probe syscall.* { }'
288 [...]
289 % stap -l 'syscall.*'
290 syscall.accept
291 [...]
292 syscall.writev
293
294
295 -L PROBE
296 Similar to "-l", but list matching probe points plus their
297 available context variables. When -v is set with -L, the output
298 includes duplicate probe points which are distinguished by their
299 PC address.
300
301 % stap -L 'process("/lib64/libpython*.so.*").mark("*")'
302 process("/usr/lib64/libpython2.7.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
303 process("/usr/lib64/libpython2.7.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
304 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__entry") $arg1:long $arg2:long $arg3:long
305 process("/usr/lib64/libpython3.6m.so.1.0").mark("function__return") $arg1:long $arg2:long $arg3:long
306 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__done") $arg1:long
307 process("/usr/lib64/libpython3.6m.so.1.0").mark("gc__start") $arg1:long
308 process("/usr/lib64/libpython3.6m.so.1.0").mark("line") $arg1:long $arg2:long $arg3:long
309
310
311 -F Without -o option, load module and start probes, then detach
312 from the module leaving the probes running. With -o option, run
313 staprun in background as a daemon and show its pid.
314
315 -S size[,N]
316 Sets the maximum size of output file and the maximum number of
317 output files. If the size of output file will exceed size
318 megabytes, systemtap switches output file to the next file. And
319 if the number of output files exceed N , systemtap removes the
320 oldest output file. You can omit the second argument.
321
322 -T TIMEOUT
323 Exit the script after TIMEOUT seconds.
324
325 --skip-badvars
326 Ignore unresolvable or run-time-inaccessible context variables
327 and substitute with 0, without errors.
328
329
330 --prologue-searching[=WHEN]
331 Prologue-searching mode. Activate heuristics to work around in‐
332 correct debugging information for function parameter $context
333 variables. WHEN can be either "never", "always", or "auto" (i.e.
334 enabled by heuristic). If WHEN is missing, then "always" is as‐
335 sumed. If the option is missing, then "auto" is assumed.
336
337
338 --suppress-handler-errors
339 Wrap all probe handlers into something like this
340
341 try { ... } catch { next }
342
343 block, which causes any runtime errors to be quietly suppressed.
344 Suppressed errors do not count against MAXERRORS limits. In
345 this mode, the MAXSKIPPED limits are also suppressed, so that
346 many errors and skipped probes may be accumulated during a
347 script's runtime. Any overall counts will still be reported at
348 shutdown.
349
350
351 --compatible VERSION
352 Suppress recent script language or tapset changes which are in‐
353 compatible with given older version of systemtap. This may be
354 useful if a much older systemtap script fails to run. See the
355 DEPRECATION section for more details.
356
357
358 --check-version
359 This option is used to check if the active script has any con‐
360 structs that may be systemtap version specific. See the DEPRE‐
361 CATION section for more details.
362
363
364 --clean-cache
365 This option prunes stale entries from the cache directory. This
366 is normally done automatically after successful runs, but this
367 option will trigger the cleanup manually and then exit. See the
368 CACHING section for more details about cache limits.
369
370
371 --color[=WHEN], --colour[=WHEN]
372 This option controls coloring of error messages. WHEN can be ei‐
373 ther "never", "always", or "auto" (i.e. enable only if at a ter‐
374 minal). If WHEN is missing, then "always" is assumed. If the op‐
375 tion is missing, then "auto" is assumed.
376
377 Colors can be modified using the SYSTEMTAP_COLORS environment
378 variable. The format must be of the form
379 key1=val1:key2=val2:key3=val3 ...etc. Valid keys are "error",
380 "warning", "source", "caret", and "token". Values constitute
381 Select Graphic Rendition (SGR) parameter(s). Consult the docu‐
382 mentation of your terminal for the SGRs it supports. As an exam‐
383 ple, the default colors would be expressed as
384 error=01;31:warning=00;33:source=00;34:caret=01:token=01. If
385 SYSTEMTAP_COLORS is absent, the default colors will be used. If
386 it is empty or invalid, coloring is turned off.
387
388
389 --disable-cache
390 This option disables all use of the cache directory. No files
391 will be either read from or written to the cache.
392
393
394 --poison-cache
395 This option treats files in the cache directory as invalid. No
396 files will be read from the cache, but resulting files from this
397 run will still be written to the cache. This is meant as a
398 troubleshooting aid when stap's cached behavior seems to be mis‐
399 behaving. If it helped, there is a probably a bug in systemtap
400 that the developers would like you to report.
401
402
403 --privilege[=stapusr | =stapsys | =stapdev]
404 This option instructs stap to examine the script looking for
405 constructs which are not allowed for the specified privilege
406 level (see UNPRIVILEGED USERS). Compilation fails if any such
407 constructs are used. If stapusr or stapsys are specified when
408 using a compile server (see --use-server), the server will exam‐
409 ine the script and, if compilation succeeds, the server will
410 cryptographically sign the resulting kernel module, certifying
411 that is it safe for use by users at the specified privilege lev‐
412 el.
413
414 If --privilege has not been specified, -pN has not been speci‐
415 fied with N < 5, and the invoking user is not root, and is not a
416 member of the group stapdev, then stap will automatically add
417 the appropriate --privilege option to the options already speci‐
418 fied.
419
420
421 --unprivileged
422 This option is equivalent to --privilege=stapusr.
423
424
425 --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
426 Specify compile-server(s) to be used for compilation and/or in
427 conjunction with --list-servers and --trust-servers (see below)
428 for listing. If no argument is supplied, then the default in un‐
429 privileged mode (see --privilege) is to select compatible
430 servers which are trusted as SSL peers and as module signers and
431 currently online. Otherwise the default is to select compatible
432 servers which are trusted as SSL peers and currently online.
433 --use-server may be specified more than once, in which case a
434 list of servers is accumulated in the order specified. Servers
435 may be specified by host name, ip address, or by certificate se‐
436 rial number (obtained using --list-servers). The latter is most
437 commonly used when adding or revoking trust in a server (see
438 --trust-servers below). If a server is specified by host name or
439 ip address, then an optional port number may be specified. This
440 is useful for accessing servers which are not on the local net‐
441 work or to specify a particular server.
442
443 IP addresses may be IPv4 or IPv6 addresses.
444
445 If a particular IPv6 address is link local and exists on more
446 than one interface, the intended interface may be specified by
447 appending the address with a percent sign (%) followed by the
448 intended interface name. For example,
449 "fe80::5eff:35ff:fe07:55ca%eth0".
450
451 In order to specify a port number with an IPv6 address, it is
452 necessary to enclose the IPv6 address in square brackets ([]) in
453 order to separate the port number from the rest of the address.
454 For example, "[fe80::5eff:35ff:fe07:55ca]:5000" or
455 "[fe80::5eff:35ff:fe07:55ca%eth0]:5000".
456
457 If --use-server has not been specified, -pN has not been speci‐
458 fied with N < 5, and the invoking user not root, is not a member
459 of the group stapdev, but is a member of the group stapusr, then
460 stap will automatically add --use-server to the options already
461 specified.
462
463
464 --use-server-on-error[=yes|=no]
465 Instructs stap to retry compilation of a script using a compile
466 server if compilation on the local host fails in a manner which
467 suggests that it might succeed using a server. If this option
468 is not specified, the default is no. If no argument is provid‐
469 ed, then the default is yes. Compilation will be retried for
470 certain types of errors (e.g. insufficient data or resources)
471 which may not occur during re-compilation by a compile server.
472 Compile servers will be selected automatically for the re-compi‐
473 lation attempt as if --use-server was specified with no argu‐
474 ments.
475
476
477 --list-servers[=SERVERS]
478 Display the status of the requested SERVERS, where SERVERS is a
479 comma-separated list of server attributes. The list of at‐
480 tributes is combined to filter the list of servers displayed.
481 Supported attributes are:
482
483 all specifies all known servers (trusted SSL peers, trusted
484 module signers, online servers).
485
486 specified
487 specifies servers specified using --use-server.
488
489 online filters the output by retaining information about servers
490 which are currently online.
491
492 trusted
493 filters the output by retaining information about servers
494 which are trusted as SSL peers.
495
496 signer filters the output by retaining information about servers
497 which are trusted as module signers (see --privilege).
498
499 compatible
500 filters the output by retaining information about servers
501 which are compatible with the current kernel release and
502 architecture.
503
504 If no argument is provided, then the default is specified. If
505 no servers were specified using --use-server, then the default
506 servers for --use-server are listed.
507
508 Note that --list-servers uses the avahi-daemon service to detect
509 online servers. If this service is not available, then
510 --list-servers will fail to detect any online servers. In order
511 for --list-servers to detect servers listening on IPv6 address‐
512 es, the avahi-daemon configuration file /etc/avahi/avahi-dae‐
513 mon.conf must contain an active "use-ipv6=yes" line. The service
514 must be restarted after adding this line in order for IPv6 to be
515 enabled.
516
517
518 --trust-servers[=TRUST_SPEC]
519 Grant or revoke trust in compile-servers, specified using
520 --use-server as specified by TRUST_SPEC, where TRUST_SPEC is a
521 comma-separated list specifying the trust which is to be granted
522 or revoked. Supported elements are:
523
524 ssl trust the specified servers as SSL peers.
525
526 signer trust the specified servers as module signers (see
527 --privilege). Only root can specify signer.
528
529 all-users
530 grant trust as an ssl peer for all users on the local
531 host. The default is to grant trust as an ssl peer for
532 the current user only. Trust as a module signer is always
533 granted for all users. Only root can specify all-users.
534
535 revoke revoke the specified trust. The default is to grant it.
536
537 no-prompt
538 do not prompt the user for confirmation before carrying
539 out the requested action. The default is to prompt the
540 user for confirmation.
541
542 If no argument is provided, then the default is ssl. If no
543 servers were specified using --use-server, then no trust will be
544 granted or revoked.
545
546 Unless no-prompt has been specified, the user will be prompted
547 to confirm the trust to be granted or revoked before the opera‐
548 tion is performed.
549
550
551 --sign-module
552 Sign the module with a MOK (Machine Owner Key) on UEFI/Secure‐
553 Boot systems. See the SECUREBOOT section for more details.
554
555
556 --dump-probe-types
557 Dumps a list of supported probe types and exits. If --privi‐
558 lege=stapusr is also specified, the list will be limited to
559 probe types available to unprivileged users.
560
561
562 --dump-probe-aliases
563 Dumps a list of all probe aliases found in library files and ex‐
564 its.
565
566
567 --dump-functions
568 Dumps a list of all the public functions found in library files
569 and exits. Also includes their parameters and types. A function
570 of type 'unknown' indicates a function that does not return a
571 value. Note that not all function/parameter types may be re‐
572 solved (these are also shown by 'unknown'). This features is
573 very memory-intensive and thus may not work properly with --use-
574 server if the target server imposes an rlimit on process memory
575 (i.e. through the ~stap-server/.systemtap/rc configuration file,
576 see stap-server(8)).
577
578
579 --remote URL
580 Set the execution target to the given host. This option may be
581 repeated to target multiple execution targets. Passes 1-4 are
582 completed locally as normal to build the script, and then pass 5
583 will copy the module to the target and run it. Acceptable URL
584 forms include:
585
586 [USER@]HOSTNAME, ssh://[USER@]HOSTNAME
587 This mode uses ssh, optionally using a username not
588 matching your own. If a custom ssh_config file is in use,
589 add SendEnv LANG to retain internationalization function‐
590 ality.
591
592 libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
593 This mode uses stapvirt to execute the script on a domain
594 managed by libvirt. Optionally, LIBVIRT_URI may be speci‐
595 fied to connect to a specific driver and/or a remote
596 host. For example, to connect to the local privileged QE‐
597 MU driver, use:
598
599 --remote libvirt://MyDomain/qemu:///system
600
601 See the page at <http://libvirt.org/uri.html> for sup‐
602 ported URIs. Also see stapvirt(1) for more information on
603 how to prepare the domain for stap probing.
604
605 unix:PATH
606 This mode connects to a UNIX socket. This can be used
607 with a QEMU virtio-serial port for executing scripts in‐
608 side a running virtual machine.
609
610 direct://
611 Special loopback mode to run on the local host.
612
613 --remote-prefix
614 Prefix each line of remote output with "N: ", where N is the in‐
615 dex of the remote execution target from which the given line
616 originated.
617
618
619 --download-debuginfo[=OPTION]
620 Enable, disable or set a timeout for the automatic debuginfo
621 downloading feature offered by abrt as specified by OPTION,
622 where OPTION is one of the following:
623
624 yes enable automatic downloading of debuginfo with no time‐
625 out. This is the same as not providing an OPTION value to
626 --download-debuginfo
627
628 no explicitly disable automatic downloading of debuginfo.
629 This is the same as not using the option at all.
630
631 ask show abrt output, and ask before continuing download. No
632 timeout will be set.
633
634 <timeout>
635 specify a timeout as a positive number to stop the down‐
636 load if it is taking longer than <timeout> seconds.
637
638 --rlimit-as=NUM
639 Specify the maximum size of the process's virtual memory (ad‐
640 dress space), in bytes.
641
642
643 --rlimit-cpu=NUM
644 Specify the CPU time limit, in seconds.
645
646
647 --rlimit-nproc=NUM
648 Specify the maximum number of processes that can be created.
649
650
651 --rlimit-stack=NUM
652 Specify the maximum size of the process stack, in bytes.
653
654
655 --rlimit-fsize=NUM
656 Specify the maximum size of files that the process may create,
657 in bytes.
658
659
660 --sysroot=DIR
661 Specify sysroot directory where target files (executables, li‐
662 braries, etc.) are located. With -r RELEASE, the sysroot will
663 be searched for the appropriate kernel build directory. With -r
664 /DIR, however, the sysroot will not be used to find the kernel
665 build.
666
667
668 --sysenv=VAR=VALUE
669 Provide an alternate value for an environment variable where the
670 value on a remote system differs. Path variables (e.g. PATH,
671 LD_LIBRARY_PATH) are assumed to be relative to the directory
672 provided by --sysroot, if provided.
673
674
675 --suppress-time-limits
676 Disable -DSTP_OVERLOAD related options as well as -DMAXACTION
677 and -DMAXTRYLOCK. This option requires guru mode.
678
679
680 --runtime=MODE
681 Set the pass-5 runtime mode. Valid options are kernel (de‐
682 fault), dyninst and bpf. See ALTERNATE RUNTIMES below for more
683 information.
684
685
686 --dyninst
687 Shorthand for --runtime=dyninst.
688
689
690 --bpf Shorthand for --runtime=bpf.
691
692
693 --save-uprobes
694 On machines that require SystemTap to build its own uprobes mod‐
695 ule (kernels prior to version 3.5), this option instructs Sys‐
696 temTap to also save a copy of the module in the current directo‐
697 ry (creating a new "uprobes" directory first).
698
699
700 --target-namespaces=PID
701 Allow for a set of target namespaces to be set based on the
702 namespaces the given PID is in. This is for namespace-aware
703 tapset functions. If the target namespaces was not set, the tar‐
704 get defaults to the stap process' namespaces.
705
706
707 --monitor=INTERVAL
708 Enables an interface to display status information about the
709 module(uptime, module name, invoker uid, memory sizes, global
710 variables, list of probes with their statistics). An optional
711 argument INTERVAL can be supplied to set the refresh rate in
712 seconds of the status window. The module can also be controlled
713 by a list of commands using the following keys:
714
715 c Resets all global variables to their initial values or
716 zeroes them if they did not have an initial value.
717
718 s Rotates the attribute used to sort the list of probes.
719
720 t Brings up a prompt to allow toggling(on/off) of probes by
721 index. Probe points are still affected by their condi‐
722 tions.
723
724 r Resumes the script by toggling on all probes.
725
726 p Pauses the script by toggling off all probes.
727
728 x Hides/shows the status window. This allows for more out‐
729 put to be seen.
730
731 navigation-keys
732 The navigation keys can be used to scroll up and down the
733 windows.
734
735 Tab Toggle scrolling between status and output windows.
736
737
738 --example
739 This option is used to run example scripts without having to en‐
740 ter the entire path to the script. Example scripts can be found
741 in the directory specified in the stappaths(7) manual page.
742
743
744 --no-global-var-display
745 This option is used to disable the automatic logging of unused
746 global variables at the end of a stap session.
747
748
749 --language-server
750 Language server mode. Start a language server which will commu‐
751 nicate via stdio. The language server will respect stap verbosi‐
752 ty.
753
754
756 Any additional arguments on the command line are passed to the script
757 parser for substitution. See below.
758
759
761 The systemtap script language resembles awk and C. There are two main
762 outermost constructs: probes and functions. Within these, statements
763 and expressions use C-like operator syntax and precedence.
764
765
766 GENERAL SYNTAX
767 Whitespace is ignored. Three forms of comments are supported:
768 # ... shell style, to the end of line, except for $# and @#
769 // ... C++ style, to the end of line
770 /* ... C style ... */
771 Literals are either strings enclosed in double-quotes (passing through
772 the usual C escape codes with backslashes, and with adjacent string
773 literals glued together, also as in C), or integers (in decimal, hexa‐
774 decimal, or octal, using the same notation as in C). All strings are
775 limited in length to some reasonable value (a few hundred bytes). In‐
776 tegers are 64-bit signed quantities, although the parser also accepts
777 (and wraps around) values above positive 2**63.
778
779 In addition, script arguments given at the end of the command line may
780 be inserted. Use $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for
781 insertion as a string literal. The number of arguments may be accessed
782 through $# (as an unquoted number) or through @# (as a quoted number).
783 These may be used at any place a token may begin, including within the
784 preprocessing stage. Reference to an argument number beyond what was
785 actually given is an error.
786
787
788 PREPROCESSING
789 A simple conditional preprocessing stage is run as a part of parsing.
790 The general form is similar to the cond ? exp1 : exp2 ternary operator:
791
792 %( CONDITION %? TRUE-TOKENS %)
793 %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)
794
795 The CONDITION is either an expression whose format is determined by its
796 first keyword, or a string literals comparison or a numeric literals
797 comparison. It can be also composed of many alternatives and conjunc‐
798 tions of CONDITIONs (meant as in previous sentence) using || and && re‐
799 spectively. However, parentheses are not supported yet, so remembering
800 that conjunction takes precedence over alternative is important.
801
802 If the first part is the identifier kernel_vr or kernel_v to refer to
803 the kernel version number, with ("2.6.13-1.322FC3smp") or without
804 ("2.6.13") the release code suffix, then the second part is one of the
805 six standard numeric comparison operators <, <=, ==, !=, >, and >=, and
806 the third part is a string literal that contains an RPM-style version-
807 release value. The condition is deemed satisfied if the version of the
808 target kernel (as optionally overridden by the -r option) compares to
809 the given version string. The comparison is performed by the glibc
810 function strverscmp. As a special case, if the operator is for simple
811 equality (==), or inequality (!=), and the third part contains any
812 wildcard characters (* or ? or [), then the expression is treated as a
813 wildcard (mis)match as evaluated by fnmatch.
814
815 If, on the other hand, the first part is the identifier arch to refer
816 to the processor architecture (as named by the kernel build system
817 ARCH/SUBARCH), then the second part is one of the two string comparison
818 operators == or !=, and the third part is a string literal for matching
819 it. This comparison is a wildcard (mis)match.
820
821 Similarly, if the first part is an identifier like CONFIG_something to
822 refer to a kernel configuration option, then the second part is == or
823 !=, and the third part is a string literal for matching the value (com‐
824 monly "y" or "m"). Nonexistent or unset kernel configuration options
825 are represented by the empty string. This comparison is also a wild‐
826 card (mis)match.
827
828 If the first part is the identifier systemtap_v, the test refers to the
829 systemtap compatibility version, which may be overridden for old
830 scripts with the --compatible flag. The comparison operator is as is
831 for kernel_v and the right operand is a version string. See also the
832 DEPRECATION section below.
833
834 If the first part is the identifier systemtap_privilege, the test
835 refers to the privilege level that the systemtap script is compiled
836 with. Here the second part is == or !=, and the third part is a string
837 literal, either "stapusr" or "stapsys" or "stapdev".
838
839 If the first part is the identifier guru_mode, the test refers to if
840 the systemtap script is compiled with guru_mode. Here the second part
841 is == or !=, and the third part is a number, either 1 or 0.
842
843 If the first part is the identifier runtime, the test refers to the
844 systemtap runtime mode. See ALTERNATE RUNTIMES below for more informa‐
845 tion on runtimes. The second part is one of the two string comparison
846 operators == or !=, and the third part is a string literal for matching
847 it. This comparison is a wildcard (mis)match.
848
849 Otherwise, the CONDITION is expected to be a comparison between two
850 string literals or two numeric literals. In this case, the arguments
851 are the only variables usable.
852
853 The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens
854 (possibly including nested preprocessor conditionals), and are passed
855 into the input stream if the condition is true or false. For example,
856 the following code induces a parse error unless the target kernel ver‐
857 sion is newer than 2.6.5:
858
859 %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence
860
861 The following code might adapt to hypothetical kernel version drift:
862
863 probe kernel.function (
864 %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
865 %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
866 UNSUPPORTED %) %)
867 ) { /* ... */ }
868
869 %( arch == "ia64" %?
870 probe syscall.vliw = kernel.function("vliw_widget") {}
871 %)
872
873
874
875 PREPROCESSOR MACROS
876 The preprocessor also supports a simple macro facility, run as a sepa‐
877 rate pass before conditional preprocessing.
878
879 Macros are defined using the following construct:
880
881 @define NAME %( BODY %)
882 @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)
883
884 Macros, and parameters inside a macro body, are both invoked by prefix‐
885 ing the macro name with an @ symbol:
886
887 @define foo %( x %)
888 @define add(a,b) %( ((@a)+(@b)) %)
889
890 @foo = @add(2,2)
891
892
893 Macro expansion is currently performed in a separate pass before condi‐
894 tional compilation. Therefore, both TRUE- and FALSE-tokens in condi‐
895 tional expressions will be macroexpanded regardless of how the condi‐
896 tion is evaluated. This can sometimes lead to errors:
897
898 // The following results in a conflict:
899 %( CONFIG_UPROBE == "y" %?
900 @define foo %( process.syscall %)
901 %:
902 @define foo %( **ERROR** %)
903 %)
904
905 // The following works properly as expected:
906 @define foo %(
907 %( CONFIG_UPROBE == "y" %? process.syscall %: **ERROR** %)
908 %)
909
910 The first example is incorrect because both @defines are evaluated in a
911 pass prior to the conditional being evaluated.
912
913 Normally, a macro definition is local to the file it occurs in. Thus,
914 defining a macro in a tapset does not make it available to the user of
915 the tapset. Publically available library macros can be defined by in‐
916 cluding .stpm files on the tapset search path. These files may only
917 contain @define constructs, which become visible across all tapsets and
918 user scripts. Optionally, within the .stpm files, a public macro defi‐
919 nition can be surrounded by a preprocessor conditional as described
920 above.
921
922
923 CONSTANTS
924 Tapsets or guru-mode user scripts can access header file constant to‐
925 kens, typically macros, using built-in @const() operator. The respec‐
926 tive header file inclusion is possible either via the tapset library,
927 or using a top-level guru mode embedded-C construct. This results in
928 appropriate embedded C pragma comments setting.
929
930 @const("STP_SKIP_BADVARS")
931
932
933
934 VARIABLES
935 Identifiers for variables and functions are an alphanumeric sequence,
936 and may include _ and $ characters. They may not start with a plain
937 digit, as in C. Each variable is by default local to the probe or
938 function statement block within which it is mentioned, and therefore
939 its scope and lifetime is limited to a particular probe or function in‐
940 vocation.
941
942 Scalar variables are implicitly typed as either string or integer. As‐
943 sociative arrays also have a string or integer value, and a tuple of
944 strings and/or integers serving as a key. Here are a few basic expres‐
945 sions.
946
947 var1 = 5
948 var2 = "bar"
949 array1 [pid()] = "name" # single numeric key
950 array2 ["foo",4,i++] += 5 # vector of string/num/num keys
951 if (["hello",5,4] in array2) println ("yes") # membership test
952
953
954 The translator performs type inference on all identifiers, including
955 array indexes and function parameters. Inconsistent type-related use
956 of identifiers signals an error.
957
958 Variables may be declared global, so that they are shared amongst all
959 probes and functions and live as long as the entire systemtap session.
960 There is one namespace for all global variables, regardless of which
961 script file they are found within. Concurrent access to global vari‐
962 ables is automatically protected with locks, see the SAFETY AND SECURI‐
963 TY section for more details. A global declaration may be written at
964 the outermost level anywhere, not within a block of code. Global vari‐
965 ables which are written but never read will be displayed automatically
966 at session shutdown. The translator will infer for each its value
967 type, and if it is used as an array, its key types. Optionally, scalar
968 globals may be initialized with a string or number literal. The fol‐
969 lowing declaration marks variables as global.
970
971 global var1, var2, var3=4
972
973
974 Global variables can also be set as module options. One can do this by
975 either using the -G option, or the module must first be compiled using
976 stap -p4. Global variables can then be set on the command line when
977 calling staprun on the module generated by stap -p4. See staprun(8) for
978 more information.
979
980 The scope of a global variable may be limited to a tapset or user
981 script file using private keyword. The global keyword is optional when
982 defining a private global variable. Following declaration marks var1
983 and var2 private globals.
984
985 private global var1=2
986 private var2
987
988
989 Arrays are limited in size by the MAXMAPENTRIES variable -- see the
990 SAFETY AND SECURITY section for details. Optionally, global arrays may
991 be declared with a maximum size in brackets, overriding MAXMAPENTRIES
992 for that array only. Note that this doesn't indicate the type of keys
993 for the array, just the size.
994
995 global tiny_array[10], normal_array, big_array[50000]
996
997
998 Arrays may be configured for wrapping using the '%' suffix. This caus‐
999 es older elements to be overwritten if more elements are inserted than
1000 the array can hold. This works for both associative and statistics
1001 typed arrays.
1002
1003 global wrapped_array1%[10], wrapped_array2%
1004
1005
1006
1007 Many types of probe points provide context variables, which are run-
1008 time values, safely extracted from the kernel or userspace program be‐
1009 ing probed. These are prefixed with the $ character. The CONTEXT
1010 VARIABLES section in stapprobes(3stap) lists what is available for each
1011 type of probe point. These context variables become normal string or
1012 numeric scalars once they are stored in normal script variables. See
1013 the TYPECASTING section below on how to to turn them back into typed
1014 pointers for further processing as context variables. There is some
1015 automation to help!
1016
1017
1018 STATEMENTS
1019 Statements enable procedural control flow. They may occur within func‐
1020 tions and probe handlers. The total number of statements executed in
1021 response to any single probe event is limited to some number defined by
1022 the MAXACTION macro in the translated C code, and is in the neighbour‐
1023 hood of 1000.
1024
1025 EXP Execute the string- or integer-valued expression and throw away
1026 the value.
1027
1028 { STMT1 STMT2 ... }
1029 Execute each statement in sequence in this block. Note that
1030 separators or terminators are generally not necessary between
1031 statements.
1032
1033 ; Null statement, do nothing. It is useful as an optional separa‐
1034 tor between statements to improve syntax-error detection and to
1035 handle certain grammar ambiguities.
1036
1037 if (EXP) STMT1 [ else STMT2 ]
1038 Compare integer-valued EXP to zero. Execute the first (non-ze‐
1039 ro) or second STMT (zero).
1040
1041 while (EXP) STMT
1042 While integer-valued EXP evaluates to non-zero, execute STMT.
1043
1044 for (EXP1; EXP2; EXP3) STMT
1045 Execute EXP1 as initialization. While EXP2 is non-zero, execute
1046 STMT, then the iteration expression EXP3.
1047
1048 foreach (VAR in ARRAY [ limit EXP ]) STMT
1049 Loop over each element of the named global array, assigning cur‐
1050 rent key to VAR. The array may not be modified within the
1051 statement. By adding a single + or - operator after the VAR or
1052 the ARRAY identifier, the iteration will proceed in a sorted or‐
1053 der, by ascending or descending index or value. If the array
1054 contains statistics aggregates, adding the desired @operator be‐
1055 tween the ARRAY identifier and the + or - will specify the sort‐
1056 ing aggregate function. See the STATISTICS section below for
1057 the ones available. Default is @count. Using the optional lim‐
1058 it keyword limits the number of loop iterations to EXP times.
1059 EXP is evaluated once at the beginning of the loop.
1060
1061 foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
1062 Same as above, used when the array is indexed with a tuple of
1063 keys. A sorting suffix may be used on at most one VAR or ARRAY
1064 identifier.
1065
1066 foreach ([VAR1, VAR2, ...] in ARRAY [INDEX1, INDEX2, ...] [ limit EXP
1067 ]) STMT
1068 Same as above, where iterations are limited to elements in the
1069 array where the keys match the index values specified. The sym‐
1070 bol * can be used to specify an index and will be treated as a
1071 wildcard.
1072
1073 foreach (VAR0 = VAR in ARRAY [ limit EXP ]) STMT
1074 This variant of foreach saves current value into VAR0 on each
1075 iteration, so it is the same as ARRAY[VAR]. This also works
1076 with a tuple of keys. Sorting suffixes on VAR0 have the same
1077 effect as on ARRAY.
1078
1079 foreach (VAR0 = VAR in ARRAY [INDEX1, INDEX2, ...] [ limit EXP ]) STMT
1080 Same as above, where iterations are limited to elements in the
1081 array where the keys match the index values specified. The sym‐
1082 bol * can be used to specify an index and will be treated as a
1083 wildcard.
1084
1085 break, continue
1086 Exit or iterate the innermost nesting loop (while or for or
1087 foreach) statement.
1088
1089 return EXP
1090 Return EXP value from enclosing function. If the function's
1091 value is not taken anywhere, then a return statement is not
1092 needed, and the function will have a special "unknown" type with
1093 no return value.
1094
1095 next Return now from enclosing probe handler. This is especially
1096 useful in probe aliases that apply event filtering predicates.
1097 When used in functions, the execution will be immediately trans‐
1098 ferred to the next overloaded function.
1099
1100 try { STMT1 } catch { STMT2 }
1101 Run the statements in the first block. Upon any run-time er‐
1102 rors, abort STMT1 and start executing STMT2. Any errors in
1103 STMT2 will propagate to outer try/catch blocks, if any.
1104
1105 try { STMT1 } catch(VAR) { STMT2 }
1106 Same as above, plus assign the error message to the string
1107 scalar variable VAR.
1108
1109 delete ARRAY[INDEX1, INDEX2, ...]
1110 Remove from ARRAY the element specified by the index tuple. If
1111 the index tuple contains a * in place of an index, the * is
1112 treated as a wildcard and all elements with keys that match the
1113 index tuple will be removed from ARRAY. The value will no
1114 longer be available, and subsequent iterations will not report
1115 the element. It is not an error to delete an element that does
1116 not exist.
1117
1118 delete ARRAY
1119 Remove all elements from ARRAY.
1120
1121 delete SCALAR
1122 Removes the value of SCALAR. Integers and strings are cleared
1123 to 0 and "" respectively, while statistics are reset to the ini‐
1124 tial empty state.
1125
1126
1127 EXPRESSIONS
1128 Systemtap supports a number of operators that have the same general
1129 syntax, semantics, and precedence as in C and awk. Arithmetic is per‐
1130 formed as per typical C rules for signed integers. Division by zero or
1131 overflow is detected and results in an error.
1132
1133 binary numeric operators
1134 * / % + - >> << & ^ | && ||
1135
1136 binary string operators
1137 . (string concatenation)
1138
1139 numeric assignment operators
1140 = *= /= %= += -= >>= <<= &= ^= |=
1141
1142 string assignment operators
1143 = .=
1144
1145 unary numeric operators
1146 + - ! ~ ++ --
1147
1148 binary numeric, string comparison or regex matching operators
1149 < > <= >= == != =~ !~
1150
1151 ternary operator
1152 cond ? exp1 : exp2
1153
1154 grouping operator
1155 ( exp )
1156
1157 function call
1158 fn ([ arg1, arg2, ... ])
1159
1160 array membership check
1161 exp in array
1162 [exp1, exp2, ... ] in array
1163 [*, *, ... ] in array
1164
1165
1166 REGULAR EXPRESSION MATCHING
1167 The scripting language supports regular expression matching. The basic
1168 syntax is as follows:
1169
1170 exp =~ regex
1171 exp !~ regex
1172
1173 (The first operand must be an expression evaluating to a string; the
1174 second operand must be a string literal containing a syntactically
1175 valid regular expression.)
1176
1177 The regular expression syntax supports POSIX Extended Regular Expres‐
1178 sion features as documented in egrep(1) except for subexpression reuse
1179 ("\1") functionality.
1180
1181 After a successful match, the contents of the matched string and subex‐
1182 pressions can be extracted using the matched() and ngroups() tapset
1183 functions as follows:
1184
1185 if ("an example string" =~ "str(ing)") {
1186 matched(0) // -> returns "string", the matched substring
1187 matched(1) // -> returns "ing", the 1st matched subexpression
1188 ngroups() // -> returns 2, the number of matched groups
1189 }
1190
1191
1192 PROBES
1193 The main construct in the scripting language identifies probes. Probes
1194 associate abstract events with a statement block ("probe handler") that
1195 is to be executed when any of those events occur. The general syntax
1196 is as follows:
1197
1198 probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
1199 probe PROBEPOINT [, PROBEPOINT] if (CONDITION) { [STMT ...] }
1200
1201
1202 Events are specified in a special syntax called "probe points". There
1203 are several varieties of probe points defined by the translator, and
1204 tapset scripts may define further ones using aliases. Probe points may
1205 be wildcarded, grouped, or listed in preference sequences, or declared
1206 optional. More details on probe point syntax and semantics are listed
1207 on the stapprobes(3stap) manual page.
1208
1209 The probe handler is interpreted relative to the context of each event.
1210 For events associated with kernel code, this context may include vari‐
1211 ables defined in the source code at that spot. These "context vari‐
1212 ables" are presented to the script as variables whose names are pre‐
1213 fixed with "$". They may be accessed only if the kernel's compiler
1214 preserved them despite optimization. This is the same constraint that
1215 a debugger user faces when working with optimized code. In addition,
1216 the objects must exist in paged-in memory at the moment of the system‐
1217 tap probe handler's execution, because systemtap must not cause (sup‐
1218 presses) any additional paging. Some probe types have very little con‐
1219 text. See the stapprobes(3stap) man pages to see the kinds of context
1220 variables available at each kind of probe point. As of systemtap ver‐
1221 sion 4.3, functions called from the handlers of some probe point types
1222 may also refer to context variables. These are treated as if a clone
1223 of that function was inlined into the calling probe handler and $vari‐
1224 ables evaluated in its context.
1225
1226 Probes may be decorated with an arming condition, consisting of a sim‐
1227 ple boolean expression on read-only global script variables. While
1228 disarmed (inactive, condition evaluates to false), some probe types re‐
1229 duce or eliminate their run-time overheads. When an arming condition
1230 evaluates to true, probes will be soon re-armed, and their probe han‐
1231 dlers will start getting called as the events fire. (Some events may
1232 be lost during the arming interval. If this is unacceptable, do not
1233 use arming conditions for those probes.) Example of the syntax:
1234
1235 probe timer.us(TIMER) if (enabled) {
1236 }
1237
1238
1239 New probe points may be defined using "aliases". Probe point aliases
1240 look similar to probe definitions, but instead of activating a probe at
1241 the given point, it just defines a new probe point name as an alias to
1242 an existing one. There are two types of alias, i.e. the prologue style
1243 and the epilogue style which are identified by "=" and "+=" respective‐
1244 ly.
1245
1246 For prologue style alias, the statement block that follows an alias
1247 definition is implicitly added as a prologue to any probe that refers
1248 to the alias. While for the epilogue style alias, the statement block
1249 that follows an alias definition is implicitly added as an epilogue to
1250 any probe that refers to the alias. For example:
1251
1252 probe syscall.read = kernel.function("sys_read") {
1253 fildes = $fd
1254 if (execname() == "init") next # skip rest of probe
1255 }
1256
1257 defines a new probe point syscall.read, which expands to
1258 kernel.function("sys_read"), with the given statement as a prologue,
1259 which is useful to predefine some variables for the alias user and/or
1260 to skip probe processing entirely based on some conditions. And
1261
1262 probe syscall.read += kernel.function("sys_read") {
1263 if (tracethis) println ($fd)
1264 }
1265
1266 defines a new probe point with the given statement as an epilogue,
1267 which is useful to take actions based upon variables set or left over
1268 by the the alias user. Please note that in each case, the statements
1269 in the alias handler block are treated ordinarily, so that variables
1270 assigned there constitute mere initialization, not a macro substitu‐
1271 tion.
1272
1273 Aliases can also be defined to include both a prologue and an epilogue.
1274
1275 probe syscall.read = kernel.function("sys_read") {
1276 fildes = $fd
1277 if (execname() == "init") next
1278 },{
1279 if (tracethis) println ($fd)
1280 }
1281
1282
1283 An alias is used just like a built-in probe type.
1284
1285 probe syscall.read {
1286 printf("reading fd=%d\n", fildes)
1287 if (fildes > 10) tracethis = 1
1288 }
1289
1290
1291 Probes with an alias can make use of the @probewrite predicate. This
1292 check is used to detect whether a script variable or target variable
1293 has been written to in the probe handler body.
1294
1295 @probewrite(var)
1296 expands to 1 iff var has been written to in the probe handler
1297 body, otherwise it expands to 0.
1298
1299 In the following example, @probewrite(var) expands to 1 because var has
1300 been written to in the probe handler body and consequently, the condi‐
1301 tional statement will run.
1302
1303 probe foo = begin { var = 0 }, { if (@probewrite(var)) println(var) }
1304
1305 probe foo {
1306 var = 1
1307 }
1308
1309
1310
1311 FUNCTIONS
1312 Systemtap scripts may define subroutines to factor out common work.
1313 Functions take any number of scalar (integer or string) arguments, and
1314 must return a single scalar (integer or string). An example function
1315 declaration looks like this:
1316
1317 function thisfn (arg1, arg2) {
1318 return arg1 + arg2
1319 }
1320
1321 Note the general absence of type declarations, which are instead in‐
1322 ferred by the translator. However, if desired, a function definition
1323 may include explicit type declarations for its return value and/or its
1324 arguments. This is especially helpful for embedded-C functions. In
1325 the following example, the type inference engine need only infer type
1326 type of arg2 (a string).
1327
1328 function thatfn:string (arg1:long, arg2) {
1329 return sprint(arg1) . arg2
1330 }
1331
1332 Functions may call others or themselves recursively, up to a fixed
1333 nesting limit. This limit is defined by the MAXNESTING macro in the
1334 translated C code and is in the neighbourhood of 10.
1335
1336 Functions may be marked private using the private keyword to limit
1337 their scope to the tapset or user script file they are defined in. An
1338 example definition of a private function follows:
1339
1340 private function three:long () { return 3 }
1341
1342
1343 Functions terminating without reaching an explicit return statement
1344 will return an implicit 0 or "", determined by type inference.
1345
1346 Functions may be overloaded during both runtime and compile time.
1347
1348 Runtime overloading allows the executed function to be selected while
1349 the module is running based on runtime conditions and is achieved using
1350 the "next" statement in script functions and STAP_NEXT macro for embed‐
1351 ded-C functions. For example,
1352
1353
1354 function f() { if (condition) next; print("first function") }
1355 function f() %{ STAP_NEXT; print("second function") %}
1356 function f() { print("third function") }
1357
1358
1359 During a functioncall f(), the execution will transfer to the third
1360 function if condition evaluates to true and print "third function".
1361 Note that the second function is unconditionally nexted.
1362
1363 Parameter overloading allows the function to be executed to be selected
1364 at compile time based on the number of arguments provided to the func‐
1365 tioncall. For example,
1366
1367
1368 function g() { print("first function") }
1369 function g(x) { print("second function") }
1370 g() -> "first function"
1371 g(1) -> "second function"
1372
1373
1374 Note that runtime overloading does not occur in the above example, as
1375 exactly one function will be resolved for the functioncall. The use of
1376 a next statement inside a function while no more overloads remain will
1377 trigger a runtime exception Runtime overloading will only occur if the
1378 functions have the same arity, functions with the same name but differ‐
1379 ent number of parameters are completely unrelated.
1380
1381 Execution order is determined by a priority value which may be speci‐
1382 fied. If no explicit priority is specified, user script functions are
1383 given a higher priority than library functions. User script functions
1384 and library functions are assigned a default priority value of 0 and 1
1385 respectively. Functions with the same priority are executed in decla‐
1386 ration order. For example,
1387
1388
1389 function f():3 { if (condition) next; print("first function") }
1390 function f():1 { if (condition) next; print("second function") }
1391 function f():2 { print("third function") }
1392
1393
1394 Since the second function has highest priority, it is executed first.
1395 The first function is never executed as there no "next" statements in
1396 the third function to transfer execution.
1397
1398
1399 PRINTING
1400 There are a set of function names that are specially treated by the
1401 translator. They format values for printing to the standard systemtap
1402 output stream in a more convenient way (note that data generated in the
1403 kernel module need to get transferred to user-space in order to get
1404 printed).
1405
1406 The sprint* variants return the formatted string instead of printing
1407 it.
1408
1409 print, sprint
1410 Print one or more values of any type, concatenated directly to‐
1411 gether.
1412
1413 println, sprintln
1414 Print values like print and sprint, but also append a newline.
1415
1416 printd, sprintd
1417 Take a string delimiter and two or more values of any type, and
1418 print the values with the delimiter interposed. The delimiter
1419 must be a literal string constant.
1420
1421 printdln, sprintdln
1422 Print values with a delimiter like printd and sprintd, but also
1423 append a newline.
1424
1425 printf, sprintf
1426 Take a formatting string and a number of values of corresponding
1427 types, and print them all. The format must be a literal string
1428 constant.
1429
1430 The printf formatting directives similar to those of C, except that
1431 they are fully type-checked by the translator:
1432
1433 %b Writes a binary blob of the value given, instead of ASCII
1434 text. The width specifier determines the number of bytes
1435 to write; valid specifiers are %b %1b %2b %4b %8b. De‐
1436 fault (%b) is 8 bytes.
1437
1438 %c Character.
1439
1440 %d,%i Signed decimal.
1441
1442 %m Safely reads kernel (without #) or user (with #) memory
1443 at the given address, outputs its content. The optional
1444 precision specifier (not field width) determines the num‐
1445 ber of bytes to read - default is 1 byte. %10.4m prints
1446 4 bytes of the memory in a 10-character-wide field.
1447 Note, on some architectures user memory can still be read
1448 without #.
1449
1450 %M Same as %m, but outputs in hexadecimal. The minimal size
1451 of output is double the optional precision specifier -
1452 default is 1 byte (2 hex chars). %10.4M prints 4 bytes
1453 of the memory as 8 hexadecimal characters in a 10-charac‐
1454 ter-wide field. %.*M hex-dumps a given number of bytes
1455 from a given buffer.
1456
1457 %o Unsigned octal.
1458
1459 %p Unsigned pointer address.
1460
1461 %s String.
1462
1463 %u Unsigned decimal.
1464
1465 %x Unsigned hex value, in all lower-case.
1466
1467 %X Unsigned hex value, in all upper-case.
1468
1469 %% Writes a %.
1470
1471 The # flag selects the alternate forms. For octal, this prefixes a 0.
1472 For hex, this prefixes 0x or 0X, depending on case. For characters,
1473 this escapes non-printing values with either C-like escapes or raw oc‐
1474 tal. In the case of %#m/%#M, this safely accesses user space memory
1475 rather than kernel space memory.
1476
1477 Examples:
1478
1479 a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
1480 print("hello")
1481 Prints: hello
1482 println(b)
1483 Prints: bob\n
1484 println(a . " is " . sprint(16))
1485 Prints: alice is 16
1486 foreach (name in id) printdln("|", strlen(name), name, id[name])
1487 Prints: 5|alice|1234\n3|bob|4567
1488 printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
1489 Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
1490 printf("2 bytes of kernel buffer at address %p: %2m", p, p)
1491 Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
1492 printf("%4b", p)
1493 Prints (these values as binary data): 0x1234abcd
1494 printf("%#o %#x %#X\n", 1, 2, 3)
1495 Prints: 01 0x2 0X3
1496 printf("%#c %#c %#c\n", 0, 9, 42)
1497 Prints: \000 \t *
1498
1499
1500
1501 STATISTICS
1502 It is often desirable to collect statistics in a way that avoids the
1503 penalties of repeatedly exclusive locking the global variables those
1504 numbers are being put into. Systemtap provides a solution using a spe‐
1505 cial operator to accumulate values, and several pseudo-functions to ex‐
1506 tract the statistical aggregates.
1507
1508 The aggregation operator is <<<, and resembles an assignment, or a C++
1509 output-streaming operation. The left operand specifies a scalar or ar‐
1510 ray-index lvalue, which must be declared global. The right operand is
1511 a numeric expression. The meaning is intuitive: add the given number
1512 to the pile of numbers to compute statistics of. (The specific list of
1513 statistics to gather is given separately, by the extraction functions.)
1514
1515 foo <<< 1
1516 stats[pid()] <<< memsize
1517
1518
1519 The extraction functions are also special. For each appearance of a
1520 distinct extraction function operating on a given identifier, the
1521 translator arranges to compute a set of statistics that satisfy it.
1522 The statistics system is thereby "on-demand". Each execution of an ex‐
1523 traction function causes the aggregation to be computed for that moment
1524 across all processors.
1525
1526 Here is the set of extractor functions. The first argument of each is
1527 the same style of lvalue used on the left hand side of the accumulate
1528 operation. The @count(v), @sum(v), @min(v), @max(v), @avg(v), @vari‐
1529 ance(v[, b]) extractor functions compute the number/total/minimum/maxi‐
1530 mum/average/variance of all accumulated values. The resulting values
1531 are all simple integers. Arrays containing aggregates may be sorted
1532 and iterated. See the foreach construct above.
1533
1534 Variance uses Welford's online algorithm. The calculations are based
1535 on integer arithmetic, and so may suffer from low precision and over‐
1536 flow. To improve this, @variance(v[, b]) accepts an optional parameter
1537 b, the bit-shift, ranging from 0 (default) to 62, for internal scaling.
1538 Only one value of bit-shift may be used with given global variable. A
1539 larger bitshift value increases precision, but increases the likelihood
1540 of overflow.
1541
1542
1543 $ stap -e \
1544 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x)) }'
1545 12
1546 $ stap -e \
1547 > 'global x probe oneshot { for(i=1;i<=5;i++) x<<<i println(@variance(x,1)) }'
1548 2
1549 $ python3 -c 'import statistics; print(statistics.variance([1, 2, 3, 4, 5]))'
1550 2.5
1551 $
1552
1553
1554 Overflow (from internal multiplication of large numbers) may occur and
1555 may cause a negative variance result. Consider normalizing your input
1556 data. Adding or subtracting a fixed value from all variance inputs
1557 preserves the original variance. Dividing the variance inputs by a
1558 fixed value shrinks the original variance by that value squared.
1559
1560
1561
1562 Histograms are also available, but are more complicated because they
1563 have a vector rather than scalar value. @hist_linear(v,start,stop,in‐
1564 terval) represents a linear histogram from "start" to "stop" (inclu‐
1565 sive) by increments of "interval". The interval must be positive. Sim‐
1566 ilarly, @hist_log(v) represents a base-2 logarithmic histogram. Print‐
1567 ing a histogram with the print family of functions renders a histogram
1568 object as a tabular "ASCII art" bar chart.
1569
1570
1571 probe timer.profile {
1572 x[1] <<< pid()
1573 x[2] <<< uid()
1574 y <<< tid()
1575 }
1576 global x // an array containing aggregates
1577 global y // a scalar
1578 probe end {
1579 foreach ([i] in x @count+) {
1580 printf ("x[%d]: avg %d = sum %d / count %d\n",
1581 i, @avg(x[i]), @sum(x[i]), @count(x[i]))
1582 println (@hist_log(x[i]))
1583 }
1584 println ("y:")
1585 println (@hist_log(y))
1586 }
1587
1588
1589 The counts of each histogram bucket may be individually accessed via
1590 the [index] operator. Each bucket is addressed from 1 through N (for
1591 each natural bucket). In addition bucket #0 counts all the samples be‐
1592 neath the start value, and bucket #N+1 counts all the samples above the
1593 stop value. Histogram buckets (including the two out-of-range buckets)
1594 may also be iterated with foreach.
1595
1596
1597 global x
1598 probe oneshot {
1599 x <<< -100
1600 x <<< 1
1601 x <<< 2
1602 x <<< 3
1603 x <<< 100
1604 foreach (bucket in @hist_linear(x,1,3,1))
1605 // expecting 1 out-of-range-low bucket
1606 // 3 payload buckets
1607 // 1 out-of-range-high bucket
1608 printf("bucket %d count %d\n",
1609 bucket, @hist_linear(x,1,3,1)[bucket])
1610 }
1611
1612
1613
1614 TYPECASTING
1615 Once a pointer (see the CONTEXT VARIABLES section of stapprobes(3stap))
1616 has been saved into a script integer variable, the translator attempts
1617 to keep the type information necessary to access members from that
1618 pointer.
1619
1620 The translator attempts to track DWARF typing associated with script
1621 variables assigned from addresses of context $variables, @cast or @var
1622 operators. Depending on the complexity of the script code, this asso‐
1623 ciation may pass to related variables, so that -> and [] operators may
1624 be used on them, just as on the original context variable. For exam‐
1625 ple:
1626
1627
1628 foo = $param->foo; printf("x:%d y:%d\n", foo->x, foo->y)
1629 printf("my value is %d\n", ($type == 42 ? $foo : $bar)->value)
1630 printf("my parent pid is %d\n", task_parent(task_current())->tgid)
1631
1632
1633 However, if this association heuristic doesn't work for a script, using
1634 the @cast() operator tells the translator how to interpret the number
1635 as a typed pointer.
1636
1637 @cast(p, "type_name"[, "module"])->member
1638
1639
1640 This will interpret p as a pointer to a struct/union named type_name
1641 and dereference the member value. Further ->subfield expressions may
1642 be appended to dereference more levels. Note that for direct derefer‐
1643 encing of a pointer {kernel,user}_{char,int,...}($p) should be used.
1644 (Refer to stapfuncs(5) for more details.) NOTE: the same dereferenc‐
1645 ing operator -> is used to refer to both direct containment or pointer
1646 indirection. Systemtap automatically determines which. The optional
1647 module tells the translator where to look for information about that
1648 type. Multiple modules may be specified as a list with : separators.
1649 If the module is not specified, it will default either to the probe
1650 module for dwarf probes, or to "kernel" for functions and all other
1651 probes types.
1652
1653 Previously up to systemtap version 4.2, "kernel" was inferred if un‐
1654 specified. Use --compatible=4.2 to activate this default.
1655
1656 The translator can create its own module with type information from a
1657 header surrounded by angle brackets, in case normal debuginfo is not
1658 available. For kernel headers, prefix it with "kernel" to use the ap‐
1659 propriate build system. All other headers are built with default GCC
1660 parameters into a user module. Multiple headers may be specified in
1661 sequence to resolve a codependency.
1662
1663 @cast(tv, "timeval", "<sys/time.h>")->tv_sec
1664 @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
1665 @cast(task, "task_struct",
1666 "kernel<linux/sched.h><linux/fs_struct.h>")->fs->umask
1667
1668 Values acquired by @cast may be pretty-printed by the $ and $$ suffix
1669 operators, the same way as described in the CONTEXT VARIABLES section
1670 of the stapprobes(3stap) manual page.
1671
1672
1673 When in guru mode, the translator will also allow scripts to assign new
1674 values to members of typecasted pointers.
1675
1676 Typecasting is also useful in the case of void* members whose type may
1677 be determinable at runtime.
1678
1679 probe foo {
1680 if ($var->type == 1) {
1681 value = @cast($var->data, "type1")->bar
1682 } else {
1683 value = @cast($var->data, "type2")->baz
1684 }
1685 print(value)
1686 }
1687
1688
1689
1690 EMBEDDED C
1691 When in guru mode, the translator accepts embedded C code in the top
1692 level of the script. Such code is enclosed between %{ and %} markers,
1693 and is transcribed verbatim, without analysis, in some sequence, into
1694 the top level of the generated C code. At the outermost level, this
1695 may be useful to add #include instructions, and any auxiliary defini‐
1696 tions for use by other embedded code.
1697
1698 Another place where embedded code is permitted is as a function body.
1699 In this case, the script language body is replaced entirely by a piece
1700 of C code enclosed again between %{ and %} markers. This C code may do
1701 anything reasonable and safe. There are a number of undocumented but
1702 complex safety constraints on atomicity, concurrency, resource consump‐
1703 tion, and run time limits, so this is an advanced technique.
1704
1705 The memory locations set aside for input and output values are made
1706 available to it using macros STAP_ARG_* and STAP_RETVALUE. Errors may
1707 be signalled with STAP_ERROR. Output may be written with STAP_PRINTF.
1708 The function may return early with STAP_RETURN. Here are some exam‐
1709 ples:
1710
1711 function integer_ops (val) %{
1712 STAP_PRINTF("%d\n", STAP_ARG_val);
1713 STAP_RETVALUE = STAP_ARG_val + 1;
1714 if (STAP_RETVALUE == 4)
1715 STAP_ERROR("wrong guess: %d", (int) STAP_RETVALUE);
1716 if (STAP_RETVALUE == 3)
1717 STAP_RETURN(0);
1718 STAP_RETVALUE ++;
1719 %}
1720 function string_ops (val) %{
1721 strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);
1722 strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);
1723 if (strcmp (STAP_RETVALUE, "three-two-one"))
1724 STAP_RETURN("parameter should be three-two-");
1725 %}
1726 function no_ops () %{
1727 STAP_RETURN(); /* function inferred with no return value */
1728 %}
1729
1730 The function argument and return value types have to be inferred by the
1731 translator from the call sites in order for this to work. The user
1732 should examine C code generated for ordinary script-language functions
1733 in order to write compatible embedded-C ones.
1734
1735 The last place where embedded code is permitted is as an expression
1736 rvalue. In this case, the C code enclosed between %{ and %} markers is
1737 interpreted as an ordinary expression value. It is assumed to be a
1738 normal 64-bit signed number, unless the marker /* string */ is includ‐
1739 ed, in which case it's treated as a string.
1740
1741 function add_one (val) {
1742 return val + %{ 1 %}
1743 }
1744 function add_string_two (val) {
1745 return val . %{ /* string */ "two" %}
1746 }
1747 @define SOME_STAP_MACRO %( %{ SOME_C_MACRO %} %)
1748 probe begin {
1749 printf("SOME_C_MACRO has value: %d\n", @SOME_STAP_MACRO);
1750 }
1751
1752
1753 The embedded-C code may contain markers to assert optimization and
1754 safety properties.
1755
1756 /* pure */
1757 means that the C code has no side effects and may be elided en‐
1758 tirely if its value is not used by script code.
1759
1760 /* stable */
1761 means that the C code always has the same value (in any given
1762 probe handler invocation), so repeated calls may be automatical‐
1763 ly replaced by memoized values. Such functions must take no pa‐
1764 rameters, and also be pure.
1765
1766 /* unprivileged */
1767 means that the C code is so safe that even unprivileged users
1768 are permitted to use it.
1769
1770 /* myproc-unprivileged */
1771 means that the C code is so safe that even unprivileged users
1772 are permitted to use it, provided that the target of the current
1773 probe is within the user's own process.
1774
1775 /* guru */
1776 means that the C code is so unsafe that a systemtap user must
1777 specify -g (guru mode) to use this. (Tapsets are permitted and
1778 presumed to call them safely.)
1779
1780 /* unmangled */
1781 in an embedded-C function, means that the legacy (pre-1.8) argu‐
1782 ment access syntax should be made available inside the function.
1783 Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use
1784 THIS->foo and THIS->__retvalue respectively inside the function.
1785 This is useful for quickly migrating code written for SystemTap
1786 version 1.7 and earlier.
1787
1788 /* unmodified-fnargs */
1789 in an embedded-C function, means that the function arguments are
1790 not modified inside the function body.
1791
1792 /* string */
1793 in embedded-C expressions only, means that the expression has
1794 const char * type and should be treated as a string value, in‐
1795 stead of the default long numeric.
1796
1797 Script level global variables may be accessed in embedded-C functions
1798 and blocks. To read or write the global variable var , the /* prag‐
1799 ma:read:var */ or /* pragma:write:var */ marker must be first placed in
1800 the embedded-C function or block. This provides the macros STAP_GLOB‐
1801 AL_GET_* and STAP_GLOBAL_SET_* macros to allow reading and writing, re‐
1802 spectively. For example:
1803
1804 global var
1805 global var2[100]
1806 function increment() %{
1807 /* pragma:read:var */ /* pragma:write:var */
1808 /* pragma:read:var2 */ /* pragma:write:var2 */
1809 STAP_GLOBAL_SET_var(STAP_GLOBAL_GET_var()+1); //var++
1810 STAP_GLOBAL_SET_var2(1, 1, STAP_GLOBAL_GET_var2(1, 1)+1); //var2[1,1]++
1811 %}
1812
1813 Variables may be read and set in both embedded-C functions and expres‐
1814 sions. Strings returned from embedded-C code are decayed to pointers.
1815 Variables must also be assigned at script level to allow for type in‐
1816 ference. Map assignment does not return the value written, so chaining
1817 does not work.
1818
1819
1820 BUILT-INS
1821 A set of builtin probe point aliases are provided by the scripts in‐
1822 stalled in the directory specified in the stappaths(7) manual page.
1823 The functions are described in the stapprobes(3stap) manual page.
1824
1825
1826 DEREFERENCING
1827 Integers can be dereferenced from pointers saved as a script integer
1828 variables using the @kderef() or @uderef() operators. @kderef() is
1829 used for kernel space addresses and @uderef() is used for user space
1830 addresses.
1831
1832 @kderef(SIZE, addr)
1833 @uderef(SIZE, addr)
1834
1835 This will interpret addr as a kernel/user address and read SIZE bytes
1836 starting at that address. SIZE should be either 1, 2, 4 or 8 bytes.
1837
1838
1839 REGISTERS
1840 The value stored within a register can be accessed using the @kregis‐
1841 ter() or @uregister() operators. @kregister() is used for kernel space
1842 registers and @uregister() is used for user space registers. The regis‐
1843 ter of interest is specified using its DWARF number.
1844
1845 @kregister(0)
1846 @uregister(5)
1847
1848
1850 The translator begins pass 1 by parsing the given input script, and all
1851 scripts (files named *.stp) found in a tapset directory. The
1852 directories listed with -I are processed in sequence, each processed in
1853 "guru mode". For each directory, a number of subdirectories are also
1854 searched. These subdirectories are derived from the selected kernel
1855 version (the -R option), in order to allow more kernel-version-specific
1856 scripts to override less specific ones. For example, for a kernel
1857 version 2.6.12-23.FC3 the following patterns would be searched, in
1858 sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally
1859 *.stp. Stopping the translator after pass 1 causes it to print the
1860 parse trees.
1861
1862
1863 In pass 2, the translator analyzes the input script to resolve symbols
1864 and types. References to variables, functions, and probe aliases that
1865 are unresolved internally are satisfied by searching through the parsed
1866 tapset script files. If any tapset script file is selected because it
1867 defines an unresolved symbol, then the entirety of that file is added
1868 to the translator's resolution queue. This process iterates until all
1869 symbols are resolved and a subset of tapset script files is selected.
1870
1871 Next, all probe point descriptions are validated against the wide
1872 variety supported by the translator. Probe points that refer to code
1873 locations ("synchronous probe points") require the appropriate kernel
1874 debugging information to be installed. In the associated probe
1875 handlers, target-side variables (whose names begin with "$") are found
1876 and have their run-time locations decoded.
1877
1878 Next, all probes and functions are analyzed for optimization
1879 opportunities, in order to remove variables, expressions, and functions
1880 that have no useful value and no side-effect. Embedded-C functions are
1881 assumed to have side-effects unless they include the magic string
1882 /* pure */. Since this optimization can hide latent code errors such
1883 as type mismatches or invalid $context variables, it sometimes may be
1884 useful to disable the optimizations with the -u option.
1885
1886 Finally, all variable, function, parameter, array, and index types are
1887 inferred from context (literals and operators). Stopping the
1888 translator after pass 2 causes it to list all the probes, functions,
1889 and variables, along with all inferred types. Any inconsistent or
1890 unresolved types cause an error.
1891
1892
1893 In pass 3, the translator writes C code that represents the actions of
1894 all selected script files, and creates a Makefile to build that into a
1895 kernel object. These files are placed into a temporary directory.
1896 Stopping the translator at this point causes it to print the contents
1897 of the C file.
1898
1899
1900 In pass 4, the translator invokes the Linux kernel build system to
1901 create the actual kernel object file. This involves running make in
1902 the temporary directory, and requires a kernel module build system
1903 (headers, config and Makefiles) to be installed in the usual spot
1904 /lib/modules/VERSION/build. Stopping the translator after pass 4 is
1905 the last chance before running the kernel object. This may be useful
1906 if you want to archive the file.
1907
1908
1909 In pass 5, the translator invokes the systemtap auxiliary program
1910 staprun program for the given kernel object. This program arranges to
1911 load the module then communicates with it, copying trace data from the
1912 kernel into temporary files, until the user sends an interrupt signal.
1913 Any run-time error encountered by the probe handlers, such as running
1914 out of memory, division by zero, exceeding nesting or runtime limits,
1915 results in a soft error indication. Soft errors in excess of MAXERRORS
1916 block of all subsequent probes (except error-handling probes), and
1917 terminate the session. Finally, staprun unloads the module, and cleans
1918 up.
1919
1920
1921 ABNORMAL TERMINATION
1922 One should avoid killing the stap process forcibly, for example with
1923 SIGKILL, because the stapio process (a child process of the stap
1924 process) and the loaded module may be left running on the system. If
1925 this happens, send SIGTERM or SIGINT to any remaining stapio processes,
1926 then use rmmod to unload the systemtap module.
1927
1928
1929
1931 See the stapex(3stap) manual page for a brief collection of samples, or
1932 a large set of installed samples under the systemtap
1933 documentation/testsuite directories. See stappaths(7stap) for the
1934 likely location of these on the system.
1935
1936
1938 The systemtap translator caches the pass 3 output (the generated C
1939 code) and the pass 4 output (the compiled kernel module) if pass 4
1940 completes successfully. This cached output is reused if the same
1941 script is translated again assuming the same conditions exist (same
1942 kernel version, same systemtap version, etc.). Cached files are stored
1943 in the $SYSTEMTAP_DIR/cache directory. The cache can be limited by
1944 having the file cache_mb_limit placed in the cache directory (shown
1945 above) containing only an ASCII integer representing how many MiB the
1946 cache should not exceed. In the absence of this file, a default will be
1947 created with the limit set to 256MiB. This is a 'soft' limit in that
1948 the cache will be cleaned after a new entry is added if the cache clean
1949 interval is exceeded, so the total cache size may temporarily exceed
1950 this limit. This interval can be specified by having the file
1951 cache_clean_interval_s placed in the cache directory (shown above)
1952 containing only an ASCII integer representing the interval in seconds.
1953 In the absence of this file, a default will be created with the
1954 interval set to 300 s.
1955
1956
1958 Systemtap may be used as a powerful administrative tool. It can expose
1959 kernel internal data structures and potentially private user
1960 information. (In dyninst runtime mode, this is not the case, see the
1961 ALTERNATE RUNTIMES section below.)
1962
1963 The translator asserts many safety constraints during compilation and
1964 more during run-time. It aims to ensure that no handler routine can
1965 run for very long, allocate boundless memory, perform unsafe
1966 operations, or in unintentionally interfere with the system. Uses of
1967 script global variables are automatically read/write locked as
1968 appropriate, to protect against manipulation by concurrent probe
1969 handlers. Locks are taken so as to run the global-variable
1970 manipulation portion of probe handlers atomically (locks are taken all-
1971 or-none). Deadlocks are detected with timeouts. Use the -t flag to
1972 receive reports of excessive lock contention. Experimenting with
1973 scripts is therefore generally safe. The guru-mode -g option allows
1974 administrators to bypass most safety measures, which permits invasive
1975 or state-changing operations, embedded-C code, and increases the risk
1976 of upset. By default, overload prevention is turned on for all
1977 modules. If you would like to disable overload processing, use the
1978 --suppress-time-limits option.
1979
1980 Errors that are caught at run time normally result in a clean script
1981 shutdown and a pass-5 error message. The --suppress-handler-errors
1982 option lets scripts tolerate soft errors without shutting down.
1983
1984
1985
1986 PERMISSIONS
1987 For the normal linux-kernel-module runtime, to run the kernel objects
1988 systemtap builds, a user must be one of the following:
1989
1990 • the root user;
1991
1992 • a member of the stapdev and stapusr groups;
1993
1994 • a member of the stapsys and stapusr groups; or
1995
1996 • a member of the stapusr group.
1997
1998 The root user or a user who is a member of both the stapdev and stapusr
1999 groups can build and run any systemtap script.
2000
2001 A user who is a member of both the stapsys and stapusr groups can only
2002 use pre-built modules under the following conditions:
2003
2004 • The module has been signed by a trusted signer. Trusted signers are
2005 normally systemtap compile-servers which sign modules when the
2006 --privilege option is specified by the client. See the
2007 stap-server(8) manual page for more information.
2008
2009 • The module was built using the --privilege=stapsys or the
2010 --privilege=stapusr options.
2011
2012 Members of only the stapusr group can only use pre-built modules under
2013 the following conditions:
2014
2015 • The module is located in the /lib/modules/VERSION/systemtap
2016 directory. This directory must be owned by root and not be world
2017 writable.
2018
2019 or
2020
2021 • The module has been signed by a trusted signer. Trusted signers are
2022 normally systemtap compile-servers which sign modules when the
2023 --privilege option is specified by the client. See the
2024 stap-server(8) manual page for more information.
2025
2026 • The module was built using the --privilege=stapusr option.
2027
2028 The kernel modules generated by stap program are run by the staprun
2029 program. The latter is a part of the Systemtap package, dedicated to
2030 module loading and unloading (but only in the white zone), and kernel-
2031 to-user data transfer. Since staprun does not perform any additional
2032 security checks on the kernel objects it is given, it would be unwise
2033 for a system administrator to add untrusted users to the stapdev or
2034 stapusr groups.
2035
2036
2037 SECUREBOOT
2038 If the current system has SecureBoot turned on in the UEFI firmware,
2039 all kernel modules must be signed. (Some kernels may allow disabling
2040 SecureBoot long after booting with a key sequence such as SysRq-X,
2041 making it unnecessary to sign modules.) There are two ways to sign a
2042 systemtap module. The systemtap compile server can sign modules with a
2043 MOK (Machine Owner Key) that it has in common with a client system.
2044 For example:
2045
2046 stap --use-server=HOSTNAME:PORT -e 'SCRIPT'
2047 # If there is no mok key in common with the server's systemtap mok key
2048 # list and the client's mok database then the user is directed by stap
2049 # to invoke:
2050 sudo mokutil --import signing_key.x509
2051 # then after rebooting the system:
2052 stap --use-server=HOSTNAME:PORT -e 'SCRIPT'
2053 # will use the server to build and sign the module and the module will run
2054 # on the client
2055
2056 Another way to sign modules is to use the stap --sign-module option,
2057 which uses a MOK on the client system without using a server. For ex‐
2058 ample:
2059
2060 stap --sign-module -e 'SCRIPT'
2061 # If there is no systemtap mok key in the system mok database
2062 # then the user is directed by stap to invoke:
2063 sudo mokutil --import /home/USER/.systemtap/ssl/server/moks/FINGERPRINT/signing_key.x509
2064 # then after rebooting the system:
2065 stap --sign-module -e 'SCRIPT'
2066 # will sign and run the module
2067
2068
2069 See the following wiki page for more details:
2070
2071 https://sourceware.org/systemtap/wiki/SecureBoot
2072
2073 Some kernels do not let systemtap guess whether module module signing
2074 is in effect. On such machines, set the SYSTEMTAP_SIGN environment
2075 variable to any value while running stap.
2076
2077
2078 RESOURCE LIMITS
2079 Many resource use limits are set by macros in the generated C code.
2080 These may be overridden with -D flags. A selection of these is as fol‐
2081 lows:
2082
2083 MAXNESTING
2084 Maximum number of nested function calls. Default determined by
2085 script analysis, with a bonus 10 slots added for recursive
2086 scripts.
2087
2088 MAXSTRINGLEN
2089 Maximum length of strings, default 128.
2090
2091 MAXTRYLOCK
2092 Maximum number of iterations to wait for locks on global vari‐
2093 ables before declaring possible deadlock and skipping the probe,
2094 default 1000.
2095
2096 MAXACTION
2097 Maximum number of statements to execute during any single probe
2098 hit (with interrupts disabled), default 1000. Note that for
2099 straight-through probe handlers lacking loops or recursion, due
2100 to optimization, this parameter may be interpreted too conserva‐
2101 tively.
2102
2103 MAXACTION_INTERRUPTIBLE
2104 Maximum number of statements to execute during any single probe
2105 hit which is executed with interrupts enabled (such as begin/end
2106 probes), default (MAXACTION * 10).
2107
2108 MAXBACKTRACE
2109 Maximum number of stack frames that will be be processed by the
2110 stap runtime unwinder as produced by the backtrace functions in
2111 the [u]context-unwind.stp tapsets, default 20.
2112
2113 MAXMAPENTRIES
2114 Maximum number of rows in any single global array, default 2048.
2115 Individual arrays may be declared with a larger or smaller limit
2116 instead:
2117
2118 global big[10000],little[5]
2119
2120 or denoted with % to make them wrap-around (replace old entries)
2121 automatically, as in
2122
2123 global big%
2124
2125 or both.
2126
2127 MAPHASHBIAS
2128 The number of powers-of-two to add or subtract from the natural
2129 size of the hash table backing each global associative array.
2130 Default is 0. Try small positive numbers to get extra perfor‐
2131 mance at the cost of more memory consumption, because that
2132 should reduce hash table collisions. Try small negative numbers
2133 for the opposite tradeoff.
2134
2135 MAXERRORS
2136 Maximum number of soft errors before an exit is triggered, de‐
2137 fault 0, which means that the first error will exit the script.
2138 Note that with the --suppress-handler-errors option, this limit
2139 is not enforced.
2140
2141 MAXSKIPPED
2142 Maximum number of skipped probes before an exit is triggered,
2143 default 100. Running systemtap with -t (timing) mode gives more
2144 details about skipped probes. With the default -DINTERRUPT‐
2145 IBLE=1 setting, probes skipped due to reentrancy are not accumu‐
2146 lated against this limit. Note that with the --suppress-han‐
2147 dler-errors option, this limit is not enforced.
2148
2149 MINSTACKSPACE
2150 Minimum number of free kernel stack bytes required in order to
2151 run a probe handler, default 1024. This number should be large
2152 enough for the probe handler's own needs, plus a safety margin.
2153
2154 MAXUPROBES
2155 Maximum number of concurrently armed user-space probes (up‐
2156 robes), default somewhat larger than the number of user-space
2157 probe points named in the script. This pool needs to be poten‐
2158 tially large because individual uprobe objects (about 64 bytes
2159 each) are allocated for each process for each matching script-
2160 level probe.
2161
2162 STP_MAXMEMORY
2163 Maximum amount of memory (in kilobytes) that the systemtap mod‐
2164 ule should use, default unlimited. The memory size includes the
2165 size of the module itself, plus any additional allocations.
2166 This only tracks direct allocations by the systemtap runtime.
2167 This does not track indirect allocations (as done by kprobes/up‐
2168 robes/etc. internals).
2169
2170 STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
2171 Maximum number of machine cycles spent in probes on any cpu per
2172 given interval, before an overload condition is declared and the
2173 script shut down. The defaults are 500 million and 1 billion,
2174 so as to limit stap script cpu consumption at around 50%.
2175
2176 STP_PROCFS_BUFSIZE
2177 Size of procfs probe read buffers (in bytes). Defaults to
2178 MAXSTRINGLEN. This value can be overridden on a per-procfs file
2179 basis using the procfs read probe .maxsize(MAXSIZE) parameter.
2180
2181 With scripts that contain probes on any interrupt path, it is possible
2182 that those interrupts may occur in the middle of another probe handler.
2183 The probe in the interrupt handler would be skipped in this case to
2184 avoid reentrance. To work around this issue, execute stap with the op‐
2185 tion -DINTERRUPTIBLE=0 to mask interrupts throughout the probe handler.
2186 This does add some extra overhead to the probes, but it may prevent
2187 reentrance for common problem cases. However, probes in NMI handlers
2188 and in the callpath of the stap runtime may still be skipped due to
2189 reentrance.
2190
2191
2192 In case something goes wrong with stap or staprun after a probe has al‐
2193 ready started running, one may safely kill both user processes, and re‐
2194 move the active probe kernel module with rmmod. Any pending trace mes‐
2195 sages may be lost.
2196
2197
2199 Systemtap exposes kernel internal data structures and potentially pri‐
2200 vate user information. Because of this, use of systemtap's full capa‐
2201 bilities are restricted to root and to users who are members of the
2202 groups stapdev and stapusr.
2203
2204 However, a restricted set of systemtap's features can be made available
2205 to trusted, unprivileged users. These users are members of the group
2206 stapusr only, or members of the groups stapusr and stapsys. These
2207 users can load systemtap modules which have been compiled and certified
2208 by a trusted systemtap compile-server. See the descriptions of the op‐
2209 tions --privilege and --use-server. See README.unprivileged in the sys‐
2210 temtap source code for information about setting up a trusted compile
2211 server.
2212
2213 The restrictions enforced when --privilege=stapsys is specified are de‐
2214 signed to prevent unprivileged users from:
2215
2216 • harming the system maliciously.
2217
2218 The restrictions enforced when --privilege=stapusr is specified are de‐
2219 signed to prevent unprivileged users from:
2220
2221 • harming the system maliciously.
2222
2223 • gaining access to information which would not normally be
2224 available to an unprivileged user.
2225
2226 • disrupting the performance of processes owned by other users
2227 of the system. Some overhead to the system in general is
2228 unavoidable since the unprivileged user's probes will be
2229 triggered at the appropriate times. What we would like to
2230 avoid is targeted interruption of another user's processes
2231 which would not normally be possible by an unprivileged us‐
2232 er.
2233
2234
2235 PROBE RESTRICTIONS
2236 A member of the groups stapusr and stapsys may use all probe points.
2237
2238 A member of only the group stapusr may use only the following probes:
2239
2240 • begin, begin(n)
2241
2242 • end, end(n)
2243
2244 • error(n)
2245
2246 • never
2247
2248 • process.*, where the target process is owned by the user.
2249
2250 • timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*
2251
2252 • timer.hz(n)
2253
2254
2255 SCRIPT LANGUAGE RESTRICTIONS
2256 The following scripting language features are unavailable to all un‐
2257 privileged users:
2258
2259
2260 • any feature enabled by the Guru Mode (-g) option.
2261
2262 • embedded C code.
2263
2264
2265 RUNTIME RESTRICTIONS
2266 The following runtime restrictions are placed upon all unprivileged
2267 users:
2268
2269 • Only the default runtime code (see -R) may be used.
2270
2271 Additional restrictions are placed on members of only the group sta‐
2272 pusr:
2273
2274 • Probing of processes owned by other users is not permitted.
2275
2276 • Access of kernel memory (read and write) is not permitted.
2277
2278
2279 COMMAND LINE OPTION RESTRICTIONS
2280 Some command line options provide access to features which must not be
2281 available to all unprivileged users:
2282
2283
2284 • -g may not be specified.
2285
2286 • The following options may not be used by the compile-server
2287 client:
2288
2289 -a, -B, -D, -I, -r, -R
2290
2291
2292
2293 ENVIRONMENT RESTRICTIONS
2294 The following environment variables must not be set for all unprivi‐
2295 leged users:
2296
2297 SYSTEMTAP_RUNTIME
2298 SYSTEMTAP_TAPSET
2299 SYSTEMTAP_DEBUGINFO_PATH
2300
2301
2302
2303 TAPSET RESTRICTIONS
2304 In general, tapset functions are only available for members of the
2305 group stapusr when they do not gather information that an ordinary pro‐
2306 gram running with that user's privileges would be denied access to.
2307
2308 There are two categories of unprivileged tapset functions. The first
2309 category consists of utility functions that are unconditionally avail‐
2310 able to all users; these include such things as:
2311
2312 cpu:long ()
2313 exit ()
2314 str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)
2315
2316
2317 The second category consists of so-called myproc-unprivileged functions
2318 that can only gather information within their own processes. Scripts
2319 that wish to use these functions must test the result of the tapset
2320 function is_myproc and only call these functions if the result is 1.
2321 The script will exit immediately if any of these functions are called
2322 by an unprivileged user within a probe within a process which is not
2323 owned by that user. Examples of myproc-unprivileged functions include:
2324
2325 print_usyms (stk:string)
2326 user_int:long (addr:long)
2327 usymname:string (addr:long)
2328
2329
2330 A compile error is triggered when any function not in either of the
2331 above categories is used by members of only the group stapusr.
2332
2333 No other built-in tapset functions may be used by members of only the
2334 group stapusr.
2335
2336
2338 As described above, systemtap's default runtime mode involves building
2339 and loading kernel modules, with various security tradeoffs presented.
2340 Systemtap now includes two new prototype backends: --runtime=dyninst
2341 and --runtime=bpf.
2342
2343 --runtime=dyninst uses Dyninst to instrument a user's own processes at
2344 runtime. This backend does not use kernel modules, and does not require
2345 root privileges, but is restricted with respect to the kinds of probes
2346 and other constructs that a script may use. dyninst runtime operates in
2347 target-attach mode, so it does require a -c COMMAND or -x PID process.
2348 For example:
2349
2350 stap --runtime=dyninst -c 'stap -V' \
2351 -e 'probe process.function("main")
2352 { println("hi from dyninst!") }'
2353
2354
2355 It may be necessary to disable a conflicting selinux check with
2356
2357 # setsebool allow_execstack 1
2358
2359
2360 --runtime=bpf compiles the user script into extended Berkeley Packet
2361 Filter (eBPF) programs instead of a kernel module. eBPF programs are
2362 verified by the kernel for safety and are executed by an in-kernel vir‐
2363 tual machine. This runtime is in an early stage of development and
2364 currently lacks support for a number of features available in the de‐
2365 fault runtime. Please see the stapbpf(8) man page for more information.
2366
2367
2369 The systemtap translator generally returns with a success code of 0 if
2370 the requested script was processed and executed successfully through
2371 the requested pass. Otherwise, errors may be printed to stderr and a
2372 failure code is returned. Use -v or -vp N to increase (global or per-
2373 pass) verbosity to identify the source of the trouble.
2374
2375 In listings mode (-l and -L), error messages are normally suppressed.
2376 A success code of 0 is returned if at least one matching probe was
2377 found.
2378
2379 A script executing in pass 5 that is interrupted with ^C / SIGINT is
2380 considered to be successful.
2381
2382
2384 Over time, some features of the script language and the tapset library
2385 may undergo incompatible changes, so that a script written against an
2386 old version of systemtap may no longer run. In these cases, it may
2387 help to run systemtap with the --compatible VERSION flag, specifying
2388 the last known working version. Running systemtap with the
2389 --check-version flag will output a warning if any possible incompatible
2390 elements have been parsed. Deprecation historical details may be found
2391 in the NEWS file.
2392
2393 The purpose of deprecation facility is to improve the experience of
2394 scripts written for newer versions of systemtap (by adding better al‐
2395 ternatives and removing conflicting or messy older alternatives), while
2396 at the same time permitting scripts written for older versions of sys‐
2397 temtap to continue running. Deprecation is thus intended a service to
2398 users (and an inconvenience to systemtap's developers), rather than the
2399 other way around.
2400
2401 Please note that underscore-prefixed identifiers in the tapset some‐
2402 times undergo such changes that are difficult to preserve compatibility
2403 for, even with the deprecation mechanisms. Avoid relying on these in
2404 your scripts; instead propose them for promotion to non-underscored
2405 status.
2406
2407
2408
2410 Important files and their corresponding paths can be located in the
2411 stappaths (7) manual page.
2412
2413
2415 stapprobes(3stap),
2416 function::*[24m(3stap),
2417 probe::*[24m(3stap),
2418 tapset::*[24m(3stap),
2419 stappaths(7),
2420 staprun(8),
2421 stapdyn(8),
2422 systemtap(8),
2423 stapvars(3stap),
2424 stapex(3stap),
2425 stap-server(8),
2426 stap-prep(1),
2427 stapref(1),
2428 awk(1),
2429 gdb(1)
2430
2431
2433 Use the Bugzilla link of the project web page or our mailing list.
2434 http://sourceware.org/systemtap/, <systemtap@sourceware.org>.
2435
2436 error::reporting(7stap),
2437 https://sourceware.org/systemtap/wiki/HowToReportBugs
2438
2439
2440
2441 STAP(1)