1BPFTRACE(8)                                                        BPFTRACE(8)
2
3
4

NAME

6       bpftrace - a high-level tracing language
7

SYNOPSIS

9       bpftrace [OPTIONS] FILENAME
10       bpftrace [OPTIONS] -e 'program code'
11

DESCRIPTION

13       bpftrace is a high-level tracing language and runtime for Linux based
14       on BPF. It supports static and dynamic tracing for both the kernel and
15       user-space.
16
17       When FILENAME is "-", read from stdin.
18

EXAMPLES

20       List all probes with "sleep" in their name
21
22             # bpftrace -l '*sleep*'
23
24       Trace processes calling sleep
25
26             # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }'
27
28       Trace processes calling sleep while spawning sleep 5 as a child process
29
30             # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }' -c 'sleep 5'
31

SUPPORTED ARCHITECTURES

33       x86_64, arm64 and s390x
34

OPTIONS

36   Output format
37       -B MODE, Set the buffer mode for stdout. Valid values are
38           none No buffering. Each I/O is written as soon as possible
39           line Data is written on the first newline or when the buffer is
40           full. This is the default mode.
41           full Data is written once the buffer is full.
42
43       -f FORMAT, Set the output format. Valid values are
44           json
45           text
46
47       -o FILENAME
48           Write bpftrace tracing output to FILENAME instead of stdout. This
49           doesn’t include child process (-c option) output. Errors are still
50           written to stderr.
51
52       --no-warnings
53           Suppress all warning messages created by bpftrace.
54
55   Tracing
56       -e PROGRAM
57           Execute PROGRAM instead of reading the program from a file
58
59       -I DIR
60           Add the directory DIR to the search path for C headers. This option
61           can be used multiple times.
62
63       --include FILENAME
64           Add FILENAME as an include for the pre-processor. This is equal to
65           adding '#include FILENAME' to the start bpftrace program. This
66           option can be used multiple times.
67
68       -l [SEARCH]
69           List all probes that match the SEARCH pattern. If the pattern is
70           omitted all probes will be listed. This pattern supports wildcards
71           in the same way that probes do. E.g. '-l kprobe:*file*' to list all
72           'kprobes' with 'file' in the name. For more details see the LISTING
73           PROBES section.
74
75       --unsafe
76           Some calls, like 'system', are marked as unsafe as they can have
77           dangerous side effects ('system("rm -rf")') and are disabled by
78           default. This flag allows their use.
79
80       -k
81           Errors from bpf-helpers(7) are silently ignored by default which
82           can lead to strange results. This flag enables the detection of
83           errors (except for errors from 'probe_read_*'). When errors occurs
84           bpftrace will log an error containing the source location and the
85           error code:
86
87           stdin:48-57: WARNING: Failed to probe_read_user_str: Bad address (-14)
88           u:lib.so:"fn(char const*)" { printf("arg0:%s\n", str(arg0));}
89                                                            ~~~~~~~~~
90
91       -kk
92           Same as '-k' but also includes the errors from 'probe_read_*'
93           helpers.
94
95   Process management
96       -p PID
97           Attach to the process with PID. If the process terminates, bpftrace
98           will also terminate. When using USDT probes they will be attached
99           to only this process.
100
101       -c COMMAND
102           Run COMMAND as a child process. When the child terminates bpftrace
103           stops as well, as if 'exit()' has been called. If bpftrace
104           terminates before the child process does the child process will be
105           terminated with a SIGTERM. If used, 'USDT' probes these will only
106           be attached to the child process. To avoid a race condition when
107           using 'USDTs' the child is stopped after 'execve' using 'ptrace(2)'
108           and continued when all 'USDT' probes are attached.
109           The child PID is available to programs as the 'cpid' builtin.
110           The child process runs with the same privileges as bpftrace itself
111           (usually root).
112
113       --usdt-file-activation
114           activate usdt semaphores based on file path
115
116   Miscellaneous
117       --info
118           Print detailed information about features supported by the kernel
119           and the bpftrace build.
120
121       -h, --help
122           Print the help summary
123
124       -V, --version
125           Print bpftrace version information
126
127       -v
128           verbose messages
129
130       -d
131           debug mode
132
133       -dd
134           verbose debug mode
135

ENVIRONMENT VARIABLES

137       Some behavior can only be controlled through environment variables.
138       This section lists all those variables.
139
140   BPFTRACE_STRLEN
141       Default: 64
142
143       Number of bytes allocated on the BPF stack for the string returned by
144       str().
145
146       Make this larger if you wish to read bigger strings with str().
147
148       Beware that the BPF stack is small (512 bytes).
149
150       Support for even larger strings is [being
151       discussed](https://github.com/iovisor/bpftrace/issues/305).
152
153   BPFTRACE_NO_CPP_DEMANGLE
154       Default: 0
155
156       C++ symbol demangling in user space stack traces is enabled by default.
157
158       This feature can be turned off by setting the value of this environment
159       variable to 1.
160
161   BPFTRACE_MAP_KEYS_MAX
162       Default: 4096
163
164       This is the maximum number of keys that can be stored in a map.
165       Increasing the value will consume more memory and increase startup
166       times. There are some cases where you will want to: for example,
167       sampling stack traces, recording timestamps for each page, etc.
168
169   BPFTRACE_MAX_PROBES
170       Default: 512
171
172       This is the maximum number of probes that bpftrace can attach to.
173       Increasing the value will consume more memory, increase startup times
174       and can incur high performance overhead or even freeze or crash the
175       system.
176
177   BPFTRACE_CACHE_USER_SYMBOLS
178       Default: 0 if ASLR is enabled on system and -c option is not given;
179       otherwise 1
180
181       By default, bpftrace caches the results of symbols resolutions only
182       when ASLR (Address Space Layout Randomization) is disabled. This is
183       because the symbol addresses change with each execution with ASLR.
184       However, disabling caching may incur some performance penalty. Set this
185       env variable to 1 to force bpftrace to cache.
186
187   BPFTRACE_VMLINUX
188       Default: None
189
190       This specifies the vmlinux path used for kernel symbol resolution when
191       attaching kprobe to offset. If this value is not given, bpftrace
192       searches vmlinux from pre defined locations. See
193       src/attached_probe.cpp:find_vmlinux() for details.
194
195   BPFTRACE_BTF
196       Default: None
197
198       The path to a BTF file. By default, bpftrace searches several locations
199       to find a BTF file. See src/btf.cpp for the details.
200
201   BPFTRACE_PERF_RB_PAGES
202       Default: 64
203
204       Number of pages to allocate per CPU for perf ring buffer. The value
205       must be a power of 2.
206
207       If you’re getting a lot of dropped events bpftrace may not be
208       processing events in the ring buffer fast enough. It may be useful to
209       bump the value higher so more events can be queued up. The tradeoff is
210       that bpftrace will use more memory.
211
212   BPFTRACE_MAX_BPF_PROGS
213       Default: 512
214
215       This is the maximum number of BPF programs (functions) that bpftrace
216       can generate. The main purpose of this limit is to prevent bpftrace
217       from hanging since generating a lot of probes takes a lot of resources
218       (and it should not happen often).
219

BPFTRACE LANGUAGE

221   Overview
222       The bpftrace (bt) language is inspired by the D language used by dtrace
223       and uses the same program structure. Each script consists of an
224       preamble and one or more action blocks.
225
226           preamble
227
228           actionblock1
229           actionblock2
230
231       Preprocessor and type definitions take place in the preamble:
232
233           #include <linux/socket.h>
234           #define RED "\033[31m"
235
236           struct S {
237             int x;
238           }
239
240       Each action block consists of three parts:
241
242           probe[,probe]
243           /predicate/ {
244             action
245           }
246
247       Probes
248           A probe specifies the event and event type to attach too.
249
250       Predicate
251           The predicate is optional condition that must be met for the action
252           to be executed.
253
254       Action
255             Actions are the programs that run when an event fires (and the
256           predicate is met). An action is a semicolon (;) separated list of
257           statements and always enclosed by brackets {}
258
259       A basic script that traces the open(2) and openat(2) system calls can
260       be written as follows:
261
262           BEGIN
263           {
264                   printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
265           }
266
267           tracepoint:syscalls:sys_enter_open,
268           tracepoint:syscalls:sys_enter_openat
269           {
270                   printf("%-6d %-16s %s\n", pid, comm, str(args->filename));
271           }
272
273       This script has two action blocks and a total of 3 probes. The first
274       action block uses the special BEGIN probe, which fires once during
275       bpftrace startup. This probe is used to print a header, indicating that
276       the tracing has started.
277
278       The second action block uses two probes, one for open and one for
279       openat, and defines an action that prints the file being open ed as
280       well as the pid and comm of the process that execute the syscall. See
281       the PROBES section for details on the available probe types.
282
283   Identifiers
284       Identifiers must match the following regular expression:
285       [_a-zA-Z][_a-zA-Z0-9]*
286
287   Comments
288       Both single line and multi line comments are supported.
289
290           // A single line comment
291           i:s:1 { // can also be used to comment inline
292           /*
293            a multi line comment
294
295           */
296             print(/* inline comment block */ 1);
297           }
298
299   Data Types
300       The following fundamental integer types are provided by the language.
301
302       ┌───────┬─────────────────────────┐
303       │       │                         │
304Type   Description             
305       ├───────┼─────────────────────────┤
306       │       │                         │
307       │uint8  │ Unsigned 8 bit integer  │
308       ├───────┼─────────────────────────┤
309       │       │                         │
310       │int8   │ Signed 8 bit integer    │
311       ├───────┼─────────────────────────┤
312       │       │                         │
313       │uint16 │ Unsigned 16 bit integer │
314       ├───────┼─────────────────────────┤
315       │       │                         │
316       │int16  │ Signed 16 bit integer   │
317       ├───────┼─────────────────────────┤
318       │       │                         │
319       │uint32 │ Unsigned 32 bit integer │
320       ├───────┼─────────────────────────┤
321       │       │                         │
322       │int32  │ Signed 32 bit integer   │
323       ├───────┼─────────────────────────┤
324       │       │                         │
325       │uint64 │ Unsigned 64 bit integer │
326       ├───────┼─────────────────────────┤
327       │       │                         │
328       │int64  │ Signed 64 bit integer   │
329       └───────┴─────────────────────────┘
330
331   Floating-point
332       Floating-point numbers are not supported by BPF and therefore not by
333       bpftrace.
334
335   Constants
336       Integers constants can be defined in the following formats:
337
338       •   decimal (base 10)
339
340       •   octal (base 8)
341
342       •   hexadecimal (base 16)
343
344       •   scientific (base 10)
345
346       Octal constants have to be prefixed with a 0, e.g. 0123. Hexadecimal
347       constants start with either 0x or 0X, e.g. 0x10. Scientific are written
348       in the <m>e<n> format which is a shorthand for m*10^n, e.g. $i = 2e3;.
349       Note that scientific literals are integer only due to the lack of
350       floating point support, 1e-3 is not valid.
351
352       To improve the readability of big literals a underscore _ can be used
353       as field separator, e.g. 1_000_123_000.
354
355       Integer suffixes as found in the C language are parsed by bpftrace to
356       ensure compatibility with C headers/definitions but they’re not used as
357       size specifiers. 123UL, 123U and 123LL all result in the same integer
358       type with a value of 123.
359
360       Character constants can be defined by enclosing the character in single
361       quotes, e.g. $c = 'c';.
362
363       String constants can be defined by enclosing the character string in
364       double quotes, e.g. $str = "Hello world";.
365
366       Characters and strings support the following escape sequences:
367
368       ┌─────┬──────────────────────┐
369       │     │                      │
370       │\n   │ Newline              │
371       ├─────┼──────────────────────┤
372       │     │                      │
373       │\t   │ Tab                  │
374       ├─────┼──────────────────────┤
375       │     │                      │
376       │\0nn │ Octal value nn       │
377       ├─────┼──────────────────────┤
378       │     │                      │
379       │\xnn │ Hexadecimal value nn │
380       └─────┴──────────────────────┘
381
382   Type conversion
383       Integer and pointer types can be converted using explicit type
384       conversion with an expression like:
385
386           $y = (uint32) $z;
387           $py = (int16 *) $pz;
388
389       Integer casts to a higher rank are sign extended. Conversion to a lower
390       rank is done by zeroing leading bits.
391
392   Operators and Expressions
393   Arithmetic Operators
394       The following operators are available for integer arithmetic:
395
396       ┌──┬────────────────────────┐
397       │  │                        │
398       │+ │ integer addition       │
399       ├──┼────────────────────────┤
400       │  │                        │
401       │- │ integer subtraction    │
402       ├──┼────────────────────────┤
403       │  │                        │
404       │* │ integer multiplication │
405       ├──┼────────────────────────┤
406       │  │                        │
407       │/ │ integer division       │
408       ├──┼────────────────────────┤
409       │  │                        │
410       │% │ integer modulo         │
411       └──┴────────────────────────┘
412
413   Logical Operators
414       ┌───┬─────────────┐
415       │   │             │
416       │&& │ Logical AND │
417       ├───┼─────────────┤
418       │   │             │
419       │|| │ Logical OR  │
420       ├───┼─────────────┤
421       │   │             │
422       │!  │ Logical NOT │
423       └───┴─────────────┘
424
425   Bitwise Operators
426       ┌───┬───────────────────────────┐
427       │   │                           │
428       │&  │ AND                       │
429       ├───┼───────────────────────────┤
430       │   │                           │
431       │|  │ OR                        │
432       ├───┼───────────────────────────┤
433       │   │                           │
434       │^  │ XOR                       │
435       ├───┼───────────────────────────┤
436       │   │                           │
437       │<< │ Left shift the left-hand  │
438       │   │ operand by the number of  │
439       │   │ bits specified by the     │
440       │   │ right-hand expression     │
441       │   │ value                     │
442       ├───┼───────────────────────────┤
443       │   │                           │
444       │>> │ Right shift the left-hand │
445       │   │ operand by the number of  │
446       │   │ bits specified by the     │
447       │   │ right-hand expression     │
448       │   │ value                     │
449       └───┴───────────────────────────┘
450
451   Relational Operators
452       The following relational operators are defined for integers and
453       pointers.
454
455       ┌───┬────────────────────────────┐
456       │   │                            │
457       │<  │ left-hand expression is    │
458       │   │ less than right-hand       │
459       ├───┼────────────────────────────┤
460       │   │                            │
461       │<= │ left-hand expression is    │
462       │   │ less than or equal to      │
463       │   │ right-hand                 │
464       ├───┼────────────────────────────┤
465       │   │                            │
466       │>  │ left-hand expression is    │
467       │   │ bigger than right-hand     │
468       ├───┼────────────────────────────┤
469       │   │                            │
470       │>= │ left-hand expression is    │
471       │   │ bigger or equal to than    │
472       │   │ right-hand                 │
473       ├───┼────────────────────────────┤
474       │   │                            │
475       │== │ left-hand expression equal │
476       │   │ to right-hand              │
477       ├───┼────────────────────────────┤
478       │   │                            │
479       │!= │ left-hand expression not   │
480       │   │ equal to right-hand        │
481       └───┴────────────────────────────┘
482
483       The following relation operators are available for comparing strings.
484
485       ┌───┬────────────────────────────┐
486       │   │                            │
487       │== │ left-hand string equal to  │
488       │   │ right-hand                 │
489       ├───┼────────────────────────────┤
490       │   │                            │
491       │!= │ left-hand string not equal │
492       │   │ to right-hand              │
493       └───┴────────────────────────────┘
494
495   Assignment Operators
496       The following assignment operators can be used on both map and scratch
497       variables:
498
499       ┌────┬────────────────────────────┐
500       │    │                            │
501       │=   │ Assignment, assign the     │
502       │    │ right-hand expression to   │
503       │    │ the left-hand variable     │
504       ├────┼────────────────────────────┤
505       │    │                            │
506       │<<= │ Update the variable with   │
507       │    │ its value left shifted by  │
508       │    │ the number of bits         │
509       │    │ specified by the           │
510       │    │ right-hand expression      │
511       │    │ value                      │
512       ├────┼────────────────────────────┤
513       │    │                            │
514       │>>= │ Update the variable with   │
515       │    │ its value right shifted by │
516       │    │ the number of bits         │
517       │    │ specified by the           │
518       │    │ right-hand expression      │
519       │    │ value                      │
520       ├────┼────────────────────────────┤
521       │    │                            │
522       │+=  │ Increment the variable by  │
523       │    │ the right-hand expression  │
524       │    │ value                      │
525       ├────┼────────────────────────────┤
526       │    │                            │
527       │-=  │ Decrement the variable by  │
528       │    │ the right-hand expression  │
529       │    │ value                      │
530       ├────┼────────────────────────────┤
531       │    │                            │
532       │*=  │ Multiple the variable by   │
533       │    │ the right-hand expression  │
534       │    │ value                      │
535       ├────┼────────────────────────────┤
536       │    │                            │
537       │/=  │ Divide the variable by the │
538       │    │ right-hand expression      │
539       │    │ value                      │
540       ├────┼────────────────────────────┤
541       │    │                            │
542       │%=  │ Modulo the variable by the │
543       │    │ right-hand expression      │
544       │    │ value                      │
545       ├────┼────────────────────────────┤
546       │    │                            │
547       │&=  │ Bitwise AND the variable   │
548       │    │ by the right-hand          │
549       │    │ expression value           │
550       ├────┼────────────────────────────┤
551       │    │                            │
552       │|=  │ Bitwise OR the variable by │
553       │    │ the right-hand expression  │
554       │    │ value                      │
555       ├────┼────────────────────────────┤
556       │    │                            │
557       │^=  │ Bitwise XOR the variable   │
558       │    │ by the right-hand          │
559       │    │ expression value           │
560       └────┴────────────────────────────┘
561
562       All these operators are syntactic sugar for combining assignment with
563       the specified operator. @ -= 5 is equal to @ = @ - 5.
564
565   Increment and Decrement Operators
566       The increment (++) and decrement (--) operators can be used on integer
567       and pointer variables to increment their value by one. They can only be
568       used on variables and can either be applied as prefix or suffix. The
569       difference is that the expression x++ returns the original value of x,
570       before it got incremented while ++x returns the value of x post
571       increment. E.g.
572
573           $x = 10;
574           $y = $x--; // y = 10; x = 9
575           $a = 10;
576           $b = --$a; // a = 9; b = 9
577
578       Note that maps will be implicitly declared and initialized to 0 if not
579       already declared or defined. Scratch variables must be initialized
580       before using these operators.
581
582   Variables and Maps
583       bpftrace knows two types of variables, scratch and map.
584
585       'scratch' variables are kept on the BPF stack and only exists during
586       the execution of the action block and cannot be accessed outside of the
587       program. Scratch variable names always start with a $, e.g. $myvar.
588
589       'map' variables use BPF 'maps'. These exist for the lifetime of
590       bpftrace itself and can be accessed from all action blocks and
591       user-space. Map names always start with a @, e.g. @mymap.
592
593       All valid identifiers can be used as name.
594
595       The data type of a variable is automatically determined during first
596       assignment and cannot be changed afterwards.
597
598   Associative Arrays
599       Associative arrays are a collection of elements indexed by a key,
600       similar to the hash tables found in languages like C++ (std::map) and
601       Python (dict). They’re a variant of 'map' variables.
602
603           @name[key] = expression
604           @name[key1,key2] = expression
605
606       Just like with any variable the type is determined on first use and
607       cannot be modified afterwards. This applies to both the key(s) and the
608       value type.
609
610       The following snippet creates a map with key signature [int64,
611       string[16]] and a value type of int64:
612
613           @[pid, comm]++
614
615   Variable scoping
616   Pointers
617       Pointers in bpftrace are similar to those found in C.
618
619   Tuples
620       bpftrace has support for immutable N-tuples (n > 1). A tuple is a
621       sequence type (like an array) where, unlike an array, every element can
622       have a different type.
623
624       Tuples are a comma separated list of expressions, enclosed in brackets,
625       (1,2) Individual fields can be accessed with the . operator. Tuples are
626       zero indexed like arrays are.
627
628           i:s:1 {
629             $a = (1,2);
630             $b = (3,4, $a);
631             print($a);
632             print($b);
633             print($b.0);
634           }
635
636       Prints:
637
638           (1, 2)
639           (3, 4, (1, 2))
640           3
641
642   Arrays
643       bpftrace supports accessing one-dimensional arrays like those found in
644       C.
645
646       Constructing arrays from scratch, like int a[] = {1,2,3} in C, is not
647       supported. They can only be read into a variable from a pointer.
648
649       The [] operator is used to access elements.
650
651           struct MyStruct {
652             int y[4];
653           }
654
655           kprobe:dummy {
656             $s = (struct MyStruct *) arg0;
657             print($s->y[0]);
658           }
659
660   Structs
661       C like structs are supported by bpftrace. Fields are accessed with the
662       . operator. Fields of a pointer to a struct can be accessed with the ->
663       operator.
664
665       Custom struct can be defined in the preamble
666
667       Constructing structs from scratch, like struct X var = {.f1 = 1} in C,
668       is not supported. They can only be read into a variable from a pointer.
669
670           struct MyStruct {
671             int a;
672           }
673
674           kprobe:dummy {
675             $ptr = (struct MyStruct *) arg0;
676             $st = *$ptr;
677             print($st.a);
678             print($ptr->a);
679           }
680
681   Conditionals
682       Conditional expressions are supported in the form of if/else statements
683       and the ternary operator.
684
685       The ternary operator consists of three operands: a condition followed
686       by a ?, the expression to execute when the condition is true followed
687       by a : and the expression to execute if the condition is false.
688
689           condition ? ifTrue : ifFalse
690
691       Both the ifTrue and ifFalse expressions must be of the same type,
692       mixing types is not allowed.
693
694       The ternary operator can be used as part of an assignment.
695
696           $a == 1 ? print("true") : print("false");
697           $b = $a > 0 ? $a : -1;
698
699       If/else statements, like the one in C, are supported.
700
701           if (condition) {
702             ifblock
703           } else if (condition) {
704             if2block
705           } else {
706             elseblock
707           }
708
709   Loops
710       Since kernel 5.3 BPF supports loops as long as the verifier can prove
711       they’re bounded and fit within the instruction limit.
712
713       In bpftrace loops are available through the while statement.
714
715           while (condition) {
716             block;
717           }
718
719       Within a while-loop the following control flow statements can be used:
720
721       ┌─────────┬────────────────────────────┐
722       │         │                            │
723       │continue │ skip processing of the     │
724       │         │ rest of the block and jump │
725       │         │ back to the evaluation of  │
726       │         │ the conditional            │
727       ├─────────┼────────────────────────────┤
728       │         │                            │
729       │break    │ Terminate the loop         │
730       └─────────┴────────────────────────────┘
731
732           i:s:1 {
733             $i = 0;
734             while ($i <= 100) {
735               printf("%d ", $i);
736               if ($i > 5) {
737                 break;
738               }
739               $i++
740             }
741             printf("\n");
742           }
743
744       Loop unrolling is also supported with the unroll statement.
745
746           unroll(n) {
747             block;
748           }
749
750       The compiler will evaluate the block n times and generate the BPF code
751       for the block n times. As this happens at compile time n must be a
752       constant greater than 0 (n > 0).
753
754       The following two probes compile into the same code:
755
756           i:s:1 {
757             unroll(3) {
758               print("Unrolled")
759             }
760           }
761
762           i:s:1 {
763             print("Unrolled")
764             print("Unrolled")
765             print("Unrolled")
766           }
767

SYNC AND ASYNC

769       While BPF in the kernel can do a lot there are still things that can
770       only be done from user space, like the outputting (printing) of data.
771       The way bpftrace handles this is by sending events from the BPF program
772       which user-space will pick up some time in the future (usually in
773       milliseconds). Operations that happen in the kernel are 'synchronous'
774       ('sync') and those that are handled in user space are 'asynchronous'
775       ('async')
776
777       The async behaviour can lead to some unexpected behavior as updates can
778       happen before user space had time to process the event. One example is
779       updating a map value in a tight loop:
780
781           BEGIN {
782               @=0;
783               unroll(10) {
784                 print(@);
785                 @++;
786               }
787               exit()
788           }
789
790       Maps are printed by reference not by value and as the value gets
791       updated right after the print user-space will likely only see the final
792       value once it processes the event:
793
794           @: 10
795           @: 10
796           @: 10
797           @: 10
798           @: 10
799           @: 10
800           @: 10
801           @: 10
802           @: 10
803           @: 10
804

ADDRESS-SPACES

806       Kernel and user pointers live in different address spaces which,
807       depending on the CPU architecture, might overlap. Trying to read a
808       pointer that is in the wrong address space results in a runtime error.
809       This error is hidden by default but can be enabled with the -kk flag:
810
811           stdin:1:9-12: WARNING: Failed to probe_read_user: Bad address (-14)
812           BEGIN { @=*uptr(kaddr("do_poweroff")) }
813                   ~~~
814
815       bpftrace tries to automatically set the correct address space for a
816       pointer based on the probe type, but might fail in cases where it is
817       unclear. The address space can be changed with the kptr() and uptr()
818       functions.
819

BUILTINS

821       Builtins are special variables built into the language. Unlike the
822       scratch and map variable they don’t need a $ or @ as prefix (except for
823       the positional parameters).
824
825       ┌──────────────┬────────────┬────────────┬───────────────────────┬───────────────────┐
826       │              │            │            │                       │                   │
827       │Variable      │ Type       │ Kernel     │ BPF Helper            │ Description       │
828       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
829       │              │            │            │                       │                   │
830       │$1, $2, ...$n │ int64      │ n/a        │ n/a                   │ The nth           │
831       │              │            │            │                       │ positional        │
832       │              │            │            │                       │ parameter         │
833       │              │            │            │                       │ passed to the     │
834       │              │            │            │                       │ bpftrace          │
835       │              │            │            │                       │ program. If       │
836       │              │            │            │                       │ less than n       │
837       │              │            │            │                       │ parameters        │
838       │              │            │            │                       │ are passed        │
839       │              │            │            │                       │ this              │
840       │              │            │            │                       │ evaluates to      │
841       │              │            │            │                       │ 0. For string     │
842       │              │            │            │                       │ arguments use     │
843       │              │            │            │                       │ the str()         │
844       │              │            │            │                       │ call to           │
845       │              │            │            │                       │ retrieve the      │
846       │              │            │            │                       │ value.            │
847       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
848       │              │            │            │                       │                   │
849       │$#            │ int64      │ n/a        │ n/a                   │ Total amount      │
850       │              │            │            │                       │ of positional     │
851       │              │            │            │                       │ parameters        │
852       │              │            │            │                       │ passed.           │
853       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
854       │              │            │            │                       │                   │
855       │arg0, arg1,   │ int64      │ n/a        │ n/a                   │ nth argument      │
856       │...argn       │            │            │                       │ passed to the     │
857       │              │            │            │                       │ function          │
858       │              │            │            │                       │ being traced.     │
859       │              │            │            │                       │ These are         │
860       │              │            │            │                       │ extracted         │
861       │              │            │            │                       │ from the CPU      │
862       │              │            │            │                       │ registers.        │
863       │              │            │            │                       │ The amount of     │
864       │              │            │            │                       │ args passed       │
865       │              │            │            │                       │ in registers      │
866       │              │            │            │                       │ depends on        │
867       │              │            │            │                       │ the CPU           │
868       │              │            │            │                       │ architecture.     │
869       │              │            │            │                       │ (kprobes,         │
870       │              │            │            │                       │ uprobes,          │
871       │              │            │            │                       │ usdt).            │
872       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
873       │              │            │            │                       │                   │
874       │cgroup        │ uint64     │ 4.18       │ get_current_cgroup_id │ ID of the         │
875       │              │            │            │                       │ cgroup the        │
876       │              │            │            │                       │ current task      │
877       │              │            │            │                       │ is in. Only       │
878       │              │            │            │                       │ works with        │
879       │              │            │            │                       │ cgroupv2.         │
880       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
881       │              │            │            │                       │                   │
882       │comm          │ string[16] │ 4.2        │ get_current_com       │ comm of the       │
883       │              │            │            │                       │ current task.     │
884       │              │            │            │                       │ Equal to the      │
885       │              │            │            │                       │ value in          │
886       │              │            │            │                       │ /proc/<pid>/comm  │
887       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
888       │              │            │            │                       │                   │
889       │cpid          │ uint32     │ n/a        │ n/a                   │ PID of the child  │
890       │              │            │            │                       │ process           │
891       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
892       │              │            │            │                       │                   │
893       │numaid        │ uint32     │ 5.8        │ numa_node_id          │ ID of the NUMA    │
894       │              │            │            │                       │ node executing    │
895       │              │            │            │                       │ the BPF program   │
896       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
897       │              │            │            │                       │                   │
898       │cpu           │ uint32     │ 4.1        │ raw_smp_processor_id  │ ID of the         │
899       │              │            │            │                       │ processor         │
900       │              │            │            │                       │ executing the     │
901       │              │            │            │                       │ BPF program       │
902       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
903       │              │            │            │                       │                   │
904       │curtask       │ uint64     │ 4.8        │ get_current_task      │ Pointer to        │
905       │              │            │            │                       │ struct            │
906       │              │            │            │                       │ task_struct of    │
907       │              │            │            │                       │ the current task  │
908       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
909       │              │            │            │                       │                   │
910       │elapsed       │ uint64     │ (see nsec) │ ktime_get_ns /        │ Nanoseconds       │
911       │              │            │            │ ktime_get_boot_ns     │ elapsed since     │
912       │              │            │            │                       │ bpftrace          │
913       │              │            │            │                       │ initialization,   │
914       │              │            │            │                       │ based on nsecs    │
915       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
916       │              │            │            │                       │                   │
917       │func          │ string     │ n/a        │ n/a                   │ Name of the       │
918       │              │            │            │                       │ current function  │
919       │              │            │            │                       │ being traced      │
920       │              │            │            │                       │ (kprobes,uprobes) │
921       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
922       │              │            │            │                       │                   │
923       │gid           │ uint64     │ 4.2        │ get_current_uid_gid   │ GID of current    │
924       │              │            │            │                       │ task              │
925       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
926       │              │            │            │                       │                   │
927       │kstack        │ kstack     │            │ get_stackid           │ Kernel stack      │
928       │              │            │            │                       │ trace             │
929       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
930       │              │            │            │                       │                   │
931       │nsecs         │ uint64     │ 4.1 / 5.7  │ ktime_get_ns /        │ nanoseconds since │
932       │              │            │            │ ktime_get_boot_ns     │ kernel boot. On   │
933       │              │            │            │                       │ kernels that      │
934       │              │            │            │                       │ support           │
935       │              │            │            │                       │ ktime_get_boot_ns │
936       │              │            │            │                       │ this includes the │
937       │              │            │            │                       │ time spent        │
938       │              │            │            │                       │ suspended, on     │
939       │              │            │            │                       │ older kernels it  │
940       │              │            │            │                       │ does not.         │
941       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
942       │              │            │            │                       │                   │
943       │pid           │ uint64     │ 4.2        │ get_current_pid_tgid  │ Process ID (or    │
944       │              │            │            │                       │ thread group ID)  │
945       │              │            │            │                       │ of the current    │
946       │              │            │            │                       │ task.             │
947       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
948       │              │            │            │                       │                   │
949       │probe         │ string     │ n/na       │ n/a                   │ Name of the       │
950       │              │            │            │                       │ current probe     │
951       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
952       │              │            │            │                       │                   │
953       │rand          │ uint32     │ 4.1        │ get_prandom_u32       │ Random number     │
954       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
955       │              │            │            │                       │                   │
956       │retval        │ int64      │ n/a        │ n/a                   │ Value returned by │
957       │              │            │            │                       │ the function      │
958       │              │            │            │                       │ being traced      │
959       │              │            │            │                       │ (kretprobe,       │
960       │              │            │            │                       │ uretprobe,        │
961       │              │            │            │                       │ kretfunc)         │
962       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
963       │              │            │            │                       │                   │
964       │sarg0, sarg1, │ int64      │ n/a        │ n/a                   │ nth stack value   │
965       │...sargn      │            │            │                       │ of the function   │
966       │              │            │            │                       │ being traced.     │
967       │              │            │            │                       │ (kprobes,         │
968       │              │            │            │                       │ uprobes).         │
969       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
970       │              │            │            │                       │                   │
971       │tid           │ uint64     │ 4.2        │ get_current_pid_tgid  │ Thread ID of the  │
972       │              │            │            │                       │ current task.     │
973       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
974       │              │            │            │                       │                   │
975       │uid           │ uint64     │ 4.2        │ get_current_uid_gid   │ UID of current    │
976       │              │            │            │                       │ task              │
977       ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
978       │              │            │            │                       │                   │
979       │ustack        │ ustack     │ 4.6        │ get_stackid           │ Userspace stack   │
980       │              │            │            │                       │ trace             │
981       └──────────────┴────────────┴────────────┴───────────────────────┴───────────────────┘
982

MAP FUNCTIONS

984       Map functions are built-in functions who’s return value can only be
985       assigned to maps. The data type associated with these functions are
986       only for internal use and are not compatible with the (integer)
987       operators.
988
989       Functions that are marked async are asynchronous which can lead to
990       unexpected behavior, see the SYNC AND ASYNC section for more
991       information.
992
993   avg
994       variants
995
996       •   avg(int64 n)
997
998       Calculate the running average of n between consecutive calls.
999
1000           i:s:1 {
1001             @x++;
1002             @y = avg(@x);
1003             print(@x);
1004             print(@y);
1005           }
1006
1007       Internally this keeps two values in the map: value count and running
1008       total. The average is computed in user-space when printing by dividing
1009       the total by the count.
1010
1011   clear
1012       variants
1013
1014       •   clear(map m)
1015
1016       async
1017
1018       Clear all keys/values from map m.
1019
1020           i:ms:100 {
1021             @[rand % 10] = count();
1022           }
1023
1024           i:s:10 {
1025             print(@);
1026             clear(@);
1027           }
1028
1029   count
1030       variants
1031
1032       •   count()
1033
1034       Count how often this function is called.
1035
1036       Using @=count() is conceptually similar to @++. The difference is that
1037       the count() function uses a map type optimized for this (PER_CPU),
1038       increasing performance. Due to this the map cannot be accessed as a
1039       regular integer.
1040
1041           i:ms:100 {
1042             @ = count();
1043           }
1044
1045           i:s:10 {
1046             print(@);
1047             clear(@);
1048           }
1049
1050   delete
1051       variants
1052
1053       •   delete(mapkey k)
1054
1055       Delete a single key from a map. For a single value map this deletes the
1056       only element. For an associative-array the key to delete has to be
1057       specified.
1058
1059           k:dummy {
1060             @scalar = 1;
1061             @associative[1,2] = 1;
1062             delete(@scalar);
1063             delete(@associative[1,2]);
1064
1065             delete(@associative); // error
1066           }
1067
1068   hist
1069       variants
1070
1071       •   hist(int64 n)
1072
1073       Create a log2 histogram of n.
1074
1075           kretprobe:vfs_read {
1076             @bytes = hist(retval);
1077           }
1078
1079       Results in:
1080
1081           @:
1082           [1M, 2M)               3 |                                                    |
1083           [2M, 4M)               2 |                                                    |
1084           [4M, 8M)               2 |                                                    |
1085           [8M, 16M)              6 |                                                    |
1086           [16M, 32M)            16 |                                                    |
1087           [32M, 64M)            27 |                                                    |
1088           [64M, 128M)           48 |@                                                   |
1089           [128M, 256M)          98 |@@@                                                 |
1090           [256M, 512M)         191 |@@@@@@                                              |
1091           [512M, 1G)           394 |@@@@@@@@@@@@@                                       |
1092           [1G, 2G)             820 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
1093
1094   lhist
1095       variants
1096
1097       •   lhist(int64 n, int64 min, int64 max, int64 step)
1098
1099       Create a linear histogram of n. lhist creates M ((max - min) / step)
1100       buckets in the range [min,max) where each bucket is step in size.
1101       Values in the range (-inf, min) and (max, inf) get their get their own
1102       bucket too, bringing the total amount of buckets created to M+2.
1103
1104           i:ms:1 {
1105             @ = lhist(rand %10, 0, 10, 1);
1106           }
1107
1108           i:s:5 {
1109             exit();
1110           }
1111
1112       Prints:
1113
1114           @:
1115           [0, 1)               306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
1116           [1, 2)               284 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            |
1117           [2, 3)               294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
1118           [3, 4)               318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
1119           [4, 5)               311 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
1120           [5, 6)               362 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1121           [6, 7)               336 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
1122           [7, 8)               326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
1123           [8, 9)               328 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
1124           [9, 10)              318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
1125
1126   max
1127       variants
1128
1129       •   max(int64 n)
1130
1131       Update the map with n if n is bigger than the current value held.
1132
1133   min
1134       variants
1135
1136       •   min(int64 n)
1137
1138       Update the map with n if n is smaller than the current value held.
1139
1140   stats
1141       variants
1142
1143       •   stats(int64 n)
1144
1145       stats combines the count, avg and sum calls into one.
1146
1147           kprobe:vfs_read {
1148             @bytes[comm] = stats(arg2);
1149           }
1150
1151           @bytes[bash]: count 7, average 1, total 7
1152           @bytes[sleep]: count 5, average 832, total 4160
1153           @bytes[ls]: count 7, average 886, total 6208
1154           @
1155
1156   sum
1157       variants
1158
1159       •   sum(int64 n)
1160
1161       Calculate the sum of all n passed.
1162
1163   zero
1164       variants
1165
1166       •   zero(map m)
1167
1168       async
1169
1170       Set all values for all keys to zero.
1171

FUNCTIONS

1173       Functions that are marked async are asynchronous which can lead to
1174       unexpected behaviour, see the [sync and async] section for more
1175       information.
1176
1177       compile time functions are evaluated at compile time, a static value
1178       will be compiled into the program.
1179
1180       unsafe functions can have dangerous side effects and should be used
1181       with care, the --unsafe flag is required for use.
1182
1183   bswap
1184       variants
1185
1186       •   uint8 bswap(uint8 n)
1187
1188       •   uint16 bswap(uint16 n)
1189
1190       •   uint32 bswap(uint32 n)
1191
1192       •   uint64 bswap(uint64 n)
1193
1194       bswap reverses the order of the bytes in integer n. In case of 8 bit
1195       integers, n is returned without being modified. The return type is an
1196       unsigned integer of the same width as n.
1197
1198   buf
1199       variants
1200
1201       •   buf_t buf(void * data, [int64 length])
1202
1203       buf reads length amount of bytes from address data. The maximum value
1204       of length is limited to the BPFTRACE_STRLEN variable. For arrays the
1205       length is optional, it is automatically inferred from the signature.
1206
1207       buf is address space aware and will call the correct helper based on
1208       the address space associated with data.
1209
1210       The buf_t object returned by buf can safely be printed as a hex encoded
1211       string with the %r format specifier.
1212
1213       Bytes with values >=32 and <=126 are printed using their ASCII
1214       character, other bytes are printed in hex form (e.g. \x00). The %rx
1215       format specifier can be used to print everything in hex form, including
1216       ASCII characters.
1217
1218           i:s:1 {
1219             printf("%r\n", buf(kaddr("avenrun"), 8));
1220           }
1221
1222           \x00\x03\x00\x00\x00\x00\x00\x00
1223           \xc2\x02\x00\x00\x00\x00\x00\x00
1224
1225   cat
1226       variants
1227
1228       •   void cat(string namefmt, [...args])
1229
1230       async
1231
1232       Dump the contents of the named file to stdout. cat supports the same
1233       format string and arguments that printf does. If the file cannot be
1234       opened or read an error is printed to stderr.
1235
1236           t:syscalls:sys_enter_execve {
1237             cat("/proc/%d/maps", pid);
1238           }
1239
1240           55f683ebd000-55f683ec1000 r--p 00000000 08:01 1843399                    /usr/bin/ls
1241           55f683ec1000-55f683ed6000 r-xp 00004000 08:01 1843399                    /usr/bin/ls
1242           55f683ed6000-55f683edf000 r--p 00019000 08:01 1843399                    /usr/bin/ls
1243           55f683edf000-55f683ee2000 rw-p 00021000 08:01 1843399                    /usr/bin/ls
1244           55f683ee2000-55f683ee3000 rw-p 00000000 00:00 0
1245
1246   cgroup_path
1247       variants
1248
1249       •   cgroup_path cgroup_path(int cgroupid, string filter)
1250
1251       Convert cgroup id to cgroup path. This is done asynchronously in
1252       userspace when the cgroup_path value is printed, therefore it can
1253       resolve to a different value if the cgroup id gets reassigned. This
1254       also means that the returned value can only be used for printing.
1255
1256       A string literal may be passed as an optional second argument to filter
1257       cgroup hierarchies in which the cgroup id is looked up by a wildcard
1258       expression (cgroup2 is always represented by "unified", regardless of
1259       where it is mounted).
1260
1261       The currently mounted hierarchy at /sys/fs/cgroup is used to do the
1262       lookup. If the cgroup with the given id isn’t present here (e.g. when
1263       running in a Docker container), the cgroup path won’t be found (unlike
1264       when looking up the cgroup path of a process via /proc/.../cgroup).
1265
1266           BEGIN {
1267             $cgroup_path = cgroup_path(3436);
1268             print($cgroup_path);
1269             print($cgroup_path); /* This may print a different path */
1270             printf("%s %s", $cgroup_path, $cgroup_path); /* This may print two different paths */
1271           }
1272
1273   cgroupid
1274       variants
1275
1276       •   uint64 cgroupid(const string path)
1277
1278       compile time
1279
1280       cgroupid retrieves the cgroupv2 ID  of the cgroup available at path.
1281
1282           BEGIN {
1283             print(cgroupid("/sys/fs/cgroup/system.slice"));
1284           }
1285
1286   exit
1287       variants
1288
1289       •   void exit()
1290
1291       async
1292
1293       Terminate bpftrace, as if a SIGTERM was received. The END probe will
1294       still trigger (if specified) and maps will be printed.
1295
1296   join
1297       variants
1298
1299       •   void join(char *arr[], [char * sep = ' '])
1300
1301       async
1302
1303       join joins all the string array arr with sep as separator into one
1304       string. This string will be printed to stdout directly, it cannot be
1305       used as string value.
1306
1307       The concatenation of the array members is done in BPF and the printing
1308       happens in userspace.
1309
1310           tracepoint:syscalls:sys_enter_execve {
1311             join(args->argv);
1312           }
1313
1314   kaddr
1315       variants
1316
1317       •   uint64 kaddr(const string name)
1318
1319       compile time
1320
1321       Get the address of the kernel symbol name.
1322
1323       The following script:
1324
1325   kptr
1326       variants
1327
1328       •   T * kptr(T * ptr)
1329
1330       Marks ptr as a kernel address space pointer. See the address-spaces
1331       section for more information on address-spaces. The pointer type is
1332       left unchanged.
1333
1334   ksym
1335       variants
1336
1337       •   ksym_t ksym(uint64 addr)
1338
1339       async
1340
1341       Retrieve the name of the function that contains address addr. The
1342       address to name mapping happens in user-space.
1343
1344       The ksym_t type can be printed with the %s format specifier.
1345
1346           kprobe:do_nanosleep
1347           {
1348             printf("%s\n", ksym(reg("ip")));
1349           }
1350
1351       Prints:
1352
1353           do_nanosleep
1354
1355   macaddr
1356       variants
1357
1358       •   macaddr_t macaddr(char [6] mac)
1359
1360       Create a buffer that holds a macaddress as read from mac This buffer
1361       can be printed in the canonical string format using the %s format
1362       specifier.
1363
1364           kprobe:arp_create {
1365             printf("SRC %s, DST %s\n", macaddr(sarg0), macaddr(sarg1));
1366           }
1367
1368       Prints:
1369
1370           SRC 18:C0:4D:08:2E:BB, DST 74:83:C2:7F:8C:FF
1371
1372   ntop
1373       variants
1374
1375       •   inet_t ntop([int64 af, ] int addr)
1376
1377       •   inet_t ntop([int64 af, ] char addr[4])
1378
1379       •   inet_t ntop([int64 af, ] char addr[16])
1380
1381       ntop returns the string representation of an IPv4 or IPv6 address. ntop
1382       will infer the address type (IPv4 or IPv6) based on the addr type and
1383       size. If an integer or char[4] is given, ntop assumes IPv4, if a
1384       char[16] is given, ntop assumes IPv6. You can also pass the address
1385       type (e.g. AF_INET) explicitly as the first parameter.
1386
1387   pton
1388       variants
1389
1390       •   char addr[4] pton(const string *addr_v4)
1391
1392       •   char addr[16] pton(const string *addr_v6)
1393
1394       compile time
1395
1396       pton converts a text representation of an IPv4 or IPv6 address to byte
1397       array. pton infers the address family based on . or : in the given
1398       argument. pton comes in handy when we need to select packets with
1399       certain IP addresses.
1400
1401   override
1402       variants
1403
1404       •   override(uint64 rc)
1405
1406       unsafe
1407
1408       Kernel 4.16
1409
1410       Helper bpf_override
1411
1412       Supported probes
1413
1414       •   kprobe
1415
1416       When using override the probed function will not be executed and
1417       instead rc will be returned.
1418
1419           k:__x64_sys_getuid
1420           /comm == "id"/ {
1421             override(2<<21);
1422           }
1423
1424           uid=4194304 gid=0(root) euid=0(root) groups=0(root)
1425
1426       This feature only works on kernels compiled with
1427       CONFIG_BPF_KPROBE_OVERRIDE and only works on functions tagged
1428       ALLOW_ERROR_INJECTION.
1429
1430       bpftrace does not test whether error injection is allowed for the
1431       probed function, instead if will fail to load the program into the
1432       kernel:
1433
1434           ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
1435           Error attaching probe: 'kprobe:vfs_read'
1436
1437   reg
1438       variants
1439
1440       •   reg(const string name)
1441
1442       Supported probes
1443
1444       •   kprobe
1445
1446       •   uprobe
1447
1448       Get the contents of the register identified by name. Valid names depend
1449       on the CPU architecture.
1450
1451   signal
1452       variants
1453
1454       •   signal(const string sig)
1455
1456       •   signal(uint32 signum)
1457
1458       unsafe
1459
1460       Kernel 5.3
1461
1462       Helper bpf_send_signal
1463
1464       Probe types: k(ret)probe, u(ret)probe, USDT, profile
1465
1466       Send a signal to the process being traced. The signal can either be
1467       identified by name, e.g. SIGSTOP or by ID, e.g. 19 as found in kill -l.
1468
1469           kprobe:__x64_sys_execve
1470           /comm == "bash"/ {
1471             signal(5);
1472           }
1473
1474           $ ls
1475           Trace/breakpoint trap (core dumped)
1476
1477   sizeof
1478       variants
1479
1480       •   sizeof(TYPE)
1481
1482       •   sizeof(EXPRESSION)
1483
1484       compile time
1485
1486       Returns size of the argument in bytes. Similar to C/C++ sizeof
1487       operator. Note that the expression does not get evaluated.
1488
1489   str
1490       variants
1491
1492       •   str(char * data [, uint32 length)
1493
1494       Helper probe_read_str, probe_read_{kernel,user}_str
1495
1496       str reads a NULL terminated (\0) string from data. The maximum string
1497       length is limited by the BPFTRACE_STR_LEN env variable, unless length
1498       is specified and shorter than the maximum. In case the string is longer
1499       than the specified length only length - 1 bytes are copied and a NULL
1500       byte is appended at the end.
1501
1502       When available (starting from kernel 5.5, see the --info flag) bpftrace
1503       will automatically use the kernel or user variant of
1504       probe_read_{kernel,user}_str based on the address space of data, see
1505       ADDRESS-SPACES for more information.
1506
1507   strerror
1508       variants
1509
1510       •   strerror strerror(int error)
1511
1512       Convert errno code to string. This is done asynchronously in userspace
1513       when the strerror value is printed, hence the returned value can only
1514       be used for printing.
1515
1516           #include <errno.h>
1517           BEGIN {
1518             print(strerror(EPERM));
1519           }
1520
1521   strftime
1522       variants
1523
1524       •   strtime_t strftime(const string fmt, int64 timestamp_ns)
1525
1526       async
1527
1528       Format the nanoseconds since boot timestamp timestamp_ns according to
1529       the format specified by fmt. The time conversion and formatting happens
1530       in user space, therefore  the timestr_t value returned can only be used
1531       for printing using the %s format specifier.
1532
1533       bpftrace uses the strftime(3) function for formatting time and supports
1534       the same format specifiers.
1535
1536           i:s:1 {
1537             printf("%s\n", strftime("%H:%M:%S", nsecs));
1538           }
1539
1540       bpftrace also supports the following format string extensions:
1541
1542       ┌──────────┬────────────────────────────┐
1543       │          │                            │
1544       │Specifier │ Description                │
1545       ├──────────┼────────────────────────────┤
1546       │          │                            │
1547       │%f        │ Microsecond as a decimal   │
1548       │          │ number, zero-padded on the │
1549       │          │ left                       │
1550       └──────────┴────────────────────────────┘
1551
1552   strncmp
1553       variants
1554
1555       •   int64 strncmp(char * s1, char * s2, int64 n)
1556
1557       strncmp compares up to n characters string s1 and string s2. If they’re
1558       equal 0 is returned, else a non-zero value is returned.
1559
1560       bpftrace doesn’t read past the length of the shortest string.
1561
1562       The use of the == and != operators is recommended over calling strncmp
1563       directly.
1564
1565   system
1566       variants
1567
1568       •   void system(string namefmt [, ...args])
1569
1570       unsafe async
1571
1572       system lets bpftrace run the specified command (fork and exec) until it
1573       completes and print its stdout. The command is run with the same
1574       privileges as bpftrace and it blocks execution of the processing
1575       threads which can lead to missed events and delays processing of async
1576       events.
1577
1578           i:s:1 {
1579             time("%H:%M:%S: ");
1580             printf("%d\n", @++);
1581           }
1582           i:s:10 {
1583             system("/bin/sleep 10");
1584           }
1585           i:s:30 {
1586             exit();
1587           }
1588
1589       Note how the async time and printf first print every second until the
1590       i:s:10 probe hits, then they print every 10 seconds due to bpftrace
1591       blocking on sleep.
1592
1593           Attaching 3 probes...
1594           08:50:37: 0
1595           08:50:38: 1
1596           08:50:39: 2
1597           08:50:40: 3
1598           08:50:41: 4
1599           08:50:42: 5
1600           08:50:43: 6
1601           08:50:44: 7
1602           08:50:45: 8
1603           08:50:46: 9
1604           08:50:56: 10
1605           08:50:56: 11
1606           08:50:56: 12
1607           08:50:56: 13
1608           08:50:56: 14
1609           08:50:56: 15
1610           08:50:56: 16
1611           08:50:56: 17
1612           08:50:56: 18
1613           08:50:56: 19
1614
1615       system supports the same format string and arguments that printf does.
1616
1617           t:syscalls:sys_enter_execve {
1618             system("/bin/grep %s /proc/%d/status", "vmswap", pid);
1619           }
1620
1621   time
1622       variants
1623
1624       •   void time(const string fmt)
1625
1626       async
1627
1628       Format the current wall time according to the format specifier fmt and
1629       print it to stdout. Unlike strftime() time() doesn’t send a timestamp
1630       from the probe, instead it is the time at which user-space processes
1631       the event.
1632
1633       bpftrace uses the strftime(3) function for formatting time and supports
1634       the same format specifiers.
1635
1636   uaddr
1637       variants
1638
1639       •   T * uaddr(const string sym)
1640
1641       Supported probes
1642
1643       •   uprobes
1644
1645       •   uretprobes
1646
1647       •   USDT
1648
1649       Does not work with ASLR, see issue #75
1650       <https://github.com/iovisor/bpftrace/issues/75>
1651
1652       The uaddr function returns the address of the specified symbol. This
1653       lookup happens during program compilation and cannot be used
1654       dynamically.
1655
1656       The default return type is uint64*. If the ELF object size matches a
1657       known integer size (1, 2, 4 or 8 bytes) the return type is modified to
1658       match the width (uint8*, uint16*, uint32* or uint64* resp.). As ELF
1659       does not contain type info the type is always assumed to be unsigned.
1660
1661           uprobe:/bin/bash:readline {
1662             printf("PS1: %s\n", str(*uaddr("ps1_prompt")));
1663           }
1664
1665   uptr
1666       variants
1667
1668       •   T * uptr(T * ptr)
1669
1670       Marks ptr as a user address space pointer. See the address-spaces
1671       section for more information on address-spaces. The pointer type is
1672       left unchanged.
1673
1674   usym
1675       variants
1676
1677       •   usym_t usym(uint64 * addr)
1678
1679       async
1680
1681       Supported probes
1682
1683       •   uprobes
1684
1685       •   uretprobes
1686
1687       Equal to ksym but resolves user space symbols
1688
1689           uprobe:/bin/bash:readline
1690           {
1691             printf("%s\n", usym(reg("ip")));
1692           }
1693
1694       Prints:
1695
1696           readline
1697
1698   path
1699       variants
1700
1701       •   char * path(struct path * path)
1702
1703       Kernel 5.10
1704
1705       Helper bpf_d_path
1706
1707       Return full path referenced by struct path pointer in argument.
1708
1709       This function can only be used by functions that are allowed to, these
1710       functions are contained in the btf_allowlist_d_path set in the kernel.
1711
1712   unwatch
1713       variants
1714
1715       •   void unwatch(void * addr)
1716
1717       async
1718
1719       Removes a watchpoint
1720
1721   skboutput
1722       variants
1723
1724       •   uint32 skboutput(const string path, struct sk_buff *skb, uint64
1725           length, const uint64 offset)
1726
1727       Kernel 5.5
1728
1729       Helper bpf_skb_output
1730
1731       Write sk_buff skb 's data section to a PCAP file in the path, starting
1732       from offset to offset + length.
1733
1734       The PCAP file is encapsulated in RAW IP, so no ethernet header is
1735       included. The data section in the struct skb may contain ethernet
1736       header in some kernel contexts, you may set offset to 14 bytes to
1737       exclude ethernet header.
1738
1739       Each packet’s timestamp is determined by adding nsecs and boot time,
1740       the accuracy varies on different kernels, see nsecs.
1741
1742       This function returns 0 on success, or a negative error in case of
1743       failure.
1744
1745       Environment variable BPFTRACE_PERF_RB_PAGES should be increased in
1746       order to capture large packets, or else these packets will be dropped.
1747
1748       Usage
1749
1750           # cat dump.bt
1751           kfunc:napi_gro_receive {
1752             $ret = skboutput("receive.pcap", args->skb, args->skb->len, 0);
1753           }
1754
1755           kfunc:dev_queue_xmit {
1756             // setting offset to 14, to exclude ethernet header
1757             $ret = skboutput("output.pcap", args->skb, args->skb->len, 14);
1758             printf("skboutput returns %d\n", $ret);
1759           }
1760
1761           # export BPFTRACE_PERF_RB_PAGES=1024
1762           # bpftrace dump.bt
1763           ...
1764
1765           # tcpdump -n -r ./receive.pcap  | head -3
1766           reading from file ./receive.pcap, link-type RAW (Raw IP)
1767           dropped privs to tcpdump
1768           10:23:44.674087 IP 22.128.74.231.63175 > 192.168.0.23.22: Flags [.], ack 3513221061, win 14009, options [nop,nop,TS val 721277750 ecr 3115333619], length 0
1769           10:23:45.823194 IP 100.101.2.146.53 > 192.168.0.23.46619: 17273 0/1/0 (130)
1770           10:23:45.823229 IP 100.101.2.146.53 > 192.168.0.23.46158: 45799 1/0/0 A 100.100.45.106 (60)
1771

OUTPUT FORMATTING

1773   print
1774       variants
1775
1776       •   void print(T val)
1777
1778       async
1779
1780       variants
1781
1782       •   void print(T val)
1783
1784       •   void print(@map)
1785
1786       •   void print(@map, uint64 top)
1787
1788       •   void print(@map, uint64 top, uint64 div)
1789
1790       print prints a the value, which can be a map or a scalar value, with
1791       the default formatting for the type.
1792
1793           i:ms:10 { @=hist(rand); }
1794           i:s:1 {
1795             print(@);
1796             print(123);
1797             print("abc");
1798             exit();
1799           }
1800
1801       Prints:
1802
1803           @:
1804           [16M, 32M)             3 |@@@                                                 |
1805           [32M, 64M)             2 |@@                                                  |
1806           [64M, 128M)            1 |@                                                   |
1807           [128M, 256M)           4 |@@@@                                                |
1808           [256M, 512M)           3 |@@@                                                 |
1809           [512M, 1G)            14 |@@@@@@@@@@@@@@                                      |
1810           [1G, 2G)              22 |@@@@@@@@@@@@@@@@@@@@@@                              |
1811           [2G, 4G)              51 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1812
1813           123
1814           abc
1815
1816       Note that maps are printed by reference while scalar values are copied.
1817       This means that updating and printing maps in a fast loop will likely
1818       result in bogus map values as the map will be updated before userspace
1819       gets the time to dump and print it.
1820
1821       The printing of maps supports the optional top and div arguments. top
1822       limits the printing to the top N entries with the highest integer
1823       values
1824
1825           BEGIN {
1826             $i = 11;
1827             while($i) {
1828               @[$i] = --$i;
1829             }
1830             print(@, 2);
1831             clear(@);
1832             exit()
1833           }
1834
1835           @[9]: 9
1836           @[10]: 10
1837
1838       The div argument scales the values prior to printing them. Scaling
1839       values before storing them can result in rounding errors. Consider the
1840       following program:
1841
1842           k:f {
1843             @[func] += arg0/10;
1844           }
1845
1846       With the following sequence as numbers for arg0: 134, 377, 111, 99. The
1847       total is 721 which rounds to 72 when scaled by 10 but the program would
1848       print 70 due to the rounding of individual values.
1849
1850       Changing the print call to print(@, 5, 2) will take the top 5 values
1851       and scale them by 2:
1852
1853           @[6]: 3
1854           @[7]: 3
1855           @[8]: 4
1856           @[9]: 4
1857           @[10]: 5
1858
1859   printf
1860       variants
1861
1862       •   void printf(const string fmt, args...)
1863
1864       async
1865
1866       printf() formats and prints data. It behaves similar to printf() found
1867       in C and many other languages.
1868
1869       The format string has to be a constant, it cannot be modified at
1870       runtime. The formatting of the string happens in user space. Values are
1871       copied and passed by value.
1872
1873       bpftrace supports all the typical format specifiers like %llx and %hhu.
1874       The non-standard ones can be found in the table below:
1875
1876       ┌──────────┬────────┬─────────────────────┐
1877       │          │        │                     │
1878       │Specifier │ Type   │ Description         │
1879       ├──────────┼────────┼─────────────────────┤
1880       │          │        │                     │
1881       │r         │ buffer │ Hex-formatted       │
1882       │          │        │ string to print     │
1883       │          │        │ arbitrary binary    │
1884       │          │        │ content returned by │
1885       │          │        │ the buf (buf)       │
1886       │          │        │ function.           │
1887       └──────────┴────────┴─────────────────────┘
1888
1889       Supported escape sequences
1890
1891       Colors are supported too, using standard terminal escape sequences:
1892
1893           print("\033[31mRed\t\033[33mYellow\033[0m\n")
1894

PROBES

1896       bpftrace supports various probe types which allow the user to attach
1897       BPF programs to different types of events. Each probe starts with a
1898       provider (e.g. kprobe) followed by a colon (:) separated list of
1899       options. The amount of options and their meaning depend on the provider
1900       and are detailed below. The valid values for options can depend on the
1901       system or binary being traced, e.g. for uprobes it depends on the
1902       binary. Also see LISTING PROBES
1903
1904       It is possible to associate multiple probes with a single action as
1905       long as the action is valid for all specified probes. Multiple probes
1906       can be specified as a comma (,) separated list:
1907
1908           kprobe:tcp_reset,kprobe:tcp_v4_rcv {
1909             printf("Entered: %s\n", probe);
1910           }
1911
1912       Wildcards are supported too:
1913
1914           kprobe:tcp_* {
1915             printf("Entered: %s\n", probe);
1916           }
1917
1918       Both can be combined:
1919
1920           kprobe:tcp_reset,kprobe:*socket* {
1921             printf("Entered: %s\n", probe);
1922           }
1923
1924       Most providers also support a short name which can be used instead of
1925       the full name, e.g. kprobe:f and k:f are identical.
1926
1927   BEGIN and END
1928       These are special built-in events provided by the bpftrace runtime.
1929       BEGIN is triggered before all other probes are attached. END is
1930       triggered after all other probes are detached.
1931
1932       Note that specifying an END probe doesn’t override the printing of
1933       'non-empty' maps at exit. To prevent the printing all used maps need be
1934       cleared, which can be done in the END probe:
1935
1936           END {
1937               clear(@map1);
1938               clear(@map2);
1939           }
1940
1941   hardware
1942       variants
1943
1944       •   hardware:event_name:
1945
1946       •   hardware:event_name:count
1947
1948       shortname
1949
1950       •   h
1951
1952       The hardware probe attaches to pre-defined hardware events provided by
1953       the kernel.
1954
1955       They are implemented using performance monitoring counters (PMCs):
1956       hardware resources on the processor. There are about ten of these, and
1957       they are documented in the perf_event_open(2) man page. The event names
1958       are:
1959
1960       •   cpu-cycles or cycles
1961
1962       •   instructions
1963
1964       •   cache-references
1965
1966       •   cache-misses
1967
1968       •   branch-instructions or branches
1969
1970       •   branch-misses
1971
1972       •   bus-cycles
1973
1974       •   frontend-stalls
1975
1976       •   backend-stalls
1977
1978       •   ref-cycles
1979
1980       The count option specifies how many events must happen before the probe
1981       fires. If count is left unspecified a default value is used.
1982
1983           hardware:cache-misses:1e6 { @[pid] = count(); }
1984
1985   interval
1986       variants
1987
1988       •   interval:us:count
1989
1990       •   interval:ms:count
1991
1992       •   interval:s:count
1993
1994       •   interval:hz:rate
1995
1996       shortnames
1997
1998       •   i
1999
2000       The interval probe fires at a fixed interval as specified by its time
2001       spec. Interval fire on one CPU at the time, unlike [profile] probes.
2002
2003   iterator
2004       variants
2005
2006       •   iter:task
2007
2008       •   iter:task:pin
2009
2010       •   iter:task_file
2011
2012       •   iter:task_file:pin
2013
2014       shortnames
2015
2016       •   it
2017
2018       These are eBPF iterator probes, that allow iteration over kernel
2019       objects.
2020
2021       Iterator probe can’t be mixed with any other probe, not even other
2022       iterator.
2023
2024       Each iterator probe provides set of fields that could be accessed with
2025       ctx pointer. User can display set of available fields for iterator via
2026       -lv options as described below.
2027
2028       Examples:
2029
2030           # bpftrace -e 'iter:task { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2031           Attaching 1 probe...
2032           systemd:1
2033           kthreadd:2
2034           rcu_gp:3
2035           rcu_par_gp:4
2036           kworker/0:0H:6
2037           mm_percpu_wq:8
2038           ...
2039
2040           # bpftrace -e 'iter:task_file { printf("%s:%d %d:%s\n", ctx->task->comm, ctx->task->pid, ctx->fd, path(ctx->file->f_path)); }'
2041           Attaching 1 probe...
2042           systemd:1 1:/dev/null
2043           systemd:1 2:/dev/null
2044           systemd:1 3:/dev/kmsg
2045           ...
2046           su:1622 1:/dev/pts/1
2047           su:1622 2:/dev/pts/1
2048           su:1622 3:/var/lib/sss/mc/passwd
2049           ...
2050           bpftrace:1892 1:pipe:[35124]
2051           bpftrace:1892 2:/dev/pts/1
2052           bpftrace:1892 3:anon_inode:bpf-map
2053           bpftrace:1892 4:anon_inode:bpf-map
2054           bpftrace:1892 5:anon_inode:bpf_link
2055           bpftrace:1892 6:anon_inode:bpf-prog
2056           bpftrace:1892 7:anon_inode:bpf_iter
2057
2058       It’s possible to pin iterator with specifying optional probe ':pin'
2059       part, that defines the pin file. It can be specified as absolute path
2060       or relative to /sys/fs/bpf.
2061
2062       relative pin
2063
2064           # bpftrace -e 'iter:task:list { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2065           Program pinned to /sys/fs/bpf/list
2066
2067           # cat /sys/fs/bpf/list
2068           systemd:1
2069           kthreadd:2
2070           rcu_gp:3
2071           rcu_par_gp:4
2072           kworker/0:0H:6
2073           mm_percpu_wq:8
2074           rcu_tasks_kthre:9
2075           ...
2076
2077       Examples with absolute pin file:
2078
2079       absolute pin
2080
2081           # bpftrace -e '
2082           iter:task_file:/sys/fs/bpf/files {
2083             printf("%s:%d %s\n", ctx->task->comm, ctx->task->pid, path(ctx->file->f_path));
2084           }'
2085
2086           Program pinned to /sys/fs/bpf/files
2087
2088           # cat /sys/fs/bpf/files
2089           systemd:1 anon_inode:inotify
2090           systemd:1 anon_inode:[timerfd]
2091           ...
2092           systemd-journal:849 /dev/kmsg
2093           systemd-journal:849 anon_inode:[eventpoll]
2094           ...
2095           sssd:1146 /var/log/sssd/sssd.log
2096           sssd:1146 anon_inode:[eventpoll]
2097           ...
2098           NetworkManager:1155 anon_inode:[eventfd]
2099           NetworkManager:1155 /var/lib/sss/mc/passwd (deleted)
2100
2101   kfunc and kretfunc
2102       variants
2103
2104       •   kfunc:fn
2105
2106       •   kretfunc:fn
2107
2108       shortnames
2109
2110       •   f (kfunc)
2111
2112       •   fr (kretfunc)
2113
2114       requires (--info)
2115
2116       •   Kernel features:BTF
2117
2118       •   Probe types:kfunc
2119
2120       kfuncs attach to kernel function similar to kprobe and kretprobe. They
2121       make use of eBPF trampolines which allows kernel code to call into BPF
2122       programs with near zero overhead.
2123
2124       kfunc s make use of BTF type information to derive the type of function
2125       arguments at compile time. This removes the need for manual type
2126       casting and makes the code more resilient against small signature
2127       changes in the kernel. The function arguments are available in the args
2128       struct which can be inspected by doing verbose listing (see LISTING
2129       PROBES). These arguments are also available in the return probe
2130       (kretfunc).
2131
2132           # bpftrace -lv 'kfunc:tcp_reset'
2133           kfunc:tcp_reset
2134               struct sock * sk
2135               struct sk_buff * skb
2136
2137           kfunc:x86_pmu_stop {
2138             printf("pmu %s stop\n", str(args->event->pmu->name));
2139           }
2140
2141           kretfunc:fget {
2142             printf("fd %d name %s\n", args->fd, str(retval->f_path.dentry->d_name.name));
2143           }
2144
2145           fd 3 name ld.so.cache
2146           fd 3 name libselinux.so.1
2147           fd 3 name libselinux.so.1
2148           ...
2149
2150   kprobe and kretprobe
2151       variants
2152
2153       •   kprobe:fn
2154
2155       •   kprobe:fn+offset
2156
2157       •   kretprobe:fn
2158
2159       shortnames
2160
2161       •   k
2162
2163       •   kr
2164
2165       kprobe s allow for dynamic instrumentation of kernel functions. Each
2166       time the specified kernel function is executed the attached BPF
2167       programs are ran.
2168
2169           kprobe:tcp_reset {
2170             @tcp_resets = count()
2171           }
2172
2173       Function arguments are available through the argX and sargX builtins,
2174       for register args and stack args respectively. Whether arguments passed
2175       on stack or in a register depends on the architecture and the number or
2176       arguments in used, e.g. on x86_64 the first non-floating point 6
2177       arguments are passed in registers, all following arguments are passed
2178       on the stack. Note that floating point arguments are typically passed
2179       in special registers which don’t count as argX arguments which can
2180       cause confusion. Consider a function with the following signature:
2181
2182           void func(int a, double d, int x)
2183
2184       Due to d being a floating point x is accessed through arg1 where one
2185       might expect arg2.
2186
2187       bpftrace does not detect the function signature so it is not aware of
2188       the argument count or their type. It is up to the user to perform Type
2189       conversion when needed, e.g.
2190
2191           kprobe:tcp_connect
2192           {
2193             $sk = ((struct sock *) arg0);
2194             ...
2195           }
2196
2197       kprobe s are not limited to function entry, they can be attached to any
2198       instruction in a function by specifying an offset from the start of the
2199       function.
2200
2201       kretprobe s trigger on the return from a kernel function. Return probes
2202       do not have access to the function (input) arguments, only to the
2203       return value (through retval). A common pattern to work around this is
2204       by storing the arguments in a map on function entry and retrieving in
2205       the return probe:
2206
2207           kprobe:d_lookup
2208           {
2209                   $name = (struct qstr *)arg1;
2210                   @fname[tid] = $name->name;
2211           }
2212
2213           kretprobe:d_lookup
2214           /@fname[tid]/
2215           {
2216                   printf("%-8d %-6d %-16s M %s\n", elapsed / 1e6, pid, comm,
2217                       str(@fname[tid]));
2218           }
2219
2220   profile
2221       variants
2222
2223       •   profile:us:count
2224
2225       •   profile:ms:count
2226
2227       •   profile:s:count
2228
2229       •   profile:hz:rate
2230
2231       shortnames
2232
2233       •   p
2234
2235       Profile probes fire on each CPU on the specified interval.
2236
2237   software
2238       variants
2239
2240       •   software:event:
2241
2242       •   software:event:count
2243
2244       shortnames
2245
2246       •   s
2247
2248       The software probe attaches to pre-defined software events provided by
2249       the kernel. Event details can be found in the perf_event_open(2) man
2250       page.
2251
2252       The event names are:
2253
2254       •   cpu-clock or cpu
2255
2256       •   task-clock
2257
2258       •   page-faults or faults
2259
2260       •   context-switches or cs
2261
2262       •   cpu-migrations
2263
2264       •   minor-faults
2265
2266       •   major-faults
2267
2268       •   alignment-faults
2269
2270       •   emulation-faults
2271
2272       •   dummy
2273
2274       •   bpf-output
2275
2276   tracepoint
2277       variants
2278
2279       •   tracepoint:subsys:event
2280
2281       shortnames
2282
2283       •   t
2284
2285       Tracepoints are hooks into events in the kernel. Tracepoints are
2286       defined in the kernel source and compiled into the kernel binary which
2287       makes them a form of static tracing. Which means that unlike kprobe s
2288       new tracepoints cannot be added without modifying the kernel.
2289
2290       The advantage of tracepoints is that they generally provide a more
2291       stable interface than kprobe s do, they do not depend on the existence
2292       of a kernel function.
2293
2294       Tracepoint arguments are available in the args struct which can be
2295       inspected with verbose listing, see the LISTING PROBES section for more
2296       details.
2297
2298           tracepoint:syscalls:sys_enter_openat {
2299             printf("%s %s\n", comm, str(args->filename));
2300           }
2301
2302           irqbalance /proc/interrupts
2303           irqbalance /proc/stat
2304           snmpd /proc/diskstats
2305           snmpd /proc/stat
2306           snmpd /proc/vmstat
2307           snmpd /proc/net/dev
2308           [...]
2309
2310       Additional information
2311
2312https://www.kernel.org/doc/html/latest/trace/tracepoints.html
2313
2314   uprobe, uretprobe
2315       variants
2316
2317       •   uprobe:binary:func
2318
2319       •   uprobe:binary:func+offset
2320
2321       •   uretprobe:binary:func
2322
2323       shortnames
2324
2325       •   u
2326
2327       •   ur
2328
2329       uprobe s or user-space probes are the user-space equivalent of kprobe
2330       s. The same limitations that apply kprobe and kretprobe also apply to
2331       uprobe s and uretprobe s.
2332
2333       When tracing libraries, it is sufficient to specify the library name
2334       instead of a full path. The path will be then automatically resolved
2335       using /etc/ld.so.cache:
2336
2337           # bpftrace -e 'uprobe:libc:malloc { printf("Allocated %d bytes\n", arg0); }'
2338           Allocated 4 bytes
2339           ...
2340
2341       If the traced binary has DWARF included, function arguments are
2342       available in the args struct which can be inspected with verbose
2343       listing, see the LISTING PROBES section for more details.
2344
2345       It is important to note that for uretprobe s to work the kernel runs a
2346       special helper on user-space function entry which overrides the return
2347       address on the stack. This can cause issues with languages that have
2348       their own runtime like Golang:
2349
2350       example.go
2351
2352           func myprint(s string) {
2353             fmt.Printf("Input: %s\n", s)
2354           }
2355
2356           func main() {
2357             ss := []string{"a", "b", "c"}
2358             for _, s := range ss {
2359               go myprint(s)
2360             }
2361             time.Sleep(1*time.Second)
2362           }
2363
2364       bpftrace
2365
2366           # bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
2367           runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
2368           stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
2369           fatal error: unknown caller pc
2370
2371   usdt
2372       variants
2373
2374       •   usdt:binary:name
2375
2376       shortnames
2377
2378       •   U
2379
2380   watchpoint and asyncwatchpoint
2381       variants
2382
2383       •   watchpoint:absolute_address:length:mode
2384
2385       •   watchpoint:function+argN:length:mode
2386
2387       shortnames
2388
2389       •   w
2390
2391       •   aw
2392
2393       These are memory watchpoints provided by the kernel. Whenever a memory
2394       address is written to (w), read from (r), or executed (x), the kernel
2395       can generate an event.
2396
2397       In the first form, an absolute address is monitored. If a pid (-p) or a
2398       command (-c) is provided, bpftrace takes the address as a userspace
2399       address and monitors the appropriate process. If not, bpftrace takes
2400       the address as a kernel space address.
2401
2402       In the second form, the address present in argN when function is
2403       entered is monitored. A pid or command must be provided for this form.
2404       If synchronous (watchpoint), a SIGSTOP is sent to the tracee upon
2405       function entry. The tracee will be SIGCONTed after the watchpoint is
2406       attached. This is to ensure events are not missed. If you want to avoid
2407       the SIGSTOP + SIGCONT use asyncwatchpoint.
2408
2409       Note that on most architectures you may not monitor for execution while
2410       monitoring read or write.
2411
2412       Examples
2413
2414       Print hit when a read from or write to 0x10000000 happens:
2415
2416           # bpftrace -e 'watchpoint:0x10000000:8:rw { printf("hit!\n"); exit(); }' -c ./testprogs/watchpoint
2417
2418       Print the call stack every time the jiffies variable is updated:
2419
2420           # bpftrace -e "watchpoint:0x$(awk '$3 == "jiffies" {print $1}' /proc/kallsyms):8:w {
2421             @[kstack] = count();
2422           }
2423
2424           i:s:1 { exit(); }"
2425           ......
2426           @[
2427               do_timer+12
2428               tick_do_update_jiffies64.part.22+89
2429               tick_sched_do_timer+103
2430               tick_sched_timer+39
2431               __hrtimer_run_queues+256
2432               hrtimer_interrupt+256
2433               smp_apic_timer_interrupt+106
2434               apic_timer_interrupt+15
2435               cpuidle_enter_state+188
2436               cpuidle_enter+41
2437               do_idle+536
2438               cpu_startup_entry+25
2439               start_secondary+355
2440               secondary_startup_64+164
2441           ]: 319
2442
2443       "hit" and exit when the memory pointed to by arg1 of increment is
2444       written to.
2445
2446           # cat wpfunc.c
2447           #include <stdio.h>
2448           #include <stdlib.h>
2449           #include <unistd.h>
2450
2451           __attribute__((noinline))
2452           void increment(__attribute__((unused)) int _, int *i)
2453           {
2454             (*i)++;
2455           }
2456
2457           int main()
2458           {
2459             int *i = malloc(sizeof(int));
2460             while (1)
2461             {
2462               increment(0, i);
2463               (*i)++;
2464               usleep(1000);
2465             }
2466           }
2467
2468           # bpftrace -e 'watchpoint:increment+arg1:4:w { printf("hit!\n"); exit() }' -c ./wpfunc
2469

LISTING PROBES

2471       Probe listing is the method to discover which probes are supported by
2472       the current system. Listing supports the same syntax as normal
2473       attachment does:
2474
2475           # bpftrace -l 'kprobe:*'
2476           # bpftrace -l 't:syscalls:*openat*
2477           # bpftrace -l 'kprobe:tcp*,trace
2478           # bpftrace -l 'k:*socket*,tracepoint:syscalls:*tcp*'
2479
2480       The verbose flag (-v) can be specified to inspect arguments (args) for
2481       providers that support it:
2482
2483           # bpftrace -l 'fr:tcp_reset,t:syscalls:sys_enter_openat' -v
2484           kretfunc:tcp_reset
2485               struct sock * sk
2486               struct sk_buff * skb
2487           tracepoint:syscalls:sys_enter_openat
2488               int __syscall_nr
2489               int dfd
2490               const char * filename
2491               int flags
2492               umode_t mode
2493           # bpftrace -l 'uprobe:/bin/bash:rl_set_prompt' -v    # works only if /bin/bash has DWARF
2494           uprobe:/bin/bash:rl_set_prompt
2495               const char *prompt
2496
2497
2498
2499                                  2022-09-26                       BPFTRACE(8)
Impressum