1BPFTRACE(8)                                                        BPFTRACE(8)
2
3
4

NAME

6       bpftrace - a high-level tracing language
7

SYNOPSIS

9       bpftrace [OPTIONS] FILENAME
10       bpftrace [OPTIONS] -e 'program code'
11

DESCRIPTION

13       bpftrace is a high-level tracing language and runtime for Linux based
14       on BPF. It supports static and dynamic tracing for both the kernel and
15       user-space.
16
17       When FILENAME is "-", read from stdin.
18

EXAMPLES

20       List all probes with "sleep" in their name
21
22             # bpftrace -l '*sleep*'
23
24       Trace processes calling sleep
25
26             # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }'
27
28       Trace processes calling sleep while spawning sleep 5 as a child process
29
30             # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }' -c 'sleep 5'
31

SUPPORTED ARCHITECTURES

33       x86_64, arm64 and s390x
34

OPTIONS

36   Output format
37       -B MODE, Set the buffer mode for stdout. Valid values are
38           none No buffering. Each I/O is written as soon as possible
39           line Data is written on the first newline or when the buffer is
40           full. This is the default mode.
41           full Data is written once the buffer is full.
42
43       -f FORMAT, Set the output format. Valid values are
44           json
45           text
46
47       -o FILENAME
48           Write bpftrace tracing output to FILENAME instead of stdout. This
49           doesn’t include child process (-c option) output. Errors are still
50           written to stderr.
51
52       --no-warnings
53           Suppress all warning messages created by bpftrace.
54
55   Tracing
56       -e PROGRAM
57           Execute PROGRAM instead of reading the program from a file
58
59       -I DIR
60           Add the directory DIR to the search path for C headers. This option
61           can be used multiple times.
62
63       --include FILENAME
64           Add FILENAME as an include for the pre-processor. This is equal to
65           adding '#include FILENAME' to the start bpftrace program. This
66           option can be used multiple times.
67
68       -l [SEARCH]
69           List all probes that match the SEARCH pattern. If the pattern is
70           omitted all probes will be listed. This pattern supports wildcards
71           in the same way that probes do. E.g. '-l kprobe:*file*' to list all
72           'kprobes' with 'file' in the name. For more details see the LISTING
73           PROBES section.
74
75       --unsafe
76           Some calls, like 'system', are marked as unsafe as they can have
77           dangerous side effects ('system("rm -rf")') and are disabled by
78           default. This flag allows their use.
79
80       -k
81           Errors from bpf-helpers(7) are silently ignored by default which
82           can lead to strange results. This flag enables the detection of
83           errors (except for errors from 'probe_read_*'). When errors occurs
84           bpftrace will log an error containing the source location and the
85           error code:
86
87           stdin:48-57: WARNING: Failed to probe_read_user_str: Bad address (-14)
88           u:lib.so:"fn(char const*)" { printf("arg0:%s\n", str(arg0));}
89                                                            ~~~~~~~~~
90
91       -kk
92           Same as '-k' but also includes the errors from 'probe_read_*'
93           helpers.
94
95   Process management
96       -p PID
97           Attach to the process with PID. If the process terminates, bpftrace
98           will also terminate. When using USDT probes they will be attached
99           to only this process.
100
101       -c COMMAND
102           Run COMMAND as a child process. When the child terminates bpftrace
103           stops as well, as if 'exit()' has been called. If bpftrace
104           terminates before the child process does the child process will be
105           terminated with a SIGTERM. If used, 'USDT' probes these will only
106           be attached to the child process. To avoid a race condition when
107           using 'USDTs' the child is stopped after 'execve' using 'ptrace(2)'
108           and continued when all 'USDT' probes are attached.
109           The child PID is available to programs as the 'cpid' builtin.
110           The child process runs with the same privileges as bpftrace itself
111           (usually root).
112
113       --usdt-file-activation
114           activate usdt semaphores based on file path
115
116   Miscellaneous
117       --info
118           Print detailed information about features supported by the kernel
119           and the bpftrace build.
120
121       -h, --help
122           Print the help summary
123
124       -V, --version
125           Print bpftrace version information
126
127       -v
128           verbose messages
129
130       -d
131           debug mode
132
133       -dd
134           verbose debug mode
135

ENVIRONMENT VARIABLES

137       Some behavior can only be controlled through environment variables.
138       This section lists all those variables.
139
140   BPFTRACE_STRLEN
141       Default: 64
142
143       Number of bytes allocated on the BPF stack for the string returned by
144       str().
145
146       Make this larger if you wish to read bigger strings with str().
147
148       Beware that the BPF stack is small (512 bytes).
149
150       Support for even larger strings is [being
151       discussed](https://github.com/iovisor/bpftrace/issues/305).
152
153   BPFTRACE_NO_CPP_DEMANGLE
154       Default: 0
155
156       C++ symbol demangling in user space stack traces is enabled by default.
157
158       This feature can be turned off by setting the value of this environment
159       variable to 1.
160
161   BPFTRACE_MAP_KEYS_MAX
162       Default: 4096
163
164       This is the maximum number of keys that can be stored in a map.
165       Increasing the value will consume more memory and increase startup
166       times. There are some cases where you will want to: for example,
167       sampling stack traces, recording timestamps for each page, etc.
168
169   BPFTRACE_MAX_PROBES
170       Default: 512
171
172       This is the maximum number of probes that bpftrace can attach to.
173       Increasing the value will consume more memory, increase startup times
174       and can incur high performance overhead or even freeze or crash the
175       system.
176
177   BPFTRACE_CACHE_USER_SYMBOLS
178       Default: PER_PROGRAM if ASLR disabled or -c option given, PER_PID
179       otherwise.
180
181       Caching strategy for user symbols. Valid values are:
182
183       •   PER_PROGRAM - each program has its own cache. If there are more
184           processes with enabled ASLR for a single program, this might
185           produce incorrect results.
186
187       •   PER_PID - each process has its own cache. This is accurate for
188           processes with ASLR enabled, and enables bpftrace to preload caches
189           for processes running at probe attachement time. If there are many
190           processes running, it will consume a lot of a memory.
191
192       •   NONE - caching disabled. This saves the most memory, but at the
193           cost of speed.
194
195   BPFTRACE_VMLINUX
196       Default: None
197
198       This specifies the vmlinux path used for kernel symbol resolution when
199       attaching kprobe to offset. If this value is not given, bpftrace
200       searches vmlinux from pre defined locations. See
201       src/attached_probe.cpp:find_vmlinux() for details.
202
203   BPFTRACE_BTF
204       Default: None
205
206       The path to a BTF file. By default, bpftrace searches several locations
207       to find a BTF file. See src/btf.cpp for the details.
208
209   BPFTRACE_PERF_RB_PAGES
210       Default: 64
211
212       Number of pages to allocate per CPU for perf ring buffer. The value
213       must be a power of 2.
214
215       If you’re getting a lot of dropped events bpftrace may not be
216       processing events in the ring buffer fast enough. It may be useful to
217       bump the value higher so more events can be queued up. The tradeoff is
218       that bpftrace will use more memory.
219
220   BPFTRACE_MAX_BPF_PROGS
221       Default: 512
222
223       This is the maximum number of BPF programs (functions) that bpftrace
224       can generate. The main purpose of this limit is to prevent bpftrace
225       from hanging since generating a lot of probes takes a lot of resources
226       (and it should not happen often).
227
228   BPFTRACE_STR_TRUNC_TRAILER
229       Default: ..
230
231       Trailer to add to strings that were truncated. Set to empty string to
232       disable truncation trailers.
233
234   BPFTRACE_STACK_MODE
235       Default: bpftrace
236
237       Output format for ustack and kstack builtins. Available modes/formats:
238       bpftrace, perf, and raw. This can be overwritten at the call site.
239

BPFTRACE LANGUAGE

241   Overview
242       The bpftrace (bt) language is inspired by the D language used by dtrace
243       and uses the same program structure. Each script consists of an
244       preamble and one or more action blocks.
245
246           preamble
247
248           actionblock1
249           actionblock2
250
251       Preprocessor and type definitions take place in the preamble:
252
253           #include <linux/socket.h>
254           #define RED "\033[31m"
255
256           struct S {
257             int x;
258           }
259
260       Each action block consists of three parts:
261
262           probe[,probe]
263           /predicate/ {
264             action
265           }
266
267       Probes
268           A probe specifies the event and event type to attach too.
269
270       Predicate
271           The predicate is optional condition that must be met for the action
272           to be executed.
273
274       Action
275             Actions are the programs that run when an event fires (and the
276           predicate is met). An action is a semicolon (;) separated list of
277           statements and always enclosed by brackets {}
278
279       A basic script that traces the open(2) and openat(2) system calls can
280       be written as follows:
281
282           BEGIN
283           {
284                   printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
285           }
286
287           tracepoint:syscalls:sys_enter_open,
288           tracepoint:syscalls:sys_enter_openat
289           {
290                   printf("%-6d %-16s %s\n", pid, comm, str(args.filename));
291           }
292
293       This script has two action blocks and a total of 3 probes. The first
294       action block uses the special BEGIN probe, which fires once during
295       bpftrace startup. This probe is used to print a header, indicating that
296       the tracing has started.
297
298       The second action block uses two probes, one for open and one for
299       openat, and defines an action that prints the file being open ed as
300       well as the pid and comm of the process that execute the syscall. See
301       the PROBES section for details on the available probe types.
302
303   Identifiers
304       Identifiers must match the following regular expression:
305       [_a-zA-Z][_a-zA-Z0-9]*
306
307   Comments
308       Both single line and multi line comments are supported.
309
310           // A single line comment
311           i:s:1 { // can also be used to comment inline
312           /*
313            a multi line comment
314
315           */
316             print(/* inline comment block */ 1);
317           }
318
319   Data Types
320       The following fundamental integer types are provided by the language.
321
322       ┌───────┬─────────────────────────┐
323       │       │                         │
324Type   Description             
325       ├───────┼─────────────────────────┤
326       │       │                         │
327       │uint8  │ Unsigned 8 bit integer  │
328       ├───────┼─────────────────────────┤
329       │       │                         │
330       │int8   │ Signed 8 bit integer    │
331       ├───────┼─────────────────────────┤
332       │       │                         │
333       │uint16 │ Unsigned 16 bit integer │
334       ├───────┼─────────────────────────┤
335       │       │                         │
336       │int16  │ Signed 16 bit integer   │
337       ├───────┼─────────────────────────┤
338       │       │                         │
339       │uint32 │ Unsigned 32 bit integer │
340       ├───────┼─────────────────────────┤
341       │       │                         │
342       │int32  │ Signed 32 bit integer   │
343       ├───────┼─────────────────────────┤
344       │       │                         │
345       │uint64 │ Unsigned 64 bit integer │
346       ├───────┼─────────────────────────┤
347       │       │                         │
348       │int64  │ Signed 64 bit integer   │
349       └───────┴─────────────────────────┘
350
351   Floating-point
352       Floating-point numbers are not supported by BPF and therefore not by
353       bpftrace.
354
355   Constants
356       Integers constants can be defined in the following formats:
357
358       •   decimal (base 10)
359
360       •   octal (base 8)
361
362       •   hexadecimal (base 16)
363
364       •   scientific (base 10)
365
366       Octal constants have to be prefixed with a 0, e.g. 0123. Hexadecimal
367       constants start with either 0x or 0X, e.g. 0x10. Scientific are written
368       in the <m>e<n> format which is a shorthand for m*10^n, e.g. $i = 2e3;.
369       Note that scientific literals are integer only due to the lack of
370       floating point support, 1e-3 is not valid.
371
372       To improve the readability of big literals a underscore _ can be used
373       as field separator, e.g. 1_000_123_000.
374
375       Integer suffixes as found in the C language are parsed by bpftrace to
376       ensure compatibility with C headers/definitions but they’re not used as
377       size specifiers. 123UL, 123U and 123LL all result in the same integer
378       type with a value of 123.
379
380       Character constants can be defined by enclosing the character in single
381       quotes, e.g. $c = 'c';.
382
383       String constants can be defined by enclosing the character string in
384       double quotes, e.g. $str = "Hello world";.
385
386       Characters and strings support the following escape sequences:
387
388       ┌─────┬──────────────────────┐
389       │     │                      │
390       │\n   │ Newline              │
391       ├─────┼──────────────────────┤
392       │     │                      │
393       │\t   │ Tab                  │
394       ├─────┼──────────────────────┤
395       │     │                      │
396       │\0nn │ Octal value nn       │
397       ├─────┼──────────────────────┤
398       │     │                      │
399       │\xnn │ Hexadecimal value nn │
400       └─────┴──────────────────────┘
401
402   Type conversion
403       Integer and pointer types can be converted using explicit type
404       conversion with an expression like:
405
406           $y = (uint32) $z;
407           $py = (int16 *) $pz;
408
409       Integer casts to a higher rank are sign extended. Conversion to a lower
410       rank is done by zeroing leading bits.
411
412       It is also possible to cast between integers and integer arrays using
413       the same syntax:
414
415           $a = (uint8[8]) 12345;
416           $x = (uint64) $a;
417
418       Both the cast and the destination type must have the same size. When
419       casting to an array, it is possible to omit the size which will be
420       determined automatically from the size of the cast value.
421
422   Operators and Expressions
423   Arithmetic Operators
424       The following operators are available for integer arithmetic:
425
426       ┌──┬────────────────────────┐
427       │  │                        │
428       │+ │ integer addition       │
429       ├──┼────────────────────────┤
430       │  │                        │
431       │- │ integer subtraction    │
432       ├──┼────────────────────────┤
433       │  │                        │
434       │* │ integer multiplication │
435       ├──┼────────────────────────┤
436       │  │                        │
437       │/ │ integer division       │
438       ├──┼────────────────────────┤
439       │  │                        │
440       │% │ integer modulo         │
441       └──┴────────────────────────┘
442
443   Logical Operators
444       ┌───┬─────────────┐
445       │   │             │
446       │&& │ Logical AND │
447       ├───┼─────────────┤
448       │   │             │
449       │|| │ Logical OR  │
450       ├───┼─────────────┤
451       │   │             │
452       │!  │ Logical NOT │
453       └───┴─────────────┘
454
455   Bitwise Operators
456       ┌───┬───────────────────────────┐
457       │   │                           │
458       │&  │ AND                       │
459       ├───┼───────────────────────────┤
460       │   │                           │
461       │|  │ OR                        │
462       ├───┼───────────────────────────┤
463       │   │                           │
464       │^  │ XOR                       │
465       ├───┼───────────────────────────┤
466       │   │                           │
467       │<< │ Left shift the left-hand  │
468       │   │ operand by the number of  │
469       │   │ bits specified by the     │
470       │   │ right-hand expression     │
471       │   │ value                     │
472       ├───┼───────────────────────────┤
473       │   │                           │
474       │>> │ Right shift the left-hand │
475       │   │ operand by the number of  │
476       │   │ bits specified by the     │
477       │   │ right-hand expression     │
478       │   │ value                     │
479       └───┴───────────────────────────┘
480
481   Relational Operators
482       The following relational operators are defined for integers and
483       pointers.
484
485       ┌───┬────────────────────────────┐
486       │   │                            │
487       │<  │ left-hand expression is    │
488       │   │ less than right-hand       │
489       ├───┼────────────────────────────┤
490       │   │                            │
491       │<= │ left-hand expression is    │
492       │   │ less than or equal to      │
493       │   │ right-hand                 │
494       ├───┼────────────────────────────┤
495       │   │                            │
496       │>  │ left-hand expression is    │
497       │   │ bigger than right-hand     │
498       ├───┼────────────────────────────┤
499       │   │                            │
500       │>= │ left-hand expression is    │
501       │   │ bigger or equal to than    │
502       │   │ right-hand                 │
503       ├───┼────────────────────────────┤
504       │   │                            │
505       │== │ left-hand expression equal │
506       │   │ to right-hand              │
507       ├───┼────────────────────────────┤
508       │   │                            │
509       │!= │ left-hand expression not   │
510       │   │ equal to right-hand        │
511       └───┴────────────────────────────┘
512
513       The following relation operators are available for comparing strings
514       and integer arrays.
515
516       ┌───┬────────────────────────────┐
517       │   │                            │
518       │== │ left-hand string equal to  │
519       │   │ right-hand                 │
520       ├───┼────────────────────────────┤
521       │   │                            │
522       │!= │ left-hand string not equal │
523       │   │ to right-hand              │
524       └───┴────────────────────────────┘
525
526   Assignment Operators
527       The following assignment operators can be used on both map and scratch
528       variables:
529
530       ┌────┬────────────────────────────┐
531       │    │                            │
532       │=   │ Assignment, assign the     │
533       │    │ right-hand expression to   │
534       │    │ the left-hand variable     │
535       ├────┼────────────────────────────┤
536       │    │                            │
537       │<<= │ Update the variable with   │
538       │    │ its value left shifted by  │
539       │    │ the number of bits         │
540       │    │ specified by the           │
541       │    │ right-hand expression      │
542       │    │ value                      │
543       ├────┼────────────────────────────┤
544       │    │                            │
545       │>>= │ Update the variable with   │
546       │    │ its value right shifted by │
547       │    │ the number of bits         │
548       │    │ specified by the           │
549       │    │ right-hand expression      │
550       │    │ value                      │
551       ├────┼────────────────────────────┤
552       │    │                            │
553       │+=  │ Increment the variable by  │
554       │    │ the right-hand expression  │
555       │    │ value                      │
556       ├────┼────────────────────────────┤
557       │    │                            │
558       │-=  │ Decrement the variable by  │
559       │    │ the right-hand expression  │
560       │    │ value                      │
561       ├────┼────────────────────────────┤
562       │    │                            │
563       │*=  │ Multiple the variable by   │
564       │    │ the right-hand expression  │
565       │    │ value                      │
566       ├────┼────────────────────────────┤
567       │    │                            │
568       │/=  │ Divide the variable by the │
569       │    │ right-hand expression      │
570       │    │ value                      │
571       ├────┼────────────────────────────┤
572       │    │                            │
573       │%=  │ Modulo the variable by the │
574       │    │ right-hand expression      │
575       │    │ value                      │
576       ├────┼────────────────────────────┤
577       │    │                            │
578       │&=  │ Bitwise AND the variable   │
579       │    │ by the right-hand          │
580       │    │ expression value           │
581       ├────┼────────────────────────────┤
582       │    │                            │
583       │|=  │ Bitwise OR the variable by │
584       │    │ the right-hand expression  │
585       │    │ value                      │
586       ├────┼────────────────────────────┤
587       │    │                            │
588       │^=  │ Bitwise XOR the variable   │
589       │    │ by the right-hand          │
590       │    │ expression value           │
591       └────┴────────────────────────────┘
592
593       All these operators are syntactic sugar for combining assignment with
594       the specified operator. @ -= 5 is equal to @ = @ - 5.
595
596   Increment and Decrement Operators
597       The increment (+`) and decrement (`--`) operators can be used on
598       integer and pointer variables to increment their value by one. They can
599       only be used on variables and can either be applied as prefix or
600       suffix. The difference is that the expression `x+ returns the original
601       value of x, before it got incremented while ++x returns the value of x
602       post increment. E.g.
603
604           $x = 10;
605           $y = $x--; // y = 10; x = 9
606           $a = 10;
607           $b = --$a; // a = 9; b = 9
608
609       Note that maps will be implicitly declared and initialized to 0 if not
610       already declared or defined. Scratch variables must be initialized
611       before using these operators.
612
613   Variables and Maps
614       bpftrace knows two types of variables, scratch and map.
615
616       'scratch' variables are kept on the BPF stack and only exists during
617       the execution of the action block and cannot be accessed outside of the
618       program. Scratch variable names always start with a $, e.g. $myvar.
619
620       'map' variables use BPF 'maps'. These exist for the lifetime of
621       bpftrace itself and can be accessed from all action blocks and
622       user-space. Map names always start with a @, e.g. @mymap.
623
624       All valid identifiers can be used as name.
625
626       The data type of a variable is automatically determined during first
627       assignment and cannot be changed afterwards.
628
629   Associative Arrays
630       Associative arrays are a collection of elements indexed by a key,
631       similar to the hash tables found in languages like C++ (std::map) and
632       Python (dict). They’re a variant of 'map' variables.
633
634           @name[key] = expression
635           @name[key1,key2] = expression
636
637       Just like with any variable the type is determined on first use and
638       cannot be modified afterwards. This applies to both the key(s) and the
639       value type.
640
641       The following snippet creates a map with key signature [int64,
642       string[16]] and a value type of int64:
643
644           @[pid, comm]++
645
646   Variable scoping
647   Pointers
648       Pointers in bpftrace are similar to those found in C.
649
650   Tuples
651       bpftrace has support for immutable N-tuples (n > 1). A tuple is a
652       sequence type (like an array) where, unlike an array, every element can
653       have a different type.
654
655       Tuples are a comma separated list of expressions, enclosed in brackets,
656       (1,2) Individual fields can be accessed with the . operator. Tuples are
657       zero indexed like arrays are.
658
659           i:s:1 {
660             $a = (1,2);
661             $b = (3,4, $a);
662             print($a);
663             print($b);
664             print($b.0);
665           }
666
667       Prints:
668
669           (1, 2)
670           (3, 4, (1, 2))
671           3
672
673   Arrays
674       bpftrace supports accessing one-dimensional arrays like those found in
675       C.
676
677       Constructing arrays from scratch, like int a[] = {1,2,3} in C, is not
678       supported. They can only be read into a variable from a pointer.
679
680       The [] operator is used to access elements.
681
682           struct MyStruct {
683             int y[4];
684           }
685
686           kprobe:dummy {
687             $s = (struct MyStruct *) arg0;
688             print($s->y[0]);
689           }
690
691   Structs
692       C like structs are supported by bpftrace. Fields are accessed with the
693       . operator. Fields of a pointer to a struct can be accessed with the ->
694       operator.
695
696       Custom struct can be defined in the preamble
697
698       Constructing structs from scratch, like struct X var = {.f1 = 1} in C,
699       is not supported. They can only be read into a variable from a pointer.
700
701           struct MyStruct {
702             int a;
703           }
704
705           kprobe:dummy {
706             $ptr = (struct MyStruct *) arg0;
707             $st = *$ptr;
708             print($st.a);
709             print($ptr->a);
710           }
711
712   Conditionals
713       Conditional expressions are supported in the form of if/else statements
714       and the ternary operator.
715
716       The ternary operator consists of three operands: a condition followed
717       by a ?, the expression to execute when the condition is true followed
718       by a : and the expression to execute if the condition is false.
719
720           condition ? ifTrue : ifFalse
721
722       Both the ifTrue and ifFalse expressions must be of the same type,
723       mixing types is not allowed.
724
725       The ternary operator can be used as part of an assignment.
726
727           $a == 1 ? print("true") : print("false");
728           $b = $a > 0 ? $a : -1;
729
730       If/else statements, like the one in C, are supported.
731
732           if (condition) {
733             ifblock
734           } else if (condition) {
735             if2block
736           } else {
737             elseblock
738           }
739
740   Loops
741       Since kernel 5.3 BPF supports loops as long as the verifier can prove
742       they’re bounded and fit within the instruction limit.
743
744       In bpftrace loops are available through the while statement.
745
746           while (condition) {
747             block;
748           }
749
750       Within a while-loop the following control flow statements can be used:
751
752       ┌─────────┬────────────────────────────┐
753       │         │                            │
754       │continue │ skip processing of the     │
755       │         │ rest of the block and jump │
756       │         │ back to the evaluation of  │
757       │         │ the conditional            │
758       ├─────────┼────────────────────────────┤
759       │         │                            │
760       │break    │ Terminate the loop         │
761       └─────────┴────────────────────────────┘
762
763           i:s:1 {
764             $i = 0;
765             while ($i <= 100) {
766               printf("%d ", $i);
767               if ($i > 5) {
768                 break;
769               }
770               $i++
771             }
772             printf("\n");
773           }
774
775       Loop unrolling is also supported with the unroll statement.
776
777           unroll(n) {
778             block;
779           }
780
781       The compiler will evaluate the block n times and generate the BPF code
782       for the block n times. As this happens at compile time n must be a
783       constant greater than 0 (n > 0).
784
785       The following two probes compile into the same code:
786
787           i:s:1 {
788             unroll(3) {
789               print("Unrolled")
790             }
791           }
792
793           i:s:1 {
794             print("Unrolled")
795             print("Unrolled")
796             print("Unrolled")
797           }
798

INVOCATION MODE

800       There are three invocation modes for bpftrace built-in functions.
801
802       ┌─────────────┬─────────────────────┬────────────────────┐
803       │             │                     │                    │
804       │Mode         │ Description         │ Example functions  │
805       ├─────────────┼─────────────────────┼────────────────────┤
806       │             │                     │                    │
807       │Synchronous  │ The value/effect of │ reg(), str(),      │
808       │             │ the built-in        │ ntop()             │
809       │             │ function is         │                    │
810       │             │ determined/handled  │                    │
811       │             │ right away by the   │                    │
812       │             │ bpf program in the  │                    │
813       │             │ kernel space.       │                    │
814       ├─────────────┼─────────────────────┼────────────────────┤
815       │             │                     │                    │
816       │Asynchronous │ The value/effect of │ printf(), clear(), │
817       │             │ the built-in        │ exit()             │
818       │             │ function is         │                    │
819       │             │ determined/handled  │                    │
820       │             │ later by the        │                    │
821       │             │ bpftrace process in │                    │
822       │             │ the user space.     │                    │
823       ├─────────────┼─────────────────────┼────────────────────┤
824       │             │                     │                    │
825       │Compile-time │ The value of the    │ kaddr(),           │
826       │             │ built-in function   │ cgroupid(),        │
827       │             │ is determined       │ offsetof()         │
828       │             │ before bpf programs │                    │
829       │             │ are running.        │                    │
830       └─────────────┴─────────────────────┴────────────────────┘
831
832       While BPF in the kernel can do a lot there are still things that can
833       only be done from user space, like the outputting (printing) of data.
834       The way bpftrace handles this is by sending events from the BPF program
835       which user-space will pick up some time in the future (usually in
836       milliseconds). Operations that happen in the kernel are 'synchronous'
837       ('sync') and those that are handled in user space are 'asynchronous'
838       ('async')
839
840       The asynchronous behaviour can lead to some unexpected behavior as
841       updates can happen before user space had time to process the event. The
842       following situations may occur:
843
844       •   event loss: when using printf(), the amount of data printed may be
845           less than the actual number of events generated by the kernel
846           during BPF program’s execution.
847
848       •   delayed exit: when using the exit() to terminate the program,
849           bpftrace needs to handle the exit signal asynchronously casuing the
850           BPF program may continue to run for some additional time.
851
852       One example is updating a map value in a tight loop:
853
854           BEGIN {
855               @=0;
856               unroll(10) {
857                 print(@);
858                 @++;
859               }
860               exit()
861           }
862
863       Maps are printed by reference not by value and as the value gets
864       updated right after the print user-space will likely only see the final
865       value once it processes the event:
866
867           @: 10
868           @: 10
869           @: 10
870           @: 10
871           @: 10
872           @: 10
873           @: 10
874           @: 10
875           @: 10
876           @: 10
877
878       Therefore, when you need precise event statistics, it is recommended to
879       use synchronous functions (e.g. count() and hist()) to ensure more
880       reliable and accurate results.
881

ADDRESS-SPACES

883       Kernel and user pointers live in different address spaces which,
884       depending on the CPU architecture, might overlap. Trying to read a
885       pointer that is in the wrong address space results in a runtime error.
886       This error is hidden by default but can be enabled with the -kk flag:
887
888           stdin:1:9-12: WARNING: Failed to probe_read_user: Bad address (-14)
889           BEGIN { @=*uptr(kaddr("do_poweroff")) }
890                   ~~~
891
892       bpftrace tries to automatically set the correct address space for a
893       pointer based on the probe type, but might fail in cases where it is
894       unclear. The address space can be changed with the kptr() and uptr()
895       functions.
896

BUILTINS

898       Builtins are special variables built into the language. Unlike the
899       scratch and map variable they don’t need a $ or @ as prefix (except for
900       the positional parameters).
901
902       ┌──────────────┬─────────────┬────────────┬───────────────────────┬───────────────────┐
903       │              │             │            │                       │                   │
904       │Variable      │ Type        │ Kernel     │ BPF Helper            │ Description       │
905       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
906       │              │             │            │                       │                   │
907       │$1, $2, ...$n │ int64       │ n/a        │ n/a                   │ The nth           │
908       │              │             │            │                       │ positional        │
909       │              │             │            │                       │ parameter         │
910       │              │             │            │                       │ passed to the     │
911       │              │             │            │                       │ bpftrace          │
912       │              │             │            │                       │ program. If       │
913       │              │             │            │                       │ less than n       │
914       │              │             │            │                       │ parameters        │
915       │              │             │            │                       │ are passed        │
916       │              │             │            │                       │ this              │
917       │              │             │            │                       │ evaluates to      │
918       │              │             │            │                       │ 0. For string     │
919       │              │             │            │                       │ arguments use     │
920       │              │             │            │                       │ the str()         │
921       │              │             │            │                       │ call to           │
922       │              │             │            │                       │ retrieve the      │
923       │              │             │            │                       │ value.            │
924       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
925       │              │             │            │                       │                   │
926       │$#            │ int64       │ n/a        │ n/a                   │ Total amount      │
927       │              │             │            │                       │ of positional     │
928       │              │             │            │                       │ parameters        │
929       │              │             │            │                       │ passed.           │
930       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
931       │              │             │            │                       │                   │
932       │arg0, arg1,   │ int64       │ n/a        │ n/a                   │ nth argument      │
933       │...argn       │             │            │                       │ passed to the     │
934       │              │             │            │                       │ function          │
935       │              │             │            │                       │ being traced.     │
936       │              │             │            │                       │ These are         │
937       │              │             │            │                       │ extracted         │
938       │              │             │            │                       │ from the CPU      │
939       │              │             │            │                       │ registers.        │
940       │              │             │            │                       │ The amount of     │
941       │              │             │            │                       │ args passed       │
942       │              │             │            │                       │ in registers      │
943       │              │             │            │                       │ depends on        │
944       │              │             │            │                       │ the CPU           │
945       │              │             │            │                       │ architecture.     │
946       │              │             │            │                       │ (kprobes,         │
947       │              │             │            │                       │ uprobes,          │
948       │              │             │            │                       │ usdt).            │
949       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
950       │              │             │            │                       │                   │
951       │args          │ struct args │ n/a        │ n/a                   │ The struct of     │
952       │              │             │            │                       │ all arguments     │
953       │              │             │            │                       │ of the traced     │
954       │              │             │            │                       │ function.         │
955       │              │             │            │                       │ Available in      │
956       │              │             │            │                       │ tracepoint,       │
957       │              │             │            │                       │ kfunc, and        │
958       │              │             │            │                       │ uprobe (with      │
959       │              │             │            │                       │ DWARF)            │
960       │              │             │            │                       │ probes. Use       │
961       │              │             │            │                       │ args.x to         │
962       │              │             │            │                       │ access            │
963       │              │             │            │                       │ argument x or     │
964       │              │             │            │                       │ args to get a     │
965       │              │             │            │                       │ record with       │
966       │              │             │            │                       │ all               │
967       │              │             │            │                       │ arguments.        │
968       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
969       │              │             │            │                       │                   │
970       │cgroup        │ uint64      │ 4.18       │ get_current_cgroup_id │ ID of the         │
971       │              │             │            │                       │ cgroup the        │
972       │              │             │            │                       │ current task      │
973       │              │             │            │                       │ is in. Only       │
974       │              │             │            │                       │ works with        │
975       │              │             │            │                       │ cgroupv2.         │
976       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
977       │              │             │            │                       │                   │
978       │comm          │ string[16]  │ 4.2        │ get_current_com       │ comm of the       │
979       │              │             │            │                       │ current task.     │
980       │              │             │            │                       │ Equal to the      │
981       │              │             │            │                       │ value in          │
982       │              │             │            │                       │ /proc/<pid>/comm  │
983       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
984       │              │             │            │                       │                   │
985       │cpid          │ uint32      │ n/a        │ n/a                   │ PID of the child  │
986       │              │             │            │                       │ process           │
987       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
988       │              │             │            │                       │                   │
989       │numaid        │ uint32      │ 5.8        │ numa_node_id          │ ID of the NUMA    │
990       │              │             │            │                       │ node executing    │
991       │              │             │            │                       │ the BPF program   │
992       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
993       │              │             │            │                       │                   │
994       │cpu           │ uint32      │ 4.1        │ raw_smp_processor_id  │ ID of the         │
995       │              │             │            │                       │ processor         │
996       │              │             │            │                       │ executing the     │
997       │              │             │            │                       │ BPF program       │
998       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
999       │              │             │            │                       │                   │
1000       │curtask       │ uint64      │ 4.8        │ get_current_task      │ Pointer to        │
1001       │              │             │            │                       │ struct            │
1002       │              │             │            │                       │ task_struct of    │
1003       │              │             │            │                       │ the current task  │
1004       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1005       │              │             │            │                       │                   │
1006       │elapsed       │ uint64      │ (see nsec) │ ktime_get_ns /        │ Nanoseconds       │
1007       │              │             │            │ ktime_get_boot_ns     │ elapsed since     │
1008       │              │             │            │                       │ bpftrace          │
1009       │              │             │            │                       │ initialization,   │
1010       │              │             │            │                       │ based on nsecs    │
1011       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1012       │              │             │            │                       │                   │
1013       │func          │ string      │ n/a        │ n/a                   │ Name of the       │
1014       │              │             │            │                       │ current function  │
1015       │              │             │            │                       │ being traced      │
1016       │              │             │            │                       │ (kprobes,uprobes) │
1017       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1018       │              │             │            │                       │                   │
1019       │gid           │ uint64      │ 4.2        │ get_current_uid_gid   │ GID of current    │
1020       │              │             │            │                       │ task              │
1021       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1022       │              │             │            │                       │                   │
1023       │kstack        │ kstack      │            │ get_stackid           │ Kernel stack      │
1024       │              │             │            │                       │ trace             │
1025       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1026       │              │             │            │                       │                   │
1027       │nsecs         │ uint64      │ 4.1 / 5.7  │ ktime_get_ns /        │ nanoseconds since │
1028       │              │             │            │ ktime_get_boot_ns     │ kernel boot. On   │
1029       │              │             │            │                       │ kernels that      │
1030       │              │             │            │                       │ support           │
1031       │              │             │            │                       │ ktime_get_boot_ns │
1032       │              │             │            │                       │ this includes the │
1033       │              │             │            │                       │ time spent        │
1034       │              │             │            │                       │ suspended, on     │
1035       │              │             │            │                       │ older kernels it  │
1036       │              │             │            │                       │ does not.         │
1037       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1038       │              │             │            │                       │                   │
1039       │pid           │ uint64      │ 4.2        │ get_current_pid_tgid  │ Process ID (or    │
1040       │              │             │            │                       │ thread group ID)  │
1041       │              │             │            │                       │ of the current    │
1042       │              │             │            │                       │ task.             │
1043       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1044       │              │             │            │                       │                   │
1045       │probe         │ string      │ n/na       │ n/a                   │ Name of the       │
1046       │              │             │            │                       │ current probe     │
1047       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1048       │              │             │            │                       │                   │
1049       │rand          │ uint32      │ 4.1        │ get_prandom_u32       │ Random number     │
1050       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1051       │              │             │            │                       │                   │
1052       │retval        │ int64       │ n/a        │ n/a                   │ Value returned by │
1053       │              │             │            │                       │ the function      │
1054       │              │             │            │                       │ being traced      │
1055       │              │             │            │                       │ (kretprobe,       │
1056       │              │             │            │                       │ uretprobe,        │
1057       │              │             │            │                       │ kretfunc)         │
1058       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1059       │              │             │            │                       │                   │
1060       │sarg0, sarg1, │ int64       │ n/a        │ n/a                   │ nth stack value   │
1061       │...sargn      │             │            │                       │ of the function   │
1062       │              │             │            │                       │ being traced.     │
1063       │              │             │            │                       │ (kprobes,         │
1064       │              │             │            │                       │ uprobes).         │
1065       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1066       │              │             │            │                       │                   │
1067       │tid           │ uint64      │ 4.2        │ get_current_pid_tgid  │ Thread ID of the  │
1068       │              │             │            │                       │ current task.     │
1069       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1070       │              │             │            │                       │                   │
1071       │uid           │ uint64      │ 4.2        │ get_current_uid_gid   │ UID of current    │
1072       │              │             │            │                       │ task              │
1073       ├──────────────┼─────────────┼────────────┼───────────────────────┼───────────────────┤
1074       │              │             │            │                       │                   │
1075       │ustack        │ ustack      │ 4.6        │ get_stackid           │ Userspace stack   │
1076       │              │             │            │                       │ trace             │
1077       └──────────────┴─────────────┴────────────┴───────────────────────┴───────────────────┘
1078

MAP FUNCTIONS

1080       Map functions are built-in functions who’s return value can only be
1081       assigned to maps. The data type associated with these functions are
1082       only for internal use and are not compatible with the (integer)
1083       operators.
1084
1085       Functions that are marked async are asynchronous which can lead to
1086       unexpected behavior, see the [Sync and Async] section for more
1087       information.
1088
1089   avg
1090       variants
1091
1092       •   avg(int64 n)
1093
1094       Calculate the running average of n between consecutive calls.
1095
1096           i:s:1 {
1097             @x++;
1098             @y = avg(@x);
1099             print(@x);
1100             print(@y);
1101           }
1102
1103       Internally this keeps two values in the map: value count and running
1104       total. The average is computed in user-space when printing by dividing
1105       the total by the count.
1106
1107   clear
1108       variants
1109
1110       •   clear(map m)
1111
1112       async
1113
1114       Clear all keys/values from map m.
1115
1116           i:ms:100 {
1117             @[rand % 10] = count();
1118           }
1119
1120           i:s:10 {
1121             print(@);
1122             clear(@);
1123           }
1124
1125   count
1126       variants
1127
1128       •   count()
1129
1130       Count how often this function is called.
1131
1132       Using @=count() is conceptually similar to @++. The difference is that
1133       the count() function uses a map type optimized for this (PER_CPU),
1134       increasing performance. Due to this the map cannot be accessed as a
1135       regular integer.
1136
1137           i:ms:100 {
1138             @ = count();
1139           }
1140
1141           i:s:10 {
1142             print(@);
1143             clear(@);
1144           }
1145
1146   delete
1147       variants
1148
1149       •   delete(mapkey k)
1150
1151       Delete a single key from a map. For a single value map this deletes the
1152       only element. For an associative-array the key to delete has to be
1153       specified.
1154
1155           k:dummy {
1156             @scalar = 1;
1157             @associative[1,2] = 1;
1158             delete(@scalar);
1159             delete(@associative[1,2]);
1160
1161             delete(@associative); // error
1162           }
1163
1164   hist
1165       variants
1166
1167       •   hist(int64 n)
1168
1169       Create a log2 histogram of n.
1170
1171           kretprobe:vfs_read {
1172             @bytes = hist(retval);
1173           }
1174
1175       Results in:
1176
1177           @:
1178           [1M, 2M)               3 |                                                    |
1179           [2M, 4M)               2 |                                                    |
1180           [4M, 8M)               2 |                                                    |
1181           [8M, 16M)              6 |                                                    |
1182           [16M, 32M)            16 |                                                    |
1183           [32M, 64M)            27 |                                                    |
1184           [64M, 128M)           48 |@                                                   |
1185           [128M, 256M)          98 |@@@                                                 |
1186           [256M, 512M)         191 |@@@@@@                                              |
1187           [512M, 1G)           394 |@@@@@@@@@@@@@                                       |
1188           [1G, 2G)             820 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
1189
1190   lhist
1191       variants
1192
1193       •   lhist(int64 n, int64 min, int64 max, int64 step)
1194
1195       Create a linear histogram of n. lhist creates M ((max - min) / step)
1196       buckets in the range [min,max) where each bucket is step in size.
1197       Values in the range (-inf, min) and (max, inf) get their get their own
1198       bucket too, bringing the total amount of buckets created to M+2.
1199
1200           i:ms:1 {
1201             @ = lhist(rand %10, 0, 10, 1);
1202           }
1203
1204           i:s:5 {
1205             exit();
1206           }
1207
1208       Prints:
1209
1210           @:
1211           [0, 1)               306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
1212           [1, 2)               284 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            |
1213           [2, 3)               294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
1214           [3, 4)               318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
1215           [4, 5)               311 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
1216           [5, 6)               362 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1217           [6, 7)               336 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
1218           [7, 8)               326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
1219           [8, 9)               328 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
1220           [9, 10)              318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
1221
1222   max
1223       variants
1224
1225       •   max(int64 n)
1226
1227       Update the map with n if n is bigger than the current value held.
1228
1229   min
1230       variants
1231
1232       •   min(int64 n)
1233
1234       Update the map with n if n is smaller than the current value held.
1235
1236   stats
1237       variants
1238
1239       •   stats(int64 n)
1240
1241       stats combines the count, avg and sum calls into one.
1242
1243           kprobe:vfs_read {
1244             @bytes[comm] = stats(arg2);
1245           }
1246
1247           @bytes[bash]: count 7, average 1, total 7
1248           @bytes[sleep]: count 5, average 832, total 4160
1249           @bytes[ls]: count 7, average 886, total 6208
1250           @
1251
1252   sum
1253       variants
1254
1255       •   sum(int64 n)
1256
1257       Calculate the sum of all n passed.
1258
1259   zero
1260       variants
1261
1262       •   zero(map m)
1263
1264       async
1265
1266       Set all values for all keys to zero.
1267

FUNCTIONS

1269       Functions that are marked async are asynchronous which can lead to
1270       unexpected behaviour, see the [sync and async] section for more
1271       information.
1272
1273       compile time functions are evaluated at compile time, a static value
1274       will be compiled into the program.
1275
1276       unsafe functions can have dangerous side effects and should be used
1277       with care, the --unsafe flag is required for use.
1278
1279   bswap
1280       variants
1281
1282       •   uint8 bswap(uint8 n)
1283
1284       •   uint16 bswap(uint16 n)
1285
1286       •   uint32 bswap(uint32 n)
1287
1288       •   uint64 bswap(uint64 n)
1289
1290       bswap reverses the order of the bytes in integer n. In case of 8 bit
1291       integers, n is returned without being modified. The return type is an
1292       unsigned integer of the same width as n.
1293
1294   buf
1295       variants
1296
1297       •   buf_t buf(void * data, [int64 length])
1298
1299       buf reads length amount of bytes from address data. The maximum value
1300       of length is limited to the BPFTRACE_STRLEN variable. For arrays the
1301       length is optional, it is automatically inferred from the signature.
1302
1303       buf is address space aware and will call the correct helper based on
1304       the address space associated with data.
1305
1306       The buf_t object returned by buf can safely be printed as a hex encoded
1307       string with the %r format specifier.
1308
1309       Bytes with values >=32 and <=126 are printed using their ASCII
1310       character, other bytes are printed in hex form (e.g. \x00). The %rx
1311       format specifier can be used to print everything in hex form, including
1312       ASCII characters. The similar %rh format specifier prints everything in
1313       hex form without \x and with spaces between bytes (e.g. 0a fe).
1314
1315           i:s:1 {
1316             printf("%r\n", buf(kaddr("avenrun"), 8));
1317           }
1318
1319           \x00\x03\x00\x00\x00\x00\x00\x00
1320           \xc2\x02\x00\x00\x00\x00\x00\x00
1321
1322   cat
1323       variants
1324
1325       •   void cat(string namefmt, [...args])
1326
1327       async
1328
1329       Dump the contents of the named file to stdout. cat supports the same
1330       format string and arguments that printf does. If the file cannot be
1331       opened or read an error is printed to stderr.
1332
1333           t:syscalls:sys_enter_execve {
1334             cat("/proc/%d/maps", pid);
1335           }
1336
1337           55f683ebd000-55f683ec1000 r--p 00000000 08:01 1843399                    /usr/bin/ls
1338           55f683ec1000-55f683ed6000 r-xp 00004000 08:01 1843399                    /usr/bin/ls
1339           55f683ed6000-55f683edf000 r--p 00019000 08:01 1843399                    /usr/bin/ls
1340           55f683edf000-55f683ee2000 rw-p 00021000 08:01 1843399                    /usr/bin/ls
1341           55f683ee2000-55f683ee3000 rw-p 00000000 00:00 0
1342
1343   cgroup_path
1344       variants
1345
1346       •   cgroup_path cgroup_path(int cgroupid, string filter)
1347
1348       Convert cgroup id to cgroup path. This is done asynchronously in
1349       userspace when the cgroup_path value is printed, therefore it can
1350       resolve to a different value if the cgroup id gets reassigned. This
1351       also means that the returned value can only be used for printing.
1352
1353       A string literal may be passed as an optional second argument to filter
1354       cgroup hierarchies in which the cgroup id is looked up by a wildcard
1355       expression (cgroup2 is always represented by "unified", regardless of
1356       where it is mounted).
1357
1358       The currently mounted hierarchy at /sys/fs/cgroup is used to do the
1359       lookup. If the cgroup with the given id isn’t present here (e.g. when
1360       running in a Docker container), the cgroup path won’t be found (unlike
1361       when looking up the cgroup path of a process via /proc/.../cgroup).
1362
1363           BEGIN {
1364             $cgroup_path = cgroup_path(3436);
1365             print($cgroup_path);
1366             print($cgroup_path); /* This may print a different path */
1367             printf("%s %s", $cgroup_path, $cgroup_path); /* This may print two different paths */
1368           }
1369
1370   cgroupid
1371       variants
1372
1373       •   uint64 cgroupid(const string path)
1374
1375       compile time
1376
1377       cgroupid retrieves the cgroupv2 ID  of the cgroup available at path.
1378
1379           BEGIN {
1380             print(cgroupid("/sys/fs/cgroup/system.slice"));
1381           }
1382
1383   exit
1384       variants
1385
1386       •   void exit()
1387
1388       async
1389
1390       Terminate bpftrace, as if a SIGTERM was received. The END probe will
1391       still trigger (if specified) and maps will be printed.
1392
1393   join
1394       variants
1395
1396       •   void join(char *arr[], [char * sep = ' '])
1397
1398       async
1399
1400       join joins all the string array arr with sep as separator into one
1401       string. This string will be printed to stdout directly, it cannot be
1402       used as string value.
1403
1404       The concatenation of the array members is done in BPF and the printing
1405       happens in userspace.
1406
1407           tracepoint:syscalls:sys_enter_execve {
1408             join(args.argv);
1409           }
1410
1411   kaddr
1412       variants
1413
1414       •   uint64 kaddr(const string name)
1415
1416       compile time
1417
1418       Get the address of the kernel symbol name.
1419
1420       The following script:
1421
1422   kptr
1423       variants
1424
1425       •   T * kptr(T * ptr)
1426
1427       Marks ptr as a kernel address space pointer. See the address-spaces
1428       section for more information on address-spaces. The pointer type is
1429       left unchanged.
1430
1431   ksym
1432       variants
1433
1434       •   ksym_t ksym(uint64 addr)
1435
1436       async
1437
1438       Retrieve the name of the function that contains address addr. The
1439       address to name mapping happens in user-space.
1440
1441       The ksym_t type can be printed with the %s format specifier.
1442
1443           kprobe:do_nanosleep
1444           {
1445             printf("%s\n", ksym(reg("ip")));
1446           }
1447
1448       Prints:
1449
1450           do_nanosleep
1451
1452   macaddr
1453       variants
1454
1455       •   macaddr_t macaddr(char [6] mac)
1456
1457       Create a buffer that holds a macaddress as read from mac This buffer
1458       can be printed in the canonical string format using the %s format
1459       specifier.
1460
1461           kprobe:arp_create {
1462             printf("SRC %s, DST %s\n", macaddr(sarg0), macaddr(sarg1));
1463           }
1464
1465       Prints:
1466
1467           SRC 18:C0:4D:08:2E:BB, DST 74:83:C2:7F:8C:FF
1468
1469   ntop
1470       variants
1471
1472       •   inet_t ntop([int64 af, ] int addr)
1473
1474       •   inet_t ntop([int64 af, ] char addr[4])
1475
1476       •   inet_t ntop([int64 af, ] char addr[16])
1477
1478       ntop returns the string representation of an IPv4 or IPv6 address. ntop
1479       will infer the address type (IPv4 or IPv6) based on the addr type and
1480       size. If an integer or char[4] is given, ntop assumes IPv4, if a
1481       char[16] is given, ntop assumes IPv6. You can also pass the address
1482       type (e.g. AF_INET) explicitly as the first parameter.
1483
1484   pton
1485       variants
1486
1487       •   char addr[4] pton(const string *addr_v4)
1488
1489       •   char addr[16] pton(const string *addr_v6)
1490
1491       compile time
1492
1493       pton converts a text representation of an IPv4 or IPv6 address to byte
1494       array. pton infers the address family based on . or : in the given
1495       argument. pton comes in handy when we need to select packets with
1496       certain IP addresses.
1497
1498   override
1499       variants
1500
1501       •   override(uint64 rc)
1502
1503       unsafe
1504
1505       Kernel 4.16
1506
1507       Helper bpf_override
1508
1509       Supported probes
1510
1511       •   kprobe
1512
1513       When using override the probed function will not be executed and
1514       instead rc will be returned.
1515
1516           k:__x64_sys_getuid
1517           /comm == "id"/ {
1518             override(2<<21);
1519           }
1520
1521           uid=4194304 gid=0(root) euid=0(root) groups=0(root)
1522
1523       This feature only works on kernels compiled with
1524       CONFIG_BPF_KPROBE_OVERRIDE and only works on functions tagged
1525       ALLOW_ERROR_INJECTION.
1526
1527       bpftrace does not test whether error injection is allowed for the
1528       probed function, instead if will fail to load the program into the
1529       kernel:
1530
1531           ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
1532           Error attaching probe: 'kprobe:vfs_read'
1533
1534   reg
1535       variants
1536
1537       •   reg(const string name)
1538
1539       Supported probes
1540
1541       •   kprobe
1542
1543       •   uprobe
1544
1545       Get the contents of the register identified by name. Valid names depend
1546       on the CPU architecture.
1547
1548   signal
1549       variants
1550
1551       •   signal(const string sig)
1552
1553       •   signal(uint32 signum)
1554
1555       unsafe
1556
1557       Kernel 5.3
1558
1559       Helper bpf_send_signal
1560
1561       Probe types: k(ret)probe, u(ret)probe, USDT, profile
1562
1563       Send a signal to the process being traced. The signal can either be
1564       identified by name, e.g. SIGSTOP or by ID, e.g. 19 as found in kill -l.
1565
1566           kprobe:__x64_sys_execve
1567           /comm == "bash"/ {
1568             signal(5);
1569           }
1570
1571           $ ls
1572           Trace/breakpoint trap (core dumped)
1573
1574   sizeof
1575       variants
1576
1577       •   sizeof(TYPE)
1578
1579       •   sizeof(EXPRESSION)
1580
1581       compile time
1582
1583       Returns size of the argument in bytes. Similar to C/C++ sizeof
1584       operator. Note that the expression does not get evaluated.
1585
1586   offsetof
1587       variants
1588
1589       •   offsetof(STRUCT, FIELD)
1590
1591       •   offsetof(EXPRESSION, FIELD)
1592
1593       compile time
1594
1595       Returns offset of the field offset bytes in struct. Similar to kernel
1596       offsetof operator. Note that subfields are not yet supported.
1597
1598   str
1599       variants
1600
1601       •   str(char * data [, uint32 length)
1602
1603       Helper probe_read_str, probe_read_{kernel,user}_str
1604
1605       str reads a NULL terminated (\0) string from data. The maximum string
1606       length is limited by the BPFTRACE_STR_LEN env variable, unless length
1607       is specified and shorter than the maximum. In case the string is longer
1608       than the specified length only length - 1 bytes are copied and a NULL
1609       byte is appended at the end.
1610
1611       When available (starting from kernel 5.5, see the --info flag) bpftrace
1612       will automatically use the kernel or user variant of
1613       probe_read_{kernel,user}_str based on the address space of data, see
1614       ADDRESS-SPACES for more information.
1615
1616   strerror
1617       variants
1618
1619       •   strerror strerror(int error)
1620
1621       Convert errno code to string. This is done asynchronously in userspace
1622       when the strerror value is printed, hence the returned value can only
1623       be used for printing.
1624
1625           #include <errno.h>
1626           BEGIN {
1627             print(strerror(EPERM));
1628           }
1629
1630   strftime
1631       variants
1632
1633       •   strtime_t strftime(const string fmt, int64 timestamp_ns)
1634
1635       async
1636
1637       Format the nanoseconds since boot timestamp timestamp_ns according to
1638       the format specified by fmt. The time conversion and formatting happens
1639       in user space, therefore  the timestr_t value returned can only be used
1640       for printing using the %s format specifier.
1641
1642       bpftrace uses the strftime(3) function for formatting time and supports
1643       the same format specifiers.
1644
1645           i:s:1 {
1646             printf("%s\n", strftime("%H:%M:%S", nsecs));
1647           }
1648
1649       bpftrace also supports the following format string extensions:
1650
1651       ┌──────────┬────────────────────────────┐
1652       │          │                            │
1653       │Specifier │ Description                │
1654       ├──────────┼────────────────────────────┤
1655       │          │                            │
1656       │%f        │ Microsecond as a decimal   │
1657       │          │ number, zero-padded on the │
1658       │          │ left                       │
1659       └──────────┴────────────────────────────┘
1660
1661   strncmp
1662       variants
1663
1664       •   int64 strncmp(char * s1, char * s2, int64 n)
1665
1666       strncmp compares up to n characters string s1 and string s2. If they’re
1667       equal 0 is returned, else a non-zero value is returned.
1668
1669       bpftrace doesn’t read past the length of the shortest string.
1670
1671       The use of the == and != operators is recommended over calling strncmp
1672       directly.
1673
1674   strcontains
1675       variants
1676
1677       •   int64 strcontains(const char *haystack, const char *needle)
1678
1679       strcontains compares whether the string haystack contains the string
1680       needle. If needle is contained 1 is returned, else zero is returned.
1681
1682       bpftrace doesn’t read past the length of the shortest string.
1683
1684   system
1685       variants
1686
1687       •   void system(string namefmt [, ...args])
1688
1689       unsafe async
1690
1691       system lets bpftrace run the specified command (fork and exec) until it
1692       completes and print its stdout. The command is run with the same
1693       privileges as bpftrace and it blocks execution of the processing
1694       threads which can lead to missed events and delays processing of async
1695       events.
1696
1697           i:s:1 {
1698             time("%H:%M:%S: ");
1699             printf("%d\n", @++);
1700           }
1701           i:s:10 {
1702             system("/bin/sleep 10");
1703           }
1704           i:s:30 {
1705             exit();
1706           }
1707
1708       Note how the async time and printf first print every second until the
1709       i:s:10 probe hits, then they print every 10 seconds due to bpftrace
1710       blocking on sleep.
1711
1712           Attaching 3 probes...
1713           08:50:37: 0
1714           08:50:38: 1
1715           08:50:39: 2
1716           08:50:40: 3
1717           08:50:41: 4
1718           08:50:42: 5
1719           08:50:43: 6
1720           08:50:44: 7
1721           08:50:45: 8
1722           08:50:46: 9
1723           08:50:56: 10
1724           08:50:56: 11
1725           08:50:56: 12
1726           08:50:56: 13
1727           08:50:56: 14
1728           08:50:56: 15
1729           08:50:56: 16
1730           08:50:56: 17
1731           08:50:56: 18
1732           08:50:56: 19
1733
1734       system supports the same format string and arguments that printf does.
1735
1736           t:syscalls:sys_enter_execve {
1737             system("/bin/grep %s /proc/%d/status", "vmswap", pid);
1738           }
1739
1740   time
1741       variants
1742
1743       •   void time(const string fmt)
1744
1745       async
1746
1747       Format the current wall time according to the format specifier fmt and
1748       print it to stdout. Unlike strftime() time() doesn’t send a timestamp
1749       from the probe, instead it is the time at which user-space processes
1750       the event.
1751
1752       bpftrace uses the strftime(3) function for formatting time and supports
1753       the same format specifiers.
1754
1755   uaddr
1756       variants
1757
1758       •   T * uaddr(const string sym)
1759
1760       Supported probes
1761
1762       •   uprobes
1763
1764       •   uretprobes
1765
1766       •   USDT
1767
1768       Does not work with ASLR, see issue #75
1769       <https://github.com/iovisor/bpftrace/issues/75>
1770
1771       The uaddr function returns the address of the specified symbol. This
1772       lookup happens during program compilation and cannot be used
1773       dynamically.
1774
1775       The default return type is uint64*. If the ELF object size matches a
1776       known integer size (1, 2, 4 or 8 bytes) the return type is modified to
1777       match the width (uint8*, uint16*, uint32* or uint64* resp.). As ELF
1778       does not contain type info the type is always assumed to be unsigned.
1779
1780           uprobe:/bin/bash:readline {
1781             printf("PS1: %s\n", str(*uaddr("ps1_prompt")));
1782           }
1783
1784   uptr
1785       variants
1786
1787       •   T * uptr(T * ptr)
1788
1789       Marks ptr as a user address space pointer. See the address-spaces
1790       section for more information on address-spaces. The pointer type is
1791       left unchanged.
1792
1793   usym
1794       variants
1795
1796       •   usym_t usym(uint64 * addr)
1797
1798       async
1799
1800       Supported probes
1801
1802       •   uprobes
1803
1804       •   uretprobes
1805
1806       Equal to ksym but resolves user space symbols.
1807
1808       If ASLR is enabled, user space symbolication only works when the
1809       process is running at either the time of the symbol resolution or the
1810       time of the probe attachment. The latter requires
1811       BPFTRACE_CACHE_USER_SYMBOLS to be set to PER_PID, and might not work
1812       with older versions of BCC. A similar limitation also applies to
1813       dynamically loaded symbols.
1814
1815           uprobe:/bin/bash:readline
1816           {
1817             printf("%s\n", usym(reg("ip")));
1818           }
1819
1820       Prints:
1821
1822           readline
1823
1824   path
1825       variants
1826
1827       •   char * path(struct path * path)
1828
1829       Kernel 5.10
1830
1831       Helper bpf_d_path
1832
1833       Return full path referenced by struct path pointer in argument.
1834
1835       This function can only be used by functions that are allowed to, these
1836       functions are contained in the btf_allowlist_d_path set in the kernel.
1837
1838   unwatch
1839       variants
1840
1841       •   void unwatch(void * addr)
1842
1843       async
1844
1845       Removes a watchpoint
1846
1847   skboutput
1848       variants
1849
1850       •   uint32 skboutput(const string path, struct sk_buff *skb, uint64
1851           length, const uint64 offset)
1852
1853       Kernel 5.5
1854
1855       Helper bpf_skb_output
1856
1857       Write sk_buff skb 's data section to a PCAP file in the path, starting
1858       from offset to offset + length.
1859
1860       The PCAP file is encapsulated in RAW IP, so no ethernet header is
1861       included. The data section in the struct skb may contain ethernet
1862       header in some kernel contexts, you may set offset to 14 bytes to
1863       exclude ethernet header.
1864
1865       Each packet’s timestamp is determined by adding nsecs and boot time,
1866       the accuracy varies on different kernels, see nsecs.
1867
1868       This function returns 0 on success, or a negative error in case of
1869       failure.
1870
1871       Environment variable BPFTRACE_PERF_RB_PAGES should be increased in
1872       order to capture large packets, or else these packets will be dropped.
1873
1874       Usage
1875
1876           # cat dump.bt
1877           kfunc:napi_gro_receive {
1878             $ret = skboutput("receive.pcap", args.skb, args.skb->len, 0);
1879           }
1880
1881           kfunc:dev_queue_xmit {
1882             // setting offset to 14, to exclude ethernet header
1883             $ret = skboutput("output.pcap", args.skb, args.skb->len, 14);
1884             printf("skboutput returns %d\n", $ret);
1885           }
1886
1887           # export BPFTRACE_PERF_RB_PAGES=1024
1888           # bpftrace dump.bt
1889           ...
1890
1891           # tcpdump -n -r ./receive.pcap  | head -3
1892           reading from file ./receive.pcap, link-type RAW (Raw IP)
1893           dropped privs to tcpdump
1894           10:23:44.674087 IP 22.128.74.231.63175 > 192.168.0.23.22: Flags [.], ack 3513221061, win 14009, options [nop,nop,TS val 721277750 ecr 3115333619], length 0
1895           10:23:45.823194 IP 100.101.2.146.53 > 192.168.0.23.46619: 17273 0/1/0 (130)
1896           10:23:45.823229 IP 100.101.2.146.53 > 192.168.0.23.46158: 45799 1/0/0 A 100.100.45.106 (60)
1897

OUTPUT FORMATTING

1899   print
1900       variants
1901
1902       •   void print(T val)
1903
1904       async
1905
1906       variants
1907
1908       •   void print(T val)
1909
1910       •   void print(@map)
1911
1912       •   void print(@map, uint64 top)
1913
1914       •   void print(@map, uint64 top, uint64 div)
1915
1916       print prints a the value, which can be a map or a scalar value, with
1917       the default formatting for the type.
1918
1919           i:ms:10 { @=hist(rand); }
1920           i:s:1 {
1921             print(@);
1922             print(123);
1923             print("abc");
1924             exit();
1925           }
1926
1927       Prints:
1928
1929           @:
1930           [16M, 32M)             3 |@@@                                                 |
1931           [32M, 64M)             2 |@@                                                  |
1932           [64M, 128M)            1 |@                                                   |
1933           [128M, 256M)           4 |@@@@                                                |
1934           [256M, 512M)           3 |@@@                                                 |
1935           [512M, 1G)            14 |@@@@@@@@@@@@@@                                      |
1936           [1G, 2G)              22 |@@@@@@@@@@@@@@@@@@@@@@                              |
1937           [2G, 4G)              51 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1938
1939           123
1940           abc
1941
1942       Note that maps are printed by reference while scalar values are copied.
1943       This means that updating and printing maps in a fast loop will likely
1944       result in bogus map values as the map will be updated before userspace
1945       gets the time to dump and print it.
1946
1947       The printing of maps supports the optional top and div arguments. top
1948       limits the printing to the top N entries with the highest integer
1949       values
1950
1951           BEGIN {
1952             $i = 11;
1953             while($i) {
1954               @[$i] = --$i;
1955             }
1956             print(@, 2);
1957             clear(@);
1958             exit()
1959           }
1960
1961           @[9]: 9
1962           @[10]: 10
1963
1964       The div argument scales the values prior to printing them. Scaling
1965       values before storing them can result in rounding errors. Consider the
1966       following program:
1967
1968           k:f {
1969             @[func] += arg0/10;
1970           }
1971
1972       With the following sequence as numbers for arg0: 134, 377, 111, 99. The
1973       total is 721 which rounds to 72 when scaled by 10 but the program would
1974       print 70 due to the rounding of individual values.
1975
1976       Changing the print call to print(@, 5, 2) will take the top 5 values
1977       and scale them by 2:
1978
1979           @[6]: 3
1980           @[7]: 3
1981           @[8]: 4
1982           @[9]: 4
1983           @[10]: 5
1984
1985   printf
1986       variants
1987
1988       •   void printf(const string fmt, args...)
1989
1990       async
1991
1992       printf() formats and prints data. It behaves similar to printf() found
1993       in C and many other languages.
1994
1995       The format string has to be a constant, it cannot be modified at
1996       runtime. The formatting of the string happens in user space. Values are
1997       copied and passed by value.
1998
1999       bpftrace supports all the typical format specifiers like %llx and %hhu.
2000       The non-standard ones can be found in the table below:
2001
2002       ┌──────────┬────────┬─────────────────────┐
2003       │          │        │                     │
2004       │Specifier │ Type   │ Description         │
2005       ├──────────┼────────┼─────────────────────┤
2006       │          │        │                     │
2007       │r         │ buffer │ Hex-formatted       │
2008       │          │        │ string to print     │
2009       │          │        │ arbitrary binary    │
2010       │          │        │ content returned by │
2011       │          │        │ the buf (buf)       │
2012       │          │        │ function.           │
2013       └──────────┴────────┴─────────────────────┘
2014
2015       Supported escape sequences
2016
2017       Colors are supported too, using standard terminal escape sequences:
2018
2019           print("\033[31mRed\t\033[33mYellow\033[0m\n")
2020

PROBES

2022       bpftrace supports various probe types which allow the user to attach
2023       BPF programs to different types of events. Each probe starts with a
2024       provider (e.g. kprobe) followed by a colon (:) separated list of
2025       options. The amount of options and their meaning depend on the provider
2026       and are detailed below. The valid values for options can depend on the
2027       system or binary being traced, e.g. for uprobes it depends on the
2028       binary. Also see LISTING PROBES
2029
2030       It is possible to associate multiple probes with a single action as
2031       long as the action is valid for all specified probes. Multiple probes
2032       can be specified as a comma (,) separated list:
2033
2034           kprobe:tcp_reset,kprobe:tcp_v4_rcv {
2035             printf("Entered: %s\n", probe);
2036           }
2037
2038       Wildcards are supported too:
2039
2040           kprobe:tcp_* {
2041             printf("Entered: %s\n", probe);
2042           }
2043
2044       Both can be combined:
2045
2046           kprobe:tcp_reset,kprobe:*socket* {
2047             printf("Entered: %s\n", probe);
2048           }
2049
2050       Most providers also support a short name which can be used instead of
2051       the full name, e.g. kprobe:f and k:f are identical.
2052
2053   BEGIN and END
2054       These are special built-in events provided by the bpftrace runtime.
2055       BEGIN is triggered before all other probes are attached. END is
2056       triggered after all other probes are detached.
2057
2058       Note that specifying an END probe doesn’t override the printing of
2059       'non-empty' maps at exit. To prevent the printing all used maps need be
2060       cleared, which can be done in the END probe:
2061
2062           END {
2063               clear(@map1);
2064               clear(@map2);
2065           }
2066
2067   hardware
2068       variants
2069
2070       •   hardware:event_name:
2071
2072       •   hardware:event_name:count
2073
2074       shortname
2075
2076       •   h
2077
2078       The hardware probe attaches to pre-defined hardware events provided by
2079       the kernel.
2080
2081       They are implemented using performance monitoring counters (PMCs):
2082       hardware resources on the processor. There are about ten of these, and
2083       they are documented in the perf_event_open(2) man page. The event names
2084       are:
2085
2086       •   cpu-cycles or cycles
2087
2088       •   instructions
2089
2090       •   cache-references
2091
2092       •   cache-misses
2093
2094       •   branch-instructions or branches
2095
2096       •   branch-misses
2097
2098       •   bus-cycles
2099
2100       •   frontend-stalls
2101
2102       •   backend-stalls
2103
2104       •   ref-cycles
2105
2106       The count option specifies how many events must happen before the probe
2107       fires. If count is left unspecified a default value is used.
2108
2109           hardware:cache-misses:1e6 { @[pid] = count(); }
2110
2111   interval
2112       variants
2113
2114       •   interval:us:count
2115
2116       •   interval:ms:count
2117
2118       •   interval:s:count
2119
2120       •   interval:hz:rate
2121
2122       shortnames
2123
2124       •   i
2125
2126       The interval probe fires at a fixed interval as specified by its time
2127       spec. Interval fire on one CPU at the time, unlike [profile] probes.
2128
2129   iterator
2130       variants
2131
2132       •   iter:task
2133
2134       •   iter:task:pin
2135
2136       •   iter:task_file
2137
2138       •   iter:task_file:pin
2139
2140       •   iter:task_vma
2141
2142       •   iter:task_vma:pin
2143
2144       shortnames
2145
2146       •   it
2147
2148       These are eBPF iterator probes, that allow iteration over kernel
2149       objects.
2150
2151       Iterator probe can’t be mixed with any other probe, not even other
2152       iterator.
2153
2154       Each iterator probe provides set of fields that could be accessed with
2155       ctx pointer. User can display set of available fields for iterator via
2156       -lv options as described below.
2157
2158       Examples:
2159
2160           # bpftrace -e 'iter:task { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2161           Attaching 1 probe...
2162           systemd:1
2163           kthreadd:2
2164           rcu_gp:3
2165           rcu_par_gp:4
2166           kworker/0:0H:6
2167           mm_percpu_wq:8
2168           ...
2169
2170           # bpftrace -e 'iter:task_file { printf("%s:%d %d:%s\n", ctx->task->comm, ctx->task->pid, ctx->fd, path(ctx->file->f_path)); }'
2171           Attaching 1 probe...
2172           systemd:1 1:/dev/null
2173           systemd:1 2:/dev/null
2174           systemd:1 3:/dev/kmsg
2175           ...
2176           su:1622 1:/dev/pts/1
2177           su:1622 2:/dev/pts/1
2178           su:1622 3:/var/lib/sss/mc/passwd
2179           ...
2180           bpftrace:1892 1:pipe:[35124]
2181           bpftrace:1892 2:/dev/pts/1
2182           bpftrace:1892 3:anon_inode:bpf-map
2183           bpftrace:1892 4:anon_inode:bpf-map
2184           bpftrace:1892 5:anon_inode:bpf_link
2185           bpftrace:1892 6:anon_inode:bpf-prog
2186           bpftrace:1892 7:anon_inode:bpf_iter
2187
2188           # bpftrace -e 'iter:task_vma {printf("%s %d %lx-%lx\n", comm, pid, ctx->vma->vm_start, ctx->vma->vm_end);}'
2189           Attaching 1 probe...
2190           bpftrace 119480 55b92c380000-55b92c386000
2191           bpftrace 119480 55b92c386000-55b92c391000
2192           bpftrace 119480 55b92c391000-55b92c397000
2193           bpftrace 119480 55b92c398000-55b92c399000
2194           bpftrace 119480 55b92c399000-55b92c39a000
2195           bpftrace 119480 55b92cce3000-55b92d010000
2196           ...
2197           bpftrace 119480 7ffd55dde000-7ffd55de2000
2198           bpftrace 119480 7ffd55de2000-7ffd55de4000
2199
2200       It’s possible to pin iterator with specifying optional probe ':pin'
2201       part, that defines the pin file. It can be specified as absolute path
2202       or relative to /sys/fs/bpf.
2203
2204       relative pin
2205
2206           # bpftrace -e 'iter:task:list { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2207           Program pinned to /sys/fs/bpf/list
2208
2209           # cat /sys/fs/bpf/list
2210           systemd:1
2211           kthreadd:2
2212           rcu_gp:3
2213           rcu_par_gp:4
2214           kworker/0:0H:6
2215           mm_percpu_wq:8
2216           rcu_tasks_kthre:9
2217           ...
2218
2219       Examples with absolute pin file:
2220
2221       absolute pin
2222
2223           # bpftrace -e '
2224           iter:task_file:/sys/fs/bpf/files {
2225             printf("%s:%d %s\n", ctx->task->comm, ctx->task->pid, path(ctx->file->f_path));
2226           }'
2227
2228           Program pinned to /sys/fs/bpf/files
2229
2230           # cat /sys/fs/bpf/files
2231           systemd:1 anon_inode:inotify
2232           systemd:1 anon_inode:[timerfd]
2233           ...
2234           systemd-journal:849 /dev/kmsg
2235           systemd-journal:849 anon_inode:[eventpoll]
2236           ...
2237           sssd:1146 /var/log/sssd/sssd.log
2238           sssd:1146 anon_inode:[eventpoll]
2239           ...
2240           NetworkManager:1155 anon_inode:[eventfd]
2241           NetworkManager:1155 /var/lib/sss/mc/passwd (deleted)
2242
2243   kfunc and kretfunc
2244       variants
2245
2246       •   kfunc[:mod]:fn
2247
2248       •   kretfunc[:mod]:fn
2249
2250       shortnames
2251
2252       •   f (kfunc)
2253
2254       •   fr (kretfunc)
2255
2256       requires (--info)
2257
2258       •   Kernel features:BTF
2259
2260       •   Probe types:kfunc
2261
2262       kfuncs attach to kernel function similar to kprobe and kretprobe. They
2263       make use of eBPF trampolines which allows kernel code to call into BPF
2264       programs with near zero overhead.
2265
2266       kfunc s make use of BTF type information to derive the type of function
2267       arguments at compile time. This removes the need for manual type
2268       casting and makes the code more resilient against small signature
2269       changes in the kernel. The function arguments are available in the args
2270       struct which can be inspected by doing verbose listing (see LISTING
2271       PROBES). These arguments are also available in the return probe
2272       (kretfunc).
2273
2274           # bpftrace -lv 'kfunc:tcp_reset'
2275           kfunc:tcp_reset
2276               struct sock * sk
2277               struct sk_buff * skb
2278
2279           kfunc:x86_pmu_stop {
2280             printf("pmu %s stop\n", str(args.event->pmu->name));
2281           }
2282
2283           kretfunc:fget {
2284             printf("fd %d name %s\n", args.fd, str(retval->f_path.dentry->d_name.name));
2285           }
2286
2287           fd 3 name ld.so.cache
2288           fd 3 name libselinux.so.1
2289           fd 3 name libselinux.so.1
2290           ...
2291
2292           kfunc:kvm:x86_emulate_insn { @ = count(); }
2293
2294           @ = 347603
2295
2296   kprobe and kretprobe
2297       variants
2298
2299       •   kprobe:fn
2300
2301       •   kprobe:fn+offset
2302
2303       •   kretprobe:fn
2304
2305       shortnames
2306
2307       •   k
2308
2309       •   kr
2310
2311       kprobe s allow for dynamic instrumentation of kernel functions. Each
2312       time the specified kernel function is executed the attached BPF
2313       programs are ran.
2314
2315           kprobe:tcp_reset {
2316             @tcp_resets = count()
2317           }
2318
2319       Function arguments are available through the argX and sargX builtins,
2320       for register args and stack args respectively. Whether arguments passed
2321       on stack or in a register depends on the architecture and the number or
2322       arguments in used, e.g. on x86_64 the first non-floating point 6
2323       arguments are passed in registers, all following arguments are passed
2324       on the stack. Note that floating point arguments are typically passed
2325       in special registers which don’t count as argX arguments which can
2326       cause confusion. Consider a function with the following signature:
2327
2328           void func(int a, double d, int x)
2329
2330       Due to d being a floating point x is accessed through arg1 where one
2331       might expect arg2.
2332
2333       bpftrace does not detect the function signature so it is not aware of
2334       the argument count or their type. It is up to the user to perform Type
2335       conversion when needed, e.g.
2336
2337           kprobe:tcp_connect
2338           {
2339             $sk = ((struct sock *) arg0);
2340             ...
2341           }
2342
2343       kprobe s are not limited to function entry, they can be attached to any
2344       instruction in a function by specifying an offset from the start of the
2345       function.
2346
2347       kretprobe s trigger on the return from a kernel function. Return probes
2348       do not have access to the function (input) arguments, only to the
2349       return value (through retval). A common pattern to work around this is
2350       by storing the arguments in a map on function entry and retrieving in
2351       the return probe:
2352
2353           kprobe:d_lookup
2354           {
2355                   $name = (struct qstr *)arg1;
2356                   @fname[tid] = $name->name;
2357           }
2358
2359           kretprobe:d_lookup
2360           /@fname[tid]/
2361           {
2362                   printf("%-8d %-6d %-16s M %s\n", elapsed / 1e6, pid, comm,
2363                       str(@fname[tid]));
2364           }
2365
2366   profile
2367       variants
2368
2369       •   profile:us:count
2370
2371       •   profile:ms:count
2372
2373       •   profile:s:count
2374
2375       •   profile:hz:rate
2376
2377       shortnames
2378
2379       •   p
2380
2381       Profile probes fire on each CPU on the specified interval.
2382
2383   software
2384       variants
2385
2386       •   software:event:
2387
2388       •   software:event:count
2389
2390       shortnames
2391
2392       •   s
2393
2394       The software probe attaches to pre-defined software events provided by
2395       the kernel. Event details can be found in the perf_event_open(2) man
2396       page.
2397
2398       The event names are:
2399
2400       •   cpu-clock or cpu
2401
2402       •   task-clock
2403
2404       •   page-faults or faults
2405
2406       •   context-switches or cs
2407
2408       •   cpu-migrations
2409
2410       •   minor-faults
2411
2412       •   major-faults
2413
2414       •   alignment-faults
2415
2416       •   emulation-faults
2417
2418       •   dummy
2419
2420       •   bpf-output
2421
2422   tracepoint
2423       variants
2424
2425       •   tracepoint:subsys:event
2426
2427       shortnames
2428
2429       •   t
2430
2431       Tracepoints are hooks into events in the kernel. Tracepoints are
2432       defined in the kernel source and compiled into the kernel binary which
2433       makes them a form of static tracing. Which means that unlike kprobe s
2434       new tracepoints cannot be added without modifying the kernel.
2435
2436       The advantage of tracepoints is that they generally provide a more
2437       stable interface than kprobe s do, they do not depend on the existence
2438       of a kernel function.
2439
2440       Tracepoint arguments are available in the args struct which can be
2441       inspected with verbose listing, see the LISTING PROBES section for more
2442       details.
2443
2444           tracepoint:syscalls:sys_enter_openat {
2445             printf("%s %s\n", comm, str(args.filename));
2446           }
2447
2448           irqbalance /proc/interrupts
2449           irqbalance /proc/stat
2450           snmpd /proc/diskstats
2451           snmpd /proc/stat
2452           snmpd /proc/vmstat
2453           snmpd /proc/net/dev
2454           [...]
2455
2456       Additional information
2457
2458https://www.kernel.org/doc/html/latest/trace/tracepoints.html
2459
2460   rawtracepoint
2461       variants
2462
2463       •   rawtracepoint:event
2464
2465       shortnames
2466
2467       •   rt
2468
2469       The hook point triggered by tracepoint and rawtracepoint is the same.
2470       tracepoint and rawtracepoint are nearly identical in terms of
2471       functionality. The only difference is in the program context.
2472       rawtracepoint offers raw arguments to the tracepoint while tracepoint
2473       applies further processing to the raw arguments. The additional
2474       processing is defined inside the kernel.
2475
2476       Tracepoint arguments are available via the argN builtins. The available
2477       arguments can be found in the relative path of the kernel source code
2478       include/trace/events/. Each arg is a 64-bit integer.
2479
2480           rawtracepoint:block_rq_insert {
2481             printf("%llx %llx\n", arg0, arg1);
2482           }
2483
2484           ffff88810977d6f8 ffff8881097e8e80
2485           [...]
2486
2487   uprobe, uretprobe
2488       variants
2489
2490       •   uprobe:binary:func
2491
2492       •   uprobe:binary:func+offset
2493
2494       •   uprobe:binary:offset
2495
2496       •   uretprobe:binary:func
2497
2498       shortnames
2499
2500       •   u
2501
2502       •   ur
2503
2504       uprobe s or user-space probes are the user-space equivalent of kprobe
2505       s. The same limitations that apply kprobe and kretprobe also apply to
2506       uprobe s and uretprobe s.
2507
2508       When tracing libraries, it is sufficient to specify the library name
2509       instead of a full path. The path will be then automatically resolved
2510       using /etc/ld.so.cache:
2511
2512           # bpftrace -e 'uprobe:libc:malloc { printf("Allocated %d bytes\n", arg0); }'
2513           Allocated 4 bytes
2514           ...
2515
2516       If the traced binary has DWARF included, function arguments are
2517       available in the args struct which can be inspected with verbose
2518       listing, see the LISTING PROBES section for more details.
2519
2520       When tracing C++ programs, it is possible to turn on automatic symbol
2521       demangling by using the :cpp prefix:
2522
2523           # bpftrace -e 'u:src/bpftrace:cpp:"bpftrace::BPFtrace::add_probe" { print("adding probe\n"); }'
2524           Attaching 1 probe...
2525           adding probe
2526
2527       It is important to note that for uretprobe s to work the kernel runs a
2528       special helper on user-space function entry which overrides the return
2529       address on the stack. This can cause issues with languages that have
2530       their own runtime like Golang:
2531
2532       example.go
2533
2534           func myprint(s string) {
2535             fmt.Printf("Input: %s\n", s)
2536           }
2537
2538           func main() {
2539             ss := []string{"a", "b", "c"}
2540             for _, s := range ss {
2541               go myprint(s)
2542             }
2543             time.Sleep(1*time.Second)
2544           }
2545
2546       bpftrace
2547
2548           # bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
2549           runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
2550           stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
2551           fatal error: unknown caller pc
2552
2553   usdt
2554       variants
2555
2556       •   usdt:binary_path:probe_name
2557
2558       •   usdt:binary_path:[probe_namespace]:probe_name
2559
2560       •   usdt:library_path:probe_name
2561
2562       •   usdt:library_path:[probe_namespace]:probe_name
2563
2564       shortnames
2565
2566       •   U
2567
2568       You can target the entire host (or an entire process’s address space by
2569       using the -p arg) by using a single wildcard in place of the
2570       binary_path/library_path e.g. bpftrace -e 'usdt:*:loop {
2571       printf("hi\n"); }. Please note that if you use wildcards for the
2572       probe_name or probe_namespace and end up targeting multiple USDTs for
2573       the same probe you might get errors if you also utilize the USDT
2574       argument builtins (e.g. arg0) as they could be of different types.
2575
2576   watchpoint and asyncwatchpoint
2577       variants
2578
2579       •   watchpoint:absolute_address:length:mode
2580
2581       •   watchpoint:function+argN:length:mode
2582
2583       shortnames
2584
2585       •   w
2586
2587       •   aw
2588
2589       These are memory watchpoints provided by the kernel. Whenever a memory
2590       address is written to (w), read from (r), or executed (x), the kernel
2591       can generate an event.
2592
2593       In the first form, an absolute address is monitored. If a pid (-p) or a
2594       command (-c) is provided, bpftrace takes the address as a userspace
2595       address and monitors the appropriate process. If not, bpftrace takes
2596       the address as a kernel space address.
2597
2598       In the second form, the address present in argN when function is
2599       entered is monitored. A pid or command must be provided for this form.
2600       If synchronous (watchpoint), a SIGSTOP is sent to the tracee upon
2601       function entry. The tracee will be SIGCONTed after the watchpoint is
2602       attached. This is to ensure events are not missed. If you want to avoid
2603       the SIGSTOP + SIGCONT use asyncwatchpoint.
2604
2605       Note that on most architectures you may not monitor for execution while
2606       monitoring read or write.
2607
2608       Examples
2609
2610       Print hit when a read from or write to 0x10000000 happens:
2611
2612           # bpftrace -e 'watchpoint:0x10000000:8:rw { printf("hit!\n"); exit(); }' -c ./testprogs/watchpoint
2613
2614       Print the call stack every time the jiffies variable is updated:
2615
2616           # bpftrace -e "watchpoint:0x$(awk '$3 == "jiffies" {print $1}' /proc/kallsyms):8:w {
2617             @[kstack] = count();
2618           }
2619
2620           i:s:1 { exit(); }"
2621           ......
2622           @[
2623               do_timer+12
2624               tick_do_update_jiffies64.part.22+89
2625               tick_sched_do_timer+103
2626               tick_sched_timer+39
2627               __hrtimer_run_queues+256
2628               hrtimer_interrupt+256
2629               smp_apic_timer_interrupt+106
2630               apic_timer_interrupt+15
2631               cpuidle_enter_state+188
2632               cpuidle_enter+41
2633               do_idle+536
2634               cpu_startup_entry+25
2635               start_secondary+355
2636               secondary_startup_64+164
2637           ]: 319
2638
2639       "hit" and exit when the memory pointed to by arg1 of increment is
2640       written to.
2641
2642           # cat wpfunc.c
2643           #include <stdio.h>
2644           #include <stdlib.h>
2645           #include <unistd.h>
2646
2647           __attribute__((noinline))
2648           void increment(__attribute__((unused)) int _, int *i)
2649           {
2650             (*i)++;
2651           }
2652
2653           int main()
2654           {
2655             int *i = malloc(sizeof(int));
2656             while (1)
2657             {
2658               increment(0, i);
2659               (*i)++;
2660               usleep(1000);
2661             }
2662           }
2663
2664           # bpftrace -e 'watchpoint:increment+arg1:4:w { printf("hit!\n"); exit() }' -c ./wpfunc
2665

LISTING PROBES

2667       Probe listing is the method to discover which probes are supported by
2668       the current system. Listing supports the same syntax as normal
2669       attachment does:
2670
2671           # bpftrace -l 'kprobe:*'
2672           # bpftrace -l 't:syscalls:*openat*
2673           # bpftrace -l 'kprobe:tcp*,trace
2674           # bpftrace -l 'k:*socket*,tracepoint:syscalls:*tcp*'
2675
2676       The verbose flag (-v) can be specified to inspect arguments (args) for
2677       providers that support it:
2678
2679           # bpftrace -l 'fr:tcp_reset,t:syscalls:sys_enter_openat' -v
2680           kretfunc:tcp_reset
2681               struct sock * sk
2682               struct sk_buff * skb
2683           tracepoint:syscalls:sys_enter_openat
2684               int __syscall_nr
2685               int dfd
2686               const char * filename
2687               int flags
2688               umode_t mode
2689           # bpftrace -l 'uprobe:/bin/bash:rl_set_prompt' -v    # works only if /bin/bash has DWARF
2690           uprobe:/bin/bash:rl_set_prompt
2691               const char *prompt
2692
2693
2694
2695                                  2023-10-04                       BPFTRACE(8)
Impressum