1BPFTRACE(8) BPFTRACE(8)
2
3
4
6 bpftrace - a high-level tracing language
7
9 bpftrace [OPTIONS] FILENAME
10 bpftrace [OPTIONS] -e 'program code'
11
13 bpftrace is a high-level tracing language and runtime for Linux based
14 on BPF. It supports static and dynamic tracing for both the kernel and
15 user-space.
16
17 When FILENAME is "-", read from stdin.
18
20 List all probes with "sleep" in their name
21
22 # bpftrace -l '*sleep*'
23
24 Trace processes calling sleep
25
26 # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }'
27
28 Trace processes calling sleep while spawning sleep 5 as a child process
29
30 # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }' -c 'sleep 5'
31
33 x86_64, arm64 and s390x
34
36 Output format
37 -B MODE, Set the buffer mode for stdout. Valid values are
38 none No buffering. Each I/O is written as soon as possible
39 line Data is written on the first newline or when the buffer is
40 full. This is the default mode.
41 full Data is written once the buffer is full.
42
43 -f FORMAT, Set the output format. Valid values are
44 json
45 text
46
47 -o FILENAME
48 Write bpftrace tracing output to FILENAME instead of stdout. This
49 doesn’t include child process (-c option) output. Errors are still
50 written to stderr.
51
52 --no-warnings
53 Suppress all warning messages created by bpftrace.
54
55 Tracing
56 -e PROGRAM
57 Execute PROGRAM instead of reading the program from a file
58
59 -I DIR
60 Add the directory DIR to the search path for C headers. This option
61 can be used multiple times.
62
63 --include FILENAME
64 Add FILENAME as an include for the pre-processor. This is equal to
65 adding '#include FILENAME' to the start bpftrace program. This
66 option can be used multiple times.
67
68 -l [SEARCH]
69 List all probes that match the SEARCH pattern. If the pattern is
70 omitted all probes will be listed. This pattern supports wildcards
71 in the same way that probes do. E.g. '-l kprobe:*file*' to list all
72 'kprobes' with 'file' in the name. For more details see the Listing
73 Probes section.
74
75 --unsafe
76 Some calls, like 'system', are marked as unsafe as they can have
77 dangerous side effects ('system("rm -rf")') and are disabled by
78 default. This flag allows their use.
79
80 -k
81 Errors from bpf-helpers(7) are silently ignored by default which
82 can lead to strange results. This flag enables the detection of
83 errors (except for errors from 'probe_read_*'). When errors occurs
84 bpftrace will log an error containing the source location and the
85 error code:
86
87 stdin:48-57: WARNING: Failed to probe_read_user_str: Bad address (-14)
88 u:lib.so:"fn(char const*)" { printf("arg0:%s\n", str(arg0));}
89 ~~~~~~~~~
90
91 -kk
92 Same as '-k' but also includes the errors from 'probe_read_*'
93 helpers.
94
95 Process management
96 -p PID
97 Attach to the process with PID. If the process terminates, bpftrace
98 will also terminate. When using USDT probes they will be attached
99 to only this process.
100
101 -c COMMAND
102 Run COMMAND as a child process. When the child terminates bpftrace
103 stops as well, as if 'exit()' has been called. If bpftrace
104 terminates before the child process does the child process will be
105 terminated with a SIGTERM. If used, 'USDT' probes these will only
106 be attached to the child process. To avoid a race condition when
107 using 'USDTs' the child is stopped after 'execve' using 'ptrace(2)'
108 and continued when all 'USDT' probes are attached.
109 The child PID is available to programs as the 'cpid' builtin.
110 The child process runs with the same privileges as bpftrace itself
111 (usually root).
112
113 --usdt-file-activation
114 activate usdt semaphores based on file path
115
116 Miscellaneous
117 --info
118 Print detailed information about features supported by the kernel
119 and the bpftrace build.
120
121 -h, --help
122 Print the help summary
123
124 -V, --version
125 Print bpftrace version information
126
127 -v
128 verbose messages
129
130 -d
131 debug mode
132
133 -dd
134 verbose debug mode
135
137 Some behavior can only be controlled through environment variables.
138 This section lists all those variables.
139
140 BPFTRACE_STRLEN
141 Default: 64
142
143 Number of bytes allocated on the BPF stack for the string returned by
144 str().
145
146 Make this larger if you wish to read bigger strings with str().
147
148 Beware that the BPF stack is small (512 bytes).
149
150 Support for even larger strings is [being
151 discussed](https://github.com/iovisor/bpftrace/issues/305).
152
153 BPFTRACE_NO_CPP_DEMANGLE
154 Default: 0
155
156 C++ symbol demangling in user space stack traces is enabled by default.
157
158 This feature can be turned off by setting the value of this environment
159 variable to 1.
160
161 BPFTRACE_MAP_KEYS_MAX
162 Default: 4096
163
164 This is the maximum number of keys that can be stored in a map.
165 Increasing the value will consume more memory and increase startup
166 times. There are some cases where you will want to: for example,
167 sampling stack traces, recording timestamps for each page, etc.
168
169 BPFTRACE_MAX_PROBES
170 Default: 512
171
172 This is the maximum number of probes that bpftrace can attach to.
173 Increasing the value will consume more memory, increase startup times
174 and can incur high performance overhead or even freeze or crash the
175 system.
176
177 BPFTRACE_CACHE_USER_SYMBOLS
178 Default: 0 if ASLR is enabled on system and -c option is not given;
179 otherwise 1
180
181 By default, bpftrace caches the results of symbols resolutions only
182 when ASLR (Address Space Layout Randomization) is disabled. This is
183 because the symbol addresses change with each execution with ASLR.
184 However, disabling caching may incur some performance penalty. Set this
185 env variable to 1 to force bpftrace to cache.
186
187 BPFTRACE_VMLINUX
188 Default: None
189
190 This specifies the vmlinux path used for kernel symbol resolution when
191 attaching kprobe to offset. If this value is not given, bpftrace
192 searches vmlinux from pre defined locations. See
193 src/attached_probe.cpp:find_vmlinux() for details.
194
195 BPFTRACE_BTF
196 Default: None
197
198 The path to a BTF file. By default, bpftrace searches several locations
199 to find a BTF file. See src/btf.cpp for the details.
200
201 BPFTRACE_PERF_RB_PAGES
202 Default: 64
203
204 Number of pages to allocate per CPU for perf ring buffer. The value
205 must be a power of 2.
206
207 If you’re getting a lot of dropped events bpftrace may not be
208 processing events in the ring buffer fast enough. It may be useful to
209 bump the value higher so more events can be queued up. The tradeoff is
210 that bpftrace will use more memory.
211
213 Overview
214 The bpftrace (bt) language is inspired by the D language used by dtrace
215 and uses the same program structure. Each script consists of an
216 preamble and one or more action blocks.
217
218 preamble
219
220 actionblock1
221 actionblock2
222
223 Preprocessor and type definitions take place in the preamble:
224
225 #include <linux/socket.h>
226 #define RED "\033[31m"
227
228 struct S {
229 int x;
230 }
231
232 Each action block consists of three parts:
233
234 probe[,probe]
235 /predicate/ {
236 action
237 }
238
239 Probes
240 A probe specifies the event and event type to attach too.
241
242 Predicate
243 The predicate is optional condition that must be met for the action
244 to be executed.
245
246 Action
247 Actions are the programs that run when an event fires (and the
248 predicate is met). An action is a semicolon (;) separated list of
249 statements and always enclosed by brackets {}
250
251 A basic script that traces the open(2) and openat(2) system calls can
252 be written as follows:
253
254 BEGIN
255 {
256 printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
257 }
258
259 tracepoint:syscalls:sys_enter_open,
260 tracepoint:syscalls:sys_enter_openat
261 {
262 printf("%-6d %-16s %s\n", pid, comm, str(args->filename));
263 }
264
265 This script has two action blocks and a total of 3 probes. The first
266 action block uses the special BEGIN probe, which fires once during
267 bpftrace startup. This probe is used to print a header, indicating that
268 the tracing has started.
269
270 The second action block uses two probes, one for open and one for
271 openat, and defines an action that prints the file being open ed as
272 well as the pid and comm of the process that execute the syscall. See
273 the [_probes] section for details on the available probe types.
274
275 Identifiers
276 Identifiers must match the following regular expression:
277 [_a-zA-Z][_a-zA-Z0-9]*
278
279 Comments
280 Both single line and multi line comments are supported.
281
282 // A single line comment
283 i:s:1 { // can also be used to comment inline
284 /*
285 a multi line comment
286
287 */
288 print(/* inline comment block */ 1);
289 }
290
291 Data Types
292 The following fundamental integer types are provided by the language.
293
294 ┌───────┬─────────────────────────┐
295 │ │ │
296 │Type │ Description │
297 ├───────┼─────────────────────────┤
298 │ │ │
299 │uint8 │ Unsigned 8 bit integer │
300 ├───────┼─────────────────────────┤
301 │ │ │
302 │int8 │ Signed 8 bit integer │
303 ├───────┼─────────────────────────┤
304 │ │ │
305 │uint16 │ Unsigned 16 bit integer │
306 ├───────┼─────────────────────────┤
307 │ │ │
308 │int16 │ Signed 16 bit integer │
309 ├───────┼─────────────────────────┤
310 │ │ │
311 │uint32 │ Unsigned 32 bit integer │
312 ├───────┼─────────────────────────┤
313 │ │ │
314 │int32 │ Signed 32 bit integer │
315 ├───────┼─────────────────────────┤
316 │ │ │
317 │uint64 │ Unsigned 64 bit integer │
318 ├───────┼─────────────────────────┤
319 │ │ │
320 │int64 │ Signed 64 bit integer │
321 └───────┴─────────────────────────┘
322
323 Floating-point
324 Floating-point numbers are not supported by BPF and therefore not by
325 bpftrace.
326
327 Constants
328 Integers constants can be defined in the following formats:
329
330 • decimal (base 10)
331
332 • octal (base 8)
333
334 • hexadecimal (base 16)
335
336 • scientific (base 10)
337
338 Octal constants have to be prefixed with a 0, e.g. 0123. Hexadecimal
339 constants start with either 0x or 0X, e.g. 0x10. Scientific are written
340 in the <m>e<n> format which is a shorthand for m*10^n, e.g. $i = 2e3;.
341 Note that scientific literals are integer only due to the lack of
342 floating point support, 1e-3 is not valid.
343
344 To improve the readability of big literals a underscore _ can be used
345 as field separator, e.g. 1_000_123_000.
346
347 Integer suffixes as found in the C language are parsed by bpftrace to
348 ensure compatibility with C headers/definitions but they’re not used as
349 size specifiers. 123UL, 123U and 123LL all result in the same integer
350 type with a value of 123.
351
352 Character constants can be defined by enclosing the character in single
353 quotes, e.g. $c = 'c';.
354
355 String constants can be defined by enclosing the character string in
356 double quotes, e.g. $str = "Hello world";.
357
358 Characters and strings support the following escape sequences:
359
360 ┌─────┬──────────────────────┐
361 │ │ │
362 │\n │ Newline │
363 ├─────┼──────────────────────┤
364 │ │ │
365 │\t │ Tab │
366 ├─────┼──────────────────────┤
367 │ │ │
368 │\0nn │ Octal value nn │
369 ├─────┼──────────────────────┤
370 │ │ │
371 │\xnn │ Hexadecimal value nn │
372 └─────┴──────────────────────┘
373
374 Type conversion
375 Integer and pointer types can be converted using explicit type
376 conversion with an expression like:
377
378 $y = (uint32) $z;
379 $py = (int16 *) $pz;
380
381 Integer casts to a higher rank are sign extended. Conversion to a lower
382 rank is done by zeroing leading bits.
383
384 Operators and Expressions
385 Arithmetic Operators
386 The following operators are available for integer arithmetic:
387
388 ┌──┬────────────────────────┐
389 │ │ │
390 │+ │ integer addition │
391 ├──┼────────────────────────┤
392 │ │ │
393 │- │ integer subtraction │
394 ├──┼────────────────────────┤
395 │ │ │
396 │* │ integer multiplication │
397 ├──┼────────────────────────┤
398 │ │ │
399 │/ │ integer division │
400 ├──┼────────────────────────┤
401 │ │ │
402 │% │ integer modulo │
403 └──┴────────────────────────┘
404
405 Logical Operators
406 ┌───┬─────────────┐
407 │ │ │
408 │&& │ Logical AND │
409 ├───┼─────────────┤
410 │ │ │
411 │|| │ Logical OR │
412 ├───┼─────────────┤
413 │ │ │
414 │! │ Logical NOT │
415 └───┴─────────────┘
416
417 Bitwise Operators
418 ┌───┬───────────────────────────┐
419 │ │ │
420 │& │ AND │
421 ├───┼───────────────────────────┤
422 │ │ │
423 │| │ OR │
424 ├───┼───────────────────────────┤
425 │ │ │
426 │^ │ XOR │
427 ├───┼───────────────────────────┤
428 │ │ │
429 │<< │ Left shift the left-hand │
430 │ │ operand by the number of │
431 │ │ bits specified by the │
432 │ │ right-hand expression │
433 │ │ value │
434 ├───┼───────────────────────────┤
435 │ │ │
436 │>> │ Right shift the left-hand │
437 │ │ operand by the number of │
438 │ │ bits specified by the │
439 │ │ right-hand expression │
440 │ │ value │
441 └───┴───────────────────────────┘
442
443 Relational Operators
444 The following relational operators are defined for integers and
445 pointers.
446
447 ┌───┬────────────────────────────┐
448 │ │ │
449 │< │ left-hand expression is │
450 │ │ less than right-hand │
451 ├───┼────────────────────────────┤
452 │ │ │
453 │<= │ left-hand expression is │
454 │ │ less than or equal to │
455 │ │ right-hand │
456 ├───┼────────────────────────────┤
457 │ │ │
458 │> │ left-hand expression is │
459 │ │ bigger than right-hand │
460 ├───┼────────────────────────────┤
461 │ │ │
462 │>= │ left-hand expression is │
463 │ │ bigger or equal to than │
464 │ │ right-hand │
465 ├───┼────────────────────────────┤
466 │ │ │
467 │== │ left-hand expression equal │
468 │ │ to right-hand │
469 ├───┼────────────────────────────┤
470 │ │ │
471 │!= │ left-hand expression not │
472 │ │ equal to right-hand │
473 └───┴────────────────────────────┘
474
475 The following relation operators are available for comparing strings.
476
477 ┌───┬────────────────────────────┐
478 │ │ │
479 │== │ left-hand string equal to │
480 │ │ right-hand │
481 ├───┼────────────────────────────┤
482 │ │ │
483 │!= │ left-hand string not equal │
484 │ │ to right-hand │
485 └───┴────────────────────────────┘
486
487 Assignment Operators
488 The following assignment operators can be used on both map and scratch
489 variables:
490
491 ┌────┬────────────────────────────┐
492 │ │ │
493 │= │ Assignment, assign the │
494 │ │ right-hand expression to │
495 │ │ the left-hand variable │
496 ├────┼────────────────────────────┤
497 │ │ │
498 │<<= │ Update the variable with │
499 │ │ its value left shifted by │
500 │ │ the number of bits │
501 │ │ specified by the │
502 │ │ right-hand expression │
503 │ │ value │
504 ├────┼────────────────────────────┤
505 │ │ │
506 │>>= │ Update the variable with │
507 │ │ its value right shifted by │
508 │ │ the number of bits │
509 │ │ specified by the │
510 │ │ right-hand expression │
511 │ │ value │
512 ├────┼────────────────────────────┤
513 │ │ │
514 │+= │ Increment the variable by │
515 │ │ the right-hand expression │
516 │ │ value │
517 ├────┼────────────────────────────┤
518 │ │ │
519 │-= │ Decrement the variable by │
520 │ │ the right-hand expression │
521 │ │ value │
522 ├────┼────────────────────────────┤
523 │ │ │
524 │*= │ Multiple the variable by │
525 │ │ the right-hand expression │
526 │ │ value │
527 ├────┼────────────────────────────┤
528 │ │ │
529 │/= │ Divide the variable by the │
530 │ │ right-hand expression │
531 │ │ value │
532 ├────┼────────────────────────────┤
533 │ │ │
534 │%= │ Modulo the variable by the │
535 │ │ right-hand expression │
536 │ │ value │
537 ├────┼────────────────────────────┤
538 │ │ │
539 │&= │ Bitwise AND the variable │
540 │ │ by the right-hand │
541 │ │ expression value │
542 ├────┼────────────────────────────┤
543 │ │ │
544 │|= │ Bitwise OR the variable by │
545 │ │ the right-hand expression │
546 │ │ value │
547 ├────┼────────────────────────────┤
548 │ │ │
549 │^= │ Bitwise XOR the variable │
550 │ │ by the right-hand │
551 │ │ expression value │
552 └────┴────────────────────────────┘
553
554 All these operators are syntactic sugar for combining assignment with
555 the specified operator. @ -= 5 is equal to @ = @ - 5.
556
557 Increment and Decrement Operators
558 The increment (++) and decrement (--) operators can be used on integer
559 and pointer variables to increment their value by one. They can only be
560 used on variables and can either be applied as prefix or suffix. The
561 difference is that the expression x++ returns the original value of x,
562 before it got incremented while ++x returns the value of x post
563 increment. E.g.
564
565 $x = 10;
566 $y = $x--; // y = 10; x = 9
567 $a = 10;
568 $b = --$a; // a = 9; b = 9
569
570 Note that maps will be implicitly declared and initialized to 0 if not
571 already declared or defined. Scratch variables must be initialized
572 before using these operators.
573
574 Variables and Maps
575 bpftrace knows two types of variables, scratch and map.
576
577 'scratch' variables are kept on the BPF stack and only exists during
578 the execution of the action block and cannot be accessed outside of the
579 program. Scratch variable names always start with a $, e.g. $myvar.
580
581 'map' variables use BPF 'maps'. These exist for the lifetime of
582 bpftrace itself and can be accessed from all action blocks and
583 user-space. Map names always start with a @, e.g. @mymap.
584
585 All valid identifiers can be used as name.
586
587 The data type of a variable is automatically determined during first
588 assignment and cannot be changed afterwards.
589
590 Associative Arrays
591 Associative arrays are a collection of elements indexed by a key,
592 similar to the hash tables found in languages like C++ (std::map) and
593 Python (dict). They’re a variant of 'map' variables.
594
595 @name[key] = expression
596 @name[key1,key2] = expression
597
598 Just like with any variable the type is determined on first use and
599 cannot be modified afterwards. This applies to both the key(s) and the
600 value type.
601
602 The following snippet creates a map with key signature [int64,
603 string[16]] and a value type of int64:
604
605 @[pid, comm]++
606
607 Variable scoping
608 Pointers
609 Pointers in bpftrace are similar to those found in C.
610
611 Tuples
612 bpftrace has support for immutable N-tuples (n > 1). A tuple is a
613 sequence type (like an array) where, unlike an array, every element can
614 have a different type.
615
616 Tuples are a comma separated list of expressions, enclosed in brackets,
617 (1,2) Individual fields can be accessed with the . operator. Tuples are
618 zero indexed like arrays are.
619
620 i:s:1 {
621 $a = (1,2);
622 $b = (3,4, $a);
623 print($a);
624 print($b);
625 print($b.0);
626 }
627
628 Prints:
629
630 (1, 2)
631 (3, 4, (1, 2))
632 3
633
634 Arrays
635 bpftrace supports accessing one-dimensional arrays like those found in
636 C.
637
638 Constructing arrays from scratch, like int a[] = {1,2,3} in C, is not
639 supported. They can only be read into a variable from a pointer.
640
641 The [] operator is used to access elements.
642
643 struct MyStruct {
644 int y[4];
645 }
646
647 kprobe:dummy {
648 $s = (struct MyStruct *) arg0;
649 print($s->y[0]);
650 }
651
652 Structs
653 C like structs are supported by bpftrace. Fields are accessed with the
654 . operator. Fields of a pointer to a struct can be accessed with the ->
655 operator.
656
657 Custom struct can be defined in the preamble
658
659 Constructing structs from scratch, like struct X var = {.f1 = 1} in C,
660 is not supported. They can only be read into a variable from a pointer.
661
662 struct MyStruct {
663 int a;
664 }
665
666 kprobe:dummy {
667 $ptr = (struct MyStruct *) arg0;
668 $st = *$ptr;
669 print($st.a);
670 print($ptr->a);
671 }
672
673 Conditionals
674 Conditional expressions are supported in the form of if/else statements
675 and the ternary operator.
676
677 The ternary operator consists of three operands: a condition followed
678 by a ?, the expression to execute when the condition is true followed
679 by a : and the expression to execute if the condition is false.
680
681 condition ? ifTrue : ifFalse
682
683 Both the ifTrue and ifFalse expressions must be of the same type,
684 mixing types is not allowed.
685
686 The ternary operator can be used as part of an assignment.
687
688 $a == 1 ? print("true") : print("false");
689 $b = $a > 0 ? $a : -1;
690
691 If/else statements, like the one in C, are supported.
692
693 if (condition) {
694 ifblock
695 } else if (condition) {
696 if2block
697 } else {
698 elseblock
699 }
700
701 Loops
702 Since kernel 5.3 BPF supports loops as long as the verifier can prove
703 they’re bounded and fit within the instruction limit.
704
705 In bpftrace loops are available through the while statement.
706
707 while (condition) {
708 block;
709 }
710
711 Within a while-loop the following control flow statements can be used:
712
713 ┌─────────┬────────────────────────────┐
714 │ │ │
715 │continue │ skip processing of the │
716 │ │ rest of the block and jump │
717 │ │ back to the evaluation of │
718 │ │ the conditional │
719 ├─────────┼────────────────────────────┤
720 │ │ │
721 │break │ Terminate the loop │
722 └─────────┴────────────────────────────┘
723
724 i:s:1 {
725 $i = 0;
726 while ($i <= 100) {
727 printf("%d ", $i);
728 if ($i > 5) {
729 break;
730 }
731 $i++
732 }
733 printf("\n");
734 }
735
736 Loop unrolling is also supported with the unroll statement.
737
738 unroll(n) {
739 block;
740 }
741
742 The compiler will evaluate the block n times and generate the BPF code
743 for the block n times. As this happens at compile time n must be a
744 constant greater than 0 (n > 0).
745
746 The following two probes compile into the same code:
747
748 i:s:1 {
749 unroll(3) {
750 print("Unrolled")
751 }
752 }
753
754 i:s:1 {
755 print("Unrolled")
756 print("Unrolled")
757 print("Unrolled")
758 }
759
761 While BPF in the kernel can do a lot there are still things that can
762 only be done from user space, like the outputting (printing) of data.
763 The way bpftrace handles this is by sending events from the BPF program
764 which user-space will pick up some time in the future (usually in
765 milliseconds). Operations that happen in the kernel are 'synchronous'
766 ('sync') and those that are handled in user space are 'asynchronous'
767 ('async')
768
769 The async behaviour can lead to some unexpected behavior as updates can
770 happen before user space had time to process the event. One example is
771 updating a map value in a tight loop:
772
773 BEGIN {
774 @=0;
775 unroll(10) {
776 print(@);
777 @++;
778 }
779 exit()
780 }
781
782 Maps are printed by reference not by value and as the value gets
783 updated right after the print user-space will likely only see the final
784 value once it processes the event:
785
786 @: 10
787 @: 10
788 @: 10
789 @: 10
790 @: 10
791 @: 10
792 @: 10
793 @: 10
794 @: 10
795 @: 10
796
798 Kernel and user pointers live in different address spaces which,
799 depending on the CPU architecture, might overlap. Trying to read a
800 pointer that is in the wrong address space results in a runtime error.
801 This error is hidden by default but can be enabled with the -kk flag:
802
803 stdin:1:9-12: WARNING: Failed to probe_read_user: Bad address (-14)
804 BEGIN { @=*uptr(kaddr("do_poweroff")) }
805 ~~~
806
807 bpftrace tries to automatically set the correct address space for a
808 pointer based on the probe type, but might fail in cases where it is
809 unclear. The address space can be changed with the kptr() and uptr()
810 functions.
811
813 Builtins are special variables built into the language. Unlike the
814 scratch and map variable they don’t need a $ or @ as prefix (except for
815 the positional parameters).
816
817 ┌──────────────┬────────────┬────────────┬───────────────────────┬───────────────────┐
818 │ │ │ │ │ │
819 │Variable │ Type │ Kernel │ BPF Helper │ Description │
820 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
821 │ │ │ │ │ │
822 │$1, $2, ...$n │ int64 │ n/a │ n/a │ The nth │
823 │ │ │ │ │ positional │
824 │ │ │ │ │ parameter │
825 │ │ │ │ │ passed to the │
826 │ │ │ │ │ bpftrace │
827 │ │ │ │ │ program. If │
828 │ │ │ │ │ less than n │
829 │ │ │ │ │ parameters │
830 │ │ │ │ │ are passed │
831 │ │ │ │ │ this │
832 │ │ │ │ │ evaluates to │
833 │ │ │ │ │ 0. For string │
834 │ │ │ │ │ arguments use │
835 │ │ │ │ │ the str() │
836 │ │ │ │ │ call to │
837 │ │ │ │ │ retrieve the │
838 │ │ │ │ │ value. │
839 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
840 │ │ │ │ │ │
841 │$# │ int64 │ n/a │ n/a │ Total amount │
842 │ │ │ │ │ of positional │
843 │ │ │ │ │ parameters │
844 │ │ │ │ │ passed. │
845 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
846 │ │ │ │ │ │
847 │arg0, arg1, │ int64 │ n/a │ n/a │ nth argument │
848 │...argn │ │ │ │ passed to the │
849 │ │ │ │ │ function │
850 │ │ │ │ │ being traced. │
851 │ │ │ │ │ These are │
852 │ │ │ │ │ extracted │
853 │ │ │ │ │ from the CPU │
854 │ │ │ │ │ registers. │
855 │ │ │ │ │ The amount of │
856 │ │ │ │ │ args passed │
857 │ │ │ │ │ in registers │
858 │ │ │ │ │ depends on │
859 │ │ │ │ │ the CPU │
860 │ │ │ │ │ architecture. │
861 │ │ │ │ │ (kprobes, │
862 │ │ │ │ │ uprobes, │
863 │ │ │ │ │ usdt). │
864 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
865 │ │ │ │ │ │
866 │cgroup │ uint64 │ 4.18 │ get_current_cgroup_id │ ID of the │
867 │ │ │ │ │ cgroup the │
868 │ │ │ │ │ current task │
869 │ │ │ │ │ is in. Only │
870 │ │ │ │ │ works with │
871 │ │ │ │ │ cgroupv2. │
872 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
873 │ │ │ │ │ │
874 │comm │ string[16] │ 4.2 │ get_current_com │ comm of the │
875 │ │ │ │ │ current task. │
876 │ │ │ │ │ Equal to the │
877 │ │ │ │ │ value in │
878 │ │ │ │ │ /proc/<pid>/comm │
879 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
880 │ │ │ │ │ │
881 │cpid │ uint32 │ n/a │ n/a │ PID of the child │
882 │ │ │ │ │ process │
883 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
884 │ │ │ │ │ │
885 │cpu │ uint32 │ 4.1 │ raw_smp_processor_id │ ID of the │
886 │ │ │ │ │ processor │
887 │ │ │ │ │ executing the │
888 │ │ │ │ │ BPF program │
889 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
890 │ │ │ │ │ │
891 │curtask │ uint64 │ 4.8 │ get_current_task │ Pointer to │
892 │ │ │ │ │ struct │
893 │ │ │ │ │ task_struct of │
894 │ │ │ │ │ the current task │
895 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
896 │ │ │ │ │ │
897 │elapsed │ uint64 │ (see nsec) │ ktime_get_ns / │ Nanoseconds │
898 │ │ │ │ ktime_get_boot_ns │ elapsed since │
899 │ │ │ │ │ bpftrace │
900 │ │ │ │ │ initialization, │
901 │ │ │ │ │ based on nsecs │
902 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
903 │ │ │ │ │ │
904 │func │ string │ n/a │ n/a │ Name of the │
905 │ │ │ │ │ current function │
906 │ │ │ │ │ being traced │
907 │ │ │ │ │ (kprobes,uprobes) │
908 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
909 │ │ │ │ │ │
910 │gid │ uint64 │ 4.2 │ get_current_uid_gid │ GID of current │
911 │ │ │ │ │ task │
912 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
913 │ │ │ │ │ │
914 │kstack │ kstack │ │ get_stackid │ Kernel stack │
915 │ │ │ │ │ trace │
916 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
917 │ │ │ │ │ │
918 │nsecs │ uint64 │ 4.1 / 5.7 │ ktime_get_ns / │ nanoseconds since │
919 │ │ │ │ ktime_get_boot_ns │ kernel boot. On │
920 │ │ │ │ │ kernels that │
921 │ │ │ │ │ support │
922 │ │ │ │ │ ktime_get_boot_ns │
923 │ │ │ │ │ this includes the │
924 │ │ │ │ │ time spent │
925 │ │ │ │ │ suspended, on │
926 │ │ │ │ │ older kernels it │
927 │ │ │ │ │ does not. │
928 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
929 │ │ │ │ │ │
930 │pid │ uint64 │ 4.2 │ get_current_pid_tgid │ Process ID (or │
931 │ │ │ │ │ thread group ID) │
932 │ │ │ │ │ of the current │
933 │ │ │ │ │ task. │
934 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
935 │ │ │ │ │ │
936 │probe │ string │ n/na │ n/a │ Name of the │
937 │ │ │ │ │ current probe │
938 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
939 │ │ │ │ │ │
940 │rand │ uint32 │ 4.1 │ get_prandom_u32 │ Random number │
941 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
942 │ │ │ │ │ │
943 │retval │ int64 │ n/a │ n/a │ Value returned by │
944 │ │ │ │ │ the function │
945 │ │ │ │ │ being traced │
946 │ │ │ │ │ (kretprobe, │
947 │ │ │ │ │ uretprobe, │
948 │ │ │ │ │ kretfunc) │
949 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
950 │ │ │ │ │ │
951 │sarg0, sarg1, │ int64 │ n/a │ n/a │ nth stack value │
952 │...sargn │ │ │ │ of the function │
953 │ │ │ │ │ being traced. │
954 │ │ │ │ │ (kprobes, │
955 │ │ │ │ │ uprobes). │
956 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
957 │ │ │ │ │ │
958 │tid │ uint64 │ 4.2 │ get_current_pid_tgid │ Thread ID of the │
959 │ │ │ │ │ current task. │
960 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
961 │ │ │ │ │ │
962 │uid │ uint64 │ 4.2 │ get_current_uid_gid │ UID of current │
963 │ │ │ │ │ task │
964 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
965 │ │ │ │ │ │
966 │ustack │ ustack │ 4.6 │ get_stackid │ Userspace stack │
967 │ │ │ │ │ trace │
968 └──────────────┴────────────┴────────────┴───────────────────────┴───────────────────┘
969
971 Map functions are built-in functions who’s return value can only be
972 assigned to maps. The data type associated with these functions are
973 only for internal use and are not compatible with the (integer)
974 operators.
975
976 Functions that are marked async are asynchronous which can lead to
977 unexpected behavior, see the [_sync_and_async] section for more
978 information.
979
980 avg
981 variants
982
983 • avg(int64 n)
984
985 Calculate the running average of n between consecutive calls.
986
987 i:s:1 {
988 @x++;
989 @y = avg(@x);
990 print(@x);
991 print(@y);
992 }
993
994 Internally this keeps two values in the map: value count and running
995 total. The average is computed in user-space when printing by dividing
996 the total by the count.
997
998 clear
999 variants
1000
1001 • clear(map m)
1002
1003 async
1004
1005 Clear all keys/values from map m.
1006
1007 i:ms:100 {
1008 @[rand % 10] = count();
1009 }
1010
1011 i:s:10 {
1012 print(@);
1013 clear(@);
1014 }
1015
1016 count
1017 variants
1018
1019 • count()
1020
1021 Count how often this function is called.
1022
1023 Using @=count() is conceptually similar to @++. The difference is that
1024 the count() function uses a map type optimized for this (PER_CPU),
1025 increasing performance. Due to this the map cannot be accessed as a
1026 regular integer.
1027
1028 i:ms:100 {
1029 @ = count();
1030 }
1031
1032 i:s:10 {
1033 print(@);
1034 clear(@);
1035 }
1036
1037 delete
1038 variants
1039
1040 • delete(mapkey k)
1041
1042 Delete a single key from a map. For a single value map this deletes the
1043 only element. For an associative-array the key to delete has to be
1044 specified.
1045
1046 k:dummy {
1047 @scalar = 1;
1048 @associative[1,2] = 1;
1049 delete(@scalar);
1050 delete(@associative[1,2]);
1051
1052 delete(@associative); // error
1053 }
1054
1055 hist
1056 variants
1057
1058 • hist(int64 n)
1059
1060 Create a log2 histogram of n.
1061
1062 kretprobe:vfs_read {
1063 @bytes = hist(retval);
1064 }
1065
1066 Results in:
1067
1068 @:
1069 [1M, 2M) 3 | |
1070 [2M, 4M) 2 | |
1071 [4M, 8M) 2 | |
1072 [8M, 16M) 6 | |
1073 [16M, 32M) 16 | |
1074 [32M, 64M) 27 | |
1075 [64M, 128M) 48 |@ |
1076 [128M, 256M) 98 |@@@ |
1077 [256M, 512M) 191 |@@@@@@ |
1078 [512M, 1G) 394 |@@@@@@@@@@@@@ |
1079 [1G, 2G) 820 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1080
1081 lhist
1082 variants
1083
1084 • lhist(int64 n, int64 min, int64 max, int64 step)
1085
1086 Create a linear histogram of n. lhist creates M ((max - min) / step)
1087 buckets in the range [min,max) where each bucket is step in size.
1088 Values in the range (-inf, min) and (max, inf) get their get their own
1089 bucket too, bringing the total amount of buckets created to M+2.
1090
1091 i:ms:1 {
1092 @ = lhist(rand %10, 0, 10, 1);
1093 }
1094
1095 i:s:5 {
1096 exit();
1097 }
1098
1099 Prints:
1100
1101 @:
1102 [0, 1) 306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1103 [1, 2) 284 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1104 [2, 3) 294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1105 [3, 4) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1106 [4, 5) 311 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1107 [5, 6) 362 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1108 [6, 7) 336 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1109 [7, 8) 326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1110 [8, 9) 328 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1111 [9, 10) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1112
1113 max
1114 variants
1115
1116 • max(int64 n)
1117
1118 Update the map with n if n is bigger than the current value held.
1119
1120 min
1121 variants
1122
1123 • min(int64 n)
1124
1125 Update the map with n if n is smaller than the current value held.
1126
1127 stats
1128 variants
1129
1130 • stats(int64 n)
1131
1132 stats combines the count, avg and sum calls into one.
1133
1134 kprobe:vfs_read {
1135 @bytes[comm] = stats(arg2);
1136 }
1137
1138 @bytes[bash]: count 7, average 1, total 7
1139 @bytes[sleep]: count 5, average 832, total 4160
1140 @bytes[ls]: count 7, average 886, total 6208
1141 @
1142
1143 sum
1144 variants
1145
1146 • sum(int64 n)
1147
1148 Calculate the sum of all n passed.
1149
1150 zero
1151 variants
1152
1153 • zero(map m)
1154
1155 async
1156
1157 Set all values for all keys to zero.
1158
1160 Functions that are marked async are asynchronous which can lead to
1161 unexpected behaviour, see the [sync and async] section for more
1162 information.
1163
1164 compile time functions are evaluated at compile time, a static value
1165 will be compiled into the program.
1166
1167 unsafe functions can have dangerous side effects and should be used
1168 with care, the --unsafe flag is required for use.
1169
1170 buf
1171 variants
1172
1173 • buf_t buf(void * data, [int64 length])
1174
1175 buf reads length amount of bytes from address data. The maximum value
1176 of length is limited to the BPFTRACE_STRLEN variable. For arrays the
1177 length is optional, it is automatically inferred from the signature.
1178
1179 buf is address space aware and will call the correct helper based on
1180 the address space associated with data.
1181
1182 The buf_t object returned by buf can safely be printed as a hex encoded
1183 string with the %r format specifier.
1184
1185 Bytes with values >=32 and <=126 are printed using their ASCII
1186 character, other bytes are printed in hex form (e.g. \x00).
1187
1188 i:s:1 {
1189 printf("%r\n", buf(kaddr("avenrun"), 8));
1190 }
1191
1192 \x00\x03\x00\x00\x00\x00\x00\x00
1193 \xc2\x02\x00\x00\x00\x00\x00\x00
1194
1195 cat
1196 variants
1197
1198 • void cat(string namefmt, [...args])
1199
1200 async
1201
1202 Dump the contents of the named file to stdout. cat supports the same
1203 format string and arguments that printf does. If the file cannot be
1204 opened or read an error is printed to stderr.
1205
1206 t:syscalls:sys_enter_execve {
1207 cat("/proc/%d/maps", pid);
1208 }
1209
1210 55f683ebd000-55f683ec1000 r--p 00000000 08:01 1843399 /usr/bin/ls
1211 55f683ec1000-55f683ed6000 r-xp 00004000 08:01 1843399 /usr/bin/ls
1212 55f683ed6000-55f683edf000 r--p 00019000 08:01 1843399 /usr/bin/ls
1213 55f683edf000-55f683ee2000 rw-p 00021000 08:01 1843399 /usr/bin/ls
1214 55f683ee2000-55f683ee3000 rw-p 00000000 00:00 0
1215
1216 cgroupid
1217 variants
1218
1219 • uint64 cgroupid(const string path)
1220
1221 compile time
1222
1223 cgroupid retrieves the cgroupv2 ID of the cgroup available at path.
1224
1225 BEGIN {
1226 print(cgroupid("/sys/fs/cgroup/system.slice"));
1227 }
1228
1229 exit
1230 variants
1231
1232 • void exit()
1233
1234 async
1235
1236 Terminate bpftrace, as if a SIGTERM was received. The END probe will
1237 still trigger (if specified) and maps will be printed.
1238
1239 join
1240 variants
1241
1242 • void join(char *arr[], [char * sep = ' '])
1243
1244 async
1245
1246 join joins all the string array arr with sep as separator into one
1247 string. This string will be printed to stdout directly, it cannot be
1248 used as string value.
1249
1250 The concatenation of the array members is done in BPF and the printing
1251 happens in userspace.
1252
1253 tracepoint:syscalls:sys_enter_execve {
1254 join(args->argv);
1255 }
1256
1257 kaddr
1258 variants
1259
1260 • uint64 kaddr(const string name)
1261
1262 compile time
1263
1264 Get the address of the kernel symbol name.
1265
1266 The following script:
1267
1268 kptr
1269 variants
1270
1271 • T * kptr(T * ptr)
1272
1273 Marks ptr as a kernel address space pointer. See the address-spaces
1274 section for more information on address-spaces. The pointer type is
1275 left unchanged.
1276
1277 ksym
1278 variants
1279
1280 • ksym_t ksym(uint64 addr)
1281
1282 async
1283
1284 Retrieve the name of the function that contains address addr. The
1285 address to name mapping happens in user-space.
1286
1287 The ksym_t type can be printed with the %s format specifier.
1288
1289 kprobe:do_nanosleep
1290 {
1291 printf("%s\n", ksym(reg("ip")));
1292 }
1293
1294 Prints:
1295
1296 do_nanosleep
1297
1298 macaddr
1299 variants
1300
1301 • macaddr_t macaddr(char [6] mac)
1302
1303 Create a buffer that holds a macaddress as read from mac This buffer
1304 can be printed in the canonical string format using the %s format
1305 specifier.
1306
1307 kprobe:arp_create {
1308 printf("SRC %s, DST %s\n", macaddr(sarg0), macaddr(sarg1));
1309 }
1310
1311 Prints:
1312
1313 SRC 18:C0:4D:08:2E:BB, DST 74:83:C2:7F:8C:FF
1314
1315 ntop
1316 variants
1317
1318 • inet_t ntop([int64 af, ] int addr)
1319
1320 • inet_t ntop([int64 af, ] char addr[4])
1321
1322 • inet_t ntop([int64 af, ] char addr[16])
1323
1324 ntop returns the string representation of an IPv4 or IPv6 address. ntop
1325 will infer the address type (IPv4 or IPv6) based on the addr type and
1326 size. If an integer or char[4] is given, ntop assumes IPv4, if a
1327 char[16] is given, ntop assumes IPv6. You can also pass the address
1328 type (e.g. AF_INET) explicitly as the first parameter.
1329
1330 override
1331 variants
1332
1333 • override(uint64 rc)
1334
1335 unsafe
1336
1337 Kernel 4.16
1338
1339 Helper bpf_override
1340
1341 Supported probes
1342
1343 • kprobe
1344
1345 When using override the probed function will not be executed and
1346 instead rc will be returned.
1347
1348 k:__x64_sys_getuid
1349 /comm == "id"/ {
1350 override(2<<21);
1351 }
1352
1353 uid=4194304 gid=0(root) euid=0(root) groups=0(root)
1354
1355 This feature only works on kernels compiled with
1356 CONFIG_BPF_KPROBE_OVERRIDE and only works on functions tagged
1357 ALLOW_ERROR_INJECTION.
1358
1359 bpftrace does not test whether error injection is allowed for the
1360 probed function, instead if will fail to load the program into the
1361 kernel:
1362
1363 ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
1364 Error attaching probe: 'kprobe:vfs_read'
1365
1366 reg
1367 variants
1368
1369 • reg(const string name)
1370
1371 Supported probes
1372
1373 • kprobe
1374
1375 • uprobe
1376
1377 Get the contents of the register identified by name. Valid names depend
1378 on the CPU architecture.
1379
1380 signal
1381 variants
1382
1383 • signal(const string sig)
1384
1385 • signal(uint32 signum)
1386
1387 unsafe
1388
1389 Kernel 5.3
1390
1391 Helper bpf_send_signal
1392
1393 Probe types: k(ret)probe, u(ret)probe, USDT, profile
1394
1395 Send a signal to the process being traced. The signal can either be
1396 identified by name, e.g. SIGSTOP or by ID, e.g. 19 as found in kill -l.
1397
1398 kprobe:__x64_sys_execve
1399 /comm == "bash"/ {
1400 signal(5);
1401 }
1402
1403 $ ls
1404 Trace/breakpoint trap (core dumped)
1405
1406 sizeof
1407 variants
1408
1409 • sizeof(TYPE)
1410
1411 • sizeof(EXPRESSION)
1412
1413 compile time
1414
1415 Returns size of the argument in bytes. Similar to C/C++ sizeof
1416 operator. Note that the expression does not get evaluated.
1417
1418 str
1419 variants
1420
1421 • str(char * data [, uint32 length)
1422
1423 Helper probe_read_str, probe_read_{kernel,user}_str
1424
1425 str reads a NULL terminated (\0) string from data. The maximum string
1426 length is limited by the BPFTRACE_STR_LEN env variable, unless length
1427 is specified and shorter than the maximum. In case the string is longer
1428 than the specified length only length - 1 bytes are copied and a NULL
1429 byte is appended at the end.
1430
1431 When available (starting from kernel 5.5, see the --info flag) bpftrace
1432 will automatically use the kernel or user variant of
1433 probe_read_{kernel,user}_str based on the address space of data, see
1434 [_address_spaces] for more information.
1435
1436 strftime
1437 variants
1438
1439 • strtime_t strftime(const string fmt, int64 timestamp_ns)
1440
1441 async
1442
1443 Format the nanoseconds since boot timestamp timestamp_ns according to
1444 the format specified by fmt. The time conversion and formatting happens
1445 in user space, therefore the timestr_t value returned can only be used
1446 for printing using the %s format specifier.
1447
1448 bpftrace uses the strftime(3) function for formatting time and supports
1449 the same format specifiers.
1450
1451 i:s:1 {
1452 printf("%s\n", strftime("%H:%M:%S", nsecs));
1453 }
1454
1455 bpftrace also supports the following format string extensions:
1456
1457 ┌──────────┬────────────────────────────┐
1458 │ │ │
1459 │Specifier │ Description │
1460 ├──────────┼────────────────────────────┤
1461 │ │ │
1462 │%f │ Microsecond as a decimal │
1463 │ │ number, zero-padded on the │
1464 │ │ left │
1465 └──────────┴────────────────────────────┘
1466
1467 strncmp
1468 variants
1469
1470 • int64 strncmp(char * s1, char * s2, int64 n)
1471
1472 strncmp compares up to n characters string s1 and string s2. If they’re
1473 equal 0 is returned, else a non-zero value is returned.
1474
1475 bpftrace doesn’t read past the length of the shortest string.
1476
1477 The use of the == and != operators is recommended over calling strncmp
1478 directly.
1479
1480 system
1481 variants
1482
1483 • void system(string namefmt [, ...args])
1484
1485 unsafe async
1486
1487 system lets bpftrace run the specified command (fork and exec) until it
1488 completes and print its stdout. The command is run with the same
1489 privileges as bpftrace and it blocks execution of the processing
1490 threads which can lead to missed events and delays processing of async
1491 events.
1492
1493 i:s:1 {
1494 time("%H:%M:%S: ");
1495 printf("%d\n", @++);
1496 }
1497 i:s:10 {
1498 system("/bin/sleep 10");
1499 }
1500 i:s:30 {
1501 exit();
1502 }
1503
1504 Note how the async time and printf first print every second until the
1505 i:s:10 probe hits, then they print every 10 seconds due to bpftrace
1506 blocking on sleep.
1507
1508 Attaching 3 probes...
1509 08:50:37: 0
1510 08:50:38: 1
1511 08:50:39: 2
1512 08:50:40: 3
1513 08:50:41: 4
1514 08:50:42: 5
1515 08:50:43: 6
1516 08:50:44: 7
1517 08:50:45: 8
1518 08:50:46: 9
1519 08:50:56: 10
1520 08:50:56: 11
1521 08:50:56: 12
1522 08:50:56: 13
1523 08:50:56: 14
1524 08:50:56: 15
1525 08:50:56: 16
1526 08:50:56: 17
1527 08:50:56: 18
1528 08:50:56: 19
1529
1530 system supports the same format string and arguments that printf does.
1531
1532 t:syscalls:sys_enter_execve {
1533 system("/bin/grep %s /proc/%d/status", "vmswap", pid);
1534 }
1535
1536 time
1537 variants
1538
1539 • void time(const string fmt)
1540
1541 async
1542
1543 Format the current wall time according to the format specifier fmt and
1544 print it to stdout. Unlike strftime() time() doesn’t send a timestamp
1545 from the probe, instead it is the time at which user-space processes
1546 the event.
1547
1548 bpftrace uses the strftime(3) function for formatting time and supports
1549 the same format specifiers.
1550
1551 uaddr
1552 variants
1553
1554 • T * uaddr(const string sym)
1555
1556 Supported probes
1557
1558 • uprobes
1559
1560 • uretprobes
1561
1562 • USDT
1563
1564 Does not work with ASLR, see issue #75
1565 <https://github.com/iovisor/bpftrace/issues/75>
1566
1567 The uaddr function returns the address of the specified symbol. This
1568 lookup happens during program compilation and cannot be used
1569 dynamically.
1570
1571 The default return type is uint64*. If the ELF object size matches a
1572 known integer size (1, 2, 4 or 8 bytes) the return type is modified to
1573 match the width (uint8*, uint16*, uint32* or uint64* resp.). As ELF
1574 does not contain type info the type is always assumed to be unsigned.
1575
1576 uprobe:/bin/bash:readline {
1577 printf("PS1: %s\n", str(*uaddr("ps1_prompt")));
1578 }
1579
1580 uptr
1581 variants
1582
1583 • T * uptr(T * ptr)
1584
1585 Marks ptr as a user address space pointer. See the address-spaces
1586 section for more information on address-spaces. The pointer type is
1587 left unchanged.
1588
1589 usym
1590 variants
1591
1592 • usym_t usym(uint64 * addr)
1593
1594 async
1595
1596 Supported probes
1597
1598 • uprobes
1599
1600 • uretprobes
1601
1602 Equal to [functions_ksym] but resolves user space symbols
1603
1604 uprobe:/bin/bash:readline
1605 {
1606 printf("%s\n", usym(reg("ip")));
1607 }
1608
1609 Prints:
1610
1611 readline
1612
1613 path
1614 variants
1615
1616 • char * path(struct path * path)
1617
1618 Kernel 5.10
1619
1620 Helper bpf_d_path
1621
1622 Return full path referenced by struct path pointer in argument.
1623
1624 This function can only be used by functions that are allowed to, these
1625 functions are contained in the btf_allowlist_d_path set in the kernel.
1626
1627 unwatch
1628 variants
1629
1630 • void unwatch(void * addr)
1631
1632 async
1633
1634 Removes a watchpoint
1635
1637 print
1638 variants
1639
1640 • void print(T val)
1641
1642 async
1643
1644 variants
1645
1646 • void print(T val)
1647
1648 • void print(@map)
1649
1650 • void print(@map, uint64 top)
1651
1652 • void print(@map, uint64 top, uint64 div)
1653
1654 print prints a the value, which can be a map or a scalar value, with
1655 the default formatting for the type.
1656
1657 i:ms:10 { @=hist(rand); }
1658 i:s:1 {
1659 print(@);
1660 print(123);
1661 print("abc");
1662 exit();
1663 }
1664
1665 Prints:
1666
1667 @:
1668 [16M, 32M) 3 |@@@ |
1669 [32M, 64M) 2 |@@ |
1670 [64M, 128M) 1 |@ |
1671 [128M, 256M) 4 |@@@@ |
1672 [256M, 512M) 3 |@@@ |
1673 [512M, 1G) 14 |@@@@@@@@@@@@@@ |
1674 [1G, 2G) 22 |@@@@@@@@@@@@@@@@@@@@@@ |
1675 [2G, 4G) 51 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1676
1677 123
1678 abc
1679
1680 Note that maps are printed by reference while scalar values are copied.
1681 This means that updating and printing maps in a fast loop will likely
1682 result in bogus map values as the map will be updated before userspace
1683 gets the time to dump and print it.
1684
1685 The printing of maps supports the optional top and div arguments. top
1686 limits the printing to the top N entries with the highest integer
1687 values
1688
1689 BEGIN {
1690 $i = 11;
1691 while($i) {
1692 @[$i] = --$i;
1693 }
1694 print(@, 2);
1695 clear(@);
1696 exit()
1697 }
1698
1699 @[9]: 9
1700 @[10]: 10
1701
1702 The div argument scales the values prior to printing them. Scaling
1703 values before storing them can result in rounding errors. Consider the
1704 following program:
1705
1706 k:f {
1707 @[func] += arg0/10;
1708 }
1709
1710 With the following sequence as numbers for arg0: 134, 377, 111, 99. The
1711 total is 721 which rounds to 72 when scaled by 10 but the program would
1712 print 70 due to the rounding of individual values.
1713
1714 Changing the print call to print(@, 5, 2) will take the top 5 values
1715 and scale them by 2:
1716
1717 @[6]: 3
1718 @[7]: 3
1719 @[8]: 4
1720 @[9]: 4
1721 @[10]: 5
1722
1723 printf
1724 variants
1725
1726 • void printf(const string fmt, args...)
1727
1728 async
1729
1730 printf() formats and prints data. It behaves similar to printf() found
1731 in C and many other languages.
1732
1733 The format string has to be a constant, it cannot be modified at
1734 runtime. The formatting of the string happens in user space. Values are
1735 copied and passed by value.
1736
1737 bpftrace supports all the typical format specifiers like %llx and %hhu.
1738 The non-standard ones can be found in the table below:
1739
1740 ┌──────────┬────────┬─────────────────────┐
1741 │ │ │ │
1742 │Specifier │ Type │ Description │
1743 ├──────────┼────────┼─────────────────────┤
1744 │ │ │ │
1745 │r │ buffer │ Hex-formatted │
1746 │ │ │ string to print │
1747 │ │ │ arbitrary binary │
1748 │ │ │ content returned by │
1749 │ │ │ the buf │
1750 │ │ │ ([functions_buf]) │
1751 │ │ │ function. │
1752 └──────────┴────────┴─────────────────────┘
1753
1754 Supported escape sequences
1755
1756 Colors are supported too, using standard terminal escape sequences:
1757
1758 print("\033[31mRed\t\033[33mYellow\033[0m\n")
1759
1761 bpftrace supports various probe types which allow the user to attach
1762 BPF programs to different types of events. Each probe starts with a
1763 provider (e.g. kprobe) followed by a colon (:) separated list of
1764 options. The amount of options and their meaning depend on the provider
1765 and are detailed below. The valid values for options can depend on the
1766 system or binary being traced, e.g. for uprobes it depends on the
1767 binary. Also see [_listing_probes]
1768
1769 It is possible to associate multiple probes with a single action as
1770 long as the action is valid for all specified probes. Multiple probes
1771 can be specified as a comma (,) separated list:
1772
1773 kprobe:tcp_reset,kprobe:tcp_v4_rcv {
1774 printf("Entered: %s\n", probe);
1775 }
1776
1777 Wildcards are supported too:
1778
1779 kprobe:tcp_* {
1780 printf("Entered: %s\n", probe);
1781 }
1782
1783 Both can be combined:
1784
1785 kprobe:tcp_reset,kprobe:*socket* {
1786 printf("Entered: %s\n", probe);
1787 }
1788
1789 Most providers also support a short name which can be used instead of
1790 the full name, e.g. kprobe:f and k:f are identical.
1791
1792 BEGIN and END
1793 These are special built-in events provided by the bpftrace runtime.
1794 BEGIN is triggered before all other probes are attached. END is
1795 triggered after all other probes are detached.
1796
1797 Note that specifying an END probe doesn’t override the printing of
1798 'non-empty' maps at exit. To prevent the printing all used maps need be
1799 cleared, which can be done in the END probe:
1800
1801 END {
1802 clear(@map1);
1803 clear(@map2);
1804 }
1805
1806 hardware
1807 variants
1808
1809 • hardware:event_name:
1810
1811 • hardware:event_name:count
1812
1813 shortname
1814
1815 • h
1816
1817 The hardware probe attaches to pre-defined hardware events provided by
1818 the kernel.
1819
1820 They are implemented using performance monitoring counters (PMCs):
1821 hardware resources on the processor. There are about ten of these, and
1822 they are documented in the perf_event_open(2) man page. The event names
1823 are:
1824
1825 • cpu-cycles or cycles
1826
1827 • instructions
1828
1829 • cache-references
1830
1831 • cache-misses
1832
1833 • branch-instructions or branches
1834
1835 • branch-misses
1836
1837 • bus-cycles
1838
1839 • frontend-stalls
1840
1841 • backend-stalls
1842
1843 • ref-cycles
1844
1845 The count option specifies how many events must happen before the probe
1846 fires. If count is left unspecified a default value is used.
1847
1848 hardware:cache-misses:1e6 { @[pid] = count(); }
1849
1850 interval
1851 variants
1852
1853 • interval:us:count
1854
1855 • interval:ms:count
1856
1857 • interval:s:count
1858
1859 • interval:hz:rate
1860
1861 shortnames
1862
1863 • i
1864
1865 The interval probe fires at a fixed interval as specified by its time
1866 spec. Interval fire on one CPU at the time, unlike [profile] probes.
1867
1868 iterator
1869 variants
1870
1871 • iter:task
1872
1873 • iter:task:pin
1874
1875 • iter:task_file
1876
1877 • iter:task_file:pin
1878
1879 shortnames
1880
1881 • it
1882
1883 These are eBPF iterator probes, that allow iteration over kernel
1884 objects.
1885
1886 Iterator probe can’t be mixed with any other probe, not even other
1887 iterator.
1888
1889 Each iterator probe provides set of fields that could be accessed with
1890 ctx pointer. User can display set of available fields for iterator via
1891 -lv options as described below.
1892
1893 Examples:
1894
1895 # bpftrace -e 'iter:task { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
1896 Attaching 1 probe...
1897 systemd:1
1898 kthreadd:2
1899 rcu_gp:3
1900 rcu_par_gp:4
1901 kworker/0:0H:6
1902 mm_percpu_wq:8
1903 ...
1904
1905 # bpftrace -e 'iter:task_file { printf("%s:%d %d:%s\n", ctx->task->comm, ctx->task->pid, ctx->fd, path(ctx->file->f_path)); }'
1906 Attaching 1 probe...
1907 systemd:1 1:/dev/null
1908 systemd:1 2:/dev/null
1909 systemd:1 3:/dev/kmsg
1910 ...
1911 su:1622 1:/dev/pts/1
1912 su:1622 2:/dev/pts/1
1913 su:1622 3:/var/lib/sss/mc/passwd
1914 ...
1915 bpftrace:1892 1:pipe:[35124]
1916 bpftrace:1892 2:/dev/pts/1
1917 bpftrace:1892 3:anon_inode:bpf-map
1918 bpftrace:1892 4:anon_inode:bpf-map
1919 bpftrace:1892 5:anon_inode:bpf_link
1920 bpftrace:1892 6:anon_inode:bpf-prog
1921 bpftrace:1892 7:anon_inode:bpf_iter
1922
1923 It’s possible to pin iterator with specifying optional probe ':pin'
1924 part, that defines the pin file. It can be specified as absolute path
1925 or relative to /sys/fs/bpf.
1926
1927 relative pin
1928
1929 # bpftrace -e 'iter:task:list { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
1930 Program pinned to /sys/fs/bpf/list
1931
1932 # cat /sys/fs/bpf/list
1933 systemd:1
1934 kthreadd:2
1935 rcu_gp:3
1936 rcu_par_gp:4
1937 kworker/0:0H:6
1938 mm_percpu_wq:8
1939 rcu_tasks_kthre:9
1940 ...
1941
1942 Examples with absolute pin file:
1943
1944 absolute pin
1945
1946 # bpftrace -e '
1947 iter:task_file:/sys/fs/bpf/files {
1948 printf("%s:%d %s\n", ctx->task->comm, ctx->task->pid, path(ctx->file->f_path));
1949 }'
1950
1951 Program pinned to /sys/fs/bpf/files
1952
1953 # cat /sys/fs/bpf/files
1954 systemd:1 anon_inode:inotify
1955 systemd:1 anon_inode:[timerfd]
1956 ...
1957 systemd-journal:849 /dev/kmsg
1958 systemd-journal:849 anon_inode:[eventpoll]
1959 ...
1960 sssd:1146 /var/log/sssd/sssd.log
1961 sssd:1146 anon_inode:[eventpoll]
1962 ...
1963 NetworkManager:1155 anon_inode:[eventfd]
1964 NetworkManager:1155 /var/lib/sss/mc/passwd (deleted)
1965
1966 kfunc and kretfunc
1967 variants
1968
1969 • kfunc:fn
1970
1971 • kretfunc:fn
1972
1973 shortnames
1974
1975 • f (kfunc)
1976
1977 • fr (kretfunc)
1978
1979 requires (--info)
1980
1981 • Kernel features:BTF
1982
1983 • Probe types:kfunc
1984
1985 kfuncs attach to kernel function similar to [probes-kprobe]. They make
1986 use of eBPF trampolines which allows kernel code to call into BPF
1987 programs with near zero overhead.
1988
1989 kfunc s make use of BTF type information to derive the type of function
1990 arguments at compile time. This removes the need for manual type
1991 casting and makes the code more resilient against small signature
1992 changes in the kernel. The function arguments are available in the args
1993 struct which can be inspected by doing verbose listing (see
1994 [_listing_probes]). These arguments are also available in the return
1995 probe (kretfunc).
1996
1997 # bpftrace -lv 'kfunc:tcp_reset'
1998 kfunc:tcp_reset
1999 struct sock * sk
2000 struct sk_buff * skb
2001
2002 kfunc:x86_pmu_stop {
2003 printf("pmu %s stop\n", str(args->event->pmu->name));
2004 }
2005
2006 kretfunc:fget {
2007 printf("fd %d name %s\n", args->fd, str(retval->f_path.dentry->d_name.name));
2008 }
2009
2010 fd 3 name ld.so.cache
2011 fd 3 name libselinux.so.1
2012 fd 3 name libselinux.so.1
2013 ...
2014
2015 kprobe and kretprobe
2016 variants
2017
2018 • kprobe:fn
2019
2020 • kprobe:fn+offset
2021
2022 • kretprobe:fn
2023
2024 shortnames
2025
2026 • k
2027
2028 • kr
2029
2030 kprobe s allow for dynamic instrumentation of kernel functions. Each
2031 time the specified kernel function is executed the attached BPF
2032 programs are ran.
2033
2034 kprobe:tcp_reset {
2035 @tcp_resets = count()
2036 }
2037
2038 Function arguments are available through the argX and sargX builtins,
2039 for register args and stack args respectively. Whether arguments passed
2040 on stack or in a register depends on the architecture and the number or
2041 arguments in used, e.g. on x86_64 the first non-floating point 6
2042 arguments are passed in registers, all following arguments are passed
2043 on the stack. Note that floating point arguments are typically passed
2044 in special registers which don’t count as argX arguments which can
2045 cause confusion. Consider a function with the following signature:
2046
2047 void func(int a, double d, int x)
2048
2049 Due to d being a floating point x is accessed through arg1 where one
2050 might expect arg2.
2051
2052 bpftrace does not detect the function signature so it is not aware of
2053 the argument count or their type. It is up to the user to perform
2054 [language_type_conversion] when needed, e.g.
2055
2056 kprobe:tcp_connect
2057 {
2058 $sk = ((struct sock *) arg0);
2059 ...
2060 }
2061
2062 kprobe s are not limited to function entry, they can be attached to any
2063 instruction in a function by specifying an offset from the start of the
2064 function.
2065
2066 kretprobe s trigger on the return from a kernel function. Return probes
2067 do not have access to the function (input) arguments, only to the
2068 return value (through retval). A common pattern to work around this is
2069 by storing the arguments in a map on function entry and retrieving in
2070 the return probe:
2071
2072 kprobe:d_lookup
2073 {
2074 $name = (struct qstr *)arg1;
2075 @fname[tid] = $name->name;
2076 }
2077
2078 kretprobe:d_lookup
2079 /@fname[tid]/
2080 {
2081 printf("%-8d %-6d %-16s M %s\n", elapsed / 1e6, pid, comm,
2082 str(@fname[tid]));
2083 }
2084
2085 profile
2086 variants
2087
2088 • profile:us:count
2089
2090 • profile:ms:count
2091
2092 • profile:s:count
2093
2094 • profile:hz:rate
2095
2096 shortnames
2097
2098 • p
2099
2100 Profile probes fire on each CPU on the specified interval.
2101
2102 software
2103 variants
2104
2105 • software:event:
2106
2107 • software:event:count
2108
2109 shortnames
2110
2111 • s
2112
2113 The software probe attaches to pre-defined software events provided by
2114 the kernel. Event details can be found in the perf_event_open(2) man
2115 page.
2116
2117 The event names are:
2118
2119 • cpu-clock or cpu
2120
2121 • task-clock
2122
2123 • page-faults or faults
2124
2125 • context-switches or cs
2126
2127 • cpu-migrations
2128
2129 • minor-faults
2130
2131 • major-faults
2132
2133 • alignment-faults
2134
2135 • emulation-faults
2136
2137 • dummy
2138
2139 • bpf-output
2140
2141 tracepoint
2142 variants
2143
2144 • tracepoint:subsys:event
2145
2146 shortnames
2147
2148 • t
2149
2150 Tracepoints are hooks into events in the kernel. Tracepoints are
2151 defined in the kernel source and compiled into the kernel binary which
2152 makes them a form of static tracing. Which means that unlike kprobe s
2153 new tracepoints cannot be added without modifying the kernel.
2154
2155 The advantage of tracepoints is that they generally provide a more
2156 stable interface than kprobe s do, they do not depend on the existence
2157 of a kernel function.
2158
2159 Tracepoint arguments are available in the args struct which can be
2160 inspected with verbose listing, see the [_listing_probes] section for
2161 more details.
2162
2163 tracepoint:syscalls:sys_enter_openat {
2164 printf("%s %s\n", comm, str(args->filename));
2165 }
2166
2167 irqbalance /proc/interrupts
2168 irqbalance /proc/stat
2169 snmpd /proc/diskstats
2170 snmpd /proc/stat
2171 snmpd /proc/vmstat
2172 snmpd /proc/net/dev
2173 [...]
2174
2175 Additional information
2176
2177 • https://www.kernel.org/doc/html/latest/trace/tracepoints.html
2178
2179 uprobe, uretprobe
2180 variants
2181
2182 • uprobe:binary:func
2183
2184 • uprobe:binary:func+offset
2185
2186 • uretprobe:binary:func
2187
2188 shortnames
2189
2190 • u
2191
2192 • ur
2193
2194 uprobe s or user-space probes are the user-space equivalent of kprobe
2195 s. The same limitations that apply [probes-kprobe] also apply to uprobe
2196 s and uretprobe s.
2197
2198 When tracing libraries, it is sufficient to specify the library name
2199 instead of a full path. The path will be then automatically resolved
2200 using /etc/ld.so.cache:
2201
2202 # bpftrace -e 'uprobe:libc:malloc { printf("Allocated %d bytes\n", arg0); }'
2203 Allocated 4 bytes
2204 ...
2205
2206 If the traced binary has DWARF included, function arguments are
2207 available in the args struct which can be inspected with verbose
2208 listing, see the [_listing_probes] section for more details.
2209
2210 It is important to note that for uretprobe s to work the kernel runs a
2211 special helper on user-space function entry which overrides the return
2212 address on the stack. This can cause issues with languages that have
2213 their own runtime like Golang:
2214
2215 example.go
2216
2217 func myprint(s string) {
2218 fmt.Printf("Input: %s\n", s)
2219 }
2220
2221 func main() {
2222 ss := []string{"a", "b", "c"}
2223 for _, s := range ss {
2224 go myprint(s)
2225 }
2226 time.Sleep(1*time.Second)
2227 }
2228
2229 bpftrace
2230
2231 # bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
2232 runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
2233 stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
2234 fatal error: unknown caller pc
2235
2236 usdt
2237 variants
2238
2239 • usdt:binary:name
2240
2241 shortnames
2242
2243 • U
2244
2245 watchpoint and asyncwatchpoint
2246 variants
2247
2248 • watchpoint:absolute_address:length:mode
2249
2250 • watchpoint:function+argN:length:mode
2251
2252 shortnames
2253
2254 • w
2255
2256 • aw
2257
2258 These are memory watchpoints provided by the kernel. Whenever a memory
2259 address is written to (w), read from (r), or executed (x), the kernel
2260 can generate an event.
2261
2262 In the first form, an absolute address is monitored. If a pid (-p) or a
2263 command (-c) is provided, bpftrace takes the address as a userspace
2264 address and monitors the appropriate process. If not, bpftrace takes
2265 the address as a kernel space address.
2266
2267 In the second form, the address present in argN when function is
2268 entered is monitored. A pid or command must be provided for this form.
2269 If synchronous (watchpoint), a SIGSTOP is sent to the tracee upon
2270 function entry. The tracee will be SIGCONTed after the watchpoint is
2271 attached. This is to ensure events are not missed. If you want to avoid
2272 the SIGSTOP + SIGCONT use asyncwatchpoint.
2273
2274 Note that on most architectures you may not monitor for execution while
2275 monitoring read or write.
2276
2277 Examples
2278
2279 Print hit when a read from or write to 0x10000000 happens:
2280
2281 # bpftrace -e 'watchpoint:0x10000000:8:rw { printf("hit!\n"); exit(); }' -c ./testprogs/watchpoint
2282
2283 Print the call stack every time the jiffies variable is updated:
2284
2285 # bpftrace -e "watchpoint:0x$(awk '$3 == "jiffies" {print $1}' /proc/kallsyms):8:w {
2286 @[kstack] = count();
2287 }
2288
2289 i:s:1 { exit(); }"
2290 ......
2291 @[
2292 do_timer+12
2293 tick_do_update_jiffies64.part.22+89
2294 tick_sched_do_timer+103
2295 tick_sched_timer+39
2296 __hrtimer_run_queues+256
2297 hrtimer_interrupt+256
2298 smp_apic_timer_interrupt+106
2299 apic_timer_interrupt+15
2300 cpuidle_enter_state+188
2301 cpuidle_enter+41
2302 do_idle+536
2303 cpu_startup_entry+25
2304 start_secondary+355
2305 secondary_startup_64+164
2306 ]: 319
2307
2308 "hit" and exit when the memory pointed to by arg1 of increment is
2309 written to.
2310
2311 # cat wpfunc.c
2312 #include <stdio.h>
2313 #include <stdlib.h>
2314 #include <unistd.h>
2315
2316 __attribute__((noinline))
2317 void increment(__attribute__((unused)) int _, int *i)
2318 {
2319 (*i)++;
2320 }
2321
2322 int main()
2323 {
2324 int *i = malloc(sizeof(int));
2325 while (1)
2326 {
2327 increment(0, i);
2328 (*i)++;
2329 usleep(1000);
2330 }
2331 }
2332
2333 # bpftrace -e 'watchpoint:increment+arg1:4:w { printf("hit!\n"); exit() }' -c ./wpfunc
2334
2336 Probe listing is the method to discover which probes are supported by
2337 the current system. Listing supports the same syntax as normal
2338 attachment does:
2339
2340 # bpftrace -l 'kprobe:*'
2341 # bpftrace -l 't:syscalls:*openat*
2342 # bpftrace -l 'kprobe:tcp*,trace
2343 # bpftrace -l 'k:*socket*,tracepoint:syscalls:*tcp*'
2344
2345 The verbose flag (-v) can be specified to inspect arguments (args) for
2346 providers that support it:
2347
2348 # bpftrace -l 'fr:tcp_reset,t:syscalls:sys_enter_openat' -v
2349 kretfunc:tcp_reset
2350 struct sock * sk
2351 struct sk_buff * skb
2352 tracepoint:syscalls:sys_enter_openat
2353 int __syscall_nr
2354 int dfd
2355 const char * filename
2356 int flags
2357 umode_t mode
2358 # bpftrace -l 'uprobe:/bin/bash:rl_set_prompt' -v # works only if /bin/bash has DWARF
2359 uprobe:/bin/bash:rl_set_prompt
2360 const char *prompt
2361
2362
2363
2364 2022-05-02 BPFTRACE(8)