1BPFTRACE(8) BPFTRACE(8)
2
3
4
6 bpftrace - a high-level tracing language
7
9 bpftrace [OPTIONS] FILENAME
10 bpftrace [OPTIONS] -e 'program code'
11
13 bpftrace is a high-level tracing language and runtime for Linux based
14 on BPF. It supports static and dynamic tracing for both the kernel and
15 user-space.
16
17 When FILENAME is "-", read from stdin.
18
20 List all probes with "sleep" in their name
21
22 # bpftrace -l '*sleep*'
23
24 Trace processes calling sleep
25
26 # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }'
27
28 Trace processes calling sleep while spawning sleep 5 as a child process
29
30 # bpftrace -e 'kprobe:do_nanosleep { printf("%d sleeping\n", pid); }' -c 'sleep 5'
31
33 x86_64, arm64 and s390x
34
36 Output format
37 -B MODE, Set the buffer mode for stdout. Valid values are
38 none No buffering. Each I/O is written as soon as possible
39 line Data is written on the first newline or when the buffer is
40 full. This is the default mode.
41 full Data is written once the buffer is full.
42
43 -f FORMAT, Set the output format. Valid values are
44 json
45 text
46
47 -o FILENAME
48 Write bpftrace tracing output to FILENAME instead of stdout. This
49 doesn’t include child process (-c option) output. Errors are still
50 written to stderr.
51
52 --no-warnings
53 Suppress all warning messages created by bpftrace.
54
55 Tracing
56 -e PROGRAM
57 Execute PROGRAM instead of reading the program from a file
58
59 -I DIR
60 Add the directory DIR to the search path for C headers. This option
61 can be used multiple times.
62
63 --include FILENAME
64 Add FILENAME as an include for the pre-processor. This is equal to
65 adding '#include FILENAME' to the start bpftrace program. This
66 option can be used multiple times.
67
68 -l [SEARCH]
69 List all probes that match the SEARCH pattern. If the pattern is
70 omitted all probes will be listed. This pattern supports wildcards
71 in the same way that probes do. E.g. '-l kprobe:*file*' to list all
72 'kprobes' with 'file' in the name. For more details see the LISTING
73 PROBES section.
74
75 --unsafe
76 Some calls, like 'system', are marked as unsafe as they can have
77 dangerous side effects ('system("rm -rf")') and are disabled by
78 default. This flag allows their use.
79
80 -k
81 Errors from bpf-helpers(7) are silently ignored by default which
82 can lead to strange results. This flag enables the detection of
83 errors (except for errors from 'probe_read_*'). When errors occurs
84 bpftrace will log an error containing the source location and the
85 error code:
86
87 stdin:48-57: WARNING: Failed to probe_read_user_str: Bad address (-14)
88 u:lib.so:"fn(char const*)" { printf("arg0:%s\n", str(arg0));}
89 ~~~~~~~~~
90
91 -kk
92 Same as '-k' but also includes the errors from 'probe_read_*'
93 helpers.
94
95 Process management
96 -p PID
97 Attach to the process with PID. If the process terminates, bpftrace
98 will also terminate. When using USDT probes they will be attached
99 to only this process.
100
101 -c COMMAND
102 Run COMMAND as a child process. When the child terminates bpftrace
103 stops as well, as if 'exit()' has been called. If bpftrace
104 terminates before the child process does the child process will be
105 terminated with a SIGTERM. If used, 'USDT' probes these will only
106 be attached to the child process. To avoid a race condition when
107 using 'USDTs' the child is stopped after 'execve' using 'ptrace(2)'
108 and continued when all 'USDT' probes are attached.
109 The child PID is available to programs as the 'cpid' builtin.
110 The child process runs with the same privileges as bpftrace itself
111 (usually root).
112
113 --usdt-file-activation
114 activate usdt semaphores based on file path
115
116 Miscellaneous
117 --info
118 Print detailed information about features supported by the kernel
119 and the bpftrace build.
120
121 -h, --help
122 Print the help summary
123
124 -V, --version
125 Print bpftrace version information
126
127 -v
128 verbose messages
129
130 -d
131 debug mode
132
133 -dd
134 verbose debug mode
135
137 Some behavior can only be controlled through environment variables.
138 This section lists all those variables.
139
140 BPFTRACE_STRLEN
141 Default: 64
142
143 Number of bytes allocated on the BPF stack for the string returned by
144 str().
145
146 Make this larger if you wish to read bigger strings with str().
147
148 Beware that the BPF stack is small (512 bytes).
149
150 Support for even larger strings is [being
151 discussed](https://github.com/iovisor/bpftrace/issues/305).
152
153 BPFTRACE_NO_CPP_DEMANGLE
154 Default: 0
155
156 C++ symbol demangling in user space stack traces is enabled by default.
157
158 This feature can be turned off by setting the value of this environment
159 variable to 1.
160
161 BPFTRACE_MAP_KEYS_MAX
162 Default: 4096
163
164 This is the maximum number of keys that can be stored in a map.
165 Increasing the value will consume more memory and increase startup
166 times. There are some cases where you will want to: for example,
167 sampling stack traces, recording timestamps for each page, etc.
168
169 BPFTRACE_MAX_PROBES
170 Default: 512
171
172 This is the maximum number of probes that bpftrace can attach to.
173 Increasing the value will consume more memory, increase startup times
174 and can incur high performance overhead or even freeze or crash the
175 system.
176
177 BPFTRACE_CACHE_USER_SYMBOLS
178 Default: 0 if ASLR is enabled on system and -c option is not given;
179 otherwise 1
180
181 By default, bpftrace caches the results of symbols resolutions only
182 when ASLR (Address Space Layout Randomization) is disabled. This is
183 because the symbol addresses change with each execution with ASLR.
184 However, disabling caching may incur some performance penalty. Set this
185 env variable to 1 to force bpftrace to cache.
186
187 BPFTRACE_VMLINUX
188 Default: None
189
190 This specifies the vmlinux path used for kernel symbol resolution when
191 attaching kprobe to offset. If this value is not given, bpftrace
192 searches vmlinux from pre defined locations. See
193 src/attached_probe.cpp:find_vmlinux() for details.
194
195 BPFTRACE_BTF
196 Default: None
197
198 The path to a BTF file. By default, bpftrace searches several locations
199 to find a BTF file. See src/btf.cpp for the details.
200
201 BPFTRACE_PERF_RB_PAGES
202 Default: 64
203
204 Number of pages to allocate per CPU for perf ring buffer. The value
205 must be a power of 2.
206
207 If you’re getting a lot of dropped events bpftrace may not be
208 processing events in the ring buffer fast enough. It may be useful to
209 bump the value higher so more events can be queued up. The tradeoff is
210 that bpftrace will use more memory.
211
212 BPFTRACE_MAX_BPF_PROGS
213 Default: 512
214
215 This is the maximum number of BPF programs (functions) that bpftrace
216 can generate. The main purpose of this limit is to prevent bpftrace
217 from hanging since generating a lot of probes takes a lot of resources
218 (and it should not happen often).
219
221 Overview
222 The bpftrace (bt) language is inspired by the D language used by dtrace
223 and uses the same program structure. Each script consists of an
224 preamble and one or more action blocks.
225
226 preamble
227
228 actionblock1
229 actionblock2
230
231 Preprocessor and type definitions take place in the preamble:
232
233 #include <linux/socket.h>
234 #define RED "\033[31m"
235
236 struct S {
237 int x;
238 }
239
240 Each action block consists of three parts:
241
242 probe[,probe]
243 /predicate/ {
244 action
245 }
246
247 Probes
248 A probe specifies the event and event type to attach too.
249
250 Predicate
251 The predicate is optional condition that must be met for the action
252 to be executed.
253
254 Action
255 Actions are the programs that run when an event fires (and the
256 predicate is met). An action is a semicolon (;) separated list of
257 statements and always enclosed by brackets {}
258
259 A basic script that traces the open(2) and openat(2) system calls can
260 be written as follows:
261
262 BEGIN
263 {
264 printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
265 }
266
267 tracepoint:syscalls:sys_enter_open,
268 tracepoint:syscalls:sys_enter_openat
269 {
270 printf("%-6d %-16s %s\n", pid, comm, str(args->filename));
271 }
272
273 This script has two action blocks and a total of 3 probes. The first
274 action block uses the special BEGIN probe, which fires once during
275 bpftrace startup. This probe is used to print a header, indicating that
276 the tracing has started.
277
278 The second action block uses two probes, one for open and one for
279 openat, and defines an action that prints the file being open ed as
280 well as the pid and comm of the process that execute the syscall. See
281 the PROBES section for details on the available probe types.
282
283 Identifiers
284 Identifiers must match the following regular expression:
285 [_a-zA-Z][_a-zA-Z0-9]*
286
287 Comments
288 Both single line and multi line comments are supported.
289
290 // A single line comment
291 i:s:1 { // can also be used to comment inline
292 /*
293 a multi line comment
294
295 */
296 print(/* inline comment block */ 1);
297 }
298
299 Data Types
300 The following fundamental integer types are provided by the language.
301
302 ┌───────┬─────────────────────────┐
303 │ │ │
304 │Type │ Description │
305 ├───────┼─────────────────────────┤
306 │ │ │
307 │uint8 │ Unsigned 8 bit integer │
308 ├───────┼─────────────────────────┤
309 │ │ │
310 │int8 │ Signed 8 bit integer │
311 ├───────┼─────────────────────────┤
312 │ │ │
313 │uint16 │ Unsigned 16 bit integer │
314 ├───────┼─────────────────────────┤
315 │ │ │
316 │int16 │ Signed 16 bit integer │
317 ├───────┼─────────────────────────┤
318 │ │ │
319 │uint32 │ Unsigned 32 bit integer │
320 ├───────┼─────────────────────────┤
321 │ │ │
322 │int32 │ Signed 32 bit integer │
323 ├───────┼─────────────────────────┤
324 │ │ │
325 │uint64 │ Unsigned 64 bit integer │
326 ├───────┼─────────────────────────┤
327 │ │ │
328 │int64 │ Signed 64 bit integer │
329 └───────┴─────────────────────────┘
330
331 Floating-point
332 Floating-point numbers are not supported by BPF and therefore not by
333 bpftrace.
334
335 Constants
336 Integers constants can be defined in the following formats:
337
338 • decimal (base 10)
339
340 • octal (base 8)
341
342 • hexadecimal (base 16)
343
344 • scientific (base 10)
345
346 Octal constants have to be prefixed with a 0, e.g. 0123. Hexadecimal
347 constants start with either 0x or 0X, e.g. 0x10. Scientific are written
348 in the <m>e<n> format which is a shorthand for m*10^n, e.g. $i = 2e3;.
349 Note that scientific literals are integer only due to the lack of
350 floating point support, 1e-3 is not valid.
351
352 To improve the readability of big literals a underscore _ can be used
353 as field separator, e.g. 1_000_123_000.
354
355 Integer suffixes as found in the C language are parsed by bpftrace to
356 ensure compatibility with C headers/definitions but they’re not used as
357 size specifiers. 123UL, 123U and 123LL all result in the same integer
358 type with a value of 123.
359
360 Character constants can be defined by enclosing the character in single
361 quotes, e.g. $c = 'c';.
362
363 String constants can be defined by enclosing the character string in
364 double quotes, e.g. $str = "Hello world";.
365
366 Characters and strings support the following escape sequences:
367
368 ┌─────┬──────────────────────┐
369 │ │ │
370 │\n │ Newline │
371 ├─────┼──────────────────────┤
372 │ │ │
373 │\t │ Tab │
374 ├─────┼──────────────────────┤
375 │ │ │
376 │\0nn │ Octal value nn │
377 ├─────┼──────────────────────┤
378 │ │ │
379 │\xnn │ Hexadecimal value nn │
380 └─────┴──────────────────────┘
381
382 Type conversion
383 Integer and pointer types can be converted using explicit type
384 conversion with an expression like:
385
386 $y = (uint32) $z;
387 $py = (int16 *) $pz;
388
389 Integer casts to a higher rank are sign extended. Conversion to a lower
390 rank is done by zeroing leading bits.
391
392 Operators and Expressions
393 Arithmetic Operators
394 The following operators are available for integer arithmetic:
395
396 ┌──┬────────────────────────┐
397 │ │ │
398 │+ │ integer addition │
399 ├──┼────────────────────────┤
400 │ │ │
401 │- │ integer subtraction │
402 ├──┼────────────────────────┤
403 │ │ │
404 │* │ integer multiplication │
405 ├──┼────────────────────────┤
406 │ │ │
407 │/ │ integer division │
408 ├──┼────────────────────────┤
409 │ │ │
410 │% │ integer modulo │
411 └──┴────────────────────────┘
412
413 Logical Operators
414 ┌───┬─────────────┐
415 │ │ │
416 │&& │ Logical AND │
417 ├───┼─────────────┤
418 │ │ │
419 │|| │ Logical OR │
420 ├───┼─────────────┤
421 │ │ │
422 │! │ Logical NOT │
423 └───┴─────────────┘
424
425 Bitwise Operators
426 ┌───┬───────────────────────────┐
427 │ │ │
428 │& │ AND │
429 ├───┼───────────────────────────┤
430 │ │ │
431 │| │ OR │
432 ├───┼───────────────────────────┤
433 │ │ │
434 │^ │ XOR │
435 ├───┼───────────────────────────┤
436 │ │ │
437 │<< │ Left shift the left-hand │
438 │ │ operand by the number of │
439 │ │ bits specified by the │
440 │ │ right-hand expression │
441 │ │ value │
442 ├───┼───────────────────────────┤
443 │ │ │
444 │>> │ Right shift the left-hand │
445 │ │ operand by the number of │
446 │ │ bits specified by the │
447 │ │ right-hand expression │
448 │ │ value │
449 └───┴───────────────────────────┘
450
451 Relational Operators
452 The following relational operators are defined for integers and
453 pointers.
454
455 ┌───┬────────────────────────────┐
456 │ │ │
457 │< │ left-hand expression is │
458 │ │ less than right-hand │
459 ├───┼────────────────────────────┤
460 │ │ │
461 │<= │ left-hand expression is │
462 │ │ less than or equal to │
463 │ │ right-hand │
464 ├───┼────────────────────────────┤
465 │ │ │
466 │> │ left-hand expression is │
467 │ │ bigger than right-hand │
468 ├───┼────────────────────────────┤
469 │ │ │
470 │>= │ left-hand expression is │
471 │ │ bigger or equal to than │
472 │ │ right-hand │
473 ├───┼────────────────────────────┤
474 │ │ │
475 │== │ left-hand expression equal │
476 │ │ to right-hand │
477 ├───┼────────────────────────────┤
478 │ │ │
479 │!= │ left-hand expression not │
480 │ │ equal to right-hand │
481 └───┴────────────────────────────┘
482
483 The following relation operators are available for comparing strings.
484
485 ┌───┬────────────────────────────┐
486 │ │ │
487 │== │ left-hand string equal to │
488 │ │ right-hand │
489 ├───┼────────────────────────────┤
490 │ │ │
491 │!= │ left-hand string not equal │
492 │ │ to right-hand │
493 └───┴────────────────────────────┘
494
495 Assignment Operators
496 The following assignment operators can be used on both map and scratch
497 variables:
498
499 ┌────┬────────────────────────────┐
500 │ │ │
501 │= │ Assignment, assign the │
502 │ │ right-hand expression to │
503 │ │ the left-hand variable │
504 ├────┼────────────────────────────┤
505 │ │ │
506 │<<= │ Update the variable with │
507 │ │ its value left shifted by │
508 │ │ the number of bits │
509 │ │ specified by the │
510 │ │ right-hand expression │
511 │ │ value │
512 ├────┼────────────────────────────┤
513 │ │ │
514 │>>= │ Update the variable with │
515 │ │ its value right shifted by │
516 │ │ the number of bits │
517 │ │ specified by the │
518 │ │ right-hand expression │
519 │ │ value │
520 ├────┼────────────────────────────┤
521 │ │ │
522 │+= │ Increment the variable by │
523 │ │ the right-hand expression │
524 │ │ value │
525 ├────┼────────────────────────────┤
526 │ │ │
527 │-= │ Decrement the variable by │
528 │ │ the right-hand expression │
529 │ │ value │
530 ├────┼────────────────────────────┤
531 │ │ │
532 │*= │ Multiple the variable by │
533 │ │ the right-hand expression │
534 │ │ value │
535 ├────┼────────────────────────────┤
536 │ │ │
537 │/= │ Divide the variable by the │
538 │ │ right-hand expression │
539 │ │ value │
540 ├────┼────────────────────────────┤
541 │ │ │
542 │%= │ Modulo the variable by the │
543 │ │ right-hand expression │
544 │ │ value │
545 ├────┼────────────────────────────┤
546 │ │ │
547 │&= │ Bitwise AND the variable │
548 │ │ by the right-hand │
549 │ │ expression value │
550 ├────┼────────────────────────────┤
551 │ │ │
552 │|= │ Bitwise OR the variable by │
553 │ │ the right-hand expression │
554 │ │ value │
555 ├────┼────────────────────────────┤
556 │ │ │
557 │^= │ Bitwise XOR the variable │
558 │ │ by the right-hand │
559 │ │ expression value │
560 └────┴────────────────────────────┘
561
562 All these operators are syntactic sugar for combining assignment with
563 the specified operator. @ -= 5 is equal to @ = @ - 5.
564
565 Increment and Decrement Operators
566 The increment (++) and decrement (--) operators can be used on integer
567 and pointer variables to increment their value by one. They can only be
568 used on variables and can either be applied as prefix or suffix. The
569 difference is that the expression x++ returns the original value of x,
570 before it got incremented while ++x returns the value of x post
571 increment. E.g.
572
573 $x = 10;
574 $y = $x--; // y = 10; x = 9
575 $a = 10;
576 $b = --$a; // a = 9; b = 9
577
578 Note that maps will be implicitly declared and initialized to 0 if not
579 already declared or defined. Scratch variables must be initialized
580 before using these operators.
581
582 Variables and Maps
583 bpftrace knows two types of variables, scratch and map.
584
585 'scratch' variables are kept on the BPF stack and only exists during
586 the execution of the action block and cannot be accessed outside of the
587 program. Scratch variable names always start with a $, e.g. $myvar.
588
589 'map' variables use BPF 'maps'. These exist for the lifetime of
590 bpftrace itself and can be accessed from all action blocks and
591 user-space. Map names always start with a @, e.g. @mymap.
592
593 All valid identifiers can be used as name.
594
595 The data type of a variable is automatically determined during first
596 assignment and cannot be changed afterwards.
597
598 Associative Arrays
599 Associative arrays are a collection of elements indexed by a key,
600 similar to the hash tables found in languages like C++ (std::map) and
601 Python (dict). They’re a variant of 'map' variables.
602
603 @name[key] = expression
604 @name[key1,key2] = expression
605
606 Just like with any variable the type is determined on first use and
607 cannot be modified afterwards. This applies to both the key(s) and the
608 value type.
609
610 The following snippet creates a map with key signature [int64,
611 string[16]] and a value type of int64:
612
613 @[pid, comm]++
614
615 Variable scoping
616 Pointers
617 Pointers in bpftrace are similar to those found in C.
618
619 Tuples
620 bpftrace has support for immutable N-tuples (n > 1). A tuple is a
621 sequence type (like an array) where, unlike an array, every element can
622 have a different type.
623
624 Tuples are a comma separated list of expressions, enclosed in brackets,
625 (1,2) Individual fields can be accessed with the . operator. Tuples are
626 zero indexed like arrays are.
627
628 i:s:1 {
629 $a = (1,2);
630 $b = (3,4, $a);
631 print($a);
632 print($b);
633 print($b.0);
634 }
635
636 Prints:
637
638 (1, 2)
639 (3, 4, (1, 2))
640 3
641
642 Arrays
643 bpftrace supports accessing one-dimensional arrays like those found in
644 C.
645
646 Constructing arrays from scratch, like int a[] = {1,2,3} in C, is not
647 supported. They can only be read into a variable from a pointer.
648
649 The [] operator is used to access elements.
650
651 struct MyStruct {
652 int y[4];
653 }
654
655 kprobe:dummy {
656 $s = (struct MyStruct *) arg0;
657 print($s->y[0]);
658 }
659
660 Structs
661 C like structs are supported by bpftrace. Fields are accessed with the
662 . operator. Fields of a pointer to a struct can be accessed with the ->
663 operator.
664
665 Custom struct can be defined in the preamble
666
667 Constructing structs from scratch, like struct X var = {.f1 = 1} in C,
668 is not supported. They can only be read into a variable from a pointer.
669
670 struct MyStruct {
671 int a;
672 }
673
674 kprobe:dummy {
675 $ptr = (struct MyStruct *) arg0;
676 $st = *$ptr;
677 print($st.a);
678 print($ptr->a);
679 }
680
681 Conditionals
682 Conditional expressions are supported in the form of if/else statements
683 and the ternary operator.
684
685 The ternary operator consists of three operands: a condition followed
686 by a ?, the expression to execute when the condition is true followed
687 by a : and the expression to execute if the condition is false.
688
689 condition ? ifTrue : ifFalse
690
691 Both the ifTrue and ifFalse expressions must be of the same type,
692 mixing types is not allowed.
693
694 The ternary operator can be used as part of an assignment.
695
696 $a == 1 ? print("true") : print("false");
697 $b = $a > 0 ? $a : -1;
698
699 If/else statements, like the one in C, are supported.
700
701 if (condition) {
702 ifblock
703 } else if (condition) {
704 if2block
705 } else {
706 elseblock
707 }
708
709 Loops
710 Since kernel 5.3 BPF supports loops as long as the verifier can prove
711 they’re bounded and fit within the instruction limit.
712
713 In bpftrace loops are available through the while statement.
714
715 while (condition) {
716 block;
717 }
718
719 Within a while-loop the following control flow statements can be used:
720
721 ┌─────────┬────────────────────────────┐
722 │ │ │
723 │continue │ skip processing of the │
724 │ │ rest of the block and jump │
725 │ │ back to the evaluation of │
726 │ │ the conditional │
727 ├─────────┼────────────────────────────┤
728 │ │ │
729 │break │ Terminate the loop │
730 └─────────┴────────────────────────────┘
731
732 i:s:1 {
733 $i = 0;
734 while ($i <= 100) {
735 printf("%d ", $i);
736 if ($i > 5) {
737 break;
738 }
739 $i++
740 }
741 printf("\n");
742 }
743
744 Loop unrolling is also supported with the unroll statement.
745
746 unroll(n) {
747 block;
748 }
749
750 The compiler will evaluate the block n times and generate the BPF code
751 for the block n times. As this happens at compile time n must be a
752 constant greater than 0 (n > 0).
753
754 The following two probes compile into the same code:
755
756 i:s:1 {
757 unroll(3) {
758 print("Unrolled")
759 }
760 }
761
762 i:s:1 {
763 print("Unrolled")
764 print("Unrolled")
765 print("Unrolled")
766 }
767
769 While BPF in the kernel can do a lot there are still things that can
770 only be done from user space, like the outputting (printing) of data.
771 The way bpftrace handles this is by sending events from the BPF program
772 which user-space will pick up some time in the future (usually in
773 milliseconds). Operations that happen in the kernel are 'synchronous'
774 ('sync') and those that are handled in user space are 'asynchronous'
775 ('async')
776
777 The async behaviour can lead to some unexpected behavior as updates can
778 happen before user space had time to process the event. One example is
779 updating a map value in a tight loop:
780
781 BEGIN {
782 @=0;
783 unroll(10) {
784 print(@);
785 @++;
786 }
787 exit()
788 }
789
790 Maps are printed by reference not by value and as the value gets
791 updated right after the print user-space will likely only see the final
792 value once it processes the event:
793
794 @: 10
795 @: 10
796 @: 10
797 @: 10
798 @: 10
799 @: 10
800 @: 10
801 @: 10
802 @: 10
803 @: 10
804
806 Kernel and user pointers live in different address spaces which,
807 depending on the CPU architecture, might overlap. Trying to read a
808 pointer that is in the wrong address space results in a runtime error.
809 This error is hidden by default but can be enabled with the -kk flag:
810
811 stdin:1:9-12: WARNING: Failed to probe_read_user: Bad address (-14)
812 BEGIN { @=*uptr(kaddr("do_poweroff")) }
813 ~~~
814
815 bpftrace tries to automatically set the correct address space for a
816 pointer based on the probe type, but might fail in cases where it is
817 unclear. The address space can be changed with the kptr() and uptr()
818 functions.
819
821 Builtins are special variables built into the language. Unlike the
822 scratch and map variable they don’t need a $ or @ as prefix (except for
823 the positional parameters).
824
825 ┌──────────────┬────────────┬────────────┬───────────────────────┬───────────────────┐
826 │ │ │ │ │ │
827 │Variable │ Type │ Kernel │ BPF Helper │ Description │
828 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
829 │ │ │ │ │ │
830 │$1, $2, ...$n │ int64 │ n/a │ n/a │ The nth │
831 │ │ │ │ │ positional │
832 │ │ │ │ │ parameter │
833 │ │ │ │ │ passed to the │
834 │ │ │ │ │ bpftrace │
835 │ │ │ │ │ program. If │
836 │ │ │ │ │ less than n │
837 │ │ │ │ │ parameters │
838 │ │ │ │ │ are passed │
839 │ │ │ │ │ this │
840 │ │ │ │ │ evaluates to │
841 │ │ │ │ │ 0. For string │
842 │ │ │ │ │ arguments use │
843 │ │ │ │ │ the str() │
844 │ │ │ │ │ call to │
845 │ │ │ │ │ retrieve the │
846 │ │ │ │ │ value. │
847 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
848 │ │ │ │ │ │
849 │$# │ int64 │ n/a │ n/a │ Total amount │
850 │ │ │ │ │ of positional │
851 │ │ │ │ │ parameters │
852 │ │ │ │ │ passed. │
853 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
854 │ │ │ │ │ │
855 │arg0, arg1, │ int64 │ n/a │ n/a │ nth argument │
856 │...argn │ │ │ │ passed to the │
857 │ │ │ │ │ function │
858 │ │ │ │ │ being traced. │
859 │ │ │ │ │ These are │
860 │ │ │ │ │ extracted │
861 │ │ │ │ │ from the CPU │
862 │ │ │ │ │ registers. │
863 │ │ │ │ │ The amount of │
864 │ │ │ │ │ args passed │
865 │ │ │ │ │ in registers │
866 │ │ │ │ │ depends on │
867 │ │ │ │ │ the CPU │
868 │ │ │ │ │ architecture. │
869 │ │ │ │ │ (kprobes, │
870 │ │ │ │ │ uprobes, │
871 │ │ │ │ │ usdt). │
872 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
873 │ │ │ │ │ │
874 │cgroup │ uint64 │ 4.18 │ get_current_cgroup_id │ ID of the │
875 │ │ │ │ │ cgroup the │
876 │ │ │ │ │ current task │
877 │ │ │ │ │ is in. Only │
878 │ │ │ │ │ works with │
879 │ │ │ │ │ cgroupv2. │
880 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
881 │ │ │ │ │ │
882 │comm │ string[16] │ 4.2 │ get_current_com │ comm of the │
883 │ │ │ │ │ current task. │
884 │ │ │ │ │ Equal to the │
885 │ │ │ │ │ value in │
886 │ │ │ │ │ /proc/<pid>/comm │
887 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
888 │ │ │ │ │ │
889 │cpid │ uint32 │ n/a │ n/a │ PID of the child │
890 │ │ │ │ │ process │
891 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
892 │ │ │ │ │ │
893 │numaid │ uint32 │ 5.8 │ numa_node_id │ ID of the NUMA │
894 │ │ │ │ │ node executing │
895 │ │ │ │ │ the BPF program │
896 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
897 │ │ │ │ │ │
898 │cpu │ uint32 │ 4.1 │ raw_smp_processor_id │ ID of the │
899 │ │ │ │ │ processor │
900 │ │ │ │ │ executing the │
901 │ │ │ │ │ BPF program │
902 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
903 │ │ │ │ │ │
904 │curtask │ uint64 │ 4.8 │ get_current_task │ Pointer to │
905 │ │ │ │ │ struct │
906 │ │ │ │ │ task_struct of │
907 │ │ │ │ │ the current task │
908 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
909 │ │ │ │ │ │
910 │elapsed │ uint64 │ (see nsec) │ ktime_get_ns / │ Nanoseconds │
911 │ │ │ │ ktime_get_boot_ns │ elapsed since │
912 │ │ │ │ │ bpftrace │
913 │ │ │ │ │ initialization, │
914 │ │ │ │ │ based on nsecs │
915 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
916 │ │ │ │ │ │
917 │func │ string │ n/a │ n/a │ Name of the │
918 │ │ │ │ │ current function │
919 │ │ │ │ │ being traced │
920 │ │ │ │ │ (kprobes,uprobes) │
921 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
922 │ │ │ │ │ │
923 │gid │ uint64 │ 4.2 │ get_current_uid_gid │ GID of current │
924 │ │ │ │ │ task │
925 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
926 │ │ │ │ │ │
927 │kstack │ kstack │ │ get_stackid │ Kernel stack │
928 │ │ │ │ │ trace │
929 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
930 │ │ │ │ │ │
931 │nsecs │ uint64 │ 4.1 / 5.7 │ ktime_get_ns / │ nanoseconds since │
932 │ │ │ │ ktime_get_boot_ns │ kernel boot. On │
933 │ │ │ │ │ kernels that │
934 │ │ │ │ │ support │
935 │ │ │ │ │ ktime_get_boot_ns │
936 │ │ │ │ │ this includes the │
937 │ │ │ │ │ time spent │
938 │ │ │ │ │ suspended, on │
939 │ │ │ │ │ older kernels it │
940 │ │ │ │ │ does not. │
941 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
942 │ │ │ │ │ │
943 │pid │ uint64 │ 4.2 │ get_current_pid_tgid │ Process ID (or │
944 │ │ │ │ │ thread group ID) │
945 │ │ │ │ │ of the current │
946 │ │ │ │ │ task. │
947 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
948 │ │ │ │ │ │
949 │probe │ string │ n/na │ n/a │ Name of the │
950 │ │ │ │ │ current probe │
951 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
952 │ │ │ │ │ │
953 │rand │ uint32 │ 4.1 │ get_prandom_u32 │ Random number │
954 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
955 │ │ │ │ │ │
956 │retval │ int64 │ n/a │ n/a │ Value returned by │
957 │ │ │ │ │ the function │
958 │ │ │ │ │ being traced │
959 │ │ │ │ │ (kretprobe, │
960 │ │ │ │ │ uretprobe, │
961 │ │ │ │ │ kretfunc) │
962 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
963 │ │ │ │ │ │
964 │sarg0, sarg1, │ int64 │ n/a │ n/a │ nth stack value │
965 │...sargn │ │ │ │ of the function │
966 │ │ │ │ │ being traced. │
967 │ │ │ │ │ (kprobes, │
968 │ │ │ │ │ uprobes). │
969 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
970 │ │ │ │ │ │
971 │tid │ uint64 │ 4.2 │ get_current_pid_tgid │ Thread ID of the │
972 │ │ │ │ │ current task. │
973 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
974 │ │ │ │ │ │
975 │uid │ uint64 │ 4.2 │ get_current_uid_gid │ UID of current │
976 │ │ │ │ │ task │
977 ├──────────────┼────────────┼────────────┼───────────────────────┼───────────────────┤
978 │ │ │ │ │ │
979 │ustack │ ustack │ 4.6 │ get_stackid │ Userspace stack │
980 │ │ │ │ │ trace │
981 └──────────────┴────────────┴────────────┴───────────────────────┴───────────────────┘
982
984 Map functions are built-in functions who’s return value can only be
985 assigned to maps. The data type associated with these functions are
986 only for internal use and are not compatible with the (integer)
987 operators.
988
989 Functions that are marked async are asynchronous which can lead to
990 unexpected behavior, see the SYNC AND ASYNC section for more
991 information.
992
993 avg
994 variants
995
996 • avg(int64 n)
997
998 Calculate the running average of n between consecutive calls.
999
1000 i:s:1 {
1001 @x++;
1002 @y = avg(@x);
1003 print(@x);
1004 print(@y);
1005 }
1006
1007 Internally this keeps two values in the map: value count and running
1008 total. The average is computed in user-space when printing by dividing
1009 the total by the count.
1010
1011 clear
1012 variants
1013
1014 • clear(map m)
1015
1016 async
1017
1018 Clear all keys/values from map m.
1019
1020 i:ms:100 {
1021 @[rand % 10] = count();
1022 }
1023
1024 i:s:10 {
1025 print(@);
1026 clear(@);
1027 }
1028
1029 count
1030 variants
1031
1032 • count()
1033
1034 Count how often this function is called.
1035
1036 Using @=count() is conceptually similar to @++. The difference is that
1037 the count() function uses a map type optimized for this (PER_CPU),
1038 increasing performance. Due to this the map cannot be accessed as a
1039 regular integer.
1040
1041 i:ms:100 {
1042 @ = count();
1043 }
1044
1045 i:s:10 {
1046 print(@);
1047 clear(@);
1048 }
1049
1050 delete
1051 variants
1052
1053 • delete(mapkey k)
1054
1055 Delete a single key from a map. For a single value map this deletes the
1056 only element. For an associative-array the key to delete has to be
1057 specified.
1058
1059 k:dummy {
1060 @scalar = 1;
1061 @associative[1,2] = 1;
1062 delete(@scalar);
1063 delete(@associative[1,2]);
1064
1065 delete(@associative); // error
1066 }
1067
1068 hist
1069 variants
1070
1071 • hist(int64 n)
1072
1073 Create a log2 histogram of n.
1074
1075 kretprobe:vfs_read {
1076 @bytes = hist(retval);
1077 }
1078
1079 Results in:
1080
1081 @:
1082 [1M, 2M) 3 | |
1083 [2M, 4M) 2 | |
1084 [4M, 8M) 2 | |
1085 [8M, 16M) 6 | |
1086 [16M, 32M) 16 | |
1087 [32M, 64M) 27 | |
1088 [64M, 128M) 48 |@ |
1089 [128M, 256M) 98 |@@@ |
1090 [256M, 512M) 191 |@@@@@@ |
1091 [512M, 1G) 394 |@@@@@@@@@@@@@ |
1092 [1G, 2G) 820 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1093
1094 lhist
1095 variants
1096
1097 • lhist(int64 n, int64 min, int64 max, int64 step)
1098
1099 Create a linear histogram of n. lhist creates M ((max - min) / step)
1100 buckets in the range [min,max) where each bucket is step in size.
1101 Values in the range (-inf, min) and (max, inf) get their get their own
1102 bucket too, bringing the total amount of buckets created to M+2.
1103
1104 i:ms:1 {
1105 @ = lhist(rand %10, 0, 10, 1);
1106 }
1107
1108 i:s:5 {
1109 exit();
1110 }
1111
1112 Prints:
1113
1114 @:
1115 [0, 1) 306 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1116 [1, 2) 284 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1117 [2, 3) 294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1118 [3, 4) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1119 [4, 5) 311 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1120 [5, 6) 362 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1121 [6, 7) 336 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1122 [7, 8) 326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1123 [8, 9) 328 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1124 [9, 10) 318 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
1125
1126 max
1127 variants
1128
1129 • max(int64 n)
1130
1131 Update the map with n if n is bigger than the current value held.
1132
1133 min
1134 variants
1135
1136 • min(int64 n)
1137
1138 Update the map with n if n is smaller than the current value held.
1139
1140 stats
1141 variants
1142
1143 • stats(int64 n)
1144
1145 stats combines the count, avg and sum calls into one.
1146
1147 kprobe:vfs_read {
1148 @bytes[comm] = stats(arg2);
1149 }
1150
1151 @bytes[bash]: count 7, average 1, total 7
1152 @bytes[sleep]: count 5, average 832, total 4160
1153 @bytes[ls]: count 7, average 886, total 6208
1154 @
1155
1156 sum
1157 variants
1158
1159 • sum(int64 n)
1160
1161 Calculate the sum of all n passed.
1162
1163 zero
1164 variants
1165
1166 • zero(map m)
1167
1168 async
1169
1170 Set all values for all keys to zero.
1171
1173 Functions that are marked async are asynchronous which can lead to
1174 unexpected behaviour, see the [sync and async] section for more
1175 information.
1176
1177 compile time functions are evaluated at compile time, a static value
1178 will be compiled into the program.
1179
1180 unsafe functions can have dangerous side effects and should be used
1181 with care, the --unsafe flag is required for use.
1182
1183 bswap
1184 variants
1185
1186 • uint8 bswap(uint8 n)
1187
1188 • uint16 bswap(uint16 n)
1189
1190 • uint32 bswap(uint32 n)
1191
1192 • uint64 bswap(uint64 n)
1193
1194 bswap reverses the order of the bytes in integer n. In case of 8 bit
1195 integers, n is returned without being modified. The return type is an
1196 unsigned integer of the same width as n.
1197
1198 buf
1199 variants
1200
1201 • buf_t buf(void * data, [int64 length])
1202
1203 buf reads length amount of bytes from address data. The maximum value
1204 of length is limited to the BPFTRACE_STRLEN variable. For arrays the
1205 length is optional, it is automatically inferred from the signature.
1206
1207 buf is address space aware and will call the correct helper based on
1208 the address space associated with data.
1209
1210 The buf_t object returned by buf can safely be printed as a hex encoded
1211 string with the %r format specifier.
1212
1213 Bytes with values >=32 and <=126 are printed using their ASCII
1214 character, other bytes are printed in hex form (e.g. \x00). The %rx
1215 format specifier can be used to print everything in hex form, including
1216 ASCII characters.
1217
1218 i:s:1 {
1219 printf("%r\n", buf(kaddr("avenrun"), 8));
1220 }
1221
1222 \x00\x03\x00\x00\x00\x00\x00\x00
1223 \xc2\x02\x00\x00\x00\x00\x00\x00
1224
1225 cat
1226 variants
1227
1228 • void cat(string namefmt, [...args])
1229
1230 async
1231
1232 Dump the contents of the named file to stdout. cat supports the same
1233 format string and arguments that printf does. If the file cannot be
1234 opened or read an error is printed to stderr.
1235
1236 t:syscalls:sys_enter_execve {
1237 cat("/proc/%d/maps", pid);
1238 }
1239
1240 55f683ebd000-55f683ec1000 r--p 00000000 08:01 1843399 /usr/bin/ls
1241 55f683ec1000-55f683ed6000 r-xp 00004000 08:01 1843399 /usr/bin/ls
1242 55f683ed6000-55f683edf000 r--p 00019000 08:01 1843399 /usr/bin/ls
1243 55f683edf000-55f683ee2000 rw-p 00021000 08:01 1843399 /usr/bin/ls
1244 55f683ee2000-55f683ee3000 rw-p 00000000 00:00 0
1245
1246 cgroup_path
1247 variants
1248
1249 • cgroup_path cgroup_path(int cgroupid, string filter)
1250
1251 Convert cgroup id to cgroup path. This is done asynchronously in
1252 userspace when the cgroup_path value is printed, therefore it can
1253 resolve to a different value if the cgroup id gets reassigned. This
1254 also means that the returned value can only be used for printing.
1255
1256 A string literal may be passed as an optional second argument to filter
1257 cgroup hierarchies in which the cgroup id is looked up by a wildcard
1258 expression (cgroup2 is always represented by "unified", regardless of
1259 where it is mounted).
1260
1261 The currently mounted hierarchy at /sys/fs/cgroup is used to do the
1262 lookup. If the cgroup with the given id isn’t present here (e.g. when
1263 running in a Docker container), the cgroup path won’t be found (unlike
1264 when looking up the cgroup path of a process via /proc/.../cgroup).
1265
1266 BEGIN {
1267 $cgroup_path = cgroup_path(3436);
1268 print($cgroup_path);
1269 print($cgroup_path); /* This may print a different path */
1270 printf("%s %s", $cgroup_path, $cgroup_path); /* This may print two different paths */
1271 }
1272
1273 cgroupid
1274 variants
1275
1276 • uint64 cgroupid(const string path)
1277
1278 compile time
1279
1280 cgroupid retrieves the cgroupv2 ID of the cgroup available at path.
1281
1282 BEGIN {
1283 print(cgroupid("/sys/fs/cgroup/system.slice"));
1284 }
1285
1286 exit
1287 variants
1288
1289 • void exit()
1290
1291 async
1292
1293 Terminate bpftrace, as if a SIGTERM was received. The END probe will
1294 still trigger (if specified) and maps will be printed.
1295
1296 join
1297 variants
1298
1299 • void join(char *arr[], [char * sep = ' '])
1300
1301 async
1302
1303 join joins all the string array arr with sep as separator into one
1304 string. This string will be printed to stdout directly, it cannot be
1305 used as string value.
1306
1307 The concatenation of the array members is done in BPF and the printing
1308 happens in userspace.
1309
1310 tracepoint:syscalls:sys_enter_execve {
1311 join(args->argv);
1312 }
1313
1314 kaddr
1315 variants
1316
1317 • uint64 kaddr(const string name)
1318
1319 compile time
1320
1321 Get the address of the kernel symbol name.
1322
1323 The following script:
1324
1325 kptr
1326 variants
1327
1328 • T * kptr(T * ptr)
1329
1330 Marks ptr as a kernel address space pointer. See the address-spaces
1331 section for more information on address-spaces. The pointer type is
1332 left unchanged.
1333
1334 ksym
1335 variants
1336
1337 • ksym_t ksym(uint64 addr)
1338
1339 async
1340
1341 Retrieve the name of the function that contains address addr. The
1342 address to name mapping happens in user-space.
1343
1344 The ksym_t type can be printed with the %s format specifier.
1345
1346 kprobe:do_nanosleep
1347 {
1348 printf("%s\n", ksym(reg("ip")));
1349 }
1350
1351 Prints:
1352
1353 do_nanosleep
1354
1355 macaddr
1356 variants
1357
1358 • macaddr_t macaddr(char [6] mac)
1359
1360 Create a buffer that holds a macaddress as read from mac This buffer
1361 can be printed in the canonical string format using the %s format
1362 specifier.
1363
1364 kprobe:arp_create {
1365 printf("SRC %s, DST %s\n", macaddr(sarg0), macaddr(sarg1));
1366 }
1367
1368 Prints:
1369
1370 SRC 18:C0:4D:08:2E:BB, DST 74:83:C2:7F:8C:FF
1371
1372 ntop
1373 variants
1374
1375 • inet_t ntop([int64 af, ] int addr)
1376
1377 • inet_t ntop([int64 af, ] char addr[4])
1378
1379 • inet_t ntop([int64 af, ] char addr[16])
1380
1381 ntop returns the string representation of an IPv4 or IPv6 address. ntop
1382 will infer the address type (IPv4 or IPv6) based on the addr type and
1383 size. If an integer or char[4] is given, ntop assumes IPv4, if a
1384 char[16] is given, ntop assumes IPv6. You can also pass the address
1385 type (e.g. AF_INET) explicitly as the first parameter.
1386
1387 pton
1388 variants
1389
1390 • char addr[4] pton(const string *addr_v4)
1391
1392 • char addr[16] pton(const string *addr_v6)
1393
1394 compile time
1395
1396 pton converts a text representation of an IPv4 or IPv6 address to byte
1397 array. pton infers the address family based on . or : in the given
1398 argument. pton comes in handy when we need to select packets with
1399 certain IP addresses.
1400
1401 override
1402 variants
1403
1404 • override(uint64 rc)
1405
1406 unsafe
1407
1408 Kernel 4.16
1409
1410 Helper bpf_override
1411
1412 Supported probes
1413
1414 • kprobe
1415
1416 When using override the probed function will not be executed and
1417 instead rc will be returned.
1418
1419 k:__x64_sys_getuid
1420 /comm == "id"/ {
1421 override(2<<21);
1422 }
1423
1424 uid=4194304 gid=0(root) euid=0(root) groups=0(root)
1425
1426 This feature only works on kernels compiled with
1427 CONFIG_BPF_KPROBE_OVERRIDE and only works on functions tagged
1428 ALLOW_ERROR_INJECTION.
1429
1430 bpftrace does not test whether error injection is allowed for the
1431 probed function, instead if will fail to load the program into the
1432 kernel:
1433
1434 ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
1435 Error attaching probe: 'kprobe:vfs_read'
1436
1437 reg
1438 variants
1439
1440 • reg(const string name)
1441
1442 Supported probes
1443
1444 • kprobe
1445
1446 • uprobe
1447
1448 Get the contents of the register identified by name. Valid names depend
1449 on the CPU architecture.
1450
1451 signal
1452 variants
1453
1454 • signal(const string sig)
1455
1456 • signal(uint32 signum)
1457
1458 unsafe
1459
1460 Kernel 5.3
1461
1462 Helper bpf_send_signal
1463
1464 Probe types: k(ret)probe, u(ret)probe, USDT, profile
1465
1466 Send a signal to the process being traced. The signal can either be
1467 identified by name, e.g. SIGSTOP or by ID, e.g. 19 as found in kill -l.
1468
1469 kprobe:__x64_sys_execve
1470 /comm == "bash"/ {
1471 signal(5);
1472 }
1473
1474 $ ls
1475 Trace/breakpoint trap (core dumped)
1476
1477 sizeof
1478 variants
1479
1480 • sizeof(TYPE)
1481
1482 • sizeof(EXPRESSION)
1483
1484 compile time
1485
1486 Returns size of the argument in bytes. Similar to C/C++ sizeof
1487 operator. Note that the expression does not get evaluated.
1488
1489 str
1490 variants
1491
1492 • str(char * data [, uint32 length)
1493
1494 Helper probe_read_str, probe_read_{kernel,user}_str
1495
1496 str reads a NULL terminated (\0) string from data. The maximum string
1497 length is limited by the BPFTRACE_STR_LEN env variable, unless length
1498 is specified and shorter than the maximum. In case the string is longer
1499 than the specified length only length - 1 bytes are copied and a NULL
1500 byte is appended at the end.
1501
1502 When available (starting from kernel 5.5, see the --info flag) bpftrace
1503 will automatically use the kernel or user variant of
1504 probe_read_{kernel,user}_str based on the address space of data, see
1505 ADDRESS-SPACES for more information.
1506
1507 strerror
1508 variants
1509
1510 • strerror strerror(int error)
1511
1512 Convert errno code to string. This is done asynchronously in userspace
1513 when the strerror value is printed, hence the returned value can only
1514 be used for printing.
1515
1516 #include <errno.h>
1517 BEGIN {
1518 print(strerror(EPERM));
1519 }
1520
1521 strftime
1522 variants
1523
1524 • strtime_t strftime(const string fmt, int64 timestamp_ns)
1525
1526 async
1527
1528 Format the nanoseconds since boot timestamp timestamp_ns according to
1529 the format specified by fmt. The time conversion and formatting happens
1530 in user space, therefore the timestr_t value returned can only be used
1531 for printing using the %s format specifier.
1532
1533 bpftrace uses the strftime(3) function for formatting time and supports
1534 the same format specifiers.
1535
1536 i:s:1 {
1537 printf("%s\n", strftime("%H:%M:%S", nsecs));
1538 }
1539
1540 bpftrace also supports the following format string extensions:
1541
1542 ┌──────────┬────────────────────────────┐
1543 │ │ │
1544 │Specifier │ Description │
1545 ├──────────┼────────────────────────────┤
1546 │ │ │
1547 │%f │ Microsecond as a decimal │
1548 │ │ number, zero-padded on the │
1549 │ │ left │
1550 └──────────┴────────────────────────────┘
1551
1552 strncmp
1553 variants
1554
1555 • int64 strncmp(char * s1, char * s2, int64 n)
1556
1557 strncmp compares up to n characters string s1 and string s2. If they’re
1558 equal 0 is returned, else a non-zero value is returned.
1559
1560 bpftrace doesn’t read past the length of the shortest string.
1561
1562 The use of the == and != operators is recommended over calling strncmp
1563 directly.
1564
1565 system
1566 variants
1567
1568 • void system(string namefmt [, ...args])
1569
1570 unsafe async
1571
1572 system lets bpftrace run the specified command (fork and exec) until it
1573 completes and print its stdout. The command is run with the same
1574 privileges as bpftrace and it blocks execution of the processing
1575 threads which can lead to missed events and delays processing of async
1576 events.
1577
1578 i:s:1 {
1579 time("%H:%M:%S: ");
1580 printf("%d\n", @++);
1581 }
1582 i:s:10 {
1583 system("/bin/sleep 10");
1584 }
1585 i:s:30 {
1586 exit();
1587 }
1588
1589 Note how the async time and printf first print every second until the
1590 i:s:10 probe hits, then they print every 10 seconds due to bpftrace
1591 blocking on sleep.
1592
1593 Attaching 3 probes...
1594 08:50:37: 0
1595 08:50:38: 1
1596 08:50:39: 2
1597 08:50:40: 3
1598 08:50:41: 4
1599 08:50:42: 5
1600 08:50:43: 6
1601 08:50:44: 7
1602 08:50:45: 8
1603 08:50:46: 9
1604 08:50:56: 10
1605 08:50:56: 11
1606 08:50:56: 12
1607 08:50:56: 13
1608 08:50:56: 14
1609 08:50:56: 15
1610 08:50:56: 16
1611 08:50:56: 17
1612 08:50:56: 18
1613 08:50:56: 19
1614
1615 system supports the same format string and arguments that printf does.
1616
1617 t:syscalls:sys_enter_execve {
1618 system("/bin/grep %s /proc/%d/status", "vmswap", pid);
1619 }
1620
1621 time
1622 variants
1623
1624 • void time(const string fmt)
1625
1626 async
1627
1628 Format the current wall time according to the format specifier fmt and
1629 print it to stdout. Unlike strftime() time() doesn’t send a timestamp
1630 from the probe, instead it is the time at which user-space processes
1631 the event.
1632
1633 bpftrace uses the strftime(3) function for formatting time and supports
1634 the same format specifiers.
1635
1636 uaddr
1637 variants
1638
1639 • T * uaddr(const string sym)
1640
1641 Supported probes
1642
1643 • uprobes
1644
1645 • uretprobes
1646
1647 • USDT
1648
1649 Does not work with ASLR, see issue #75
1650 <https://github.com/iovisor/bpftrace/issues/75>
1651
1652 The uaddr function returns the address of the specified symbol. This
1653 lookup happens during program compilation and cannot be used
1654 dynamically.
1655
1656 The default return type is uint64*. If the ELF object size matches a
1657 known integer size (1, 2, 4 or 8 bytes) the return type is modified to
1658 match the width (uint8*, uint16*, uint32* or uint64* resp.). As ELF
1659 does not contain type info the type is always assumed to be unsigned.
1660
1661 uprobe:/bin/bash:readline {
1662 printf("PS1: %s\n", str(*uaddr("ps1_prompt")));
1663 }
1664
1665 uptr
1666 variants
1667
1668 • T * uptr(T * ptr)
1669
1670 Marks ptr as a user address space pointer. See the address-spaces
1671 section for more information on address-spaces. The pointer type is
1672 left unchanged.
1673
1674 usym
1675 variants
1676
1677 • usym_t usym(uint64 * addr)
1678
1679 async
1680
1681 Supported probes
1682
1683 • uprobes
1684
1685 • uretprobes
1686
1687 Equal to ksym but resolves user space symbols
1688
1689 uprobe:/bin/bash:readline
1690 {
1691 printf("%s\n", usym(reg("ip")));
1692 }
1693
1694 Prints:
1695
1696 readline
1697
1698 path
1699 variants
1700
1701 • char * path(struct path * path)
1702
1703 Kernel 5.10
1704
1705 Helper bpf_d_path
1706
1707 Return full path referenced by struct path pointer in argument.
1708
1709 This function can only be used by functions that are allowed to, these
1710 functions are contained in the btf_allowlist_d_path set in the kernel.
1711
1712 unwatch
1713 variants
1714
1715 • void unwatch(void * addr)
1716
1717 async
1718
1719 Removes a watchpoint
1720
1721 skboutput
1722 variants
1723
1724 • uint32 skboutput(const string path, struct sk_buff *skb, uint64
1725 length, const uint64 offset)
1726
1727 Kernel 5.5
1728
1729 Helper bpf_skb_output
1730
1731 Write sk_buff skb 's data section to a PCAP file in the path, starting
1732 from offset to offset + length.
1733
1734 The PCAP file is encapsulated in RAW IP, so no ethernet header is
1735 included. The data section in the struct skb may contain ethernet
1736 header in some kernel contexts, you may set offset to 14 bytes to
1737 exclude ethernet header.
1738
1739 Each packet’s timestamp is determined by adding nsecs and boot time,
1740 the accuracy varies on different kernels, see nsecs.
1741
1742 This function returns 0 on success, or a negative error in case of
1743 failure.
1744
1745 Environment variable BPFTRACE_PERF_RB_PAGES should be increased in
1746 order to capture large packets, or else these packets will be dropped.
1747
1748 Usage
1749
1750 # cat dump.bt
1751 kfunc:napi_gro_receive {
1752 $ret = skboutput("receive.pcap", args->skb, args->skb->len, 0);
1753 }
1754
1755 kfunc:dev_queue_xmit {
1756 // setting offset to 14, to exclude ethernet header
1757 $ret = skboutput("output.pcap", args->skb, args->skb->len, 14);
1758 printf("skboutput returns %d\n", $ret);
1759 }
1760
1761 # export BPFTRACE_PERF_RB_PAGES=1024
1762 # bpftrace dump.bt
1763 ...
1764
1765 # tcpdump -n -r ./receive.pcap | head -3
1766 reading from file ./receive.pcap, link-type RAW (Raw IP)
1767 dropped privs to tcpdump
1768 10:23:44.674087 IP 22.128.74.231.63175 > 192.168.0.23.22: Flags [.], ack 3513221061, win 14009, options [nop,nop,TS val 721277750 ecr 3115333619], length 0
1769 10:23:45.823194 IP 100.101.2.146.53 > 192.168.0.23.46619: 17273 0/1/0 (130)
1770 10:23:45.823229 IP 100.101.2.146.53 > 192.168.0.23.46158: 45799 1/0/0 A 100.100.45.106 (60)
1771
1773 print
1774 variants
1775
1776 • void print(T val)
1777
1778 async
1779
1780 variants
1781
1782 • void print(T val)
1783
1784 • void print(@map)
1785
1786 • void print(@map, uint64 top)
1787
1788 • void print(@map, uint64 top, uint64 div)
1789
1790 print prints a the value, which can be a map or a scalar value, with
1791 the default formatting for the type.
1792
1793 i:ms:10 { @=hist(rand); }
1794 i:s:1 {
1795 print(@);
1796 print(123);
1797 print("abc");
1798 exit();
1799 }
1800
1801 Prints:
1802
1803 @:
1804 [16M, 32M) 3 |@@@ |
1805 [32M, 64M) 2 |@@ |
1806 [64M, 128M) 1 |@ |
1807 [128M, 256M) 4 |@@@@ |
1808 [256M, 512M) 3 |@@@ |
1809 [512M, 1G) 14 |@@@@@@@@@@@@@@ |
1810 [1G, 2G) 22 |@@@@@@@@@@@@@@@@@@@@@@ |
1811 [2G, 4G) 51 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
1812
1813 123
1814 abc
1815
1816 Note that maps are printed by reference while scalar values are copied.
1817 This means that updating and printing maps in a fast loop will likely
1818 result in bogus map values as the map will be updated before userspace
1819 gets the time to dump and print it.
1820
1821 The printing of maps supports the optional top and div arguments. top
1822 limits the printing to the top N entries with the highest integer
1823 values
1824
1825 BEGIN {
1826 $i = 11;
1827 while($i) {
1828 @[$i] = --$i;
1829 }
1830 print(@, 2);
1831 clear(@);
1832 exit()
1833 }
1834
1835 @[9]: 9
1836 @[10]: 10
1837
1838 The div argument scales the values prior to printing them. Scaling
1839 values before storing them can result in rounding errors. Consider the
1840 following program:
1841
1842 k:f {
1843 @[func] += arg0/10;
1844 }
1845
1846 With the following sequence as numbers for arg0: 134, 377, 111, 99. The
1847 total is 721 which rounds to 72 when scaled by 10 but the program would
1848 print 70 due to the rounding of individual values.
1849
1850 Changing the print call to print(@, 5, 2) will take the top 5 values
1851 and scale them by 2:
1852
1853 @[6]: 3
1854 @[7]: 3
1855 @[8]: 4
1856 @[9]: 4
1857 @[10]: 5
1858
1859 printf
1860 variants
1861
1862 • void printf(const string fmt, args...)
1863
1864 async
1865
1866 printf() formats and prints data. It behaves similar to printf() found
1867 in C and many other languages.
1868
1869 The format string has to be a constant, it cannot be modified at
1870 runtime. The formatting of the string happens in user space. Values are
1871 copied and passed by value.
1872
1873 bpftrace supports all the typical format specifiers like %llx and %hhu.
1874 The non-standard ones can be found in the table below:
1875
1876 ┌──────────┬────────┬─────────────────────┐
1877 │ │ │ │
1878 │Specifier │ Type │ Description │
1879 ├──────────┼────────┼─────────────────────┤
1880 │ │ │ │
1881 │r │ buffer │ Hex-formatted │
1882 │ │ │ string to print │
1883 │ │ │ arbitrary binary │
1884 │ │ │ content returned by │
1885 │ │ │ the buf (buf) │
1886 │ │ │ function. │
1887 └──────────┴────────┴─────────────────────┘
1888
1889 Supported escape sequences
1890
1891 Colors are supported too, using standard terminal escape sequences:
1892
1893 print("\033[31mRed\t\033[33mYellow\033[0m\n")
1894
1896 bpftrace supports various probe types which allow the user to attach
1897 BPF programs to different types of events. Each probe starts with a
1898 provider (e.g. kprobe) followed by a colon (:) separated list of
1899 options. The amount of options and their meaning depend on the provider
1900 and are detailed below. The valid values for options can depend on the
1901 system or binary being traced, e.g. for uprobes it depends on the
1902 binary. Also see LISTING PROBES
1903
1904 It is possible to associate multiple probes with a single action as
1905 long as the action is valid for all specified probes. Multiple probes
1906 can be specified as a comma (,) separated list:
1907
1908 kprobe:tcp_reset,kprobe:tcp_v4_rcv {
1909 printf("Entered: %s\n", probe);
1910 }
1911
1912 Wildcards are supported too:
1913
1914 kprobe:tcp_* {
1915 printf("Entered: %s\n", probe);
1916 }
1917
1918 Both can be combined:
1919
1920 kprobe:tcp_reset,kprobe:*socket* {
1921 printf("Entered: %s\n", probe);
1922 }
1923
1924 Most providers also support a short name which can be used instead of
1925 the full name, e.g. kprobe:f and k:f are identical.
1926
1927 BEGIN and END
1928 These are special built-in events provided by the bpftrace runtime.
1929 BEGIN is triggered before all other probes are attached. END is
1930 triggered after all other probes are detached.
1931
1932 Note that specifying an END probe doesn’t override the printing of
1933 'non-empty' maps at exit. To prevent the printing all used maps need be
1934 cleared, which can be done in the END probe:
1935
1936 END {
1937 clear(@map1);
1938 clear(@map2);
1939 }
1940
1941 hardware
1942 variants
1943
1944 • hardware:event_name:
1945
1946 • hardware:event_name:count
1947
1948 shortname
1949
1950 • h
1951
1952 The hardware probe attaches to pre-defined hardware events provided by
1953 the kernel.
1954
1955 They are implemented using performance monitoring counters (PMCs):
1956 hardware resources on the processor. There are about ten of these, and
1957 they are documented in the perf_event_open(2) man page. The event names
1958 are:
1959
1960 • cpu-cycles or cycles
1961
1962 • instructions
1963
1964 • cache-references
1965
1966 • cache-misses
1967
1968 • branch-instructions or branches
1969
1970 • branch-misses
1971
1972 • bus-cycles
1973
1974 • frontend-stalls
1975
1976 • backend-stalls
1977
1978 • ref-cycles
1979
1980 The count option specifies how many events must happen before the probe
1981 fires. If count is left unspecified a default value is used.
1982
1983 hardware:cache-misses:1e6 { @[pid] = count(); }
1984
1985 interval
1986 variants
1987
1988 • interval:us:count
1989
1990 • interval:ms:count
1991
1992 • interval:s:count
1993
1994 • interval:hz:rate
1995
1996 shortnames
1997
1998 • i
1999
2000 The interval probe fires at a fixed interval as specified by its time
2001 spec. Interval fire on one CPU at the time, unlike [profile] probes.
2002
2003 iterator
2004 variants
2005
2006 • iter:task
2007
2008 • iter:task:pin
2009
2010 • iter:task_file
2011
2012 • iter:task_file:pin
2013
2014 shortnames
2015
2016 • it
2017
2018 These are eBPF iterator probes, that allow iteration over kernel
2019 objects.
2020
2021 Iterator probe can’t be mixed with any other probe, not even other
2022 iterator.
2023
2024 Each iterator probe provides set of fields that could be accessed with
2025 ctx pointer. User can display set of available fields for iterator via
2026 -lv options as described below.
2027
2028 Examples:
2029
2030 # bpftrace -e 'iter:task { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2031 Attaching 1 probe...
2032 systemd:1
2033 kthreadd:2
2034 rcu_gp:3
2035 rcu_par_gp:4
2036 kworker/0:0H:6
2037 mm_percpu_wq:8
2038 ...
2039
2040 # bpftrace -e 'iter:task_file { printf("%s:%d %d:%s\n", ctx->task->comm, ctx->task->pid, ctx->fd, path(ctx->file->f_path)); }'
2041 Attaching 1 probe...
2042 systemd:1 1:/dev/null
2043 systemd:1 2:/dev/null
2044 systemd:1 3:/dev/kmsg
2045 ...
2046 su:1622 1:/dev/pts/1
2047 su:1622 2:/dev/pts/1
2048 su:1622 3:/var/lib/sss/mc/passwd
2049 ...
2050 bpftrace:1892 1:pipe:[35124]
2051 bpftrace:1892 2:/dev/pts/1
2052 bpftrace:1892 3:anon_inode:bpf-map
2053 bpftrace:1892 4:anon_inode:bpf-map
2054 bpftrace:1892 5:anon_inode:bpf_link
2055 bpftrace:1892 6:anon_inode:bpf-prog
2056 bpftrace:1892 7:anon_inode:bpf_iter
2057
2058 It’s possible to pin iterator with specifying optional probe ':pin'
2059 part, that defines the pin file. It can be specified as absolute path
2060 or relative to /sys/fs/bpf.
2061
2062 relative pin
2063
2064 # bpftrace -e 'iter:task:list { printf("%s:%d\n", ctx->task->comm, ctx->task->pid); }'
2065 Program pinned to /sys/fs/bpf/list
2066
2067 # cat /sys/fs/bpf/list
2068 systemd:1
2069 kthreadd:2
2070 rcu_gp:3
2071 rcu_par_gp:4
2072 kworker/0:0H:6
2073 mm_percpu_wq:8
2074 rcu_tasks_kthre:9
2075 ...
2076
2077 Examples with absolute pin file:
2078
2079 absolute pin
2080
2081 # bpftrace -e '
2082 iter:task_file:/sys/fs/bpf/files {
2083 printf("%s:%d %s\n", ctx->task->comm, ctx->task->pid, path(ctx->file->f_path));
2084 }'
2085
2086 Program pinned to /sys/fs/bpf/files
2087
2088 # cat /sys/fs/bpf/files
2089 systemd:1 anon_inode:inotify
2090 systemd:1 anon_inode:[timerfd]
2091 ...
2092 systemd-journal:849 /dev/kmsg
2093 systemd-journal:849 anon_inode:[eventpoll]
2094 ...
2095 sssd:1146 /var/log/sssd/sssd.log
2096 sssd:1146 anon_inode:[eventpoll]
2097 ...
2098 NetworkManager:1155 anon_inode:[eventfd]
2099 NetworkManager:1155 /var/lib/sss/mc/passwd (deleted)
2100
2101 kfunc and kretfunc
2102 variants
2103
2104 • kfunc:fn
2105
2106 • kretfunc:fn
2107
2108 shortnames
2109
2110 • f (kfunc)
2111
2112 • fr (kretfunc)
2113
2114 requires (--info)
2115
2116 • Kernel features:BTF
2117
2118 • Probe types:kfunc
2119
2120 kfuncs attach to kernel function similar to kprobe and kretprobe. They
2121 make use of eBPF trampolines which allows kernel code to call into BPF
2122 programs with near zero overhead.
2123
2124 kfunc s make use of BTF type information to derive the type of function
2125 arguments at compile time. This removes the need for manual type
2126 casting and makes the code more resilient against small signature
2127 changes in the kernel. The function arguments are available in the args
2128 struct which can be inspected by doing verbose listing (see LISTING
2129 PROBES). These arguments are also available in the return probe
2130 (kretfunc).
2131
2132 # bpftrace -lv 'kfunc:tcp_reset'
2133 kfunc:tcp_reset
2134 struct sock * sk
2135 struct sk_buff * skb
2136
2137 kfunc:x86_pmu_stop {
2138 printf("pmu %s stop\n", str(args->event->pmu->name));
2139 }
2140
2141 kretfunc:fget {
2142 printf("fd %d name %s\n", args->fd, str(retval->f_path.dentry->d_name.name));
2143 }
2144
2145 fd 3 name ld.so.cache
2146 fd 3 name libselinux.so.1
2147 fd 3 name libselinux.so.1
2148 ...
2149
2150 kprobe and kretprobe
2151 variants
2152
2153 • kprobe:fn
2154
2155 • kprobe:fn+offset
2156
2157 • kretprobe:fn
2158
2159 shortnames
2160
2161 • k
2162
2163 • kr
2164
2165 kprobe s allow for dynamic instrumentation of kernel functions. Each
2166 time the specified kernel function is executed the attached BPF
2167 programs are ran.
2168
2169 kprobe:tcp_reset {
2170 @tcp_resets = count()
2171 }
2172
2173 Function arguments are available through the argX and sargX builtins,
2174 for register args and stack args respectively. Whether arguments passed
2175 on stack or in a register depends on the architecture and the number or
2176 arguments in used, e.g. on x86_64 the first non-floating point 6
2177 arguments are passed in registers, all following arguments are passed
2178 on the stack. Note that floating point arguments are typically passed
2179 in special registers which don’t count as argX arguments which can
2180 cause confusion. Consider a function with the following signature:
2181
2182 void func(int a, double d, int x)
2183
2184 Due to d being a floating point x is accessed through arg1 where one
2185 might expect arg2.
2186
2187 bpftrace does not detect the function signature so it is not aware of
2188 the argument count or their type. It is up to the user to perform Type
2189 conversion when needed, e.g.
2190
2191 kprobe:tcp_connect
2192 {
2193 $sk = ((struct sock *) arg0);
2194 ...
2195 }
2196
2197 kprobe s are not limited to function entry, they can be attached to any
2198 instruction in a function by specifying an offset from the start of the
2199 function.
2200
2201 kretprobe s trigger on the return from a kernel function. Return probes
2202 do not have access to the function (input) arguments, only to the
2203 return value (through retval). A common pattern to work around this is
2204 by storing the arguments in a map on function entry and retrieving in
2205 the return probe:
2206
2207 kprobe:d_lookup
2208 {
2209 $name = (struct qstr *)arg1;
2210 @fname[tid] = $name->name;
2211 }
2212
2213 kretprobe:d_lookup
2214 /@fname[tid]/
2215 {
2216 printf("%-8d %-6d %-16s M %s\n", elapsed / 1e6, pid, comm,
2217 str(@fname[tid]));
2218 }
2219
2220 profile
2221 variants
2222
2223 • profile:us:count
2224
2225 • profile:ms:count
2226
2227 • profile:s:count
2228
2229 • profile:hz:rate
2230
2231 shortnames
2232
2233 • p
2234
2235 Profile probes fire on each CPU on the specified interval.
2236
2237 software
2238 variants
2239
2240 • software:event:
2241
2242 • software:event:count
2243
2244 shortnames
2245
2246 • s
2247
2248 The software probe attaches to pre-defined software events provided by
2249 the kernel. Event details can be found in the perf_event_open(2) man
2250 page.
2251
2252 The event names are:
2253
2254 • cpu-clock or cpu
2255
2256 • task-clock
2257
2258 • page-faults or faults
2259
2260 • context-switches or cs
2261
2262 • cpu-migrations
2263
2264 • minor-faults
2265
2266 • major-faults
2267
2268 • alignment-faults
2269
2270 • emulation-faults
2271
2272 • dummy
2273
2274 • bpf-output
2275
2276 tracepoint
2277 variants
2278
2279 • tracepoint:subsys:event
2280
2281 shortnames
2282
2283 • t
2284
2285 Tracepoints are hooks into events in the kernel. Tracepoints are
2286 defined in the kernel source and compiled into the kernel binary which
2287 makes them a form of static tracing. Which means that unlike kprobe s
2288 new tracepoints cannot be added without modifying the kernel.
2289
2290 The advantage of tracepoints is that they generally provide a more
2291 stable interface than kprobe s do, they do not depend on the existence
2292 of a kernel function.
2293
2294 Tracepoint arguments are available in the args struct which can be
2295 inspected with verbose listing, see the LISTING PROBES section for more
2296 details.
2297
2298 tracepoint:syscalls:sys_enter_openat {
2299 printf("%s %s\n", comm, str(args->filename));
2300 }
2301
2302 irqbalance /proc/interrupts
2303 irqbalance /proc/stat
2304 snmpd /proc/diskstats
2305 snmpd /proc/stat
2306 snmpd /proc/vmstat
2307 snmpd /proc/net/dev
2308 [...]
2309
2310 Additional information
2311
2312 • https://www.kernel.org/doc/html/latest/trace/tracepoints.html
2313
2314 uprobe, uretprobe
2315 variants
2316
2317 • uprobe:binary:func
2318
2319 • uprobe:binary:func+offset
2320
2321 • uretprobe:binary:func
2322
2323 shortnames
2324
2325 • u
2326
2327 • ur
2328
2329 uprobe s or user-space probes are the user-space equivalent of kprobe
2330 s. The same limitations that apply kprobe and kretprobe also apply to
2331 uprobe s and uretprobe s.
2332
2333 When tracing libraries, it is sufficient to specify the library name
2334 instead of a full path. The path will be then automatically resolved
2335 using /etc/ld.so.cache:
2336
2337 # bpftrace -e 'uprobe:libc:malloc { printf("Allocated %d bytes\n", arg0); }'
2338 Allocated 4 bytes
2339 ...
2340
2341 If the traced binary has DWARF included, function arguments are
2342 available in the args struct which can be inspected with verbose
2343 listing, see the LISTING PROBES section for more details.
2344
2345 It is important to note that for uretprobe s to work the kernel runs a
2346 special helper on user-space function entry which overrides the return
2347 address on the stack. This can cause issues with languages that have
2348 their own runtime like Golang:
2349
2350 example.go
2351
2352 func myprint(s string) {
2353 fmt.Printf("Input: %s\n", s)
2354 }
2355
2356 func main() {
2357 ss := []string{"a", "b", "c"}
2358 for _, s := range ss {
2359 go myprint(s)
2360 }
2361 time.Sleep(1*time.Second)
2362 }
2363
2364 bpftrace
2365
2366 # bpftrace -e 'uretprobe:./test:main.myprint { @=count(); }' -c ./test
2367 runtime: unexpected return pc for main.myprint called from 0x7fffffffe000
2368 stack: frame={sp:0xc00008cf60, fp:0xc00008cfd0} stack=[0xc00008c000,0xc00008d000)
2369 fatal error: unknown caller pc
2370
2371 usdt
2372 variants
2373
2374 • usdt:binary:name
2375
2376 shortnames
2377
2378 • U
2379
2380 watchpoint and asyncwatchpoint
2381 variants
2382
2383 • watchpoint:absolute_address:length:mode
2384
2385 • watchpoint:function+argN:length:mode
2386
2387 shortnames
2388
2389 • w
2390
2391 • aw
2392
2393 These are memory watchpoints provided by the kernel. Whenever a memory
2394 address is written to (w), read from (r), or executed (x), the kernel
2395 can generate an event.
2396
2397 In the first form, an absolute address is monitored. If a pid (-p) or a
2398 command (-c) is provided, bpftrace takes the address as a userspace
2399 address and monitors the appropriate process. If not, bpftrace takes
2400 the address as a kernel space address.
2401
2402 In the second form, the address present in argN when function is
2403 entered is monitored. A pid or command must be provided for this form.
2404 If synchronous (watchpoint), a SIGSTOP is sent to the tracee upon
2405 function entry. The tracee will be SIGCONTed after the watchpoint is
2406 attached. This is to ensure events are not missed. If you want to avoid
2407 the SIGSTOP + SIGCONT use asyncwatchpoint.
2408
2409 Note that on most architectures you may not monitor for execution while
2410 monitoring read or write.
2411
2412 Examples
2413
2414 Print hit when a read from or write to 0x10000000 happens:
2415
2416 # bpftrace -e 'watchpoint:0x10000000:8:rw { printf("hit!\n"); exit(); }' -c ./testprogs/watchpoint
2417
2418 Print the call stack every time the jiffies variable is updated:
2419
2420 # bpftrace -e "watchpoint:0x$(awk '$3 == "jiffies" {print $1}' /proc/kallsyms):8:w {
2421 @[kstack] = count();
2422 }
2423
2424 i:s:1 { exit(); }"
2425 ......
2426 @[
2427 do_timer+12
2428 tick_do_update_jiffies64.part.22+89
2429 tick_sched_do_timer+103
2430 tick_sched_timer+39
2431 __hrtimer_run_queues+256
2432 hrtimer_interrupt+256
2433 smp_apic_timer_interrupt+106
2434 apic_timer_interrupt+15
2435 cpuidle_enter_state+188
2436 cpuidle_enter+41
2437 do_idle+536
2438 cpu_startup_entry+25
2439 start_secondary+355
2440 secondary_startup_64+164
2441 ]: 319
2442
2443 "hit" and exit when the memory pointed to by arg1 of increment is
2444 written to.
2445
2446 # cat wpfunc.c
2447 #include <stdio.h>
2448 #include <stdlib.h>
2449 #include <unistd.h>
2450
2451 __attribute__((noinline))
2452 void increment(__attribute__((unused)) int _, int *i)
2453 {
2454 (*i)++;
2455 }
2456
2457 int main()
2458 {
2459 int *i = malloc(sizeof(int));
2460 while (1)
2461 {
2462 increment(0, i);
2463 (*i)++;
2464 usleep(1000);
2465 }
2466 }
2467
2468 # bpftrace -e 'watchpoint:increment+arg1:4:w { printf("hit!\n"); exit() }' -c ./wpfunc
2469
2471 Probe listing is the method to discover which probes are supported by
2472 the current system. Listing supports the same syntax as normal
2473 attachment does:
2474
2475 # bpftrace -l 'kprobe:*'
2476 # bpftrace -l 't:syscalls:*openat*
2477 # bpftrace -l 'kprobe:tcp*,trace
2478 # bpftrace -l 'k:*socket*,tracepoint:syscalls:*tcp*'
2479
2480 The verbose flag (-v) can be specified to inspect arguments (args) for
2481 providers that support it:
2482
2483 # bpftrace -l 'fr:tcp_reset,t:syscalls:sys_enter_openat' -v
2484 kretfunc:tcp_reset
2485 struct sock * sk
2486 struct sk_buff * skb
2487 tracepoint:syscalls:sys_enter_openat
2488 int __syscall_nr
2489 int dfd
2490 const char * filename
2491 int flags
2492 umode_t mode
2493 # bpftrace -l 'uprobe:/bin/bash:rl_set_prompt' -v # works only if /bin/bash has DWARF
2494 uprobe:/bin/bash:rl_set_prompt
2495 const char *prompt
2496
2497
2498
2499 2022-09-26 BPFTRACE(8)