1STAPPROBES(5) File Formats Manual STAPPROBES(5)
2
3
4
6 stapprobes - systemtap probe points
7
8
9
11 The following sections enumerate the variety of probe points supported
12 by the systemtap translator, and additional aliases defined by standard
13 tapset scripts.
14
15 The general probe point syntax is a dotted-symbol sequence. This
16 allows a breakdown of the event namespace into parts, somewhat like the
17 Domain Name System does on the Internet. Each component identifier may
18 be parametrized by a string or number literal, with a syntax like a
19 function call. A component may include a "*" character, to expand to a
20 set of matching probe points. Probe aliases likewise expand to other
21 probe points. Each and every resulting probe point is normally
22 resolved to some low-level system instrumentation facility (e.g., a
23 kprobe address, marker, or a timer configuration), otherwise the elabo‐
24 ration phase will fail.
25
26 However, a probe point may be followed by a "?" character, to indicate
27 that it is optional, and that no error should result if it fails to
28 resolve. Optionalness passes down through all levels of alias/wildcard
29 expansion. Alternately, a probe point may be followed by a "!" charac‐
30 ter, to indicate that it is both optional and sufficient. (Think
31 vaguely of the prolog cut operator.) If it does resolve, then no fur‐
32 ther probe points in the same comma-separated list will be resolved.
33 Therefore, the "!" sufficiency mark only makes sense in a list of
34 probe point alternatives.
35
36 Additionally, a probe point may be followed by a "if (expr)" statement,
37 in order to enable/disable the probe point on-the-fly. With the "if"
38 statement, if the "expr" is false when the probe point is hit, the
39 whole probe body including alias's body is skipped. The condition is
40 stacked up through all levels of alias/wildcard expansion. So the final
41 condition becomes the logical-and of conditions of all expanded
42 alias/wildcard.
43
44 These are all syntactically valid probe points:
45
46 kernel.function("foo").return
47 syscall(22)
48 user.inode("/bin/vi").statement(0x2222)
49 end
50 syscall.*
51 kernel.function("no_such_function") ?
52 module("awol").function("no_such_function") !
53 signal.*? if (switch)
54
55 Probes may be broadly classified into "synchronous" and "asynchronous".
56 A "synchronous" event is deemed to occur when any processor executes an
57 instruction matched by the specification. This gives these probes a
58 reference point (instruction address) from which more contextual data
59 may be available. Other families of probe points refer to "asynchro‐
60 nous" events such as timers/counters rolling over, where there is no
61 fixed reference point that is related. Each probe point specification
62 may match multiple locations (for example, using wildcards or aliases),
63 and all them are then probed. A probe declaration may also contain
64 several comma-separated specifications, all of which are probed.
65
66
67 BEGIN/END/ERROR
68 The probe points begin and end are defined by the translator to refer
69 to the time of session startup and shutdown. All "begin" probe han‐
70 dlers are run, in some sequence, during the startup of the session.
71 All global variables will have been initialized prior to this point.
72 All "end" probes are run, in some sequence, during the normal shutdown
73 of a session, such as in the aftermath of an exit () function call, or
74 an interruption from the user. In the case of an error-triggered shut‐
75 down, "end" probes are not run. There are no target variables avail‐
76 able in either context.
77
78 If the order of execution among "begin" or "end" probes is significant,
79 then an optional sequence number may be provided:
80
81 begin(N)
82 end(N)
83
84 The number N may be positive or negative. The probe handlers are run
85 in increasing order, and the order between handlers with the same se‐
86 quence number is unspecified. When "begin" or "end" are given without
87 a sequence, they are effectively sequence zero.
88
89 The error probe point is similar to the end probe, except that each
90 such probe handler run when the session ends after errors have oc‐
91 curred. In such cases, "end" probes are skipped, but each "error"
92 prober is still attempted. This kind of probe can be used to clean up
93 or emit a "final gasp". It may also be numerically parametrized to set
94 a sequence.
95
96
97 NEVER
98 The probe point never is specially defined by the translator to mean
99 "never". Its probe handler is never run, though its statements are an‐
100 alyzed for symbol / type correctness as usual. This probe point may be
101 useful in conjunction with optional probes.
102
103
104 TIMERS
105 Intervals defined by the standard kernel "jiffies" timer may be used to
106 trigger probe handlers asynchronously. Two probe point variants are
107 supported by the translator:
108
109 timer.jiffies(N)
110 timer.jiffies(N).randomize(M)
111
112 The probe handler is run every N jiffies (a kernel-defined unit of
113 time, typically between 1 and 60 ms). If the "randomize" component is
114 given, a linearly distributed random value in the range [-M..+M] is
115 added to N every time the handler is run. N is restricted to a reason‐
116 able range (1 to around a million), and M is restricted to be smaller
117 than N. There are no target variables provided in either context. It
118 is possible for such probes to be run concurrently on a multi-processor
119 computer.
120
121 Alternatively, intervals may be specified in units of time. There are
122 two probe point variants similar to the jiffies timer:
123
124 timer.ms(N)
125 timer.ms(N).randomize(M)
126
127 Here, N and M are specified in milliseconds, but the full options for
128 units are seconds (s/sec), milliseconds (ms/msec), microseconds
129 (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
130 supported for hertz timers.
131
132 The actual resolution of the timers depends on the target kernel. For
133 kernels prior to 2.6.17, timers are limited to jiffies resolution, so
134 intervals are rounded up to the nearest jiffies interval. After
135 2.6.17, the implementation uses hrtimers for tighter precision, though
136 the actual resolution will be arch-dependent. In either case, if the
137 "randomize" component is given, then the random value will be added to
138 the interval before any rounding occurs.
139
140 Profiling timers are also available to provide probes that execute on
141 all CPUs at the rate of the system tick. This probe takes no parame‐
142 ters.
143
144 timer.profile
145
146 Full context information of the interrupted process is available, mak‐
147 ing this probe suitable for a time-based sampling profiler.
148
149
150 DWARF
151 This family of probe points uses symbolic debugging information for the
152 target kernel/module/program, as may be found in unstripped executa‐
153 bles, or the separate debuginfo packages. They allow placement of
154 probes logically into the execution path of the target program, by
155 specifying a set of points in the source or object code. When a match‐
156 ing statement executes on any processor, the probe handler is run in
157 that context.
158
159 Points in a kernel, which are identified by module, source file, line
160 number, function name, or some combination of these.
161
162 Here is a list of probe point families currently supported. The .func‐
163 tion variant places a probe near the beginning of the named function,
164 so that parameters are available as context variables. The .return
165 variant places a probe at the moment of return from the named function,
166 so the return value is available as the "$return" context variable.
167 The .inline modifier for .function filters the results to include only
168 instances of inlined functions. The .call modifier selects the oppo‐
169 site subset. Inline functions do not have an identifiable return
170 point, so .return is not supported on .inline probes. The .statement
171 variant places a probe at the exact spot, exposing those local vari‐
172 ables that are visible there.
173
174 kernel.function(PATTERN)
175 kernel.function(PATTERN).call
176 kernel.function(PATTERN).return
177 kernel.function(PATTERN).inline
178 module(MPATTERN).function(PATTERN)
179 module(MPATTERN).function(PATTERN).call
180 module(MPATTERN).function(PATTERN).return
181 module(MPATTERN).function(PATTERN).inline
182 kernel.statement(PATTERN)
183 kernel.statement(ADDRESS).absolute
184 module(MPATTERN).statement(PATTERN)
185
186 In the above list, MPATTERN stands for a string literal that aims to
187 identify the loaded kernel module of interest. It may include "*",
188 "[]", and "?" wildcards. PATTERN stands for a string literal that aims
189 to identify a point in the program. It is made up of three parts:
190
191 · The first part is the name of a function, as would appear in the nm
192 program's output. This part may use the "*" and "?" wildcarding
193 operators to match multiple names.
194
195 · The second part is optional and begins with the "@" character. It
196 is followed by the path to the source file containing the function,
197 which may include a wildcard pattern, such as mm/slab*. In most
198 cases, the path should be relative to the top of the linux source
199 directory, although an absolute path may be necessary for some ker‐
200 nels. If a relative pathname doesn't work, try absolute.
201
202 · Finally, the third part is optional if the file name part was giv‐
203 en, and identifies the line number in the source file, preceded by
204 a ":".
205
206 As an alternative, PATTERN may be a numeric constant, indicating an
207 (module-relative or kernel-_stext-relative) address. In guru mode on‐
208 ly, absolute kernel addresses may be specified with the ".absolute"
209 suffix.
210
211 Some of the source-level variables, such as function parameters, lo‐
212 cals, globals visible in the compilation unit, may be visible to probe
213 handlers. They may refer to these variables by prefixing their name
214 with "$" within the scripts. In addition, a special syntax allows lim‐
215 ited traversal of structures, pointers, and arrays.
216
217 $var refers to an in-scope variable "var". If it's an integer-like
218 type, it will be cast to a 64-bit int for systemtap script use.
219 String-like pointers (char *) may be copied to systemtap string
220 values using the kernel_string or user_string functions.
221
222 $var->field
223 traversal to a structure's field. The indirection operator may
224 be repeated to follow more levels of pointers.
225
226 $var[N]
227 indexes into an array. The index is given with a literal num‐
228 ber.
229
230
231 USER-SPACE
232 Early prototype support for user-space probing is available in the form
233 of a non-symbolic probe point:
234 process(PID).statement(ADDRESS).absolute
235 is analogous to kernel.statement(ADDRESS).absolute in that both use raw
236 (unverified) virtual addresses and provide no $variables. The target
237 PID parameter must identify a running process, and ADDRESS should iden‐
238 tify a valid instruction address. All threads of that process will be
239 probed.
240
241
242 PROCFS
243 These probe points allow procfs "files" in /proc/systemtap/MODNAME to
244 be created, read and written (MODNAME is the name of the systemtap mod‐
245 ule). The proc filesystem is a pseudo-filesystem which is used an an
246 interface to kernel data structures. There are four probe point vari‐
247 ants supported by the translator:
248
249 procfs("PATH").read
250 procfs("PATH").write
251 procfs.read
252 procfs.write
253
254 PATH is the file name (relative to /proc/systemtap/MODNAME) to be cre‐
255 ated. If no PATH is specified (as in the last two variants above),
256 PATH defaults to "command".
257
258 When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
259 procfs read probe is triggered. The string data to be read should be
260 assigned to a variable named $value, like this:
261
262 procfs("PATH").read { $value = "100\n" }
263
264 When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding
265 procfs write probe is triggered. The data the user wrote is available
266 in the string variable named $value, like this:
267
268 procfs("PATH").write { printf("user wrote: %s", $value) }
269
270
271 MARKERS
272 This family of probe points hooks up to static probing markers inserted
273 into the kernel or modules. These markers are special macro calls in‐
274 serted by kernel developers to make probing faster and more reliable
275 than with DWARF-based probes. Further, DWARF debugging information is
276 not required to probe markers.
277
278 Marker probe points begin with kernel. The next part names the marker
279 itself: mark("name"). The marker name string, which may contain the
280 usual wildcard characters, is matched against the names given to the
281 marker macros when the kernel and/or module was compiled. Optional‐
282 ly, you can specify format("format"). Specifying the marker format
283 string allows differentation between two markers with the same name but
284 different marker format strings.
285
286 The handler associated with a marker-based probe may read the optional
287 parameters specified at the macro call site. These are named $arg1
288 through $argNN, where NN is the number of parameters supplied by the
289 macro. Number and string parameters are passed in a type-safe manner.
290
291 The marker format string associated with a marker is available in $for‐
292 mat.
293
294
295 PERFORMANCE MONITORING HARDWARE
296 The perfmon family of probe points is used to access the performance
297 monitoring hardware available in modern processors. This family of
298 probes points needs the perfmon2 support in the kernel to access the
299 performance monitoring hardware.
300
301 Performance monitor hardware points begin with a perfmon. The next
302 part of the names the event being counted counter("event"). The event
303 names are processor implementation specific with the execption of the
304 generic cycles and instructions events, which are available on all pro‐
305 cessors. This sets up a counter on the processor to count the number of
306 events occuring on the processor. For more details on the performance
307 monitoring events available on a specific processor use the command
308 perfmon2 command:
309
310 pfmon -l
311
312 $counter
313 is a handle used in the body of the probe for operations involv‐
314 ing the counter associated with the probe.
315
316 read_counter
317 is a function that is passed the handle for the perfmon probe
318 and returns the current count for the event.
319
320
322 Here are some example probe points, defining the associated events.
323
324 begin, end, end
325 refers to the startup and normal shutdown of the session. In
326 this case, the handler would run once during startup and twice
327 during shutdown.
328
329 timer.jiffies(1000).randomize(200)
330 refers to a periodic interrupt, every 1000 +/- 200 jiffies.
331
332 kernel.function("*init*"), kernel.function("*exit*")
333 refers to all kernel functions with "init" or "exit" in the
334 name.
335
336 kernel.function("*@kernel/sched.c:240")
337 refers to any functions within the "kernel/sched.c" file that
338 span line 240.
339
340 kernel.mark("getuid")
341 refers to an STAP_MARK(getuid, ...) macro call in the kernel.
342
343 module("usb*").function("*sync*").return
344 refers to the moment of return from all functions with "sync" in
345 the name in any of the USB drivers.
346
347 kernel.statement(0xc0044852)
348 refers to the first byte of the statement whose compiled in‐
349 structions include the given address in the kernel.
350
351 kernel.statement("*@kernel/sched.c:2917")
352 refers to the statement of line 2917 within the "ker‐
353 nel/sched.c".
354
355 syscall.*.return
356 refers to the group of probe aliases with any name in the third
357 position
358
359
361 stap(1), stapprobes.iosched(5), stapprobes.netdev(5), stap‐
362 probes.nfs(5), stapprobes.nfsd(5), stapprobes.pagefault(5), stap‐
363 probes.process(5), stapprobes.rpc(5), stapprobes.scsi(5), stap‐
364 probes.signal(5), stapprobes.socket(5), stapprobes.tcp(5), stap‐
365 probes.udp(5), proc(5)
366
367
368
369Red Hat 2008-03-27 STAPPROBES(5)