1lockstat(1M) System Administration Commands lockstat(1M)
2
3
4
6 lockstat - report kernel lock and profiling statistics
7
9 lockstat [-ACEHI] [-e event_list] [-i rate]
10 [-b | -t | -h | -s depth] [-n nrecords]
11 [-l lock [, size]] [-d duration]
12 [-f function [, size]] [-T] [-ckgwWRpP] [-D count]
13 [-o filename] [-x opt [=val]] command [args]
14
15
17 The lockstat utility gathers and displays kernel locking and profiling
18 statistics. lockstat allows you to specify which events to watch (for
19 example, spin on adaptive mutex, block on read access to rwlock due to
20 waiting writers, and so forth) how much data to gather for each event,
21 and how to display the data. By default, lockstat monitors all lock
22 contention events, gathers frequency and timing data about those
23 events, and displays the data in decreasing frequency order, so that
24 the most common events appear first.
25
26
27 lockstat gathers data until the specified command completes. For exam‐
28 ple, to gather statistics for a fixed-time interval, use sleep(1) as
29 the command, as follows:
30
31
32 example# lockstat sleep 5
33
34
35 When the -I option is specified, lockstat establishes a per-processor
36 high-level periodic interrupt source to gather profiling data. The
37 interrupt handler simply generates a lockstat event whose caller is the
38 interrupted PC (program counter). The profiling event is just like any
39 other lockstat event, so all of the normal lockstat options are appli‐
40 cable.
41
42
43 lockstat relies on DTrace to modify the running kernel's text to inter‐
44 cept events of interest. This imposes a small but measurable overhead
45 on all system activity, so access to lockstat is restricted to super-
46 user by default. The system administrator can permit other users to use
47 lockstat by granting them additional DTrace privileges. Refer to the
48 Solaris Dynamic Tracing Guide for more information about DTrace secu‐
49 rity features.
50
52 The following options are supported:
53
54 Event Selection
55 If no event selection options are specified, the default is -C.
56
57 -A
58
59 Watch all lock events. -A is equivalent to -CH.
60
61
62 -C
63
64 Watch contention events.
65
66
67 -E
68
69 Watch error events.
70
71
72 -e event_list
73
74 Only watch the specified events. event list is a comma-separated
75 list of events or ranges of events such as 1,4-7,35. Run lockstat
76 with no arguments to get a brief description of all events.
77
78
79 -H
80
81 Watch hold events.
82
83
84 -I
85
86 Watch profiling interrupt events.
87
88
89 -i rate
90
91 Interrupt rate (per second) for -I. The default is 97 Hz, so that
92 profiling doesn't run in lockstep with the clock interrupt (which
93 runs at 100 Hz).
94
95
96 Data Gathering
97 -x arg[=val]
98
99 Enable or modify a DTrace runtime option or D compiler option. The
100 list of options is found in the . Boolean options are enabled by
101 specifying their name. Options with values are set by separating
102 the option name and value with an equals sign (=).
103
104
105 Data Gathering (Mutually Exclusive)
106 -b
107
108 Basic statistics: lock, caller, number of events.
109
110
111 -h
112
113 Histogram: Timing plus time-distribution histograms.
114
115
116 -s depth
117
118 Stack trace: Histogram plus stack traces up to depth frames deep.
119
120
121 -t
122
123 Timing: Basic plus timing for all events [default].
124
125
126 Data Filtering
127 -d duration
128
129 Only watch events longer than duration.
130
131
132 -f func[,size]
133
134 Only watch events generated by func, which can be specified as a
135 symbolic name or hex address. size defaults to the ELF symbol size
136 if available, or 1 if not.
137
138
139 -l lock[,size]
140
141 Only watch lock, which can be specified as a symbolic name or hex
142 address. size defaults to the ELF symbol size or 1 if the symbol
143 size is not available.
144
145
146 -n nrecords
147
148 Maximum number of data records.
149
150
151 -T
152
153 Trace (rather than sample) events [off by default].
154
155
156 Data Reporting
157 -c
158
159 Coalesce lock data for lock arrays (for example, pse_mutex[]).
160
161
162 -D count
163
164 Only display the top count events of each type.
165
166
167 -g
168
169 Show total events generated by function. For example, if foo()
170 calls bar() in a loop, the work done by bar() counts as work gener‐
171 ated by foo() (along with any work done by foo() itself). The -g
172 option works by counting the total number of stack frames in which
173 each function appears. This implies two things: (1) the data
174 reported by -g can be misleading if the stack traces are not deep
175 enough, and (2) functions that are called recursively might show
176 greater than 100% activity. In light of issue (1), the default data
177 gathering mode when using -g is -s 50.
178
179
180 -k
181
182 Coalesce PCs within functions.
183
184
185 -o filename
186
187 Direct output to filename.
188
189
190 -P
191
192 Sort data by (count * time) product.
193
194
195 -p
196
197 Parsable output format.
198
199
200 -R
201
202 Display rates (events per second) rather than counts.
203
204
205 -W
206
207 Whichever: distinguish events only by caller, not by lock.
208
209
210 -w
211
212 Wherever: distinguish events only by lock, not by caller.
213
214
216 The following headers appear over various columns of data.
217
218 Count or ops/s
219
220 Number of times this event occurred, or the rate (times per second)
221 if -R was specified.
222
223
224 indv
225
226 Percentage of all events represented by this individual event.
227
228
229 genr
230
231 Percentage of all events generated by this function.
232
233
234 cuml
235
236 Cumulative percentage; a running total of the individuals.
237
238
239 rcnt
240
241 Average reference count. This will always be 1 for exclusive locks
242 (mutexes, spin locks, rwlocks held as writer) but can be greater
243 than 1 for shared locks (rwlocks held as reader).
244
245
246 nsec
247
248 Average duration of the events in nanoseconds, as appropriate for
249 the event. For the profiling event, duration means interrupt
250 latency.
251
252
253 Lock
254
255 Address of the lock; displayed symbolically if possible.
256
257
258 CPU+PIL
259
260 CPU plus processor interrupt level (PIL). For example, if CPU 4 is
261 interrupted while at PIL 6, this will be reported as cpu[4]+6.
262
263
264 Caller
265
266 Address of the caller; displayed symbolically if possible.
267
268
270 Example 1 Measuring Kernel Lock Contention
271
272 example# lockstat sleep 5
273 Adaptive mutex spin: 2210 events in 5.055 seconds (437 events/sec)
274
275
276
277 Count indv cuml rcnt nsec Lock Caller
278 ------------------------------------------------------------------------
279 269 12% 12% 1.00 2160 service_queue background+0xdc
280 249 11% 23% 1.00 86 service_queue qenable_locked+0x64
281 228 10% 34% 1.00 131 service_queue background+0x15c
282 68 3% 37% 1.00 79 0x30000024070 untimeout+0x1c
283 59 3% 40% 1.00 384 0x300066fa8e0 background+0xb0
284 43 2% 41% 1.00 30 rqcred_lock svc_getreq+0x3c
285 42 2% 43% 1.00 341 0x30006834eb8 background+0xb0
286 41 2% 45% 1.00 135 0x30000021058 untimeout+0x1c
287 40 2% 47% 1.00 39 rqcred_lock svc_getreq+0x260
288 37 2% 49% 1.00 2372 0x300068e83d0 hmestart+0x1c4
289 36 2% 50% 1.00 77 0x30000021058 timeout_common+0x4
290 36 2% 52% 1.00 354 0x300066fa120 background+0xb0
291 32 1% 53% 1.00 97 0x30000024070 timeout_common+0x4
292 31 1% 55% 1.00 2923 0x300069883d0 hmestart+0x1c4
293 29 1% 56% 1.00 366 0x300066fb290 background+0xb0
294 28 1% 57% 1.00 117 0x3000001e040 untimeout+0x1c
295 25 1% 59% 1.00 93 0x3000001e040 timeout_common+0x4
296 22 1% 60% 1.00 25 0x30005161110 sync_stream_buf+0xdc
297 21 1% 60% 1.00 291 0x30006834eb8 putq+0xa4
298 19 1% 61% 1.00 43 0x3000515dcb0 mdf_alloc+0xc
299 18 1% 62% 1.00 456 0x30006834eb8 qenable+0x8
300 18 1% 63% 1.00 61 service_queue queuerun+0x168
301 17 1% 64% 1.00 268 0x30005418ee8 vmem_free+0x3c
302 [...]
303
304 R/W reader blocked by writer: 76 events in 5.055 seconds (15 events/sec)
305
306 Count indv cuml rcnt nsec Lock Caller
307 ------------------------------------------------------------------------
308 23 30% 30% 1.00 22590137 0x300098ba358 ufs_dirlook+0xd0
309 17 22% 53% 1.00 5820995 0x3000ad815e8 find_bp+0x10
310 13 17% 70% 1.00 2639918 0x300098ba360 ufs_iget+0x198
311 4 5% 75% 1.00 3193015 0x300098ba360 ufs_getattr+0x54
312 3 4% 79% 1.00 7953418 0x3000ad817c0 find_bp+0x10
313 3 4% 83% 1.00 935211 0x3000ad815e8 find_read_lof+0x14
314 2 3% 86% 1.00 16357310 0x300073a4720 find_bp+0x10
315 2 3% 88% 1.00 2072433 0x300073a4720 find_read_lof+0x14
316 2 3% 91% 1.00 1606153 0x300073a4370 find_bp+0x10
317 1 1% 92% 1.00 2656909 0x300107e7400 ufs_iget+0x198
318 [...]
319
320
321
322 Example 2 Measuring Hold Times
323
324 example# lockstat -H -D 10 sleep 1
325 Adaptive mutex spin: 513 events
326
327
328
329 Count indv cuml rcnt nsec Lock Caller
330 -------------------------------------------------------------------------
331 480 5% 5% 1.00 1136 0x300007718e8 putnext+0x40
332 286 3% 9% 1.00 666 0x3000077b430 getf+0xd8
333 271 3% 12% 1.00 537 0x3000077b430 msgio32+0x2fc
334 270 3% 15% 1.00 3670 0x300007718e8 strgetmsg+0x3d4
335 270 3% 18% 1.00 1016 0x300007c38b0 getq_noenab+0x200
336 264 3% 20% 1.00 1649 0x300007718e8 strgetmsg+0xa70
337 216 2% 23% 1.00 6251 tcp_mi_lock tcp_snmp_get+0xfc
338 206 2% 25% 1.00 602 thread_free_lock clock+0x250
339 138 2% 27% 1.00 485 0x300007c3998 putnext+0xb8
340 138 2% 28% 1.00 3706 0x300007718e8 strrput+0x5b8
341 -------------------------------------------------------------------------
342 [...]
343
344
345
346 Example 3 Measuring Hold Times for Stack Traces Containing a Specific
347 Function
348
349 example# lockstat -H -f tcp_rput_data -s 50 -D 10 sleep 1
350 Adaptive mutex spin: 11 events in 1.023 seconds (11
351 events/sec)
352
353
354
355 -------------------------------------------------------------------------
356 Count indv cuml rcnt nsec Lock Caller
357 9 82% 82% 1.00 2540 0x30000031380 tcp_rput_data+0x2b90
358
359 nsec ------ Time Distribution ------ count Stack
360 256 |@@@@@@@@@@@@@@@@ 5 tcp_rput_data+0x2b90
361 512 |@@@@@@ 2 putnext+0x78
362 1024 |@@@ 1 ip_rput+0xec4
363 2048 | 0 _c_putnext+0x148
364 4096 | 0 hmeread+0x31c
365 8192 | 0 hmeintr+0x36c
366 16384 |@@@ 1
367 sbus_intr_wrapper+0x30
368 [...]
369
370 Count indv cuml rcnt nsec Lock Caller
371 1 9% 91% 1.00 1036 0x30000055380 freemsg+0x44
372
373 nsec ------ Time Distribution ------ count Stack
374 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 freemsg+0x44
375 tcp_rput_data+0x2fd0
376 putnext+0x78
377 ip_rput+0xec4
378 _c_putnext+0x148
379 hmeread+0x31c
380 hmeintr+0x36c
381
382 sbus_intr_wrapper+0x30
383 -------------------------------------------------------------------------
384 [...]
385
386
387
388 Example 4 Basic Kernel Profiling
389
390
391 For basic profiling, we don't care whether the profiling interrupt sam‐
392 pled foo()+0x4c or foo()+0x78; we care only that it sampled somewhere
393 in foo(), so we use -k. The CPU and PIL aren't relevant to basic pro‐
394 filing because we are measuring the system as a whole, not a particular
395 CPU or interrupt level, so we use -W.
396
397
398 example# lockstat -kIW -D 20 ./polltest
399 Profiling interrupt: 82 events in 0.424 seconds (194
400 events/sec)
401
402
403
404 Count indv cuml rcnt nsec Hottest CPU+PIL Caller
405 -----------------------------------------------------------------------
406 8 10% 10% 1.00 698 cpu[1] utl0
407 6 7% 17% 1.00 299 cpu[0] read
408 5 6% 23% 1.00 124 cpu[1] getf
409 4 5% 28% 1.00 327 cpu[0] fifo_read
410 4 5% 33% 1.00 112 cpu[1] poll
411 4 5% 38% 1.00 212 cpu[1] uiomove
412 4 5% 43% 1.00 361 cpu[1] mutex_tryenter
413 3 4% 46% 1.00 682 cpu[0] write
414 3 4% 50% 1.00 89 cpu[0] pcache_poll
415 3 4% 54% 1.00 118 cpu[1] set_active_fd
416 3 4% 57% 1.00 105 cpu[0] syscall_trap32
417 3 4% 61% 1.00 640 cpu[1] (usermode)
418 2 2% 63% 1.00 127 cpu[1] fifo_poll
419 2 2% 66% 1.00 300 cpu[1] fifo_write
420 2 2% 68% 1.00 669 cpu[0] releasef
421 2 2% 71% 1.00 112 cpu[1] bt_getlowbit
422 2 2% 73% 1.00 247 cpu[1] splx
423 2 2% 76% 1.00 503 cpu[0] mutex_enter
424 2 2% 78% 1.00 467 cpu[0]+10 disp_lock_enter
425 2 2% 80% 1.00 139 cpu[1] default_copyin
426 -----------------------------------------------------------------------
427 [...]
428
429
430
431 Example 5 Generated-load Profiling
432
433
434 In the example above, 5% of the samples were in poll(). This tells us
435 how much time was spent inside poll() itself, but tells us nothing
436 about how much work was generated by poll(); that is, how much time we
437 spent in functions called by poll(). To determine that, we use the -g
438 option. The example below shows that although polltest spends only 5%
439 of its time in poll() itself, poll()-induced work accounts for 34% of
440 the load.
441
442
443
444 Note that the functions that generate the profiling interrupt (lock‐
445 stat_intr(), cyclic_fire(), and so forth) appear in every stack trace,
446 and therefore are considered to have generated 100% of the load. This
447 illustrates an important point: the generated load percentages do not
448 add up to 100% because they are not independent. If 72% of all stack
449 traces contain both foo() and bar(), then both foo() and bar() are 72%
450 load generators.
451
452
453 example# lockstat -kgIW -D 20 ./polltest
454 Profiling interrupt: 80 events in 0.412 seconds (194 events/sec)
455
456
457
458 Count genr cuml rcnt nsec Hottest CPU+PIL Caller
459 -------------------------------------------------------------------------
460 80 100% ---- 1.00 310 cpu[1] lockstat_intr
461 80 100% ---- 1.00 310 cpu[1] cyclic_fire
462 80 100% ---- 1.00 310 cpu[1] cbe_level14
463 80 100% ---- 1.00 310 cpu[1] current_thread
464 27 34% ---- 1.00 176 cpu[1] poll
465 20 25% ---- 1.00 221 cpu[0] write
466 19 24% ---- 1.00 249 cpu[1] read
467 17 21% ---- 1.00 232 cpu[0] write32
468 17 21% ---- 1.00 207 cpu[1] pcache_poll
469 14 18% ---- 1.00 319 cpu[0] fifo_write
470 13 16% ---- 1.00 214 cpu[1] read32
471 10 12% ---- 1.00 208 cpu[1] fifo_read
472 10 12% ---- 1.00 787 cpu[1] utl0
473 9 11% ---- 1.00 178 cpu[0] pcacheset_resolve
474 9 11% ---- 1.00 262 cpu[0] uiomove
475 7 9% ---- 1.00 506 cpu[1] (usermode)
476 5 6% ---- 1.00 195 cpu[1] fifo_poll
477 5 6% ---- 1.00 136 cpu[1] syscall_trap32
478 4 5% ---- 1.00 139 cpu[0] releasef
479 3 4% ---- 1.00 277 cpu[1] polllock
480 -------------------------------------------------------------------------
481 [...]
482
483
484
485 Example 6 Gathering Lock Contention and Profiling Data for a Specific
486 Module
487
488
489 In this example we use the -f option not to specify a single function,
490 but rather to specify the entire text space of the sbus module. We
491 gather both lock contention and profiling statistics so that contention
492 can be correlated with overall load on the module.
493
494
495 example# modinfo | grep sbus
496 24 102a8b6f b8b4 59 1 sbus (SBus (sysio) nexus driver)
497
498
499
500 example# lockstat -kICE -f 0x102a8b6f,0xb8b4 sleep 10
501 Adaptive mutex spin: 39 events in 10.042 seconds (4 events/sec)
502
503
504
505 Count indv cuml rcnt nsec Lock Caller
506 -------------------------------------------------------------------------
507 15 38% 38% 1.00 206 0x30005160528 sync_stream_buf
508 7 18% 56% 1.00 14 0x30005160d18 sync_stream_buf
509 6 15% 72% 1.00 27 0x300060c3118 sync_stream_buf
510 5 13% 85% 1.00 24 0x300060c3510 sync_stream_buf
511 2 5% 90% 1.00 29 0x300060c2d20 sync_stream_buf
512 2 5% 95% 1.00 24 0x30005161cf8 sync_stream_buf
513 1 3% 97% 1.00 21 0x30005161110 sync_stream_buf
514 1 3% 100% 1.00 23 0x30005160130 sync_stream_buf
515 [...]
516
517 Adaptive mutex block: 9 events in 10.042 seconds (1 events/sec)
518
519 Count indv cuml rcnt nsec Lock Caller
520 -------------------------------------------------------------------------
521 4 44% 44% 1.00 156539 0x30005160528 sync_stream_buf
522 2 22% 67% 1.00 763516 0x30005160d18 sync_stream_buf
523 1 11% 78% 1.00 462130 0x300060c3510 sync_stream_buf
524 1 11% 89% 1.00 288749 0x30005161110 sync_stream_buf
525 1 11% 100% 1.00 1015374 0x30005160130 sync_stream_buf
526 [...]
527
528 Profiling interrupt: 229 events in 10.042 seconds (23 events/sec)
529
530 Count indv cuml rcnt nsec Hottest CPU+PIL Caller
531
532 -------------------------------------------------------------------------
533 89 39% 39% 1.00 426 cpu[0]+6 sync_stream_buf
534 64 28% 67% 1.00 398 cpu[0]+6 sbus_intr_wrapper
535 23 10% 77% 1.00 324 cpu[0]+6 iommu_dvma_kaddr_load
536 21 9% 86% 1.00 512 cpu[0]+6 iommu_tlb_flush
537 14 6% 92% 1.00 342 cpu[0]+6 iommu_dvma_unload
538 13 6% 98% 1.00 306 cpu[1] iommu_dvma_sync
539 5 2% 100% 1.00 389 cpu[1] iommu_dma_bindhdl
540 -------------------------------------------------------------------------
541 [...]
542
543
544
545 Example 7 Determining the Average PIL (processor interrupt level) for a
546 CPU
547
548 example# lockstat -Iw -l cpu[3] ./testprog
549
550 Profiling interrupt: 14791 events in 152.463 seconds (97 events/sec)
551
552 Count indv cuml rcnt nsec CPU+PIL Hottest Caller
553
554 -----------------------------------------------------------------------
555 13641 92% 92% 1.00 253 cpu[3] (usermode)
556 579 4% 96% 1.00 325 cpu[3]+6 ip_ocsum+0xe8
557 375 3% 99% 1.00 411 cpu[3]+10 splx
558 154 1% 100% 1.00 527 cpu[3]+4 fas_intr_svc+0x80
559 41 0% 100% 1.00 293 cpu[3]+13 send_mondo+0x18
560 1 0% 100% 1.00 266 cpu[3]+12 zsa_rxint+0x400
561 -----------------------------------------------------------------------
562 [...]
563
564
565
566 Example 8 Determining which Subsystem is Causing the System to be Busy
567
568 example# lockstat -s 10 -I sleep 20
569
570 Profiling interrupt: 4863 events in 47.375 seconds (103 events/sec)
571
572 Count indv cuml rcnt nsec CPU+PIL Caller
573
574 -----------------------------------------------------------------------
575 1929 40% 40% 0.00 3215 cpu[0] usec_delay+0x78
576 nsec ------ Time Distribution ------ count Stack
577 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1872 ata_wait+0x90
578 8192 | 27 acersb_get_intr_status+0x34
579 16384 | 29 ata_set_feature+0x124
580 32768 | 1 ata_disk_start+0x15c
581 ata_hba_start+0xbc
582 ghd_waitq_process_and \
583 _mutex_hold+0x70
584 ghd_waitq_process_and \
585 _mutex_exit+0x4
586 ghd_transport+0x12c
587 ata_disk_tran_start+0x108
588 -----------------------------------------------------------------------
589 [...]
590
591
592
594 See attributes(5) for descriptions of the following attributes:
595
596
597
598
599 ┌─────────────────────────────┬─────────────────────────────┐
600 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
601 ├─────────────────────────────┼─────────────────────────────┤
602 │Availability │SUNWdtrc │
603 └─────────────────────────────┴─────────────────────────────┘
604
606 dtrace(1M), plockstat(1M), attributes(5), lockstat(7D), mutex(9F),
607 rwlock(9F)
608
609
610 Solaris Dynamic Tracing Guide
611
613 The profiling support provided by lockstat -I replaces the old (and
614 undocumented) /usr/bin/kgmon and /dev/profile.
615
616
617 Tail-call elimination can affect call sites. For example, if foo()+0x50
618 calls bar() and the last thing bar() does is call mutex_exit(), the
619 compiler can arrange for bar() to branch to mutex_exit()with a return
620 address of foo()+0x58. Thus, the mutex_exit() in bar() will appear as
621 though it occurred at foo()+0x58.
622
623
624 The PC in the stack frame in which an interrupt occurs can be bogus
625 because, between function calls, the compiler is free to use the return
626 address register for local storage.
627
628
629 When using the -I and -s options together, the interrupted PC will usu‐
630 ally not appear anywhere in the stack since the interrupt handler is
631 entered asynchronously, not by a function call from that PC.
632
633
634 The lockstat technology is provided on an as-is basis. The format and
635 content of lockstat output reflect the current Solaris kernel implemen‐
636 tation and are therefore subject to change in future releases.
637
638
639
640SunOS 5.11 28 Feb 2008 lockstat(1M)