1cpc_bind_curlwp(3CPCCP)U Performance Counters Library Functiocnpsc_bind_curlwp(3CPC)
2
3
4
6 cpc_bind_curlwp, cpc_bind_pctx, cpc_bind_cpu, cpc_unbind,
7 cpc_request_preset, cpc_set_restart - bind request sets to hardware
8 counters
9
11 cc [ flag... ] file... -lcpc [ library... ]
12 #include <libcpc.h>
13
14 int cpc_bind_curlwp(cpc_t *cpc, cpc_set_t *set, uint_t flags);
15
16
17 int cpc_bind_pctx(cpc_t *cpc, pctx_t *pctx, id_t id, cpc_set_t *set,
18 uint_t flags);
19
20
21 int cpc_bind_cpu(cpc_t *cpc, processorid_t id, cpc_set_t *set,
22 uint_t flags);
23
24
25 int cpc_unbind(cpc_t *cpc, cpc_set_t *set);
26
27
28 int cpc_request_preset(cpc_t *cpc, int index, uint64_t preset);
29
30
31 int cpc_set_restart(cpc_t *cpc, cpc_set_t *set);
32
33
35 These functions program the processor's hardware counters according to
36 the requests contained in the set argument. If these functions are suc‐
37 cessful, then upon return the physical counters will have been assigned
38 to count events on behalf of each request in the set, and each counter
39 will be enabled as configured.
40
41
42 The cpc_bind_curlwp() function binds the set to the calling LWP. If
43 successful, a performance counter context is associated with the LWP
44 that allows the system to virtualize the hardware counters to that spe‐
45 cific LWP.
46
47
48 By default, the system binds the set to the current LWP only. If the
49 CPC_BIND_LWP_INHERIT flag is present in the flags argument, however,
50 any subsequent LWPs created by the current LWP will inherit a copy of
51 the request set. The newly created LWP will have its virtualized 64-bit
52 counters initialized to the preset values specified in set, and the
53 counters will be enabled and begin counting events on behalf of the new
54 LWP. This automatic inheritance behavior can be useful when dealing
55 with multithreaded programs to determine aggregate statistics for the
56 program as a whole.
57
58
59 If the CPC_BIND_LWP_INHERIT flag is specified and any of the requests
60 in the set have the CPC_OVF_NOTIFY_EMT flag set, the process will imme‐
61 diately dispatch a SIGEMT signal to the freshly created LWP so that it
62 can preset its counters appropriately on the new LWP. This initializa‐
63 tion condition can be detected using cpc_set_sample(3CPC) and looking
64 at the counter value for any requests with CPC_OVF_NOTIFY_EMT set. The
65 value of any such counters will be UINT64_MAX.
66
67
68 The cpc_bind_pctx() function binds the set to the LWP specified by the
69 pctx-id pair, where pctx refers to a handle returned from libpctx and
70 id is the ID of the desired LWP in the target process. If successful, a
71 performance counter context is associated with the specified LWP and
72 the system virtualizes the hardware counters to that specific LWP. The
73 flags argument is reserved for future use and must always be 0.
74
75
76 The cpc_bind_cpu() function binds the set to the specified CPU and mea‐
77 sures events occurring on that CPU regardless of which LWP is running.
78 Only one such binding can be active on the specified CPU at a time. As
79 long as any application has bound a set to a CPU, per-LWP counters are
80 unavailable and any attempt to use either cpc_bind_curlwp() or
81 cpc_bind_pctx() returns EAGAIN. The first invocation of cpc_bind_cpu()
82 invalidates all currently bound per-LWP counter sets, and any attempt
83 to sample an invalidated set returns EAGAIN. To bind to a CPU, the
84 library binds the calling LWP to the measured CPU with proces‐
85 sor_bind(2). The application must not change its processor binding
86 until after it has unbound the set with cpc_unbind(). The flags argu‐
87 ment is reserved for future use and must always be 0.
88
89
90 The cpc_request_preset() function updates the preset and current value
91 stored in the indexed request within the currently bound set, thereby
92 changing the starting value for the specified request for the calling
93 LWP only, which takes effect at the next call to cpc_set_restart().
94
95
96 When a performance counter counting on behalf of a request with the
97 CPC_OVF_NOTIFY_EMT flag set overflows, the performance counters are
98 frozen and the LWP to which the set is bound receives a SIGEMT signal.
99 The cpc_set_restart() function can be called from a SIGEMT signal han‐
100 dler function to quickly restart the hardware counters. Counting begins
101 from each request's original preset (see cpc_set_add_request(3CPC)), or
102 from the preset specified in a prior call to cpc_request_preset().
103 Applications performing performance counter overflow profiling should
104 use the cpc_set_restart() function to quickly restart counting after
105 receiving a SIGEMT overflow signal and recording any relevant program
106 state.
107
108
109 The cpc_unbind() function unbinds the set from the resource to which it
110 is bound. All hardware resources associated with the bound set are
111 freed and if the set was bound to a CPU, the calling LWP is unbound
112 from the corresponding CPU. See processor_bind(2).
113
115 Upon successful completion these functions return 0. Otherwise, -1 is
116 returned and errno is set to indicate the error.
117
119 Applications wanting to get detailed error values should register an
120 error handler with cpc_seterrhndlr(3CPC). Otherwise, the library will
121 output a specific error description to stderr.
122
123
124 These functions will fail if:
125
126 EACCES For cpc_bind_curlwp(), the system has Pentium 4 processors
127 with HyperThreading and at least one physical processor has
128 more than one hardware thread online. See NOTES.
129
130 For cpc_bind_cpu(), the process does not have the cpc_cpu
131 privilege to access the CPU's counters.
132
133 For cpc_bind_curlwp(), cpc_bind_cpc(), and cpc_bind_pctx(),
134 access to the requested hypervisor event was denied.
135
136
137 EAGAIN For cpc_bind_curlwp() and cpc_bind_pctx(), the performance
138 counters are not available for use by the application.
139
140 For cpc_bind_cpu(), another process has already bound to
141 this CPU. Only one process is allowed to bind to a CPU at a
142 time and only one set can be bound to a CPU at a time.
143
144
145 EINVAL The set does not contain any requests or
146 cpc_set_add_request() was not called.
147
148 The value given for an attribute of a request is out of
149 range.
150
151 The system could not assign a physical counter to each
152 request in the system. See NOTES.
153
154 One or more requests in the set conflict and might not be
155 programmed simultaneously.
156
157 The set was not created with the same cpc handle.
158
159 For cpc_bind_cpu(), the specified processor does not exist.
160
161 For cpc_unbind(), the set is not bound.
162
163 For cpc_request_preset() and cpc_set_restart(), the calling
164 LWP does not have a bound set.
165
166
167 ENOSYS For cpc_bind_cpu(), the specified processor is not online.
168
169
170 ENOTSUP The cpc_bind_curlwp() function was called with the
171 CPC_OVF_NOTIFY_EMT flag, but the underlying processor is not
172 capable of detecting counter overflow.
173
174
175 ESRCH For cpc_bind_pctx(), the specified LWP in the target process
176 does not exist.
177
178
180 Example 1 Use hardware performance counters to measure events in a
181 process.
182
183
184 The following example demonstrates how a standalone application can be
185 instrumented with the libcpc(3LIB) functions to use hardware perfor‐
186 mance counters to measure events in a process. The application performs
187 20 iterations of a computation, measuring the counter values for each
188 iteration. By default, the example makes use of two counters to measure
189 external cache references and external cache hits. These options are
190 only appropriate for UltraSPARC processors. By setting the EVENT0 and
191 EVENT1 environment variables to other strings (a list of which can be
192 obtained from the -h option of the cpustat(1M) or cputrack(1) utili‐
193 ties), other events can be counted. The error() routine is assumed to
194 be a user-provided routine analogous to the familiar printf(3C) func‐
195 tion from the C library that also performs an exit(2) after printing
196 the message.
197
198
199 #include <inttypes.h>
200 #include <stdlib.h>
201 #include <stdio.h>
202 #include <unistd.h>
203 #include <libcpc.h>
204 #include <errno.h>
205
206 int
207 main(int argc, char *argv[])
208 {
209 int iter;
210 char *event0 = NULL, *event1 = NULL;
211 cpc_t *cpc;
212 cpc_set_t *set;
213 cpc_buf_t *diff, *after, *before;
214 int ind0, ind1;
215 uint64_t val0, val1;
216
217 if ((cpc = cpc_open(CPC_VER_CURRENT)) == NULL)
218 error("perf counters unavailable: %s", strerror(errno));
219
220 if ((event0 = getenv("EVENT0")) == NULL)
221 event0 = "EC_ref";
222 if ((event1 = getenv("EVENT1")) == NULL)
223 event1 = "EC_hit";
224
225 if ((set = cpc_set_create(cpc)) == NULL)
226 error("could not create set: %s", strerror(errno));
227
228 if ((ind0 = cpc_set_add_request(cpc, set, event0, 0, CPC_COUNT_USER, 0,
229 NULL)) == -1)
230 error("could not add first request: %s", strerror(errno));
231
232 if ((ind1 = cpc_set_add_request(cpc, set, event1, 0, CPC_COUNT_USER, 0,
233 NULL)) == -1)
234 error("could not add first request: %s", strerror(errno));
235
236 if ((diff = cpc_buf_create(cpc, set)) == NULL)
237 error("could not create buffer: %s", strerror(errno));
238 if ((after = cpc_buf_create(cpc, set)) == NULL)
239 error("could not create buffer: %s", strerror(errno));
240 if ((before = cpc_buf_create(cpc, set)) == NULL)
241 error("could not create buffer: %s", strerror(errno));
242
243 if (cpc_bind_curlwp(cpc, set, 0) == -1)
244 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
245
246 for (iter = 1; iter <= 20; iter++) {
247
248 if (cpc_set_sample(cpc, set, before) == -1)
249 break;
250
251 /* ==> Computation to be measured goes here <== */
252
253 if (cpc_set_sample(cpc, set, after) == -1)
254 break;
255
256 cpc_buf_sub(cpc, diff, after, before);
257 cpc_buf_get(cpc, diff, ind0, &val0);
258 cpc_buf_get(cpc, diff, ind1, &val1);
259
260 (void) printf("%3d: %" PRId64 " %" PRId64 "\n", iter,
261 val0, val1);
262 }
263
264 if (iter != 21)
265 error("cannot sample set: %s", strerror(errno));
266
267 cpc_close(cpc);
268
269 return (0);
270 }
271
272
273 Example 2 Write a signal handler to catch overflow signals.
274
275
276 The following example builds on Example 1 and demonstrates how to write
277 the signal handler to catch overflow signals. A counter is preset so
278 that it is 1000 counts short of overflowing. After 1000 counts the sig‐
279 nal handler is invoked.
280
281
282
283 The signal handler:
284
285
286 cpc_t *cpc;
287 cpc_set_t *set;
288 cpc_buf_t *buf;
289 int index;
290
291 void
292 emt_handler(int sig, siginfo_t *sip, void *arg)
293 {
294 ucontext_t *uap = arg;
295 uint64_t val;
296
297 if (sig != SIGEMT || sip->si_code != EMT_CPCOVF) {
298 psignal(sig, "example");
299 psiginfo(sip, "example");
300 return;
301 }
302
303 (void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p\n",
304 _lwp_self(), (void *)sip->si_addr,
305 (void *)uap->uc_mcontext.gregs[PC],
306 (void *)uap->uc_mcontext.gregs[SP]);
307
308 if (cpc_set_sample(cpc, set, buf) != 0)
309 error("cannot sample: %s", strerror(errno));
310
311 cpc_buf_get(cpc, buf, index, &val);
312
313 (void) printf("0x%" PRIx64"\n", val);
314 (void) fflush(stdout);
315
316 /*
317 * Update a request's preset and restart the counters. Counters which
318 * have not been preset with cpc_request_preset() will resume counting
319 * from their current value.
320 */
321 (cpc_request_preset(cpc, ind1, val1) != 0)
322 error("cannot set preset for request %d: %s", ind1,
323 strerror(errno));
324 if (cpc_set_restart(cpc, set) != 0)
325 error("cannot restart lwp%d: %s", _lwp_self(), strerror(errno));
326 }
327
328
329
330 The setup code, which can be positioned after the code that opens the
331 CPC library and creates a set:
332
333
334 #define PRESET (UINT64_MAX - 999ull)
335
336 struct sigaction act;
337 ...
338 act.sa_sigaction = emt_handler;
339 bzero(&act.sa_mask, sizeof (act.sa_mask));
340 act.sa_flags = SA_RESTART|SA_SIGINFO;
341 if (sigaction(SIGEMT, &act, NULL) == -1)
342 error("sigaction: %s", strerror(errno));
343
344 if ((index = cpc_set_add_request(cpc, set, event, PRESET,
345 CPC_COUNT_USER | CPC_OVF_NOTIFY_EMT, 0, NULL)) != 0)
346 error("cannot add request to set: %s", strerror(errno));
347
348 if ((buf = cpc_buf_create(cpc, set)) == NULL)
349 error("cannot create buffer: %s", strerror(errno));
350
351 if (cpc_bind_curlwp(cpc, set, 0) == -1)
352 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
353
354 for (iter = 1; iter <= 20; iter++) {
355 /* ==> Computation to be measured goes here <== */
356 }
357
358 cpc_unbind(cpc, set); /* done */
359
360
362 See attributes(5) for descriptions of the following attributes:
363
364
365
366
367 ┌─────────────────────────────┬─────────────────────────────┐
368 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
369 ├─────────────────────────────┼─────────────────────────────┤
370 │Interface Stability │Evolving │
371 ├─────────────────────────────┼─────────────────────────────┤
372 │MT-Level │Safe │
373 └─────────────────────────────┴─────────────────────────────┘
374
376 cpustat(1M), cputrack(1), psrinfo(1M), processor_bind(2), cpc_seter‐
377 rhndlr(3CPC), cpc_set_sample(3CPC), libcpc(3LIB), attributes(5)
378
380 When a set is bound, the system assigns a physical hardware counter to
381 count on behalf of each request in the set. If such an assignment is
382 not possible for all requests in the set, the bind function returns -1
383 and sets errno to EINVAL. The assignment of requests to counters
384 depends on the capabilities of the available counters. Some processors
385 (such as Pentium 4) have a complicated counter control mechanism that
386 requires the reservation of limited hardware resources beyond the
387 actual counters. It could occur that two requests for different events
388 might be impossible to count at the same time due to these limited
389 hardware resources. See the processor manual as referenced by
390 cpc_cpuref(3CPC) for details about the underlying processor's capabili‐
391 ties and limitations.
392
393
394 Some processors can be configured to dispatch an interrupt when a phys‐
395 ical counter overflows. The most obvious use for this facility is to
396 ensure that the full 64-bit counter values are maintained without
397 repeated sampling. Certain hardware, such as the UltraSPARC processor,
398 does not record which counter overflowed. A more subtle use for this
399 facility is to preset the counter to a value slightly less than the
400 maximum value, then use the resulting interrupt to catch the counter
401 overflow associated with that event. The overflow can then be used as
402 an indication of the frequency of the occurrence of that event.
403
404
405 The interrupt generated by the processor might not be particularly pre‐
406 cise. That is, the particular instruction that caused the counter over‐
407 flow might be earlier in the instruction stream than is indicated by
408 the program counter value in the ucontext.
409
410
411 When a request is added to a set with the CPC_OVF_NOTIFY_EMT flag set,
412 then as before, the control registers and counter are preset from the
413 64-bit preset value given. When the flag is set, however, the kernel
414 arranges to send the calling process a SIGEMT signal when the overflow
415 occurs. The si_code member of the corresponding siginfo structure is
416 set to EMT_CPCOVF and the si_addr member takes the program counter
417 value at the time the overflow interrupt was delivered. Counting is
418 disabled until the set is bound again.
419
420
421 If the CPC_CAP_OVERFLOW_PRECISE bit is set in the value returned by
422 cpc_caps(3CPC), the processor is able to determine precisely which
423 counter has overflowed after receiving the overflow interrupt. On such
424 processors, the SIGEMT signal is sent only if a counter overflows and
425 the request that the counter is counting has the CPC_OVF_NOTIFY_EMT
426 flag set. If the capability is not present on the processor, the system
427 sends a SIGEMT signal to the process if any of its requests have the
428 CPC_OVF_NOTIFY_EMT flag set and any counter in its set overflows.
429
430
431 Different processors have different counter ranges available, though
432 all processors supported by Solaris allow at least 31 bits to be spec‐
433 ified as a counter preset value. Portable preset values lie in the
434 range UINT64_MAX to UINT64_MAX-INT32_MAX.
435
436
437 The appropriate preset value will often need to be determined experi‐
438 mentally. Typically, this value will depend on the event being measured
439 as well as the desire to minimize the impact of the act of measurement
440 on the event being measured. Less frequent interrupts and samples lead
441 to less perturbation of the system.
442
443
444 If the processor cannot detect counter overflow, bind will fail and
445 return ENOTSUP. Only user events can be measured using this technique.
446 See Example 2.
447
448 Pentium 4
449 Most Pentium 4 events require the specification of an event mask for
450 counting. The event mask is specified with the emask attribute.
451
452
453 Pentium 4 processors with HyperThreading Technology have only one set
454 of hardware counters per physical processor. To use cpc_bind_curlwp()
455 or cpc_bind_pctx() to measure per-LWP events on a system with Pentium 4
456 HT processors, a system administrator must first take processors in the
457 system offline until each physical processor has only one hardware
458 thread online (See the -p option to psrinfo(1M)). If a second hardware
459 thread is brought online, all per-LWP bound contexts will be invali‐
460 dated and any attempt to sample or bind a CPC set will return EAGAIN.
461
462
463 Only one CPC set at a time can be bound to a physical processor with
464 cpc_bind_cpu(). Any call to cpc_bind_cpu() that attempts to bind a set
465 to a processor that shares a physical processor with a processor that
466 already has a CPU-bound set returns an error.
467
468
469 To measure the shared state on a Pentium 4 processor with HyperThread‐
470 ing, the count_sibling_usr and count_sibling_sys attributes are pro‐
471 vided for use with cpc_bind_cpu(). These attributes behave exactly as
472 the CPC_COUNT_USER and CPC_COUNT_SYSTEM request flags, except that they
473 act on the sibling hardware thread sharing the physical processor with
474 the CPU measured by cpc_bind_cpu(). Some CPC sets will fail to bind due
475 to resource constraints. The most common type of resource constraint is
476 an ESCR conflict among one or more requests in the set. For example,
477 the branch_retired event cannot be measured on counters 12 and 13
478 simultaneously because both counters require the CRU_ESCR2 ESCR to mea‐
479 sure this event. To measure branch_retired events simultaneously on
480 more than one counter, use counters such that one counter uses
481 CRU_ESCR2 and the other counter uses CRU_ESCR3. See the processor docu‐
482 mentation for details.
483
484
485
486SunOS 5.11 05 Mar 2007 cpc_bind_curlwp(3CPC)