1cpc_bind_curlwp(3CPCCP)U Performance Counters Library Functiocnpsc_bind_curlwp(3CPC)
2
3
4

NAME

6       cpc_bind_curlwp,      cpc_bind_pctx,      cpc_bind_cpu,     cpc_unbind,
7       cpc_request_preset, cpc_set_restart - bind  request  sets  to  hardware
8       counters
9

SYNOPSIS

11       cc [ flag... ] file... -lcpc [ library... ]
12       #include <libcpc.h>
13
14       int cpc_bind_curlwp(cpc_t *cpc, cpc_set_t *set, uint_t flags);
15
16
17       int cpc_bind_pctx(cpc_t *cpc, pctx_t *pctx, id_t id, cpc_set_t *set,
18            uint_t flags);
19
20
21       int cpc_bind_cpu(cpc_t *cpc, processorid_t id, cpc_set_t *set,
22            uint_t flags);
23
24
25       int cpc_unbind(cpc_t *cpc, cpc_set_t *set);
26
27
28       int cpc_request_preset(cpc_t *cpc, int index, uint64_t preset);
29
30
31       int cpc_set_restart(cpc_t *cpc, cpc_set_t *set);
32
33

DESCRIPTION

35       These  functions program the processor's hardware counters according to
36       the requests contained in the set argument. If these functions are suc‐
37       cessful, then upon return the physical counters will have been assigned
38       to count events on behalf of each request in the set, and each  counter
39       will be enabled as configured.
40
41
42       The  cpc_bind_curlwp()  function  binds  the set to the calling LWP. If
43       successful, a performance counter context is associated  with  the  LWP
44       that allows the system to virtualize the hardware counters to that spe‐
45       cific LWP.
46
47
48       By default, the system binds the set to the current LWP  only.  If  the
49       CPC_BIND_LWP_INHERIT  flag  is  present in the flags argument, however,
50       any subsequent LWPs created by the current LWP will inherit a  copy  of
51       the request set. The newly created LWP will have its virtualized 64-bit
52       counters initialized to the preset values specified  in  set,  and  the
53       counters will be enabled and begin counting events on behalf of the new
54       LWP. This automatic inheritance behavior can  be  useful  when  dealing
55       with  multithreaded  programs to determine aggregate statistics for the
56       program as a whole.
57
58
59       If the CPC_BIND_LWP_INHERIT flag is specified and any of  the  requests
60       in the set have the CPC_OVF_NOTIFY_EMT flag set, the process will imme‐
61       diately dispatch a SIGEMT signal to the freshly created LWP so that  it
62       can  preset its counters appropriately on the new LWP. This initializa‐
63       tion condition can be detected using cpc_set_sample(3CPC)  and  looking
64       at  the counter value for any requests with CPC_OVF_NOTIFY_EMT set. The
65       value of any such counters will be UINT64_MAX.
66
67
68       The cpc_bind_pctx() function binds the set to the LWP specified by  the
69       pctx-id  pair,  where pctx refers to a handle returned from libpctx and
70       id is the ID of the desired LWP in the target process. If successful, a
71       performance  counter  context  is associated with the specified LWP and
72       the system virtualizes the hardware counters to that specific LWP.  The
73       flags argument is reserved for future use and must always be 0.
74
75
76       The cpc_bind_cpu() function binds the set to the specified CPU and mea‐
77       sures events occurring on that CPU regardless of which LWP is  running.
78       Only  one such binding can be active on the specified CPU at a time. As
79       long as any application has bound a set to a CPU, per-LWP counters  are
80       unavailable   and  any  attempt  to  use  either  cpc_bind_curlwp()  or
81       cpc_bind_pctx() returns EAGAIN. The first invocation of  cpc_bind_cpu()
82       invalidates  all  currently bound per-LWP counter sets, and any attempt
83       to sample an invalidated set returns EAGAIN. To  bind  to  a  CPU,  the
84       library  binds  the  calling  LWP  to  the  measured  CPU  with proces‐
85       sor_bind(2). The application must  not  change  its  processor  binding
86       until  after  it has unbound the set with cpc_unbind(). The flags argu‐
87       ment is reserved for future use and must always be 0.
88
89
90       The cpc_request_preset() function updates the preset and current  value
91       stored  in  the indexed request within the currently bound set, thereby
92       changing the starting value for the specified request for  the  calling
93       LWP only, which takes effect at the next call to cpc_set_restart().
94
95
96       When  a  performance  counter  counting on behalf of a request with the
97       CPC_OVF_NOTIFY_EMT flag set overflows,  the  performance  counters  are
98       frozen  and the LWP to which the set is bound receives a SIGEMT signal.
99       The cpc_set_restart() function can be called from a SIGEMT signal  han‐
100       dler function to quickly restart the hardware counters. Counting begins
101       from each request's original preset (see cpc_set_add_request(3CPC)), or
102       from  the  preset  specified  in  a prior call to cpc_request_preset().
103       Applications performing performance counter overflow  profiling  should
104       use  the  cpc_set_restart()  function to quickly restart counting after
105       receiving a SIGEMT overflow signal and recording any  relevant  program
106       state.
107
108
109       The cpc_unbind() function unbinds the set from the resource to which it
110       is bound. All hardware resources associated  with  the  bound  set  are
111       freed  and  if  the  set was bound to a CPU, the calling LWP is unbound
112       from the corresponding CPU. See processor_bind(2).
113

RETURN VALUES

115       Upon successful completion these functions return 0. Otherwise,  -1  is
116       returned and errno is set to indicate the error.
117

ERRORS

119       Applications  wanting  to  get detailed error values should register an
120       error handler with cpc_seterrhndlr(3CPC). Otherwise, the  library  will
121       output a specific error description to stderr.
122
123
124       These functions will fail if:
125
126       EACCES     For  cpc_bind_curlwp(),  the system has Pentium 4 processors
127                  with HyperThreading and at least one physical processor  has
128                  more than one hardware thread online. See NOTES.
129
130                  For  cpc_bind_cpu(),  the  process does not have the cpc_cpu
131                  privilege to access the CPU's counters.
132
133                  For cpc_bind_curlwp(), cpc_bind_cpc(), and  cpc_bind_pctx(),
134                  access to the requested hypervisor event was denied.
135
136
137       EAGAIN     For  cpc_bind_curlwp()  and cpc_bind_pctx(), the performance
138                  counters are not available for use by the application.
139
140                  For cpc_bind_cpu(), another process  has  already  bound  to
141                  this  CPU. Only one process is allowed to bind to a CPU at a
142                  time and only one set can be bound to a CPU at a time.
143
144
145       EINVAL     The   set    does    not    contain    any    requests    or
146                  cpc_set_add_request() was not called.
147
148                  The  value  given  for  an  attribute of a request is out of
149                  range.
150
151                  The system could not  assign  a  physical  counter  to  each
152                  request in the system. See NOTES.
153
154                  One  or  more  requests in the set conflict and might not be
155                  programmed simultaneously.
156
157                  The set was not created with the same cpc handle.
158
159                  For cpc_bind_cpu(), the specified processor does not exist.
160
161                  For cpc_unbind(), the set is not bound.
162
163                  For cpc_request_preset() and cpc_set_restart(), the  calling
164                  LWP does not have a bound set.
165
166
167       ENOSYS     For cpc_bind_cpu(), the specified processor is not online.
168
169
170       ENOTSUP    The   cpc_bind_curlwp()   function   was   called  with  the
171                  CPC_OVF_NOTIFY_EMT flag, but the underlying processor is not
172                  capable of detecting counter overflow.
173
174
175       ESRCH      For cpc_bind_pctx(), the specified LWP in the target process
176                  does not exist.
177
178

EXAMPLES

180       Example 1 Use hardware performance counters  to  measure  events  in  a
181       process.
182
183
184       The  following example demonstrates how a standalone application can be
185       instrumented with the libcpc(3LIB) functions to  use  hardware  perfor‐
186       mance counters to measure events in a process. The application performs
187       20 iterations of a computation, measuring the counter values  for  each
188       iteration. By default, the example makes use of two counters to measure
189       external cache references and external cache hits.  These  options  are
190       only  appropriate  for UltraSPARC processors. By setting the EVENT0 and
191       EVENT1 environment variables to other strings (a list of which  can  be
192       obtained  from  the  -h option of the cpustat(1M) or cputrack(1) utili‐
193       ties), other events can be counted. The error() routine is  assumed  to
194       be  a  user-provided routine analogous to the familiar printf(3C) func‐
195       tion from the C library that also performs an  exit(2)  after  printing
196       the message.
197
198
199         #include <inttypes.h>
200         #include <stdlib.h>
201         #include <stdio.h>
202         #include <unistd.h>
203         #include <libcpc.h>
204         #include <errno.h>
205
206         int
207         main(int argc, char *argv[])
208         {
209         int iter;
210         char *event0 = NULL, *event1 = NULL;
211         cpc_t *cpc;
212         cpc_set_t *set;
213         cpc_buf_t *diff, *after, *before;
214         int ind0, ind1;
215         uint64_t val0, val1;
216
217         if ((cpc = cpc_open(CPC_VER_CURRENT)) == NULL)
218                 error("perf counters unavailable: %s", strerror(errno));
219
220         if ((event0 = getenv("EVENT0")) == NULL)
221              event0 = "EC_ref";
222         if ((event1 = getenv("EVENT1")) == NULL)
223              event1 = "EC_hit";
224
225         if ((set = cpc_set_create(cpc)) == NULL)
226                 error("could not create set: %s", strerror(errno));
227
228         if ((ind0 = cpc_set_add_request(cpc, set, event0, 0, CPC_COUNT_USER, 0,
229                 NULL)) == -1)
230                 error("could not add first request: %s", strerror(errno));
231
232         if ((ind1 = cpc_set_add_request(cpc, set, event1, 0, CPC_COUNT_USER, 0,
233                 NULL)) == -1)
234                 error("could not add first request: %s", strerror(errno));
235
236         if ((diff = cpc_buf_create(cpc, set)) == NULL)
237                 error("could not create buffer: %s", strerror(errno));
238         if ((after = cpc_buf_create(cpc, set)) == NULL)
239                 error("could not create buffer: %s", strerror(errno));
240         if ((before = cpc_buf_create(cpc, set)) == NULL)
241                 error("could not create buffer: %s", strerror(errno));
242
243         if (cpc_bind_curlwp(cpc, set, 0) == -1)
244                  error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
245
246         for (iter = 1; iter <= 20; iter++) {
247
248                 if (cpc_set_sample(cpc, set, before) == -1)
249                      break;
250
251                  /* ==> Computation to be measured goes here <== */
252
253                 if (cpc_set_sample(cpc, set, after) == -1)
254                      break;
255
256                 cpc_buf_sub(cpc, diff, after, before);
257                 cpc_buf_get(cpc, diff, ind0, &val0);
258                 cpc_buf_get(cpc, diff, ind1, &val1);
259
260                  (void) printf("%3d: %" PRId64 " %" PRId64 "\n", iter,
261                         val0, val1);
262         }
263
264          if (iter != 21)
265                 error("cannot sample set: %s",  strerror(errno));
266
267         cpc_close(cpc);
268
269         return (0);
270         }
271
272
273       Example 2 Write a signal handler to catch overflow signals.
274
275
276       The following example builds on Example 1 and demonstrates how to write
277       the signal handler to catch overflow signals. A counter  is  preset  so
278       that it is 1000 counts short of overflowing. After 1000 counts the sig‐
279       nal handler is invoked.
280
281
282
283       The signal handler:
284
285
286         cpc_t     *cpc;
287         cpc_set_t *set;
288         cpc_buf_t *buf;
289         int       index;
290
291         void
292         emt_handler(int sig, siginfo_t *sip, void *arg)
293         {
294              ucontext_t *uap = arg;
295              uint64_t val;
296
297              if (sig != SIGEMT || sip->si_code != EMT_CPCOVF) {
298                  psignal(sig, "example");
299                  psiginfo(sip, "example");
300                  return;
301              }
302
303              (void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p\n",
304                  _lwp_self(), (void *)sip->si_addr,
305                  (void *)uap->uc_mcontext.gregs[PC],
306                  (void *)uap->uc_mcontext.gregs[SP]);
307
308              if (cpc_set_sample(cpc, set, buf) != 0)
309                  error("cannot sample: %s", strerror(errno));
310
311              cpc_buf_get(cpc, buf, index, &val);
312
313              (void) printf("0x%" PRIx64"\n", val);
314              (void) fflush(stdout);
315
316              /*
317              * Update a request's preset and restart the counters. Counters which
318              * have not been preset with cpc_request_preset() will resume counting
319              * from their current value.
320              */
321              (cpc_request_preset(cpc, ind1, val1) != 0)
322                 error("cannot set preset for request %d: %s", ind1,
323                      strerror(errno));
324                 if (cpc_set_restart(cpc, set) != 0)
325                      error("cannot restart lwp%d: %s", _lwp_self(), strerror(errno));
326         }
327
328
329
330       The setup code, which can be positioned after the code that  opens  the
331       CPC library and creates a set:
332
333
334         #define PRESET (UINT64_MAX - 999ull)
335
336              struct sigaction act;
337              ...
338              act.sa_sigaction = emt_handler;
339              bzero(&act.sa_mask, sizeof (act.sa_mask));
340              act.sa_flags = SA_RESTART|SA_SIGINFO;
341              if (sigaction(SIGEMT, &act, NULL) == -1)
342                  error("sigaction: %s", strerror(errno));
343
344              if ((index = cpc_set_add_request(cpc, set, event, PRESET,
345                 CPC_COUNT_USER | CPC_OVF_NOTIFY_EMT, 0, NULL)) != 0)
346                 error("cannot add request to set: %s", strerror(errno));
347
348              if ((buf = cpc_buf_create(cpc, set)) == NULL)
349                 error("cannot create buffer: %s", strerror(errno));
350
351              if (cpc_bind_curlwp(cpc, set, 0) == -1)
352                  error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
353
354              for (iter = 1; iter <= 20; iter++) {
355                  /* ==> Computation to be measured goes here <== */
356              }
357
358              cpc_unbind(cpc, set);      /* done */
359
360

ATTRIBUTES

362       See attributes(5) for descriptions of the following attributes:
363
364
365
366
367       ┌─────────────────────────────┬─────────────────────────────┐
368       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
369       ├─────────────────────────────┼─────────────────────────────┤
370       │Interface Stability          │Evolving                     │
371       ├─────────────────────────────┼─────────────────────────────┤
372       │MT-Level                     │Safe                         │
373       └─────────────────────────────┴─────────────────────────────┘
374

SEE ALSO

376       cpustat(1M),  cputrack(1),  psrinfo(1M),  processor_bind(2), cpc_seter‐
377       rhndlr(3CPC), cpc_set_sample(3CPC), libcpc(3LIB), attributes(5)
378

NOTES

380       When a set is bound, the system assigns a physical hardware counter  to
381       count  on  behalf  of each request in the set. If such an assignment is
382       not possible for all requests in the set, the bind function returns  -1
383       and  sets  errno  to  EINVAL.  The  assignment  of requests to counters
384       depends on the capabilities of the available counters. Some  processors
385       (such  as  Pentium 4) have a complicated counter control mechanism that
386       requires the reservation  of  limited  hardware  resources  beyond  the
387       actual  counters. It could occur that two requests for different events
388       might be impossible to count at the same  time  due  to  these  limited
389       hardware   resources.   See  the  processor  manual  as  referenced  by
390       cpc_cpuref(3CPC) for details about the underlying processor's capabili‐
391       ties and limitations.
392
393
394       Some processors can be configured to dispatch an interrupt when a phys‐
395       ical counter overflows. The most obvious use for this  facility  is  to
396       ensure  that  the  full  64-bit  counter  values are maintained without
397       repeated sampling. Certain hardware, such as the UltraSPARC  processor,
398       does  not  record  which counter overflowed. A more subtle use for this
399       facility is to preset the counter to a value  slightly  less  than  the
400       maximum  value,  then  use the resulting interrupt to catch the counter
401       overflow associated with that event. The overflow can then be  used  as
402       an indication of the frequency of the occurrence of that event.
403
404
405       The interrupt generated by the processor might not be particularly pre‐
406       cise. That is, the particular instruction that caused the counter over‐
407       flow  might  be  earlier in the instruction stream than is indicated by
408       the program counter value in the ucontext.
409
410
411       When a request is added to a set with the CPC_OVF_NOTIFY_EMT flag  set,
412       then  as  before, the control registers and counter are preset from the
413       64-bit preset value given. When the flag is set,  however,  the  kernel
414       arranges  to send the calling process a SIGEMT signal when the overflow
415       occurs. The si_code member of the corresponding  siginfo  structure  is
416       set  to  EMT_CPCOVF  and  the  si_addr member takes the program counter
417       value at the time the overflow interrupt  was  delivered.  Counting  is
418       disabled until the set is bound again.
419
420
421       If  the  CPC_CAP_OVERFLOW_PRECISE  bit  is set in the value returned by
422       cpc_caps(3CPC), the processor is  able  to  determine  precisely  which
423       counter  has overflowed after receiving the overflow interrupt. On such
424       processors, the SIGEMT signal is sent only if a counter  overflows  and
425       the  request  that  the  counter is counting has the CPC_OVF_NOTIFY_EMT
426       flag set. If the capability is not present on the processor, the system
427       sends  a  SIGEMT  signal to the process if any of its requests have the
428       CPC_OVF_NOTIFY_EMT flag set and any counter in its set overflows.
429
430
431       Different processors have different counter  ranges  available,  though
432       all  processors supported by Solaris allow at least 31 bits to be spec‐
433       ified as a counter preset value. Portable  preset  values  lie  in  the
434       range UINT64_MAX to UINT64_MAX-INT32_MAX.
435
436
437       The  appropriate  preset value will often need to be determined experi‐
438       mentally. Typically, this value will depend on the event being measured
439       as  well as the desire to minimize the impact of the act of measurement
440       on the event being measured. Less frequent interrupts and samples  lead
441       to less perturbation of the system.
442
443
444       If  the  processor  cannot  detect counter overflow, bind will fail and
445       return ENOTSUP. Only user events can be measured using this  technique.
446       See Example 2.
447
448   Pentium 4
449       Most  Pentium  4  events require the specification of an event mask for
450       counting. The event mask is specified with the emask attribute.
451
452
453       Pentium 4 processors with HyperThreading Technology have only  one  set
454       of  hardware  counters per physical processor. To use cpc_bind_curlwp()
455       or cpc_bind_pctx() to measure per-LWP events on a system with Pentium 4
456       HT processors, a system administrator must first take processors in the
457       system offline until each physical  processor  has  only  one  hardware
458       thread  online (See the -p option to psrinfo(1M)). If a second hardware
459       thread is brought online, all per-LWP bound contexts  will  be  invali‐
460       dated and any attempt to sample or bind a CPC set will return EAGAIN.
461
462
463       Only  one  CPC  set at a time can be bound to a physical processor with
464       cpc_bind_cpu(). Any call to cpc_bind_cpu() that attempts to bind a  set
465       to  a  processor that shares a physical processor with a processor that
466       already has a CPU-bound set returns an error.
467
468
469       To measure the shared state on a Pentium 4 processor with  HyperThread‐
470       ing,  the  count_sibling_usr  and count_sibling_sys attributes are pro‐
471       vided for use with cpc_bind_cpu(). These attributes behave  exactly  as
472       the CPC_COUNT_USER and CPC_COUNT_SYSTEM request flags, except that they
473       act on the sibling hardware thread sharing the physical processor  with
474       the CPU measured by cpc_bind_cpu(). Some CPC sets will fail to bind due
475       to resource constraints. The most common type of resource constraint is
476       an  ESCR  conflict  among one or more requests in the set. For example,
477       the branch_retired event cannot be  measured  on  counters  12  and  13
478       simultaneously because both counters require the CRU_ESCR2 ESCR to mea‐
479       sure this event. To measure  branch_retired  events  simultaneously  on
480       more  than  one  counter,  use  counters  such  that  one  counter uses
481       CRU_ESCR2 and the other counter uses CRU_ESCR3. See the processor docu‐
482       mentation for details.
483
484
485
486SunOS 5.11                        05 Mar 2007            cpc_bind_curlwp(3CPC)
Impressum