1cpc_bind_event(3CPCC)PU Performance Counters Library Functioncspc_bind_event(3CPC)
2
3
4

NAME

6       cpc_bind_event,  cpc_take_sample,  cpc_rele - use CPU performance coun‐
7       ters on lwps
8

SYNOPSIS

10       cc [ flag... ] file... −lcpc [ library... ]
11       #include <libcpc.h>
12
13       int cpc_bind_event(cpc_event_t *event, int flags);
14
15
16       int cpc_take_sample(cpc_event_t *event);
17
18
19       int cpc_rele(void);
20
21

DESCRIPTION

23       Once the events to be sampled have been selected  using,  for  example,
24       cpc_strtoevent(3CPC),  the event selections can be bound to the calling
25       LWP using cpc_bind_event(). If cpc_bind_event()  returns  successfully,
26       the  system has associated performance counter context with the calling
27       LWP. The context allows the system to virtualize the hardware  counters
28       to that specific LWP, and the counters are enabled.
29
30
31       Two  flags are defined that can be passed into the routine to allow the
32       behavior of the interface to be modified, as described below.
33
34
35       Counter values can be sampled at any time by calling cpc_take_sample(),
36       and dereferencing the fields of the ce_pic[] array returned. The ce_hrt
37       field contains the timestamp at which the kernel last sampled the coun‐
38       ters.
39
40
41       To  immediately  remove  the performance counter context on an LWP, the
42       cpc_rele() interface should be used. Otherwise,  the  context  will  be
43       destroyed after the LWP or process exits.
44
45
46       The  caller  should  take steps to ensure that the counters are sampled
47       often enough to avoid the 32-bit counters  wrapping.  The  events  most
48       prone  to  wrap are those that count processor clock cycles. If such an
49       event is of interest, sampling should occur  frequently  so  that  less
50       than  4  billion  clock  cycles  can occur between samples. Practically
51       speaking, this is only likely to be a problem for otherwise  idle  sys‐
52       tems,  or  when processes are bound to processors, since normal context
53       switching behavior will otherwise hide this problem.
54

RETURN VALUES

56       Upon  successful  completion,  cpc_bind_event()  and  cpc_take_sample()
57       return  0. Otherwise, these functions return −1, and set errno to indi‐
58       cate the error.
59

ERRORS

61       The cpc_bind_event() and cpc_take_sample() functions will fail if:
62
63       EACCES     For cpc_bind_event(), access  to  the  requested  hypervisor
64                  event was denied.
65
66
67       EAGAIN     Another  process may be sampling system-wide CPU statistics.
68                  For cpc_bind_event(), this implies that no new contexts  can
69                  be  created.  For  cpc_take_sample(),  this implies that the
70                  performance counter context has been invalidated and must be
71                  released with cpc_rele(). Robust programs should be coded to
72                  expect this behavior and recover from it  by  releasing  the
73                  now  invalid  context  by  calling cpc_rele() sleeping for a
74                  while, then attempting to bind and  sample  the  event  once
75                  more.
76
77
78       EINVAL     The  cpc_take_sample()  function has been invoked before the
79                  context is bound.
80
81
82       ENOTSUP    The caller has attempted an operation that is illegal or not
83                  supported  on  the  current  platform, such as attempting to
84                  specify signal delivery on counter overflow on  a  CPU  that
85                  doesn't generate an interrupt on counter overflow.
86
87

USAGE

89       Prior   to   calling   cpc_bind_event(),   applications   should   call
90       cpc_access(3CPC) to determine if the counters  are  accessible  on  the
91       system.
92

EXAMPLES

94       Example  1  Use  hardware  performance  counters to measure events in a
95       process.
96
97
98       The example below shows how a standalone program  can  be  instrumented
99       with  the  libcpc routines to use hardware performance counters to mea‐
100       sure events in a process.  The program performs 20 iterations of a com‐
101       putation, measuring the counter values for each iteration.  By default,
102       the example makes the counters measure external  cache  references  and
103       external  cache hits; these options are only appropriate for UltraSPARC
104       processors. By setting the PERFEVENTS  environment  variable  to  other
105       strings (a list of which can be gleaned from the -h flag of the cpustat
106       or cputrack utilities), other events can be counted.  The error()  rou‐
107       tine  below  is  assumed to be a user-provided routine analogous to the
108       familiar printf(3C) routine from the C library but which also  performs
109       an exit(2) after printing the message.
110
111
112         #include <inttypes.h>
113         #include <stdlib.h>
114         #include <stdio.h>
115         #include <unistd.h>
116         #include <libcpc.h>
117         int
118         main(int argc, char *argv[])
119         {
120         int cpuver, iter;
121         char *setting = NULL;
122         cpc_event_t event;
123
124         if (cpc_version(CPC_VER_CURRENT) != CPC_VER_CURRENT)
125             error("application:library cpc version mismatch!");
126
127         if ((cpuver = cpc_getcpuver()) == -1)
128             error("no performance counter hardware!");
129
130         if ((setting = getenv("PERFEVENTS")) == NULL)
131             setting = "pic0=EC_ref,pic1=EC_hit";
132
133         if (cpc_strtoevent(cpuver, setting, &event) != 0)
134             error("can't measure '%s' on this processor", setting);
135         setting = cpc_eventtostr(&event);
136
137         if (cpc_access() == -1)
138             error("can't access perf counters: %s", strerror(errno));
139
140         if (cpc_bind_event(&event, 0) == -1)
141             error("can't bind lwp%d: %s", _lwp_self(), strerror(errno));
142
143         for (iter = 1; iter <= 20; iter++) {
144             cpc_event_t before, after;
145
146             if (cpc_take_sample(&before) == -1)
147                 break;
148
149             /* ==> Computation to be measured goes here <== */
150
151             if (cpc_take_sample(&after) == -1)
152                 break;
153             (void) printf("%3d: %" PRId64 " %" PRId64 "0, iter,
154                 after.ce_pic[0] - before.ce_pic[0],
155                 after.ce_pic[1] - before.ce_pic[1]);
156         }
157
158         if (iter != 20)
159             error("can't sample '%s': %s", setting,    strerror(errno));
160
161         free(setting);
162         return (0);
163         }
164
165
166       Example 2 Write a signal handler to catch overflow signals.
167
168
169       This  example  builds  on  Example 1, but demonstrates how to write the
170       signal handler to catch overflow signals. The counters  are  preset  so
171       that  counter  zero  is 1000 counts short of overflowing, while counter
172       one is set to zero. After 1000 counts on counter zero, the signal  han‐
173       dler will be invoked.
174
175
176
177       First the signal handler:
178
179
180         #define PRESET0        (UINT64_MAX - UINT64_C(999))
181         #define PRESET1        0
182
183         void
184         emt_handler(int sig, siginfo_t *sip, void *arg)
185         {
186         ucontext_t *uap = arg;
187         cpc_event_t sample;
188
189         if (sig != SIGEMT || sip->si_code != EMT_CPCOVF) {
190             psignal(sig, "example");
191             psiginfo(sip, "example");
192             return;
193         }
194
195         (void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p0,
196             _lwp_self(), (void *)sip->si_addr,
197             (void *)uap->uc_mcontext.gregs[PC],
198             (void *)uap->uc_mcontext.gregs[USP]);
199
200         if (cpc_take_sample(&sample) == -1)
201             error("can't sample: %s", strerror(errno));
202
203         (void) printf("0x%" PRIx64 " 0x%" PRIx64 "0,
204             sample.ce_pic[0], sample.ce_pic[1]);
205         (void) fflush(stdout);
206
207         sample.ce_pic[0] = PRESET0;
208         sample.ce_pic[1] = PRESET1;
209         if (cpc_bind_event(&sample, CPC_BIND_EMT_OVF) == -1)
210             error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
211         }
212
213
214
215       and  second  the  setup  code  (this  can be placed after the code that
216       selects the event to be measured):
217
218
219         struct sigaction act;
220         cpc_event_t event;
221         ...
222         act.sa_sigaction = emt_handler;
223         bzero(&act.sa_mask, sizeof (act.sa_mask));
224         act.sa_flags = SA_RESTART|SA_SIGINFO;
225         if (sigaction(SIGEMT, &act, NULL) == -1)
226             error("sigaction: %s", strerror(errno));
227         event.ce_pic[0] = PRESET0;
228         event.ce_pic[1] = PRESET1;
229         if (cpc_bind_event(&event, CPC_BIND_EMT_OVF) == -1)
230             error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
231
232         for (iter = 1; iter <= 20; iter++) {
233             /* ==> Computation to be measured goes here <== */
234         }
235
236         cpc_bind_event(NULL, 0);    /* done */
237
238
239
240       Note that a more general  version  of  the  signal  handler  would  use
241       write(2)  directly  instead of depending on the signal-unsafe semantics
242       of stderr and stdout. Most real signal handlers will probably  do  more
243       with the samples than just print them out.
244
245

ATTRIBUTES

247       See attributes(5) for descriptions of the following attributes:
248
249
250
251
252       ┌─────────────────────────────┬─────────────────────────────┐
253       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
254       ├─────────────────────────────┼─────────────────────────────┤
255       │MT-Level                     │MT-Safe                      │
256       ├─────────────────────────────┼─────────────────────────────┤
257       │Interface Stability          │Obsolete                     │
258       └─────────────────────────────┴─────────────────────────────┘
259

SEE ALSO

261       cpustat(1M),   cputrack(1),   write(2).   cpc(3CPC),  cpc_access(3CPC),
262       cpc_bind_curlwp(3CPC),   cpc_set_sample(3CPC),    cpc_strtoevent(3CPC),
263       cpc_unbind(3CPC), libcpc(3LIB), attributes(5)
264

NOTES

266       The cpc_bind_event(), cpc_take_sample(), and cpc_rele() functions exist
267       for binary compatibility only. Source containing these  functions  will
268       not  compile.  These  functions  are obsolete and might be removed in a
269       future  release.   Applications   should   use   cpc_bind_curlwp(3CPC),
270       cpc_set_sample(3CPC), and cpc_unbind(3CPC) instead.
271
272
273       Sometimes,  even  the  overhead of performing a system call will be too
274       disruptive  to  the   events   being   measured.   Once   a   call   to
275       cpc_bind_event() has been issued, it is possible to directly access the
276       performance hardware registers from within the application. If the per‐
277       formance  counter  context  is  active, then the counters will count on
278       behalf of the current LWP.
279
280   SPARC
281         rd %pic, %rN        ! All UltraSPARC
282         wr %rN, %pic        ! (ditto, but see text)
283
284
285   x86
286         rdpmc               ! Pentium II only
287
288
289
290       If the counter context is not active or has been invalidated, the  %pic
291       register  (SPARC),  and  the  rdpmc  instruction  (Pentium) will become
292       unavailable.
293
294
295       Note that the two 32-bit UltraSPARC performance counters  are  kept  in
296       the  single 64-bit %pic register so a couple of additional instructions
297       are required to separate the values. Also note that when the %pcr  reg‐
298       ister bit has been set that configures the %pic register as readable by
299       an application, it is also writable. Any values written  will  be  pre‐
300       served by the context switching mechanism.
301
302
303       Pentium  II  processors  support  the  non-privileged rdpmc instruction
304       which requires [5] that the counter of interest be specified  in  %ecx,
305       and returns a 40-bit value in the %edx:%eax register pair.  There is no
306       non-privileged access mechanism for Pentium I processors.
307
308   Handling counter overflow
309       As described above, when counting events, some processors  allow  their
310       counter registers to silently overflow. More recent CPUs such as Ultra‐
311       SPARC III and Pentium II, however, are capable of generating an  inter‐
312       rupt  when  the  hardware counter overflows. Some processors offer more
313       control over when interrupts will actually be generated.  For  example,
314       they  might allow the interrupt to be programmed to occur when only one
315       of the counters overflows. See cpc_strtoevent(3CPC) for the syntax.
316
317
318       The most obvious use for this facility  is  to  ensure  that  the  full
319       64-bit  counter  values  are maintained without repeated sampling. How‐
320       ever, current hardware does not record which counter overflowed. A more
321       subtle  use  for this facility is to preset the counter to a value to a
322       little less than the maximum value, then use the resulting interrupt to
323       catch the counter overflow associated with that event. The overflow can
324       then be used as an indication of the frequency  of  the  occurrence  of
325       that event.
326
327
328       Note  that the interrupt generated by the processor may not be particu‐
329       larly precise.  That is, the particular  instruction  that  caused  the
330       counter overflow may be earlier in the instruction stream than is indi‐
331       cated by the program counter value in the ucontext.
332
333
334       When cpc_bind_event() is called with  the  CPC_BIND_EMT_OVF  flag  set,
335       then  as before, the control registers and counters are preset from the
336       64-bit values contained in event. However, when the flag  is  set,  the
337       kernel  arranges  to  send the calling process a SIGEMT signal when the
338       overflow occurs, with the si_code field of  the  corresponding  siginfo
339       structure  set  to  EMT_CPCOVF,  and  the  si_addr field is the program
340       counter value at the time the overflow interrupt was delivered.  Count‐
341       ing is disabled until the next call to cpc_bind_event(). Even in a mul‐
342       tithreaded process, during execution of the signal handler, the  thread
343       behaves as if it is temporarily bound to the running LWP.
344
345
346       Different  processors  have  different counter ranges available, though
347       all processors supported by Solaris allow at least 31 bits to be speci‐
348       fied  as a counter preset value; thus portable preset values lie in the
349       range UINT64_MAX to UINT64_MAXINT32_MAX.
350
351
352       The appropriate preset value will often need to be  determined  experi‐
353       mentally.  Typically,  it  will  depend on the event being measured, as
354       well as the desire to minimize the impact of the act of measurement  on
355       the  event being measured; less frequent interrupts and samples lead to
356       less perturbation of the system.
357
358
359       If the processor cannot detect counter overflow, this  call  will  fail
360       (ENOTSUP).  Specifying a null event unbinds the context from the under‐
361       lying LWP and disables signal delivery.  Currently,  only  user  events
362       can be measured using this technique. See Example 2, above.
363
364   Inheriting events onto multiple LWPs
365       By  default,  the  library binds the performance counter context to the
366       current LWP only.  If the CPC_BIND_LWP_INHERIT flag is  set,  then  any
367       subsequent LWPs created by that LWP will automatically inherit the same
368       performance counter context.  The counters will be initialized to 0  as
369       if  a cpc_bind_event() had just been issued. This automatic inheritance
370       behavior can be useful when  dealing  with  multithreaded  programs  to
371       determine aggregate statistics for the program as a whole.
372
373
374       If  the CPC_BIND_EMT_OVF flag is also set, the process will immediately
375       dispatch a SIGEMT signal to the freshly created LWP so that it can pre‐
376       set its counters appropriately on the new LWP. This initialization con‐
377       dition can be detected  using  cpc_take_sample()  to  check  that  both
378       ce_pic[] values are set to UINT64_MAX.
379
380
381
382SunOS 5.11                        02 Mar 2007             cpc_bind_event(3CPC)
Impressum