1LIBPFM(3)                  Linux Programmer's Manual                 LIBPFM(3)
2
3
4

NAME

6       libpfm_itanium - support for Itanium specific PMU features
7

SYNOPSIS

9       #include <perfmon/pfmlib.h>
10       #include <perfmon/pfmlib_itanium.h>
11
12       int pfm_ita_is_ear(unsigned int i);
13       int pfm_ita_is_dear(unsigned int i);
14       int pfm_ita_is_dear_tlb(unsigned int i);
15       int pfm_ita_is_dear_cache(unsigned int i);
16       int pfm_ita_is_iear(unsigned int i);
17       int pfm_ita_is_iear_tlb(unsigned int i);
18       int pfm_ita_is_iear_cache(unsigned int i);
19       int pfm_ita_is_btb(unsigned int i);
20       int pfm_ita_support_opcm(unsigned int i);
21       int pfm_ita_support_iarr(unsigned int i);
22       int pfm_ita_support_darr(unsigned int i);
23       int pfm_ita_get_event_maxincr(unsigned int i, unsigned int *maxincr);
24       int pfm_ita_get_event_umask(unsigned int i, unsigned long *umask);
25
26

DESCRIPTION

28       The  libpfm  library provides full support for all the Itanium specific
29       features of the PMU. The interface is defined in  pfmlib_itanium.h.  It
30       consists  of a set of functions and structures which describe and allow
31       access to the Itanium specific PMU features.
32
33       The Itanium specific  functions  presented  here  are  mostly  used  to
34       retrieve the characteristics of an event. Given a opaque event descrip‐
35       tor, obtained by pfm_find_event or its derivatives, they return a bool‐
36       ean  value indicating whether this event support this features or is of
37       a particular kind.
38
39       The pfm_ita_is_ear() function returns 1 if the event  designated  by  i
40       corresponds  to  a  EAR  event, i.e., an Event Address Register type of
41       events. Otherwise 0 is returned. For instance,  DATA_EAR_CACHE_LAT4  is
42       an  ear  event,  but CPU_CYCLES is not. It can be a data or instruction
43       EAR event.
44
45       The pfm_ita_is_dear() function returns 1 if the event designated  by  i
46       corresponds to an Data EAR event. Otherwise 0 is returned.  It can be a
47       cache or TLB EAR event.
48
49       The pfm_ita_is_dear_tlb() function returns 1 if the event designated by
50       i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
51
52       The  pfm_ita_is_dear_cache() function returns 1 if the event designated
53       by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
54
55       The pfm_ita_is_iear() function returns 1 if the event designated  by  i
56       corresponds  to  an instruction EAR event. Otherwise 0 is returned.  It
57       can be a cache or TLB instruction EAR event.
58
59       The pfm_ita_is_iear_tlb() function returns 1 if the event designated by
60       i corresponds to an instruction EAR TLB event. Otherwise 0 is returned.
61
62       The  pfm_ita_is_iear_cache() function returns 1 if the event designated
63       by i corresponds to an instruction EAR  cache  event.  Otherwise  0  is
64       returned.
65
66       The  pfm_ita_support_opcm()  function returns 1 if the event designated
67       by i supports opcode matching, i.e., can this event be  measured  accu‐
68       rately  when  opcode  matching  via PMC8/PMC9 is active. Not all events
69       supports this feature.
70
71       The pfm_ita_support_iarr() function returns 1 if the  event  designated
72       by  i supports code address range restrictions, i.e., can this event be
73       measured accurately when code range restriction is active. Otherwise  0
74       is returned. Not all events supports this feature.
75
76       The  pfm_ita_support_darr()  function returns 1 if the event designated
77       by i supports data address range restrictions, i.e., can this event  be
78       measured accurately when data range restriction is active.  Otherwise 0
79       is returned. Not all events supports this feature.
80
81       The pfm_ita_get_event_maxincr() function returns in maxincr the maximum
82       number  of occurrences per cycle for the event designated by i. Certain
83       Itanium events can occur more than once per cycle. When an event occurs
84       more  than  once per cycle, the PMD counter will be incremented accord‐
85       ingly.  It is possible to restrict measurement when  event  occur  more
86       than  once  per  cycle.  For  instance, NOPS_RETIRED can happen up to 6
87       times/cycle which means that the threshold can be  adjusted  between  0
88       and  5, where 5 would mean that the PMD counter would be incremented by
89       1 only when the nop instruction is executed more  than  5  times/cycle.
90       This  function  returns  the maximum number of occurrences of the event
91       per cycle, and is the non-inclusive upper bound for  the  threshold  to
92       program in the PMC register.
93
94       The  pfm_ita_get_event_umask()  function returns in umask the umask for
95       the event designated by i.
96
97
98       When the Itanium specific features are needed to support a  measurement
99       their  descriptions must be passed as model-specific input arguments to
100       the pfm_dispatch_events call. The Itanium specific input arguments  are
101       described  in  the  pfmlib_ita_input_param_t  structure  and the output
102       parameters in pfmlib_ita_output_param_t. They are defined as follows:
103
104       typedef enum {
105            PFMLIB_ITA_ISM_BOTH=0,
106            PFMLIB_ITA_ISM_IA32=1,
107            PFMLIB_ITA_ISM_IA64=2
108       } pfmlib_ita_ism_t;
109
110       typedef struct {
111            unsigned int   flags;
112            unsigned int   thres;
113            pfmlib_ita_ism_t ism;
114       } pfmlib_ita_counter_t;
115
116       typedef struct {
117            unsigned char   opcm_used;
118            unsigned long   pmc_val;
119       } pfmlib_ita_opcm_t;
120
121       typedef struct {
122            unsigned char   btb_used;
123
124            unsigned char   btb_tar;
125            unsigned char   btb_tac;
126            unsigned char   btb_bac;
127            unsigned char   btb_tm;
128            unsigned char   btb_ptm;
129            unsigned char   btb_ppm;
130            unsigned int    btb_plm;
131       } pfmlib_ita_btb_t;
132
133       typedef enum {
134            PFMLIB_ITA_EAR_CACHE_MODE= 0,
135            PFMLIB_ITA_EAR_TLB_MODE  = 1,
136       } pfmlib_ita_ear_mode_t;
137
138       typedef struct {
139           unsigned char          ear_used;
140
141           pfmlib_ita_ear_mode_t  ear_mode;
142           pfmlib_ita_ism_t       ear_ism;
143           unsigned int           ear_plm;
144           unsigned long          ear_umask;
145       } pfmlib_ita_ear_t;
146
147       typedef struct {
148           unsigned int  rr_plm;
149           unsigned long rr_start;
150           unsigned long rr_end;
151       } pfmlib_ita_input_rr_desc_t;
152
153       typedef struct {
154           unsigned long rr_soff;
155           unsigned long rr_eoff;
156       } pfmlib_ita_output_rr_desc_t;
157
158
159       typedef struct {
160           unsigned int                rr_flags;
161           pfmlib_ita_input_rr_desc_t rr_limits[4];
162           unsigned char               rr_used;
163       } pfmlib_ita_input_rr_t;
164
165       typedef struct {
166           unsigned int                 rr_nbr_used;
167           pfmlib_ita_output_rr_desc_t  rr_infos[4];
168           pfmlib_reg_t                 rr_br[8];
169       } pfmlib_ita_output_rr_t;
170
171       typedef struct {
172           pfmlib_ita_counter_t    pfp_ita_counters[PMU_ITA_NUM_COUNTERS];
173
174           unsigned long           pfp_ita_flags;
175
176           pfmlib_ita_opcm_t       pfp_ita_pmc8;
177           pfmlib_ita_opcm_t       pfp_ita_pmc9;
178           pfmlib_ita_ear_t        pfp_ita_iear;
179           pfmlib_ita_ear_t        pfp_ita_dear;
180           pfmlib_ita_btb_t        pfp_ita_btb;
181           pfmlib_ita_input_rr_t   pfp_ita_drange;
182           pfmlib_ita_input_rr_t   pfp_ita_irange;
183       } pfmlib_ita_input_param_t;
184
185       typedef struct {
186           pfmlib_ita_output_rr_t pfp_ita_drange;
187           pfmlib_ita_output_rr_t pfp_ita_irange;
188       } pfmlib_ita_output_param_t;
189
190

INSTRUCTION SET

192       The Itanium processor provides two additional  per-event  features  for
193       counters:  thresholding  and instruction set selection. They can be set
194       using the pfp_ita_counters data structure  for  each  event.   The  ism
195       field can be initialized as follows:
196
197       PFMLIB_ITA_ISM_BOTH
198              The event will be monitored during IA-64 and IA-32 execution
199
200       PFMLIB_ITA_ISM_IA32
201              The event will only be monitored during IA-32 execution
202
203       PFMLIB_ITA_ISM_IA64
204              The event will only be monitored during IA-64 execution
205
206
207       If ism has a value of zero, it will default to PFMLIB_ITA_ISM_BOTH.
208
209       The thres indicates the threshold for the event. A threshold of n means
210       that the counter will be incremented by one only when the event  occurs
211       more than n times per cycle.
212
213       The  flags  field  contains event-specific flags. The currently defined
214       flags are:
215
216
217       PFMLIB_ITA_FL_EVT_NO_QUALCHECK
218              When this flag is set  it  indicates  that  the  library  should
219              ignore  the  qualifiers  constraints  for this event. Qualifiers
220              includes opcode matching, code and data range restrictions. When
221              an  event is marked as not supporting a particular qualifier, it
222              usually means that it is ignored, i.e., the extra level of  fil‐
223              tering  is ignored. For instance, the CPU_CYCLES events does not
224              support code range restrictions and by default the library  will
225              refuse  to  program  it  if range restriction is also requested.
226              Using the flag will override the check and the call to  pfm_dis‐
227              patch_events  will  succeed.   In  this case, CPU_CYCLES will be
228              measured for the entire program and not just for the code  range
229              requested.   For  certain measurements this is perfectly accept‐
230              able as the range restriction will only be applied  relevant  to
231              events  which  support it. Make sure you understand which events
232              do not support certains qualifiers before using this flag.
233

OPCODE MATCHING

235       The pfp_ita_pmc8 and pfp_ita_pmc9 fields of type pfmlib_ita_opcm_t con‐
236       tain  the  description  of what to do with the opcode matchers. Itanium
237       supports opcode matching via PMC8 and PMC9. When this feature  is  used
238       the  opcm_used  field  must be set to 1, otherwise it is ignored by the
239       library. The pmc_val simply contains the raw value to store in PMC8  or
240       PMC9.  The  library  does not modify the values for PMC8 and PMC9, they
241       will be stored in the pfp_pmcs table of the generic output parameters.
242
243

EVENT ADDRESS REGISTERS

245       The pfp_ita_iear field of type pfmlib_ita_ear_t describes  what  to  do
246       with  instruction  Event Address Registers (I-EARs). Again if this fea‐
247       ture is used the ear_used must be  set  to  1,  otherwise  it  will  be
248       ignored  by the library. The ear_mode must be set to either one of PFM‐
249       LIB_ITA_EAR_TLB_MODE, PFMLIB_ITA_EAR_CACHE_MODEto indicate the type  of
250       EAR  to  program.   The umask to store into PMC10 must be in ear_umask.
251       The privilege level mask at which the I-EAR will be monitored  must  be
252       set  in  ear_plm  which  can  be any combination of PFM_PLM0, PFM_PLM1,
253       PFM_PLM2, PFM_PLM3.  If ear_plm is 0 then the default  privilege  level
254       mask  in pfp_dfl_plm is used.  Finally the instruction set for which to
255       monitor is in ear_ism and can be any one of  PFMLIB_ITA_ISM_BOTH,  PFM‐
256       LIB_ITA_ISM_IA32, or PFMLIB_ITA_ISM_IA64.
257
258       The  pfp_ita_dear  field  of type pfmlib_ita_ear_t describes what to do
259       with data Event Address Registers (D-EARs). The description is  identi‐
260       cal to the I-EARs except that it applies to PMC11.
261
262       In  general,  there are four different methods to program the EAR (data
263       or instruction):
264
265       Method 1
266              There is an EAR event in the  list  of  events  to  monitor  and
267              ear_used  is  cleared.  In  this case the EAR will be programmed
268              (PMC10 or PMC11) based on the information encoded in the  event.
269              A  counting  monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
270              count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on  the
271              type of EAR.
272
273       Method 2
274              There  is  an  EAR  event  in  the list of events to monitor and
275              ear_used is set. In this case the EAR will be programmed  (PMC10
276              or   PMC11)   using  the  information  in  the  pfp_ita_iear  or
277              pfp_ita_dear structure because it contains more detailed  infor‐
278              mation, such as privilege level and instruction set.  A counting
279              monitor  (PMC4/PMD4-PMC7/PMD7)  will  be  programmed  to   count
280              DATA_EAR_EVENT  or  INSTRUCTION_EAR_EVENTS depending on the type
281              of EAR.
282
283       Method 3
284              There is no EAR event in the list of events to monitor  and  and
285              ear_used is cleared. In this case no EAR is programmed.
286
287       Method 4
288              There  is  no EAR event in the list of events to monitor and and
289              ear_used is set. In this case case the EAR  will  be  programmed
290              (PMC10  or  PMC11)  using the information in the pfp_ita_iear or
291              pfp_ita_dear structure. This is the free running  mode  for  the
292              EAR.
293
294

BRANCH TRACE BUFFER

296       The pfp_ita_btb of type pfmlib_ita_btb_t field is used to configure the
297       Branch Trace Buffer (BTB). If the btb_used is  set,  then  the  library
298       will  take the configuration into account, otherwise any BTB configura‐
299       tion will be ignored.  The various fields  in  this  structure  provide
300       means to filter out the kind of branches that gets recorded in the BTB.
301       Each one represents an element of the branch architecture of  the  Ita‐
302       nium  processor.  Refer  to the Itanium specific documentation for more
303       details on the branch architecture. The fields are as follows:
304
305       btb_tar
306              If the value of this field is 1, then branches predicted by  the
307              Target  Address Register (TAR) predictions are captured. If 0 no
308              branch predicted by the TAR is included.
309
310       btb_tac
311              If this field is  1,  then  branches  predicted  by  the  Target
312              Address  Cache  (TAC)  are captured. If 0 no branch predicted by
313              the TAC is included.
314
315       btb_bac
316              If this field is  1,  then  branches  predicted  by  the  Branch
317              Address  Corrector  (BAC) are captured. If 0 no branch predicted
318              by the BAC is included.
319
320       btb_tm If this field is 0, then no branch is captured. If this field is
321              1,  then  non  taken  branches are captured. If this field is 2,
322              then taken branches are captured. Finally if  this  field  is  3
323              then all branches are captured.
324
325       btb_ptm
326              If this field is 0, then no branch is captured. If this field is
327              1, then branches with a mispredicted  target  address  are  cap‐
328              tured.  If  this  field  is 2, then branches with correctly pre‐
329              dicted target address are captured. Finally if this field  is  3
330              then all branches are captured regardless of target address pre‐
331              diction.
332
333       btb_ppm
334              If this field is 0, then no branch is captured. If this field is
335              1,  then branches with a mispredicted path (taken/non taken) are
336              captured. If this field is 2, then branches with correctly  pre‐
337              dicted  path  are  captured. Finally if this field is 3 then all
338              branches are captured regardless of their path prediction.
339
340       btb_plm
341              This is the privilege level  mask  at  which  the  BTB  captures
342              branches.  It  can  be  any  combination  of PFM_PLM0, PFM_PLM1,
343              PFM_PLM2, PFM_PLM3. If btb_plm is 0 then the  default  privilege
344              level mask in pfp_dfl_plm is used.
345
346              There are 4 methods to program the BTB and they are as follows:
347
348
349       Method 1
350              The  BRANCH_EVENT  is  in  the  list  of  events  to monitor and
351              btb_used is cleared. In this case, the BTB  will  be  configured
352              (PMC12)   to   record   ALL   branches.   A   counting   monitor
353              (PMC4/PMD4-PMC7/PMD7) will be programmed to count BRANCH_EVENT.
354
355       Method 2
356              The BRANCH_EVENT is  in  the  list  of  events  to  monitor  and
357              btb_used  is  set.  In  this  case,  the  BTB will be configured
358              (PMC12) using the information in the  pfp_ita_btb  structure.  A
359              counting  monitor  (PMC4/PMD4-PMC7/PMD7)  will  be programmed to
360              count BRANCH_EVENT.
361
362       Method 3
363              The BRANCH_EVENT is not in the list of  events  to  monitor  and
364              btb_used  is  set.  In  this  case,  the  BTB will be configured
365              (PMC12) using the information in the pfp_ita_btb structure. This
366              is the free running mode for the BTB.
367
368       Method 4
369              The  BRANCH_EVENT  is  not  in the list of events to monitor and
370              btb_used is cleared. In this case, the BTB is not programmed.
371
372

DATA AND CODE RANGE RESTRICTIONS

374       The pfp_ita_drange and pfp_ita_irange fields control the range restric‐
375       tions for the data and code respectively. The idea is that the applica‐
376       tion passes a set of  ranges,  each  designated  by  a  start  and  end
377       address.  Upon  return from pfm_dispatch_events(), the application gets
378       back the set of registers and their values that needs to be  programmed
379       via a kernel interface.
380
381       Range  restriction is implemented using the debug registers. There is a
382       limited number of debug registers and they go  in  pair.  With  8  data
383       debug  registers,  a maximum of 4 distinct ranges can be specified. The
384       same applies to code  range  restrictions.  Moreover,  there  are  some
385       severe  constraints  on the alignment and size of the range. Given that
386       the size range is specified using a bitmask, there  can  be  situations
387       where  the actual range is larger than the requested range. The library
388       will make the best effort to cover only what  is  requested.   It  will
389       never  cover  less than what is requested. The algorithm uses more than
390       one pair of debug registers to get a more precise range  if  necessary.
391       Hence,  up  to  the 4 pairs can be used to describe a single range. The
392       library returns the start and end offsets of the actual range  compared
393       to the requested range.
394
395       If  range  restriction  is to be used, the rr_used field must be set to
396       one, otherwise settings will be ignored.  The ranges are  described  by
397       the  pfmlib_ita2_input_rr_t  structure.  Up to 4 ranges can be defined.
398       Each range is described in by a entry in rr_limits.
399
400       The pfmlib_ita2_input_rr_desc_t structure is defined as follows:
401
402       rr_plm The privilege level at which the range is active. It can be  any
403              combinations  of  PFM_PLM0,  PFM_PLM1,  PFM_PLM2,  PFM_PLM3.  If
404              rr_plm is 0 then the default privilege level mask in pfp_dfl_plm
405              is  used.The  privilege  level is only relevant for code ranges,
406              data ranges ingores the setting.
407
408       rr_start
409              This is the start address of the range. Any address is supported
410              but  for  code  range  it  must be bundle aligned, i.e., 16-byte
411              aligned.
412
413       rr_end This is the end address of the range. Any address  is  supported
414              but  for  code  range  it  must be bundle aligned, i.e., 16-byte
415              aligned.
416
417       The library will provide the values for the debug registers as well  as
418       some  information  about the actual ranges in the output parameters and
419       more precisely in the pfmlib_ita2_output_rr_t structure for each range.
420       The structure is defined as follows:
421
422       rr_nbr_used
423              Contains  the number of debug registers used to cover the range.
424              This is necessarily an even number as debug registers always  go
425              in pair. The value of this field  is between 0 and 7.
426
427       rr_br  This  table  contains  the  list of debug registers necessary to
428              cover the ranges. Each element  is  of  type  pfmlib_reg_t.  The
429              reg_num  field contains the debug register index while reg_value
430              contains the debug register value. Both the index and value must
431              be copied into the kernel specific argument to program the debug
432              registers. The library never programs them.
433
434       rr_infos
435              Contains information about the ranges defined. Because of align‐
436              ment  restrictions, the actual range covered by the debug regis‐
437              ters may be larger than the requested range. This table describe
438              the   differences   between  the  requested  and  actual  ranges
439              expressed as offsets:
440
441       rr_soff
442              Contains the start offset of the actual range described  by  the
443              debug registers. If zero, it means the library was able to match
444              exactly the beginning of the range. Otherwise it represents  the
445              number  of byte by which the actual range preceeds the requested
446              range.
447
448       rr_eoff
449              Contains the end offset of the actual  range  described  by  the
450              debug registers. If zero, it means the library was able to match
451              exactly the end of the range. Otherwise it represents the number
452              of bytes by which the actual range exceeds the requested range.
453
454

ERRORS

456       Refer to the description of pfm_dispatch_events() for errors when using
457       the Itanium specific input and output arguments.
458

SEE ALSO

460       pfm_dispatch_events(3) and set of examples shipped with the library
461

AUTHOR

463       Stephane Eranian <eranian@hpl.hp.com>
464
465                                November, 2003                       LIBPFM(3)
Impressum