1LIBPFM(3) Linux Programmer's Manual LIBPFM(3)
2
3
4
6 libpfm_itanium2 - support for Itanium2 specific PMU features
7
9 #include <perfmon/pfmlib.h>
10 #include <perfmon/pfmlib_itanium2.h>
11
12 int pfm_ita2_is_ear(unsigned int i);
13 int pfm_ita2_is_dear(unsigned int i);
14 int pfm_ita2_is_dear_tlb(unsigned int i);
15 int pfm_ita2_is_dear_cache(unsigned int i);
16 int pfm_ita2_is_dear_alat(unsigned int i);
17 int pfm_ita2_is_iear(unsigned int i);
18 int pfm_ita2_is_iear_tlb(unsigned int i);
19 int pfm_ita2_is_iear_cache(unsigned int i);
20 int pfm_ita2_is_btb(unsigned int i);
21 int pfm_ita2_support_opcm(unsigned int i);
22 int pfm_ita2_support_iarr(unsigned int i);
23 int pfm_ita2_support_darr(unsigned int i);
24 int pfm_ita2_get_event_maxincr(unsigned int i, unsigned int *maxincr);
25 int pfm_ita2_get_event_umask(unsigned int i, unsigned long *umask);
26 int pfm_ita2_get_event_group(unsigned int i, int *grp);
27 int pfm_ita2_get_event_set(unsigned int i, int *set);
28 int pfm_ita2_get_ear_mode(unsigned int i, pfmlib_ita2_ear_mode_t *mode);
29 int pfm_ita2_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out);
30
31
33 The libpfm library provides full support for all the Itanium 2 specific
34 features of the PMU. The interface is defined in pfmlib_itanium2.h. It
35 consists of a set of functions and structures which describe and allow
36 access to the Itanium 2 specific PMU features.
37
38 The Itanium 2 specific functions presented here are mostly used to
39 retrieve the characteristics of an event. Given a opaque event descrip‐
40 tor, obtained by pfm_find_event or its derivatives, they return a bool‐
41 ean value indicating whether this event support this feature or is of a
42 particular kind.
43
44 The pfm_ita2_is_ear() function returns 1 if the event designated by i
45 corresponds to a EAR event, i.e., an Event Address Register type of
46 events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is
47 an ear event, but CPU_CYCLES is not. It can be a data or instruction
48 EAR event.
49
50 The pfm_ita2_is_dear() function returns 1 if the event designated by i
51 corresponds to an Data EAR event. Otherwise 0 is returned. It can be a
52 cache or TLB EAR event.
53
54 The pfm_ita2_is_dear_tlb() function returns 1 if the event designated
55 by i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
56
57 The pfm_ita2_is_dear_cache() function returns 1 if the event designated
58 by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
59
60 The pfm_ita2_is_dear_alat() function returns 1 if the event designated
61 by i corresponds to a ALAT EAR cache event. Otherwise 0 is returned.
62
63 The pfm_ita2_is_iear() function returns 1 if the event designated by i
64 corresponds to an instruction EAR event. Otherwise 0 is returned. It
65 can be a cache or TLB instruction EAR event.
66
67 The pfm_ita2_is_iear_tlb() function returns 1 if the event designated
68 by i corresponds to an instruction EAR TLB event. Otherwise 0 is
69 returned.
70
71 The pfm_ita2_is_iear_cache() function returns 1 if the event designated
72 by i corresponds to an instruction EAR cache event. Otherwise 0 is
73 returned.
74
75 The pfm_ita2_support_opcm() function returns 1 if the event designated
76 by i supports opcode matching, i.e., can this event be measured accu‐
77 rately when opcode matching via PMC8/PMC9 is active. Not all events
78 supports this feature.
79
80 The pfm_ita2_support_iarr() function returns 1 if the event designated
81 by i supports code address range restrictions, i.e., can this event be
82 measured accurately when code range restriction is active. Otherwise 0
83 is returned. Not all events supports this feature.
84
85 The pfm_ita2_support_darr() function returns 1 if the event designated
86 by i supports data address range restrictions, i.e., can this event be
87 measured accurately when data range restriction is active. Otherwise 0
88 is returned. Not all events supports this feature.
89
90 The pfm_ita2_get_event_maxincr() function returns in maxincr the maxi‐
91 mum number of occurrences per cycle for the event designated by i. Cer‐
92 tain Itanium 2 events can occur more than once per cycle. When an event
93 occurs more than once per cycle, the PMD counter will be incremented
94 accordingly. It is possible to restrict measurement when event occur
95 more than once per cycle. For instance, NOPS_RETIRED can happen up to 6
96 times/cycle which means that the threshold can be adjusted between 0
97 and 5, where 5 would mean that the PMD counter would be incremented by
98 1 only when the nop instruction is executed more than 5 times/cycle.
99 This function returns the maximum number of occurrences of the event
100 per cycle, and is the non-inclusive upper bound for the threshold to
101 program in the PMC register.
102
103 The pfm_ita2_get_event_umask() function returns in umask the umask for
104 the event designated by i.
105
106 The pfm_ita2_get_event_grp() function returns in grp the group to which
107 the event designated by i belongs. The notion of group is used for L1
108 and L2 cache events only. For all other events, a group is irrelevant
109 and can be ignored. If the event is an L2 cache event then the value of
110 grp will be PFMLIB_ITA2_EVT_L2_CACHE_GRP. Similarly, if the event is an
111 L1 cache event, the value of grp will be PFMLIB_ITA2_EVT_L1_CACHE_GRP.
112 In any other cases, the value of grp will be PFMLIB_ITA2_EVT_NO_GRP.
113
114 The pfm_ita2_get_event_set() function returns in set the set to which
115 the event designated by i belongs. A set is a subdivision of a group
116 and is therefore only relevant for L1 and L2 cache events. An event can
117 only belong to one group and one set. This partioning of the cache
118 events is due to some hardware limitations which impose some restric‐
119 tions on events. For a given group, events from different sets cannot
120 be measured at the same time. If the event does not belong to a group
121 then the value of set is PFMLIB_MONT_EVT_NO_SET.
122
123 The pfm_ita2_irange_is_fine function returns 1 if the configuration
124 description passed in outp, the generic output parameters and mod_out,
125 the Itanium2 specific output parameters, use code range restriction in
126 fine mode. Otherwise the function returns 0. This function can only be
127 called after a call pfm_dispatch_events() which returned successfully
128 and had the data structures pointed to by outp and mod_out as output
129 parameters.
130
131 The pfm_ita2_get_event_ear_mode() function returns in mode the EAR mode
132 of the event designated by i. If the event is not an EAR event, then
133 PFMLIB_ERR_INVAL is returned and mode is not updated. Otherwise mode
134 can have the following values:
135
136 PFMLIB_ITA2_EAR_TLB_MODE
137 The event is an EAR TLB mode. It can be either data or instruc‐
138 tion TLB EAR.
139
140 PFMLIB_ITA2_EAR_CACHE_MODE
141 The event is a cache EAR. It can be either data or instruction
142 cache EAR.
143
144 PFMLIB_ITA2_EAR_ALAT_MODE
145 The event is an ALAT EAR. It can only be a data EAR event.
146
147
148 When the Itanium 2 specific features are needed to support a measure‐
149 ment their descriptions must be passed as model-specific input argu‐
150 ments to the pfm_dispatch_events call. The Itanium 2 specific input
151 arguments are described in the pfmlib_ita2_input_param_t structure and
152 the output parameters in pfmlib_ita2_output_param_t. They are defined
153 as follows:
154
155 typedef enum {
156 PFMLIB_ITA2_ISM_BOTH=0,
157 PFMLIB_ITA2_ISM_IA32=1,
158 PFMLIB_ITA2_ISM_IA64=2
159 } pfmlib_ita2_ism_t;
160
161 typedef struct {
162 unsigned int flags;
163 unsigned int thres;
164 pfmlib_ita2_ism_t ism;
165 } pfmlib_ita2_counter_t;
166
167 typedef struct {
168 unsigned char opcm_used;
169 unsigned long pmc_val;
170 } pfmlib_ita2_opcm_t;
171
172 typedef struct {
173 unsigned char btb_used;
174
175 unsigned char btb_ds;
176 unsigned char btb_tm;
177 unsigned char btb_ptm;
178 unsigned char btb_ppm;
179 unsigned char btb_brt;
180 unsigned int btb_plm;
181 } pfmlib_ita2_btb_t;
182
183 typedef enum {
184 PFMLIB_ITA2_EAR_CACHE_MODE= 0,
185 PFMLIB_ITA2_EAR_TLB_MODE = 1,
186 PFMLIB_ITA2_EAR_ALAT_MODE = 2
187 } pfmlib_ita2_ear_mode_t;
188
189 typedef struct {
190 unsigned char ear_used;
191
192 pfmlib_ita2_ear_mode_t ear_mode;
193 pfmlib_ita2_ism_t ear_ism;
194 unsigned int ear_plm;
195 unsigned long ear_umask;
196 } pfmlib_ita2_ear_t;
197
198 typedef struct {
199 unsigned int rr_plm;
200 unsigned long rr_start;
201 unsigned long rr_end;
202 } pfmlib_ita2_input_rr_desc_t;
203
204 typedef struct {
205 unsigned long rr_soff;
206 unsigned long rr_eoff;
207 } pfmlib_ita2_output_rr_desc_t;
208
209
210 typedef struct {
211 unsigned int rr_flags;
212 pfmlib_ita2_input_rr_desc_t rr_limits[4];
213 unsigned char rr_used;
214 } pfmlib_ita2_input_rr_t;
215
216 typedef struct {
217 unsigned int rr_nbr_used;
218 pfmlib_ita2_output_rr_desc_t rr_infos[4];
219 pfmlib_reg_t rr_br[8];
220 } pfmlib_ita2_output_rr_t;
221
222 typedef struct {
223 pfmlib_ita2_counter_t pfp_ita2_counters[PMU_ITA2_NUM_COUNTERS];
224
225 unsigned long pfp_ita2_flags;
226
227 pfmlib_ita2_opcm_t pfp_ita2_pmc8;
228 pfmlib_ita2_opcm_t pfp_ita2_pmc9;
229 pfmlib_ita2_ear_t pfp_ita2_iear;
230 pfmlib_ita2_ear_t pfp_ita2_dear;
231 pfmlib_ita2_btb_t pfp_ita2_btb;
232 pfmlib_ita2_input_rr_t pfp_ita2_drange;
233 pfmlib_ita2_input_rr_t pfp_ita2_irange;
234 } pfmlib_ita2_input_param_t;
235
236 typedef struct {
237 pfmlib_ita2_output_rr_t pfp_ita2_drange;
238 pfmlib_ita2_output_rr_t pfp_ita2_irange;
239 } pfmlib_ita2_output_param_t;
240
241
242
244 The Itanium 2 processor provides two additional per-event features for
245 counters: thresholding and instruction set selection. They can be set
246 using the pfp_ita2_counters data structure for each event. The ism
247 field can be initialized as follows:
248
249 PFMLIB_ITA2_ISM_BOTH
250 The event will be monitored during IA-64 and IA-32 execution
251
252 PFMLIB_ITA2_ISM_IA32
253 The event will only be monitored during IA-32 execution
254
255 PFMLIB_ITA2_ISM_IA64
256 The event will only be monitored during IA-64 execution
257
258
259 If ism has a value of zero, it will default to PFMLIB_ITA2_ISM_BOTH.
260
261 The thres indicates the threshold for the event. A threshold of n means
262 that the counter will be incremented by one only when the event occurs
263 more than n times per cycle.
264
265 The flags field contains event-specific flags. The currently defined
266 flags are:
267
268
269 PFMLIB_ITA2_FL_EVT_NO_QUALCHECK
270 When this flag is set it indicates that the library should
271 ignore the qualifiers constraints for this event. Qualifiers
272 includes opcode matching, code and data range restrictions. When
273 an event is marked as not supporting a particular qualifier, it
274 usually means that it is ignored, i.e., the extra level of fil‐
275 tering is ignored. For instance, the CPU_CYCLES event does not
276 support code range restrictions and by default the library will
277 refuse to program it if range restriction is also requested.
278 Using the flag will override the check and the call to pfm_dis‐
279 patch_events will succeed. In this case, CPU_CYCLES will be
280 measured for the entire program and not just for the code range
281 requested. For certain measurements this is perfectly accept‐
282 able as the range restriction will only be applied relevant to
283 events which support it. Make sure you understand which events
284 do not support certains qualifiers before using this flag.
285
287 The pfp_ita2_pmc8 and pfp_ita2_pmc9 fields of type pfmlib_ita2_opcm_t
288 contain the description of what to do with the opcode matchers. Itanium
289 2 supports opcode matching via PMC8 and PMC9. When this feature is used
290 the opcm_used field must be set to 1, otherwise it is ignored by the
291 library. The pmc_val simply contains the raw value to store in PMC8 or
292 PMC9. The library may adjust the value to enable/disable some options
293 depending on the set of features being used. The final value for PMC8
294 and PMC9 will be stored in the pfp_pmcs table of the generic output
295 parameters.
296
297
299 The pfp_ita2_iear field of type pfmlib_ita2_ear_t describes what to do
300 with instruction Event Address Registers (I-EARs). Again if this fea‐
301 ture is used the ear_used must be set to 1, otherwise it will be
302 ignored by the library. The ear_mode must be set to either one of PFM‐
303 LIB_ITA2_EAR_TLB_MODE, PFMLIB_ITA2_EAR_CACHE_MODEto indicate the type
304 of EAR to program. The umask to store into PMC10 must be in ear_umask.
305 The privilege level mask at which the I-EAR will be monitored must be
306 set in ear_plm which can be any combination of PFM_PLM0, PFM_PLM1,
307 PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the default privilege level
308 mask in pfp_dfl_plm is used. Finally the instruction set for which to
309 monitor is in ear_ism and can be any one of PFMLIB_ITA2_ISM_BOTH, PFM‐
310 LIB_ITA2_ISM_IA32, or PFMLIB_ITA2_ISM_IA64.
311
312 The pfp_ita2_dear field of type pfmlib_ita2_ear_t describes what to do
313 with data Event Address Registers (D-EARs). The description is identi‐
314 cal to the I-EARs except that it applies to PMC11 and that a ear_mode
315 of PFMLIB_ITA2_EAR_ALAT_MODE is possible.
316
317 In general, there are four different methods to program the EAR (data
318 or instruction):
319
320 Method 1
321 There is an EAR event in the list of events to monitor and
322 ear_used is cleared. In this case the EAR will be programmed
323 (PMC10 or PMC11) based on the information encoded in the event.
324 A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
325 count DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of
326 EAR.
327
328 Method 2
329 There is an EAR event in the list of events to monitor and
330 ear_used is set. In this case the EAR will be programmed (PMC10
331 or PMC11) using the information in the pfp_ita2_iear or
332 pfp_ita2_dear structure because it contains more detailed infor‐
333 mation, such as privilege level and isntruction set. A counting
334 monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count
335 DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of EAR.
336
337 Method 3
338 There is no EAR event in the list of events to monitor and and
339 ear_used is cleared. In this case no EAR is programmed.
340
341 Method 4
342 There is no EAR event in the list of events to monitor and and
343 ear_used is set. In this case case the EAR will be programmed
344 (PMC10 or PMC11) using the information in the pfp_ita2_iear or
345 pfp_ita2_dear structure. This is the free running mode for the
346 EAR.
347
348
350 The pfp_ita2_btb of type pfmlib_ita2_btb_t field is used to configure
351 the Branch Trace Buffer (BTB). If the btb_used is set, then the library
352 will take the configuration into account, otherwise any BTB configura‐
353 tion will be ignored. The various fields in this structure provide
354 means to filter out the kind of branches that gets recorded in the BTB.
355 Each one represents an element of the branch architecture of the Ita‐
356 nium 2 processor. Refer to the Itanium 2 specific documentation for
357 more details on the branch architecture. The fields are as follows:
358
359 btb_ds If the value of this field is 1, then detailed information about
360 the branch prediction are recorded in place of information about
361 the target address. If the value is 0, then information about
362 the target address of the branch is recorded instead.
363
364 btb_tm If this field is 0, then no branch is captured. If this field is
365 1, then non taken branches are captured. If this field is 2,
366 then taken branches are captured. Finally if this field is 3
367 then all branches are captured.
368
369 btb_ptm
370 If this field is 0, then no branch is captured. If this field is
371 1, then branches with a mispredicted target address are cap‐
372 tured. If this field is 2, then branches with correctly pre‐
373 dicted target address are captured. Finally if this field is 3
374 then all branches are captured regardless of target address pre‐
375 diction.
376
377 btb_ppm
378 If this field is 0, then no branch is captured. If this field is
379 1, then branches with a mispredicted path (taken/non taken) are
380 captured. If this field is 2, then branches with correctly pre‐
381 dicted path are captured. Finally if this field is 3 then all
382 branches are captured regardless of their path prediction.
383
384 btb_brt
385 If this field is 0, then no branch is captured. If this field is
386 1, then only IP-relative branches are captured. If this field is
387 2, then only return branches are captured. Finally if this field
388 is 3 then only non-return indirect branches are captured.
389
390 btb_plm
391 This is the privilege level mask at which the BTB captures
392 branches. It can be any combination of PFM_PLM0, PFM_PLM1,
393 PFM_PLM2, PFM_PLM3. If btb_plm is 0 then the default privilege
394 level mask in pfp_dfl_plm is used.
395
396 There are 4 methods to program the BTB and they are as follows:
397
398
399 Method 1
400 The BRANCH_EVENT is in the list of event to monitor and btb_used
401 is cleared. In this case, the BTB will be configured (PMC12) to
402 record ALL branches. A counting monitor (PMC4/PMD4-PMC7/PMD7)
403 will be programmed to count BRANCH_EVENT.
404
405 Method 2
406 The BRANCH_EVENT is in the list of events to monitor and
407 btb_used is set. In this case, the BTB will be configured
408 (PMC12) using the information in the pfp_ita2_btb structure. A
409 counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
410 count BRANCH_EVENT.
411
412 Method 3
413 The BRANCH_EVENT is not in the list of events to monitor and
414 btb_used is set. In this case, the BTB will be configured
415 (PMC12) using the information in the pfp_ita2_btb structure.
416 This is the free running mode for the BTB.
417
418 Method 4
419 The BRANCH_EVENT is not in the list of events to monitor and
420 btb_used is cleared. In this case, the BTB is not programmed.
421
422
424 The pfp_ita2_drange and pfp_ita2_irange fields control the range
425 restrictions for the data and code respectively. The idea is that the
426 application passes a set of ranges, each designated by a start and end
427 address. Upon return from pfm_dispatch_events(), the application gets
428 back the set of registers and their values that needs to be programmed
429 via a kernel interface.
430
431 Range restriction is implemented using the debug registers. There is a
432 limited number of debug registers and they go in pair. With 8 data
433 debug registers, a maximum of 4 distinct ranges can be specified. The
434 same applies to code range restrictions. Moreover, there are some
435 severe constraints on the alignment and size of the ranges. Given that
436 the size of a range is specified using a bitmask, there can be situa‐
437 tions where the actual range is larger than the requested range. For
438 code ranges, the Itanium 2 processor can use what is called a fine
439 mode, where a range is designated using two pairs of code debug regis‐
440 ters. In this mode, the bitmask is not used, the start and end
441 addresses are directly specified. Not all code ranges qualify for fine
442 mode, the size of the range must be 4KB or less and the range cannot
443 cross a 4KB page boundary. The library will make a best effort in
444 choosing the right mode for each range. For code ranges, it will try
445 the fine mode first and will default to using the bitmask mode other‐
446 wise. Fine mode applies to all code debug registers or none, i.e., you
447 cannot have a range using fine mode and another using the bitmask. the
448 Itanium 2 processor somehow limits the use of multiple pairs to accu‐
449 rately cover a code range. This can only be done for IA64_INST_RETIRED
450 and even then, you need several events to collect the counts. For all
451 other events, only one pair can be used, which leads to more inaccuracy
452 due to approximation. Data ranges can used multiple debug register
453 pairs to gain more accuracy. The library will never cover less than
454 what is requested. The algorithm will use more than one pair of debug
455 registers whenever possible to get a more precise range. Hence, up to
456 the 4 pairs can be used to describe a single range.
457
458 If range restriction is to be used, the rr_used field must be set to
459 one, otherwise settings will be ignored. The ranges are described by
460 the pfmlib_ita2_input_rr_t structure. Up to 4 ranges can be defined.
461 Each range is described in by a entry in rr_limits. Some flags for all
462 ranges can be defined in rr_flags. Currently defined flags are:
463
464
465 PFMLIB_ITA2_RR_INV
466 Inverse the code ranges. The qualifying events will be measure‐
467 ment when executing outside the specified ranges.
468
469 PFMLIB_ITA2_RR_NO_FINE_MODE
470 Force non fine mode for all code ranges (mostly for debug)
471
472
473 The pfmlib_ita2_input_rr_desc_t structure is defined as follows:
474
475
476 rr_plm The privilege level at which the range is active. It can be any
477 combinations of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If
478 btb_plm is 0 then the default privilege level mask in
479 pfp_dfl_plm is used. The privilege level is only relevant for
480 code ranges, data ranges ingores the setting.
481
482 rr_start
483 This is the start address of the range. Any address is supported
484 but for code range it must be bundle aligned, i.e., 16-byte
485 aligned.
486
487 rr_end This is the end address of the range. Any address is supported
488 but for code range it must be bundle aligned, i.e., 16-byte
489 aligned.
490
491
492 The library will provide the values for the debug registers as well as
493 some information about the actual ranges in the output parameters and
494 more precisely in the pfmlib_ita2_output_rr_t structure for each range.
495 The structure is defined as follows:
496
497 rr_nbr_used
498 Contains the number of debug registers used to cover the range.
499 This is necessarily an even number as debug registers always go
500 in pair. The value of this field is between 0 and 7.
501
502 rr_br This table contains the list of debug registers necessary to
503 cover the ranges. Each element is of type pfmlib_reg_t. The
504 reg_num field contains the debug register index while reg_value
505 contains the debug register value. Both the index and value must
506 be copied into the kernel specific argument to program the debug
507 registers. The library never programs them.
508
509 rr_infos
510 Contains information about the ranges defined. Because of align‐
511 ment restrictions, the actual range covered by the debug regis‐
512 ters may be larger than the requested range. This table describe
513 the differences between the requested and actual ranges
514 expressed as offsets:
515
516 rr_soff
517 Contains the start offset of the actual range described by the
518 debug registers. If zero, it means the library was able to match
519 exactly the beginning of the range. Otherwise it represents the
520 number of byte by which the actual range preceeds the requested
521 range.
522
523 rr_eoff
524 Contains the end offset of the actual range described by the
525 debug registers. If zero, it means the library was able to match
526 exactly the end of the range. Otherwise it represents the number
527 of bytes by which the actual range exceeds the requested range.
528
529
531 Refer to the description of pfm_dispatch_events() for errors when using
532 the Itanium 2 specific input and output arguments.
533
535 pfm_dispatch_events(3) and set of examples shipped with the library
536
538 Stephane Eranian <eranian@hpl.hp.com>
539
540 November, 2003 LIBPFM(3)