1LIBPFM(3) Linux Programmer's Manual LIBPFM(3)
2
3
4
6 libpfm_itanium - support for Itanium specific PMU features
7
9 #include <perfmon/pfmlib.h>
10 #include <perfmon/pfmlib_itanium.h>
11
12 int pfm_ita_is_ear(unsigned int i);
13 int pfm_ita_is_dear(unsigned int i);
14 int pfm_ita_is_dear_tlb(unsigned int i);
15 int pfm_ita_is_dear_cache(unsigned int i);
16 int pfm_ita_is_iear(unsigned int i);
17 int pfm_ita_is_iear_tlb(unsigned int i);
18 int pfm_ita_is_iear_cache(unsigned int i);
19 int pfm_ita_is_btb(unsigned int i);
20 int pfm_ita_support_opcm(unsigned int i);
21 int pfm_ita_support_iarr(unsigned int i);
22 int pfm_ita_support_darr(unsigned int i);
23 int pfm_ita_get_event_maxincr(unsigned int i, unsigned int *maxincr);
24 int pfm_ita_get_event_umask(unsigned int i, unsigned long *umask);
25
26
28 The libpfm library provides full support for all the Itanium specific
29 features of the PMU. The interface is defined in pfmlib_itanium.h. It
30 consists of a set of functions and structures which describe and allow
31 access to the Itanium specific PMU features.
32
33 The Itanium specific functions presented here are mostly used to
34 retrieve the characteristics of an event. Given a opaque event descrip‐
35 tor, obtained by pfm_find_event or its derivatives, they return a bool‐
36 ean value indicating whether this event support this features or is of
37 a particular kind.
38
39 The pfm_ita_is_ear() function returns 1 if the event designated by i
40 corresponds to a EAR event, i.e., an Event Address Register type of
41 events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is
42 an ear event, but CPU_CYCLES is not. It can be a data or instruction
43 EAR event.
44
45 The pfm_ita_is_dear() function returns 1 if the event designated by i
46 corresponds to an Data EAR event. Otherwise 0 is returned. It can be a
47 cache or TLB EAR event.
48
49 The pfm_ita_is_dear_tlb() function returns 1 if the event designated by
50 i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
51
52 The pfm_ita_is_dear_cache() function returns 1 if the event designated
53 by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
54
55 The pfm_ita_is_iear() function returns 1 if the event designated by i
56 corresponds to an instruction EAR event. Otherwise 0 is returned. It
57 can be a cache or TLB instruction EAR event.
58
59 The pfm_ita_is_iear_tlb() function returns 1 if the event designated by
60 i corresponds to an instruction EAR TLB event. Otherwise 0 is returned.
61
62 The pfm_ita_is_iear_cache() function returns 1 if the event designated
63 by i corresponds to an instruction EAR cache event. Otherwise 0 is
64 returned.
65
66 The pfm_ita_support_opcm() function returns 1 if the event designated
67 by i supports opcode matching, i.e., can this event be measured accu‐
68 rately when opcode matching via PMC8/PMC9 is active. Not all events
69 supports this feature.
70
71 The pfm_ita_support_iarr() function returns 1 if the event designated
72 by i supports code address range restrictions, i.e., can this event be
73 measured accurately when code range restriction is active. Otherwise 0
74 is returned. Not all events supports this feature.
75
76 The pfm_ita_support_darr() function returns 1 if the event designated
77 by i supports data address range restrictions, i.e., can this event be
78 measured accurately when data range restriction is active. Otherwise 0
79 is returned. Not all events supports this feature.
80
81 The pfm_ita_get_event_maxincr() function returns in maxincr the maximum
82 number of occurrences per cycle for the event designated by i. Certain
83 Itanium events can occur more than once per cycle. When an event occurs
84 more than once per cycle, the PMD counter will be incremented accord‐
85 ingly. It is possible to restrict measurement when event occur more
86 than once per cycle. For instance, NOPS_RETIRED can happen up to 6
87 times/cycle which means that the threshold can be adjusted between 0
88 and 5, where 5 would mean that the PMD counter would be incremented by
89 1 only when the nop instruction is executed more than 5 times/cycle.
90 This function returns the maximum number of occurrences of the event
91 per cycle, and is the non-inclusive upper bound for the threshold to
92 program in the PMC register.
93
94 The pfm_ita_get_event_umask() function returns in umask the umask for
95 the event designated by i.
96
97
98 When the Itanium specific features are needed to support a measurement
99 their descriptions must be passed as model-specific input arguments to
100 the pfm_dispatch_events call. The Itanium specific input arguments are
101 described in the pfmlib_ita_input_param_t structure and the output
102 parameters in pfmlib_ita_output_param_t. They are defined as follows:
103
104 typedef enum {
105 PFMLIB_ITA_ISM_BOTH=0,
106 PFMLIB_ITA_ISM_IA32=1,
107 PFMLIB_ITA_ISM_IA64=2
108 } pfmlib_ita_ism_t;
109
110 typedef struct {
111 unsigned int flags;
112 unsigned int thres;
113 pfmlib_ita_ism_t ism;
114 } pfmlib_ita_counter_t;
115
116 typedef struct {
117 unsigned char opcm_used;
118 unsigned long pmc_val;
119 } pfmlib_ita_opcm_t;
120
121 typedef struct {
122 unsigned char btb_used;
123
124 unsigned char btb_tar;
125 unsigned char btb_tac;
126 unsigned char btb_bac;
127 unsigned char btb_tm;
128 unsigned char btb_ptm;
129 unsigned char btb_ppm;
130 unsigned int btb_plm;
131 } pfmlib_ita_btb_t;
132
133 typedef enum {
134 PFMLIB_ITA_EAR_CACHE_MODE= 0,
135 PFMLIB_ITA_EAR_TLB_MODE = 1,
136 } pfmlib_ita_ear_mode_t;
137
138 typedef struct {
139 unsigned char ear_used;
140
141 pfmlib_ita_ear_mode_t ear_mode;
142 pfmlib_ita_ism_t ear_ism;
143 unsigned int ear_plm;
144 unsigned long ear_umask;
145 } pfmlib_ita_ear_t;
146
147 typedef struct {
148 unsigned int rr_plm;
149 unsigned long rr_start;
150 unsigned long rr_end;
151 } pfmlib_ita_input_rr_desc_t;
152
153 typedef struct {
154 unsigned long rr_soff;
155 unsigned long rr_eoff;
156 } pfmlib_ita_output_rr_desc_t;
157
158
159 typedef struct {
160 unsigned int rr_flags;
161 pfmlib_ita_input_rr_desc_t rr_limits[4];
162 unsigned char rr_used;
163 } pfmlib_ita_input_rr_t;
164
165 typedef struct {
166 unsigned int rr_nbr_used;
167 pfmlib_ita_output_rr_desc_t rr_infos[4];
168 pfmlib_reg_t rr_br[8];
169 } pfmlib_ita_output_rr_t;
170
171 typedef struct {
172 pfmlib_ita_counter_t pfp_ita_counters[PMU_ITA_NUM_COUNTERS];
173
174 unsigned long pfp_ita_flags;
175
176 pfmlib_ita_opcm_t pfp_ita_pmc8;
177 pfmlib_ita_opcm_t pfp_ita_pmc9;
178 pfmlib_ita_ear_t pfp_ita_iear;
179 pfmlib_ita_ear_t pfp_ita_dear;
180 pfmlib_ita_btb_t pfp_ita_btb;
181 pfmlib_ita_input_rr_t pfp_ita_drange;
182 pfmlib_ita_input_rr_t pfp_ita_irange;
183 } pfmlib_ita_input_param_t;
184
185 typedef struct {
186 pfmlib_ita_output_rr_t pfp_ita_drange;
187 pfmlib_ita_output_rr_t pfp_ita_irange;
188 } pfmlib_ita_output_param_t;
189
190
192 The Itanium processor provides two additional per-event features for
193 counters: thresholding and instruction set selection. They can be set
194 using the pfp_ita_counters data structure for each event. The ism
195 field can be initialized as follows:
196
197 PFMLIB_ITA_ISM_BOTH
198 The event will be monitored during IA-64 and IA-32 execution
199
200 PFMLIB_ITA_ISM_IA32
201 The event will only be monitored during IA-32 execution
202
203 PFMLIB_ITA_ISM_IA64
204 The event will only be monitored during IA-64 execution
205
206
207 If ism has a value of zero, it will default to PFMLIB_ITA_ISM_BOTH.
208
209 The thres indicates the threshold for the event. A threshold of n means
210 that the counter will be incremented by one only when the event occurs
211 more than n times per cycle.
212
213 The flags field contains event-specific flags. The currently defined
214 flags are:
215
216
217 PFMLIB_ITA_FL_EVT_NO_QUALCHECK
218 When this flag is set it indicates that the library should
219 ignore the qualifiers constraints for this event. Qualifiers
220 includes opcode matching, code and data range restrictions. When
221 an event is marked as not supporting a particular qualifier, it
222 usually means that it is ignored, i.e., the extra level of fil‐
223 tering is ignored. For instance, the CPU_CYCLES events does not
224 support code range restrictions and by default the library will
225 refuse to program it if range restriction is also requested.
226 Using the flag will override the check and the call to pfm_dis‐
227 patch_events will succeed. In this case, CPU_CYCLES will be
228 measured for the entire program and not just for the code range
229 requested. For certain measurements this is perfectly accept‐
230 able as the range restriction will only be applied relevant to
231 events which support it. Make sure you understand which events
232 do not support certains qualifiers before using this flag.
233
235 The pfp_ita_pmc8 and pfp_ita_pmc9 fields of type pfmlib_ita_opcm_t con‐
236 tain the description of what to do with the opcode matchers. Itanium
237 supports opcode matching via PMC8 and PMC9. When this feature is used
238 the opcm_used field must be set to 1, otherwise it is ignored by the
239 library. The pmc_val simply contains the raw value to store in PMC8 or
240 PMC9. The library does not modify the values for PMC8 and PMC9, they
241 will be stored in the pfp_pmcs table of the generic output parameters.
242
243
245 The pfp_ita_iear field of type pfmlib_ita_ear_t describes what to do
246 with instruction Event Address Registers (I-EARs). Again if this fea‐
247 ture is used the ear_used must be set to 1, otherwise it will be
248 ignored by the library. The ear_mode must be set to either one of PFM‐
249 LIB_ITA_EAR_TLB_MODE, PFMLIB_ITA_EAR_CACHE_MODEto indicate the type of
250 EAR to program. The umask to store into PMC10 must be in ear_umask.
251 The privilege level mask at which the I-EAR will be monitored must be
252 set in ear_plm which can be any combination of PFM_PLM0, PFM_PLM1,
253 PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the default privilege level
254 mask in pfp_dfl_plm is used. Finally the instruction set for which to
255 monitor is in ear_ism and can be any one of PFMLIB_ITA_ISM_BOTH, PFM‐
256 LIB_ITA_ISM_IA32, or PFMLIB_ITA_ISM_IA64.
257
258 The pfp_ita_dear field of type pfmlib_ita_ear_t describes what to do
259 with data Event Address Registers (D-EARs). The description is identi‐
260 cal to the I-EARs except that it applies to PMC11.
261
262 In general, there are four different methods to program the EAR (data
263 or instruction):
264
265 Method 1
266 There is an EAR event in the list of events to monitor and
267 ear_used is cleared. In this case the EAR will be programmed
268 (PMC10 or PMC11) based on the information encoded in the event.
269 A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
270 count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the
271 type of EAR.
272
273 Method 2
274 There is an EAR event in the list of events to monitor and
275 ear_used is set. In this case the EAR will be programmed (PMC10
276 or PMC11) using the information in the pfp_ita_iear or
277 pfp_ita_dear structure because it contains more detailed infor‐
278 mation, such as privilege level and instruction set. A counting
279 monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count
280 DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the type
281 of EAR.
282
283 Method 3
284 There is no EAR event in the list of events to monitor and and
285 ear_used is cleared. In this case no EAR is programmed.
286
287 Method 4
288 There is no EAR event in the list of events to monitor and and
289 ear_used is set. In this case case the EAR will be programmed
290 (PMC10 or PMC11) using the information in the pfp_ita_iear or
291 pfp_ita_dear structure. This is the free running mode for the
292 EAR.
293
294
296 The pfp_ita_btb of type pfmlib_ita_btb_t field is used to configure the
297 Branch Trace Buffer (BTB). If the btb_used is set, then the library
298 will take the configuration into account, otherwise any BTB configura‐
299 tion will be ignored. The various fields in this structure provide
300 means to filter out the kind of branches that gets recorded in the BTB.
301 Each one represents an element of the branch architecture of the Ita‐
302 nium processor. Refer to the Itanium specific documentation for more
303 details on the branch architecture. The fields are as follows:
304
305 btb_tar
306 If the value of this field is 1, then branches predicted by the
307 Target Address Register (TAR) predictions are captured. If 0 no
308 branch predicted by the TAR is included.
309
310 btb_tac
311 If this field is 1, then branches predicted by the Target
312 Address Cache (TAC) are captured. If 0 no branch predicted by
313 the TAC is included.
314
315 btb_bac
316 If this field is 1, then branches predicted by the Branch
317 Address Corrector (BAC) are captured. If 0 no branch predicted
318 by the BAC is included.
319
320 btb_tm If this field is 0, then no branch is captured. If this field is
321 1, then non taken branches are captured. If this field is 2,
322 then taken branches are captured. Finally if this field is 3
323 then all branches are captured.
324
325 btb_ptm
326 If this field is 0, then no branch is captured. If this field is
327 1, then branches with a mispredicted target address are cap‐
328 tured. If this field is 2, then branches with correctly pre‐
329 dicted target address are captured. Finally if this field is 3
330 then all branches are captured regardless of target address pre‐
331 diction.
332
333 btb_ppm
334 If this field is 0, then no branch is captured. If this field is
335 1, then branches with a mispredicted path (taken/non taken) are
336 captured. If this field is 2, then branches with correctly pre‐
337 dicted path are captured. Finally if this field is 3 then all
338 branches are captured regardless of their path prediction.
339
340 btb_plm
341 This is the privilege level mask at which the BTB captures
342 branches. It can be any combination of PFM_PLM0, PFM_PLM1,
343 PFM_PLM2, PFM_PLM3. If btb_plm is 0 then the default privilege
344 level mask in pfp_dfl_plm is used.
345
346 There are 4 methods to program the BTB and they are as follows:
347
348
349 Method 1
350 The BRANCH_EVENT is in the list of events to monitor and
351 btb_used is cleared. In this case, the BTB will be configured
352 (PMC12) to record ALL branches. A counting monitor
353 (PMC4/PMD4-PMC7/PMD7) will be programmed to count BRANCH_EVENT.
354
355 Method 2
356 The BRANCH_EVENT is in the list of events to monitor and
357 btb_used is set. In this case, the BTB will be configured
358 (PMC12) using the information in the pfp_ita_btb structure. A
359 counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
360 count BRANCH_EVENT.
361
362 Method 3
363 The BRANCH_EVENT is not in the list of events to monitor and
364 btb_used is set. In this case, the BTB will be configured
365 (PMC12) using the information in the pfp_ita_btb structure. This
366 is the free running mode for the BTB.
367
368 Method 4
369 The BRANCH_EVENT is not in the list of events to monitor and
370 btb_used is cleared. In this case, the BTB is not programmed.
371
372
374 The pfp_ita_drange and pfp_ita_irange fields control the range restric‐
375 tions for the data and code respectively. The idea is that the applica‐
376 tion passes a set of ranges, each designated by a start and end
377 address. Upon return from pfm_dispatch_events(), the application gets
378 back the set of registers and their values that needs to be programmed
379 via a kernel interface.
380
381 Range restriction is implemented using the debug registers. There is a
382 limited number of debug registers and they go in pair. With 8 data
383 debug registers, a maximum of 4 distinct ranges can be specified. The
384 same applies to code range restrictions. Moreover, there are some
385 severe constraints on the alignment and size of the range. Given that
386 the size range is specified using a bitmask, there can be situations
387 where the actual range is larger than the requested range. The library
388 will make the best effort to cover only what is requested. It will
389 never cover less than what is requested. The algorithm uses more than
390 one pair of debug registers to get a more precise range if necessary.
391 Hence, up to the 4 pairs can be used to describe a single range. The
392 library returns the start and end offsets of the actual range compared
393 to the requested range.
394
395 If range restriction is to be used, the rr_used field must be set to
396 one, otherwise settings will be ignored. The ranges are described by
397 the pfmlib_ita2_input_rr_t structure. Up to 4 ranges can be defined.
398 Each range is described in by a entry in rr_limits.
399
400 The pfmlib_ita2_input_rr_desc_t structure is defined as follows:
401
402 rr_plm The privilege level at which the range is active. It can be any
403 combinations of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If
404 rr_plm is 0 then the default privilege level mask in pfp_dfl_plm
405 is used.The privilege level is only relevant for code ranges,
406 data ranges ingores the setting.
407
408 rr_start
409 This is the start address of the range. Any address is supported
410 but for code range it must be bundle aligned, i.e., 16-byte
411 aligned.
412
413 rr_end This is the end address of the range. Any address is supported
414 but for code range it must be bundle aligned, i.e., 16-byte
415 aligned.
416
417 The library will provide the values for the debug registers as well as
418 some information about the actual ranges in the output parameters and
419 more precisely in the pfmlib_ita2_output_rr_t structure for each range.
420 The structure is defined as follows:
421
422 rr_nbr_used
423 Contains the number of debug registers used to cover the range.
424 This is necessarily an even number as debug registers always go
425 in pair. The value of this field is between 0 and 7.
426
427 rr_br This table contains the list of debug registers necessary to
428 cover the ranges. Each element is of type pfmlib_reg_t. The
429 reg_num field contains the debug register index while reg_value
430 contains the debug register value. Both the index and value must
431 be copied into the kernel specific argument to program the debug
432 registers. The library never programs them.
433
434 rr_infos
435 Contains information about the ranges defined. Because of align‐
436 ment restrictions, the actual range covered by the debug regis‐
437 ters may be larger than the requested range. This table describe
438 the differences between the requested and actual ranges
439 expressed as offsets:
440
441 rr_soff
442 Contains the start offset of the actual range described by the
443 debug registers. If zero, it means the library was able to match
444 exactly the beginning of the range. Otherwise it represents the
445 number of byte by which the actual range preceeds the requested
446 range.
447
448 rr_eoff
449 Contains the end offset of the actual range described by the
450 debug registers. If zero, it means the library was able to match
451 exactly the end of the range. Otherwise it represents the number
452 of bytes by which the actual range exceeds the requested range.
453
454
456 Refer to the description of pfm_dispatch_events() for errors when using
457 the Itanium specific input and output arguments.
458
460 pfm_dispatch_events(3) and set of examples shipped with the library
461
463 Stephane Eranian <eranian@hpl.hp.com>
464
465 November, 2003 LIBPFM(3)