1LIBPFM(3)                  Linux Programmer's Manual                 LIBPFM(3)
2
3
4

NAME

6       libpfm_nehalem - support for Intel Nehalem processor family
7

SYNOPSIS

9       #include <perfmon/pfmlib.h>
10       #include <perfmon/pfmlib_intel_nhm.h>
11
12

DESCRIPTION

14       The  libpfm library provides full support for the Intel Nehalem proces‐
15       sor family, such as Intel Core i7. The interface  is  defined  in  pfm‐
16       lib_intel_nhm.h.  It  consists  of  a  set  of functions and structures
17       describing the Intel Nehalem  processor  specific  PMU  features.   The
18       Intel  Nehalem  processor  is  a  quad  core, dual thread processor. It
19       includes two types of PMU: core and uncore. The latter measures  events
20       at  the socket level and is therefore disconnected from any of the four
21       cores. The core PMU implements Intel architectural  perfmon  version  3
22       with  four  generic  counters  and three fixed counters. The uncore has
23       eight generic counters and one fixed counter. Each Intel  Nehalem  core
24       also implement a 16-deep branch trace buffer, called Last Branch Record
25       (LBR), which can be used  in  combination  with  the  core  PMU.  Intel
26       Nehalem  implements a newer version of the Precise Event-Based Sampling
27       (PEBS) mechanism which has the ability to capture  where  cache  misses
28       occur.
29
30
31       When  Intel Nehalem processor specific features are needed to support a
32       measurement, their descriptions must be passed as model-specific  input
33       arguments to the pfm_dispatch_events() function. The Intel Nehalem pro‐
34       cessors  specific  input  arguments   are   described   in   the   pfm‐
35       lib_nhm_input_param_t  structure.  No  output  parameters are currently
36       defined. The input parameters are defined as follows:
37
38       typedef struct {
39            unsigned long  cnt_mask;
40            unsigned int   flags;
41       } pfmlib_nhm_counter_t;
42
43       typedef struct {
44            unsigned int lbr_used;
45            unsigned int lbr_plm;
46            unsigned int lbr_filter;
47       } pfmlib_nhm_lbr_t;
48
49       typedef struct {
50            unsigned int pebs_used;
51            unsigned int ld_lat_thres;
52       } pfmlib_nhm_pebs_t;
53
54       typedef struct {
55            pfmlib_nhm_counter_t pfp_nhm_counters[PMU_NHM_NUM_COUNTERS];
56            pfmlib_nhm_pebs_t    pfp_nhm_pebs;
57            pfmlib_nhm_lbr_t     pfm_nhm_lbr;
58            uint64_t             reserved[4];
59       } pfmlib_nhm_input_param_t;
60
61
62       The Intel Nehalem processor provides a few  additional  per-event  fea‐
63       tures for counters: thresholding, inversion, edge detection, monitoring
64       of both threads, occupancy. They can be set using the  pfp_nhm_counters
65       data structure for each event.  The flags field can be initialized with
66       the following values, depending on the event:
67
68       PFMLIB_NHM_SEL_INV
69              Inverse the results of the cnt_mask comparison  when  set.  This
70              flag is supported for core and uncore PMU events.
71
72       PFMLIB_NHM_SEL_EDGE
73              Enables  edge  detection  of  events. This flag is supported for
74              core and uncore PMU events.
75
76       PFMLIB_NHM_SEL_ANYTHR
77              Enable measuring the event in any of the two  processor  threads
78              assuming  hyper-threading is enabled.  By default, only the cur‐
79              rent thread is measured. This flag is  restricted  to  core  PMU
80              events.
81
82       PFMLIB_NHM_SEL_OCC_RST
83              When  set, the queue occupancy counter associated with the event
84              is cleared. This flag is only available to uncore PMU events.
85
86       The cnt_mask field is used to set the event threshold.   The  value  of
87       the counter is incremented for each cycle in which the number of occur‐
88       rences of the event is greater or equal to  the  value  of  the  field.
89       Thus,  the event is modified to actually measure the number of qualify‐
90       ing cycles.  When  zero  all  occurrences  are  counted  (this  is  the
91       default).  This flag is supported for core and uncore PMU events.
92
93

Support for Precise-Event Based Sampling (PEBS)

95       The  library  can  be  used  to setup the PMC registers associated with
96       PEBS. In this case, the pfp_nhm_pebs_t structure must be used  and  the
97       pebs_used field must be set to 1.
98
99       To  enable  the PEBS load latency filtering capability, it is necessary
100       to program the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD event into  one
101       generic counter. The latency threshold must be passed to the library in
102       the ld_lat_thres field.  It  is  expressed  in  core  cycles  and  must
103       greater than 3. Note that pebs_used must be set as well.
104
105

Support for Last Branch Record (LBR)

107       The  library  can be used to setup LBR registers. On Intel Nehalem pro‐
108       cessors, the LBR  is  16-entry  deep  and  it  is  possible  to  filter
109       branches,  based  on privilege level or type. To configure the LBR, the
110       pfm_nhm_lbr_t structure must be used.
111
112       Like core PMU counters, LBR only distinguishes two privilege levels,  0
113       and  the  rest  (1,2,3).  When running Linux natively, the kernel is at
114       privilege level 0, applications at level 3.  It is possible to  specify
115       the  privilege  level  of  LBR  using  the lbr_plm. Any attempt to pass
116       PFM_PLM1 or PFM_PLM2 will be rejected. If _plm is 0,  then  the  global
117       value in pfmlib_input_param_t and the pfp_dfl_plm is used.
118
119       By  default,  LBR  captures  all branches. It is possible to filter out
120       branches by passing a set of flags in lbr_select. The flags are as fol‐
121       lows:
122
123       PFMLIB_NHM_LBR_JCC
124              When  set,  LBR  does not capture conditional branches. Default:
125              off.
126
127       PFM_NHM_LBR_NEAR_REL_CALL
128              When set, LBR does not capture near calls. Default: off.
129
130       PFM_NHM_LBR_NEAR_IND_CALL
131              When set, LBR does not capture indirect calls. Default: off.
132
133       PFM_NHM_LBR_NEAR_RET
134              When set, LBR does not capture return branches. Default: off.
135
136       PFM_NHM_LBR_NEAR_IND_JMP
137              When set, LBR does not capture indirect branches. Default: off.
138
139       PFM_NHM_LBR_NEAR_REL_JMP
140              When set, LBR does not capture relative branches. Default: off.
141
142       PFM_NHM_LBR_FAR_BRANCH
143              When set, LBR does not capture far branches. Default: off.
144
145

Support for uncore PMU

147       By nature, the uncore PMU does not distinguish privilege levels, there‐
148       fore it captures events at all privilege levels. To avoid any misinter‐
149       pretation, the library enforces that uncore  events  be  measured  with
150       both PFM_PLM0 and PFM_PLM3 set.
151
152       Tools  and  operating  system  kernel  interfaces  may  impose  further
153       restrictions on how the uncore PMU can be accessed.
154
155

SEE ALSO

157       pfm_dispatch_events(3) and set of examples shipped with the library
158

AUTHOR

160       Stephane Eranian <eranian@gmail.com>
161
162                                 January, 2009                       LIBPFM(3)
Impressum