libpfm_intel_knl(3)

1LIBPFM(3)                  Linux Programmer's Manual                 LIBPFM(3)
2
3
4

NAME

6       libpfm_intel_knl - support for Intel Kinghts Landing core PMU
7

SYNOPSIS

9       #include <perfmon/pfmlib.h>
10
11       PMU name: knl
12       PMU desc: Intel Kinghts Landing
13
14

DESCRIPTION

16       The  library  supports the Intel Kinghts Landing core PMU. It should be
17       noted that this PMU model only covers  each  core's  PMU  and  not  the
18       socket level PMU.
19
20       On Knights Landing, the number of generic counters is 4. There is 4-way
21       HyperThreading support.  The pfm_get_pmu_info()  function  returns  the
22       maximum number of generic counters in num_cntrs.
23
24

MODIFIERS

26       The  following modifiers are supported on Intel Kinghts Landing proces‐
27       sors:
28
29       u      Measure at user level which includes privilege levels 1,  2,  3.
30              This corresponds to PFM_PLM3.  This is a boolean modifier.
31
32       k      Measure  at  kernel level which includes privilege level 0. This
33              corresponds to PFM_PLM0.  This is a boolean modifier.
34
35       i      Invert the meaning of the event.  The  counter  will  now  count
36              cycles  in  which  the event is not occurring. This is a boolean
37              modifier
38
39       e      Enable edge detection, i.e., count only when there  is  a  state
40              transition  from  no  occurrence  of  the  event to at least one
41              occurrence. This modifier must be combined with a  counter  mask
42              modifier  (m)  with  a value greater or equal to one.  This is a
43              boolean modifier.
44
45       c      Set the counter mask value. The mask acts as  a  threshold.  The
46              counter  will  count the number of cycles in which the number of
47              occurrences of the event is greater or equal to  the  threshold.
48              This is an integer modifier with values in the range [0:255].
49
50       t      Measure  on any of the 4 hyper-threads at the same time assuming
51              hyper-threading is enabled. This is a  boolean  modifier.   This
52              modifier  is  only  available on fixed counters (unhalted_refer‐
53              ence_cycles,    instructions_retired,     unhalted_core_cycles).
54              Depending  on  the underlying kernel interface, the event may be
55              programmed on a fixed counter or a generic counter,  except  for
56              unhalted_reference_cycles,  in  which case, this modifier may be
57              ignored or rejected.
58
59

OFFCORE_RESPONSE events

61       Intel Knights Landing provides two offcore_response  events.  They  are
62       called OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1.
63
64       Those  events  need special treatment in the performance monitoring in‐
65       frastructure because each event uses an extra register  to  store  some
66       settings.  Thus, in case multiple offcore_response events are monitored
67       simultaneously, the kernel needs to manage the sharing  of  that  extra
68       register.
69
70       The  offcore_response  events  are  exposed  as  normal  events  by the
71       library. The extra settings are exposed as regular umasks. The  library
72       takes  care  of  encoding the events according to the underlying kernel
73       interface.
74
75       On Intel Knights Landing, the umasks are  divided  into  4  categories:
76       request, supplier and snoop and average latency. Offcore_response event
77       has two modes of operations: normal and average latency.  In the  first
78       mode,  the  two  offcore_respnse  events  operate independently of each
79       other. The user must provide at least one umask for each of the first 3
80       categories:  request, supplier, snoop. In the second mode, the two off‐
81       core_response events are combined to compute  an  average  latency  per
82       request type.
83
84       For  the  normal  mode,  there  is  a special supplier (response) umask
85       called ANY_RESPONSE. When this umask is used then it overrides any sup‐
86       plier  and  snoop  umasks.  In  other  words,  users can specify either
87       ANY_RESPONSE OR any combinations of supplier + snoops. In case no  sup‐
88       plier   or   snoop   is   specified,  the  library  defaults  to  using
89       ANY_RESPONSE.
90
91       For instance, the following are valid event selections:
92
93       OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE
94
95       OFFCORE_RESPONSE_0:ANY_REQUEST
96
97       OFFCORE_RESPONSE_0:ANY_RFO:DDR_NEAR
98
99
100       But the following is illegal:
101
102
103       OFFCORE_RESPONSE_0:ANY_RFO:DDR_NEAR:ANY_RESPONSE
104
105       In average latency  mode,  OFFCORE_RESPONSE_0  must  be  programmed  to
106       select  the  request types of interest, for instance, DMND_DATA_RD, and
107       the OUTSTANDING umask must be set  and  no  others.  the  library  will
108       enforce that restriction as soon as the OUTSTANDING umask is used. Then
109       OFFCORE_RESPONSE_1 must be set with the  same  request  types  and  the
110       ANY_RESPONSE  umask. It should be noted that the library encodes events
111       independently of each  other  and  therefore  cannot  verify  that  the
112       requests  are  matching  between  the  two  events.  Example of average
113       latency settings:
114
115       OFFCORE_RESPONSE_0:DMND_DATA_RD:OUTSTANDING+OFF‐
116       CORE_RESPONSE_1:DMND_DATA_RD:ANY_RESPONSE
117
118       OFFCORE_RESPONSE_0:ANY_REQUEST:OUTSTANDING+OFF‐
119       CORE_RESPONSE_1:ANY_REQUEST:ANY_RESPONSE
120
121       The average latency for the request(s)  is  obtained  by  dividing  the
122       counts  of  OFFCORE_RESPONSE_0  by the count of OFFCORE_RESPONSE_1. The
123       ratio is expressed in core cycles.
124
125

AUTHORS

127       Stephane Eranian <eranian@gmail.com>
128
129
130
131                                  July, 2016                         LIBPFM(3)