1LIBPFM(3)                  Linux Programmer's Manual                 LIBPFM(3)
2
3
4

NAME

6       libpfm_intel_knm - support for Intel Knights Mill core PMU
7

SYNOPSIS

9       #include <perfmon/pfmlib.h>
10
11       PMU name: knm
12       PMU desc: Intel Kinghts Mill
13
14

DESCRIPTION

16       The  library  supports  the  Intel  Kinghts Mill core PMU. It should be
17       noted that this PMU model only covers  each  core's  PMU  and  not  the
18       socket level PMU.
19
20       On  Knights  Mill,  the number of generic counters is 4. There is 4-way
21       HyperThreading support.  The pfm_get_pmu_info()  function  returns  the
22       maximum number of generic counters in num_cntrs.
23
24

MODIFIERS

26       The following modifiers are supported on Intel Kinghts Mill processors:
27
28       u      Measure  at  user level which includes privilege levels 1, 2, 3.
29              This corresponds to PFM_PLM3.  This is a boolean modifier.
30
31       k      Measure at kernel level which includes privilege level  0.  This
32              corresponds to PFM_PLM0.  This is a boolean modifier.
33
34       i      Invert  the  meaning  of  the  event. The counter will now count
35              cycles in which the event is not occurring. This  is  a  boolean
36              modifier
37
38       e      Enable  edge  detection,  i.e., count only when there is a state
39              transition from no occurrence of  the  event  to  at  least  one
40              occurrence.  This  modifier must be combined with a counter mask
41              modifier (m) with a value greater or equal to one.   This  is  a
42              boolean modifier.
43
44       c      Set  the  counter  mask value. The mask acts as a threshold. The
45              counter will count the number of cycles in which the  number  of
46              occurrences  of  the event is greater or equal to the threshold.
47              This is an integer modifier with values in the range [0:255].
48
49       t      Measure on any of the 4 hyper-threads at the same time  assuming
50              hyper-threading  is  enabled.  This is a boolean modifier.  This
51              modifier is only available on  fixed  counters  (unhalted_refer‐
52              ence_cycles,     instructions_retired,    unhalted_core_cycles).
53              Depending on the underlying kernel interface, the event  may  be
54              programmed  on  a fixed counter or a generic counter, except for
55              unhalted_reference_cycles, in which case, this modifier  may  be
56              ignored or rejected.
57
58

OFFCORE_RESPONSE events

60       Intel  Knights  Mill  provides  two  offcore_response  events. They are
61       called OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1.
62
63       Those events need special treatment in the performance  monitoring  in‐
64       frastructure  because  each  event uses an extra register to store some
65       settings. Thus, in case multiple offcore_response events are  monitored
66       simultaneously,  the  kernel  needs to manage the sharing of that extra
67       register.
68
69       The offcore_response  events  are  exposed  as  normal  events  by  the
70       library.  The extra settings are exposed as regular umasks. The library
71       takes care of encoding the events according to  the  underlying  kernel
72       interface.
73
74       On  Intel  Knights  Mill,  the  umasks  are  divided into 4 categories:
75       request, supplier and snoop and average latency. Offcore_response event
76       has  two modes of operations: normal and average latency.  In the first
77       mode, the two offcore_respnse  events  operate  independently  of  each
78       other. The user must provide at least one umask for each of the first 3
79       categories: request, supplier, snoop. In the second mode, the two  off‐
80       core_response  events  are  combined  to compute an average latency per
81       request type.
82
83       For the normal mode, there  is  a  special  supplier  (response)  umask
84       called ANY_RESPONSE. When this umask is used then it overrides any sup‐
85       plier and snoop umasks.  In  other  words,  users  can  specify  either
86       ANY_RESPONSE  OR any combinations of supplier + snoops. In case no sup‐
87       plier  or  snoop  is  specified,  the   library   defaults   to   using
88       ANY_RESPONSE.
89
90       For instance, the following are valid event selections:
91
92       OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE
93
94       OFFCORE_RESPONSE_0:ANY_REQUEST
95
96       OFFCORE_RESPONSE_0:ANY_RFO:DDR_NEAR
97
98
99       But the following is illegal:
100
101
102       OFFCORE_RESPONSE_0:ANY_RFO:DDR_NEAR:ANY_RESPONSE
103
104       In  average  latency  mode,  OFFCORE_RESPONSE_0  must  be programmed to
105       select the request types of interest, for instance,  DMND_DATA_RD,  and
106       the  OUTSTANDING  umask  must  be  set  and no others. the library will
107       enforce that restriction as soon as the OUTSTANDING umask is used. Then
108       OFFCORE_RESPONSE_1  must  be  set  with  the same request types and the
109       ANY_RESPONSE umask. It should be noted that the library encodes  events
110       independently  of  each  other  and  therefore  cannot  verify that the
111       requests are matching between  the  two  events.   Example  of  average
112       latency settings:
113
114       OFFCORE_RESPONSE_0:DMND_DATA_RD:OUTSTANDING+OFF‐
115       CORE_RESPONSE_1:DMND_DATA_RD:ANY_RESPONSE
116
117       OFFCORE_RESPONSE_0:ANY_REQUEST:OUTSTANDING+OFF‐
118       CORE_RESPONSE_1:ANY_REQUEST:ANY_RESPONSE
119
120       The  average  latency  for  the  request(s) is obtained by dividing the
121       counts of OFFCORE_RESPONSE_0 by the count  of  OFFCORE_RESPONSE_1.  The
122       ratio is expressed in core cycles.
123
124

AUTHORS

126       Stephane Eranian <eranian@gmail.com>
127
128
129
130                                  March, 2018                        LIBPFM(3)
Impressum