1PAPI_profil(3) PAPI PAPI_profil(3)
2
3
4
6 PAPI_profil - generate a histogram of hardware counter overflows vs. PC
7 addresses
8
9
11 C Interface
12 #include <papi.h>
13 int PAPI_profil(void * buf, unsigned bufsiz, unsigned long offset,
14 unsigned scale, int EventSet, int EventCode, int threshold,
15 int flags);
16
17 Fortran Interface
18 The profiling routines have no Fortran interface.
19
20
22 PAPI_profil() provides hardware event statistics by profiling the
23 occurence of specified hardware counter events. It is designed to mimic
24 the UNIX SVR4 profil call. The statistics are generated by creating a
25 histogram of hardware counter event overflows vs. program counter
26 addresses for the current process. The histogram is defined for a spe‐
27 cific region of program code to be profiled, and the identified region
28 is logically broken up into a set of equal size subdivisions, each of
29 which corresponds to a count in the histogram. With each hardware event
30 overflow, the current subdivision is identified and its corresponding
31 histogram count is incremented. These counts establish a relative mea‐
32 sure of how many hardware counter events are occuring in each code sub‐
33 division. The resulting histogram counts for a profiled region can be
34 used to identify those program addresses that generate a disproportion‐
35 ately high percentage of the event of interest.
36
37 Events to be profiled are specified with the EventSet and EventCode
38 parameters. More than one event can be simultaneously profiled by call‐
39 ing PAPI_profil() several times with different EventCode values. Pro‐
40 filing can be turned off for a given event by calling PAPI_profil()
41 with a threshold value of 0.
42
43
45 *buf -- pointer to a buffer of bufsiz bytes in which the histogram
46 counts are stored in an array of unsigned short, unsigned int, or
47 unsigned long long values, or 'buckets'. The size of the buckets is
48 determined by values in the flags argument.
49
50 bufsiz -- the size of the histogram buffer in bytes. It is computed
51 from the length of the code region to be profiled, the size of the
52 buckets, and the scale factor as discussed below.
53
54 offset -- the start address of the region to be profiled.
55
56 scale -- broadly and historically speaking, a contraction factor that
57 indicates how much smaller the histogram buffer is than the region to
58 be profiled. More precisely, scale is interpreted as an unsigned 16-bit
59 fixed-point fraction with the decimal point implied on the left. Its
60 value is the reciprocal of the number of addresses in a subdivision,
61 per counter of histogram buffer. Below is a table of representative
62 values for scale:
63
64 ┌────────────────────────────────────────────────────────────────────────────────────────────┐
65 │ Representative values for the scale variable │
66 ├────────┬─────────┬─────────────────────────────────────────────────────────────────────────┤
67 │HEX │ DECIMAL │ DEFININTION │
68 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
69 │0x20000 │ 131072 │ Maps precisely one instruction address to a unique bucket in buf. │
70 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
71 │0x10000 │ 65536 │ Maps precisely two instruction addresses to a unique bucket in buf. │
72 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
73 │ 0xFFFF │ 65535 │ Maps approximately two instruction addresses to a unique bucket in buf. │
74 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
75 │ 0x8000 │ 32768 │ Maps every four instruction addresses to a bucket in buf. │
76 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
77 │ 0x4000 │ 16384 │ Maps every eight instruction addresses to a bucket in buf. │
78 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
79 │ 0x0002 │ 2 │ Maps all instruction addresses to the same bucket in buf. │
80 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
81 │ 0x0001 │ 1 │ Undefined. │
82 ├────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
83 │ 0x0000 │ 0 │ Undefined. │
84 └────────┴─────────┴─────────────────────────────────────────────────────────────────────────┘
85 Historically, the scale factor was introduced to allow the allocation
86 of buffers smaller than the code size to be profiled. Data and instruc‐
87 tion sizes were assumed to be multiples of 16-bits. These assumptions
88 are no longer necessarily true. PAPI_profil has preserved the tradi‐
89 tional definition of scale where appropriate, but deprecated the defi‐
90 nitions for 0 and 1 (disable scaling) and extended the range of scale
91 to include 65536 and 131072 to allow for exactly two addresses and
92 exactly one address per profiling bucket.
93
94 The value of bufsiz is computed as follows:
95
96 bufsiz = (end - start)*(bucket_size/2)*(scale/65536) where
97
98 bufsiz - the size of the buffer in bytes
99
100 end, start - the ending and starting addresses of the profiled region
101
102 bucket_size - the size of each bucket in bytes; 2, 4, or 8 as defined
103 in flags
104
105 scale - as defined above
106
107 EventSet -- The PAPI EventSet to profile. This EventSet is marked as
108 profiling-ready, but profiling doesn't actually start until a
109 PAPI_start() call is issued.
110
111 EventCode -- Code of the Event in the EventSet to profile. This event
112 must already be a member of the EventSet.
113
114 threshold -- minimum number of events that must occur before the PC is
115 sampled. If hardware overflow is supported for your substrate, this
116 threshold will trigger an interrupt when reached. Otherwise, the coun‐
117 ters will be sampled periodically and the PC will be recorded for the
118 first sample that exceeds the threshold. If the value of threshold is
119 0, profiling will be disabled for this event.
120
121
122 flags -- bit pattern to control profiling behavior. Defined values are
123 shown in the table below:
124
125 ┌───────────────────────────────────────────────────┐
126 │Defined bits for the flags variable │
127 ├──────────────────────┬────────────────────────────┤
128 │PAPI_PROFIL_POSIX │ Default type of profiling, │
129 │ │ similar to profil(3). │
130 ├──────────────────────┼────────────────────────────┤
131 │PAPI_PROFIL_RANDOM │ Drop a random 25% of the │
132 │ │ samples. │
133 ├──────────────────────┼────────────────────────────┤
134 │PAPI_PROFIL_WEIGHTED │ Weight the samples by │
135 │ │ their value. │
136 ├──────────────────────┼────────────────────────────┤
137 │PAPI_PROFIL_COMPRESS │ Ignore samples as values │
138 │ │ in the hash buckets get │
139 │ │ big. │
140 ├──────────────────────┼────────────────────────────┤
141 │PAPI_PROFIL_BUCKET_16 │ Use unsigned short (16 │
142 │ │ bit) buckets, This is the │
143 │ │ default bucket. │
144 ├──────────────────────┼────────────────────────────┤
145 │PAPI_PROFIL_BUCKET_32 │ Use unsigned int (32 bit) │
146 │ │ buckets. │
147 ├──────────────────────┼────────────────────────────┤
148 │PAPI_PROFIL_BUCKET_64 │ Use unsigned long long (64 │
149 │ │ bit) buckets. │
150 ├──────────────────────┼────────────────────────────┤
151 │PAPI_PROFIL_FORCE_SW │ Force software overflow in │
152 │ │ profiling. │
153 ├──────────────────────┼────────────────────────────┤
154 │ │ │
155 └──────────────────────┴────────────────────────────┘
156
158 On success, this function returns PAPI_OK.
159 On error, a non-zero error code is returned.
160
161
163 PAPI_EINVAL
164 One or more of the arguments is invalid.
165
166 PAPI_ENOMEM
167 Insufficient memory to complete the operation.
168
169 PAPI_ENOEVST
170 The EventSet specified does not exist.
171
172 PAPI_EISRUN
173 The EventSet is currently counting events.
174
175 PAPI_ECNFLCT
176 The underlying counter hardware can not count this event and
177 other events in the EventSet simultaneously.
178
179 PAPI_ENOEVNT
180 The PAPI preset is not available on the underlying hardware.
181
182
184 int retval;
185 unsigned long length;
186 PAPI_exe_info_t *prginfo;
187 unsigned short *profbuf;
188
189 if ((prginfo = PAPI_get_executable_info()) == NULL)
190 handle_error(1);
191
192 length = (unsigned long)(prginfo->text_end - prginfo->text_start);
193
194 profbuf = (unsigned short *)malloc(length);
195 if (profbuf == NULL)
196 handle_error(1);
197 memset(profbuf,0x00,length);
198 .
199 .
200 .
201 if ((retval = PAPI_profil(profbuf, length, start, 65536, EventSet,
202 PAPI_FP_INS, 1000000, PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16)) != PAPI_OK)
203 handle_error(retval);
204
205
207 If you call PAPI_profil, PAPI allocates buffer space that will not be
208 freed if you call PAPI_shutdown or PAPI_cleanup_eventset. To clean all
209 memory, you must call PAPI_profil on the Events with a 0 threshold.
210
211
213 PAPI_sprofil(3), PAPI_overflow(3), PAPI_get_executable_info(3)
214
215
216
217
218PAPI Programmer's Reference September, 2004 PAPI_profil(3)