1PFMON(1)                        User's command                        PFMON(1)
2
3
4

NAME

6       pfmon - a hardware-based performance monitoring tool
7

SYNOPSIS

9       pfmon [OPTION] [PROGNAME]
10
11

DESCRIPTION

13       The  pfmon tool is a command line performance monitoring tool using the
14       perfmon interface to access to hardware performance counters of certain
15       processors.  This version supports the following processors:
16
17       Itanium processors
18              Itanium,  Itanium  2 (McKinley, Madison and variants), Dual-Core
19              Itanium 2 (Montecito). Pfmon runs with  any  2.6.x  kernels  for
20              Itanium processors.
21
22
23       AMD X86-64 processors
24              You  need  to have a kernel with perfmon v2.2 or higher or pfmon
25              to work.
26
27
28       Intel Pentium M and P6 processors
29              You need to have a kernel with perfmon v2.2 or higher  or  pfmon
30              to work.
31
32       With  pfmon,  it  is  possible to monitor a single thread or the entire
33       system.  It is also possible  to  monitorin  multi-process  and  multi-
34       threaded  programs.   For each, it is possible to collect simple counts
35       or profiles.
36
37       The set of events that can be measured depends on the  underlying  pro‐
38       cessor.   Similarly certains options are specific to a processor model.
39       In general pfmon gives acess to all processor-specific monitoring  fea‐
40       tures.
41
42

generic options

44       Pfmon provides the following options on all processors:
45
46       -h or --help
47              display list of available options and exit
48
49       -V or --version
50              print pfmon version information and exit
51
52       -l[regex] or --show-events[=regex]
53              If regex is not provided, pfmon lists the names of all available
54              events for the current  processor.  Otherwise  only  the  events
55              matching the regular expression are printed.
56
57       --long-show-events[=regex]
58              If  regex  is not provided, pfmon lists all available events for
59              the current processor with a abbreviated list of attributes  all
60              one  one  line.  Otherwise  only the events matching the regular
61              expression are printed.
62
63       -i event or --event-info=event
64              Display detailed information about an event. The event parameter
65              can  either  be  the  event  code,  the event name, or a regular
66              expression. In case multiple events match the  expression,  they
67              are all printed.
68
69       -u, -3, or --user-level
70              Monitor  at  the  user  level  for  all events. By default, this
71              option is turned on.
72
73       -k, -0, or --kernel-level
74              Monitor at the kernel level for all  events.  By  default,  this
75              option is turned off.
76
77       -1     Monitor  execution at privilege level 1. By default, this option
78              is turned off.
79
80       -2     Monitor execution at privilege level 2. By default, this  option
81              is turned off.
82
83       -e ev1,ev2,... or --events=ev1,ev2,...
84              Select  events  to  monitor. The events are specified by name or
85              event code. If there are multiple events, they must be passed as
86              a  comma  separated  list  without spaces. The maximum number of
87              events depends on the underlying  processors.  Events  requiring
88              unit    mask    can    be    specified   using   the   notation:
89              event_name:unit_mask1:unit_mask2.... Each -e option forms a  set
90              of  events,  multiple  sets  can be defined by specifying the -e
91              option multiple times.  Events related options always  apply  to
92              the  last  defined  sets.  All  events  from  a set are measured
93              together. Pfmon uses the perfmon interface to multiplex the sets
94              on  the actual processors. In case multiple sets are used, pfmon
95              scales the final count  to provides estimates of what the actual
96              count  would have been had all the events been measured through‐
97              out the entire duration of the run. Pfmon  does  not  re-arrange
98              events between sets in case they cannot be measured together.
99
100       -I or --info
101              Print information related to the pfmon version, the support pro‐
102              cessor models and built-in sampling modules.
103
104       -t secs or --session-timeout=secs
105              Duration of the monitoring session expressed  in  seconds.  Once
106              the  timeout  expires,  pfmon  stops  monitoring and print final
107              counts or profiles.
108
109       -S format or --smpl-module-info=format
110              Display information about a sampling module.
111
112       --debug
113              Enable debug output (for experts).
114
115       --verbose
116              Print more information about the execution of pfmon.
117
118       --outfile=filename
119              Print final counts in the file called filename. By default,  all
120              results (count or profiles) are printed on the terminal.
121
122       --append
123              Append  results  (counts or profile) to the current output file.
124              If --outfile or --smpl-outfile  are  not  provided  results  are
125              printed on the screen.
126
127       --overflow-block
128              Block  the  monitored  thread  when  the sampling buffer becomes
129              full. This option is  only  available  in  per-thread  mode.  By
130              default,  this  option  is  turned off meaning tha the monitored
131              thread keeps on running, with monitoring disabled,  while  pfmon
132              is  processing the sampling buffer. In other words, there may be
133              blind spots.
134
135       --system-wide
136              Create a system wide monitoring session where pfmon measured all
137              threads  running  on a set of processors. By default this option
138              is turned off, i.e.,  pfmon  operates  in  per-thread  mode.  By
139              default, system-wide mode measures the same events on all avail‐
140              able processors. It is possible to restrict to a subset of  pro‐
141              cessor using the --cpu-list option.
142
143       --smpl-outfile=filename
144              Save  profiles  into  the file called filename. By default, pro‐
145              files are printed on the terminal.
146
147       --long-smpl-periods=val1,val2,...
148              Set the sampling period to reload into the overflowed counter(s)
149              after the last sample is recorded into the sampling buffer, i.e.
150              when the buffer becomes full.  The values must be passed in  the
151              same  order  as  the  events  the refer to. For instance, if the
152              events are passed as -eev1,ev2 then  sampling  periods  for  ev1
153              must  be  the  first, and for ev2, it must be the second.  It is
154              possible to skip a period, by providing an empty element in  the
155              list,  e.g.,  --long-smpl-periods=,val2.  Sampling  periods  are
156              expressed in the same unit as the event, they refer  to.  If  an
157              event  counts  the number of instructions retired, then the sam‐
158              pling period is using the same unit, i.e., instructions retired.
159              To  sampling  every  100,000  instructions, you can pass --long-
160              smpl-periods=100000.
161
162       --short-smpl-periods=val1,val2,...
163              Set the sampling to reload into the overflowed counter(s)  after
164              a sample is recorded into the buffer and when that sample is not
165              the last, i.e., when the buffer still has space remaining. Other
166              than that, this option works exactly like --long-smpl-periods.
167
168       --smpl-entries=n
169              Selects  the  number  of samples that the kernel sampling buffer
170              can hold.  The default size is determined dynamically  by  pfmon
171              based on the size of a sample and system resource limits such as
172              the amount of locked memory  allowed  for  a  user  process  (as
173              reported by ulimit).
174
175       --with-header
176              Generates  a  header  before  printing  counts  or profiles. The
177              header contains information about the configuration of the  host
178              systems and about the measurement being made.
179
180       --cpu-list=num,num1-num2,...
181              For  system-wide mode, this option specifies the list of proces‐
182              sors to monitor.  Without this option, all available  processors
183              are  monitored.  Processors  can  be specified individually with
184              their index, or by range.
185
186       --aggregate-results
187              aggregate counts and profiles output. By default, this option is
188              off meaning that results are per-thread or per-CPU.
189
190       --trigger-code-start-address=addr
191              Start  monitoring  the first time code executes at address addr.
192              The address can be specified in hexadecimal or with a symbol.
193
194       --trigger-code-stop-address=addr
195              Stop monitoring the first time code executes  at  address  addr.
196              The address can be specified in hexadecimal or with a symbol.
197
198       --trigger-data-start-address=addr
199              Start  monitoring  when  the  data  address  at  address addr is
200              accessed. By default, this is for any read or write access.
201
202       --trigger-data-stop-address=addr
203              Stop monitoring when data address at address addr  is  accessed.
204              By default, this is for any read of write access.
205
206       --trigger-code-repeat
207              By default, the start and stop code triggers are activated  only
208              the first time they are reached. With this option, it is  possi‐
209              ble  to  repeat  the start/stop behavior each time the execution
210              crosses the trigger address.
211
212       --trigger-code-follow
213              Apply the start/stop code triggers to all monitored threads.  By
214              default,  triggers  are  only  applied to the first thread. This
215              option has no effect on system-wide measurements.
216
217       --trigger-data-repeat
218              By default, the start and stop data triggers are activated  only
219              the  first time they are reached. With this option, it is possi‐
220              ble to repeat the start/stop behavior each time the data address
221              is accessed.
222
223       --trigger-data-follow
224              Apply  the start/stop data triggers to all monitored threads. By
225              default, triggers are only applied to  the  first  thread.  This
226              option has no effect on system-wide measurements.
227
228       --trigger-data-ro
229              Data trigger are activated on read access only. By default, they
230              are activated on read or write access.
231
232       --trigger-data-wo
233              Data trigger activated on write access only.  By  default,  they
234              are activated on read or write access.
235
236       --trigger-start-delay=secs
237              Number of seconds before activating monitoring. By default, mon‐
238              itoring is activated immediatly, except when code/data  triggers
239              are used.
240
241       Set  privilege  level  per  event. The levels apply to the current set,
242       i.e. the
243              last -e option. The levels are specified in the  same  order  as
244              the  events.  Accepted values for privileges are: u, k, 0, 1, 2,
245              3 or any combinations thereof.
246
247       --us-counter-format
248              Print counts using commas, e.g., 1,024.
249
250       --eu-counter-format
251              Print count using points, e.g., 1.024.
252
253       --hex-counter-format
254              Print count using hexadecimal, e.g., 0x400.
255
256       --smpl-module=name
257              Select the sampling module. By default  the  first  module  that
258              matches  the PMU model is used. This is typically the detailed-*
259              module. To figure out which modules are  supports,  use  the  -I
260              option.
261
262       --show-time
263              Show real,user, and system time for the command executed in per-
264              thread mode.
265
266       --symbol-file=filename
267              ELF image containing the symbol table for the command being mon‐
268              itored.  By default, pfmon uses the binary image on disk.
269
270       --sysmap-file=filename
271              System.map format file containing the kernel symbol table.
272
273       --check-events-only
274              Verify  combination  of  events and exit. No measurement is per‐
275              formed.
276
277       --smpl-periods-random=mask1:seed1,...
278              Apply randomization to long and short periods. For each  period,
279              a  seed  and  a mask value must be passed. The mask is a bitmask
280              representing the range of variation  for  randomization.  As  of
281              perfmon v2.3, the seed value is now ignored.
282
283       --smpl-print-counts
284              When sampling, the final counts for the counters are not printed
285              by default.  This option forces counts to be printed at the  end
286              of a sampling measurement.
287
288       --attach-task pid
289              Attach to thread identified by pid that is already running. User
290              must have permission to attach to the thread.
291
292       --reset-non-smpl-periods
293              At the end of a sampling period, reset all counters.
294
295       --follow-fork
296              Monitoring continues across fork(). By default monitoring is not
297              propagated to child processes. This option has no effect in sys‐
298              tem-wide mode.
299
300       --follow-vfork
301              Monitoring continues across vfork(). By  default  monitoring  is
302              not  propagated to child processes. This option has no effect in
303              system-wide mode.
304
305       --follow-pthread
306              Monitoring continues across pthread_create(). By  default  moni‐
307              toring  is  not  propagated  to  new threads. This option has no
308              effect in system-wide mode.
309
310       --follow-exec[=pattern]
311              Monitoring follows through the exec*() system call.  By  default
312              monitoring stops at exec*(). It is possible to specify a regular
313              expression pattern to filter out which command  gets  monitored.
314              Without the pattern all commands are monitored.
315
316       --follow-exec-exclude=pattern
317              Monitoring  follows  through the exec*() system call. By default
318              monitoring stops at exec*(). This option is the counter-part  of
319              --follow-exec  in  that  the pattern specifies the command which
320              must be excluded from monitoring.  Depending  on  the  monitored
321              workload,  it  may be easier to specify the commands to excludes
322              rather than the commands to include.
323
324       --follow-all
325              This option is equivalent to specifying  all  of  --follow-fork,
326              --follow-vfork, --follow-pthreads, --follow-exec.
327
328       --no-cmd-output
329              Redirect all output of executed commands to /dev/null.
330
331       --exec-split-results
332              Generate  separate results output for execution before and after
333              exec*().
334
335       --resolve-addresses
336              Resolve all code/data addresses in profiles using  symbol  table
337              information.   If the symbol information is not present, the raw
338              address is printed. By default, only raw addresses are printed.
339
340       --extra-smpl-pmds=num,num1-num2,...
341              Specify a list of extra PMD  register  to  include  in  samples.
342              Those PMD registers are typically virtual PMD registers not tied
343              to counters.
344
345       --demangle-cpp
346              C++ symbol demangling. By default, no symbol demangling is  per‐
347              formed.
348
349       --demangle-java
350              Java symbol demangling. By default, no symbol demangling is per‐
351              formed.
352
353       --saturate-smpl-buffer
354              Stop collecting samples  the  first  time  the  sampling  buffer
355              becomes full. In other words, simply collect the first N entries
356              when --smpl-entries=N.  By default, this option is off.
357
358       --pin-command
359              Pin executed command on the CPUs specified by  --cpu-list.  This
360              option is only relavant in system-wide mode.
361
362       --switch-timeout=milliseconds
363              The  number  of milliseconds before switching from one event set
364              to the next.  Depending on the  granularity  of  the  underlying
365              operating  system  timer tick, the timeout may be rounded up. If
366              the difference with the user provided timeout exeeds  2%,  pfmon
367              prints a warning message.
368
369       --dont-start
370              Do  not  activate monitoring. This option is useful on architec‐
371              tures where it is possible to start/stop counters directly  from
372              the user level.
373
374       --excl-idle
375              Exclude idle threads from system-wide measurement.
376
377       --cpu-set-relative
378              With  this option, CPU identifications for --cpu--list are rela‐
379              tive to cpu_set affinity.  By  default,  they  are  relative  to
380              actual CPU0.
381
382       --print-interval=sec
383              With  this  option,  intermediate  results  an be generated when
384              counting in a system-wide session. Pfmon prints  the  delta  for
385              each  event  since the last print. The interval sec is expressed
386              in seconds. This option is not supported in per-threas mode.
387

SEE ALSO

389       Visit http://perfmon2.sf.net for more detailed documentation  including
390       processor specific options.
391
392

AUTHOR

394       Stephane Eranian <eranian@hpl.hp.com>
395
396pfmon 3.2                         April 2006                          PFMON(1)
Impressum