1PERF-STAT(1) perf Manual PERF-STAT(1)
2
3
4
6 perf-stat - Run a command and gather performance counter statistics
7
9 perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
10 perf stat [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
11 perf stat [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
12 perf stat report [-i file]
13
15 This command runs a command and gathers performance counter statistics
16 from it.
17
19 <command>...
20 Any command you can specify in a shell.
21
22 record
23 See STAT RECORD.
24
25 report
26 See STAT REPORT.
27
28 -e, --event=
29 Select the PMU event. Selection can be:
30
31 • a symbolic event name (use perf list to list all events)
32
33 • a raw PMU event in the form of rN where N is a hexadecimal
34 value that represents the raw register encoding with the layout
35 of the event control registers as described by entries in
36 /sys/bus/event_source/devices/cpu/format/*.
37
38 • a symbolic or raw PMU event followed by an optional colon and a
39 list of event modifiers, e.g., cpu-cycles:p. See the perf-
40 list(1) man page for details on event modifiers.
41
42 • a symbolically formed event like pmu/param1=0x3,param2/ where
43 param1 and param2 are defined as formats for the PMU in
44 /sys/bus/event_source/devices/<pmu>/format/*
45
46 'percore' is a event qualifier that sums up the event counts for both
47 hardware threads in a core. For example:
48 perf stat -A -a -e cpu/event,percore=1/,otherevent ...
49
50 • a symbolically formed event like
51 pmu/config=M,config1=N,config2=K/ where M, N, K are numbers (in
52 decimal, hex, octal format). Acceptable values for each of
53 config, config1 and config2 parameters are defined by
54 corresponding entries in
55 /sys/bus/event_source/devices/<pmu>/format/*
56
57 Note that the last two syntaxes support prefix and glob matching in
58 the PMU name to simplify creation of events across multiple instances
59 of the same type of PMU in large systems (e.g. memory controller PMUs).
60 Multiple PMU instances are typical for uncore PMUs, so the prefix
61 'uncore_' is also ignored when performing this match.
62
63 -i, --no-inherit
64 child tasks do not inherit counters
65
66 -p, --pid=<pid>
67 stat events on existing process id (comma separated list)
68
69 -t, --tid=<tid>
70 stat events on existing thread id (comma separated list)
71
72 -b, --bpf-prog
73 stat events on existing bpf program id (comma separated list),
74 requiring root rights. bpftool-prog could be used to find program
75 id all bpf programs in the system. For example:
76
77 # bpftool prog | head -n 1
78 17247: tracepoint name sys_enter tag 192d548b9d754067 gpl
79
80 # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
81
82 Performance counter stats for 'BPF program(s) 17247':
83
84 85,967 cycles
85 28,982 instructions # 0.34 insn per cycle
86
87 1.102235068 seconds time elapsed
88
89 --bpf-counters
90 Use BPF programs to aggregate readings from perf_events. This
91 allows multiple perf-stat sessions that are counting the same
92 metric (cycles, instructions, etc.) to share hardware counters. To
93 use BPF programs on common events by default, use "perf config
94 stat.bpf-counter-events=<list_of_events>".
95
96 --bpf-attr-map
97 With option "--bpf-counters", different perf-stat sessions share
98 information about shared BPF programs and maps via a pinned
99 hashmap. Use "--bpf-attr-map" to specify the path of this pinned
100 hashmap. The default path is /sys/fs/bpf/perf_attr_map.
101
102 -a, --all-cpus
103 system-wide collection from all CPUs (default if no target is
104 specified)
105
106 --no-scale
107 Don’t scale/normalize counter values
108
109 -d, --detailed
110 print more detailed statistics, can be specified up to 3 times
111
112 -d: detailed events, L1 and LLC data cache
113 -d -d: more detailed events, dTLB and iTLB events
114 -d -d -d: very detailed events, adding prefetch events
115
116 -r, --repeat=<n>
117 repeat command and print average + stddev (max: 100). 0 means
118 forever.
119
120 -B, --big-num
121 print large numbers with thousands' separators according to locale.
122 Enabled by default. Use "--no-big-num" to disable. Default setting
123 can be changed with "perf config stat.big-num=false".
124
125 -C, --cpu=
126 Count only on the list of CPUs provided. Multiple CPUs can be
127 provided as a comma-separated list with no space: 0,1. Ranges of
128 CPUs are specified with -: 0-2. In per-thread mode, this option is
129 ignored. The -a option is still necessary to activate system-wide
130 monitoring. Default is to count on all CPUs.
131
132 -A, --no-aggr
133 Do not aggregate counts across all monitored CPUs.
134
135 -n, --null
136 null run - Don’t start any counters.
137
138 This can be useful to measure just elapsed wall-clock time - or to
139 assess the raw overhead of perf stat itself, without running any
140 counters.
141
142 -v, --verbose
143 be more verbose (show counter open errors, etc)
144
145 -x SEP, --field-separator SEP
146 print counts using a CSV-style output to make it easy to import
147 directly into spreadsheets. Columns are separated by the string
148 specified in SEP.
149
150 --table
151 Display time for each run (-r option), in a table format, e.g.:
152
153 $ perf stat --null -r 5 --table perf bench sched pipe
154
155 Performance counter stats for 'perf bench sched pipe' (5 runs):
156
157 # Table of individual measurements:
158 5.189 (-0.293) #
159 5.189 (-0.294) #
160 5.186 (-0.296) #
161 5.663 (+0.181) ##
162 6.186 (+0.703) ####
163
164 # Final result:
165 5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
166
167 -G name, --cgroup name
168 monitor only in the container (cgroup) called "name". This option
169 is available only in per-cpu mode. The cgroup filesystem must be
170 mounted. All threads belonging to container "name" are monitored
171 when they run on the monitored CPUs. Multiple cgroups can be
172 provided. Each cgroup is applied to the corresponding event, i.e.,
173 first cgroup to first event, second cgroup to second event and so
174 on. It is possible to provide an empty cgroup (monitor all the
175 time) using, e.g., -G foo,,bar. Cgroups must have corresponding
176 events, i.e., they always refer to events defined earlier on the
177 command line. If the user wants to track multiple events for a
178 specific cgroup, the user can use -e e1 -e e2 -G foo,foo or just
179 use -e e1 -e e2 -G foo.
180
181 If wanting to monitor, say, cycles for a cgroup and also for system
182 wide, this command line can be used: perf stat -e cycles -G cgroup_name
183 -a -e cycles.
184
185 --for-each-cgroup name
186 Expand event list for each cgroup in "name" (allow multiple cgroups
187 separated by comma). It also support regex patterns to match
188 multiple groups. This has same effect that repeating -e option and
189 -G option for each event x name. This option cannot be used with
190 -G/--cgroup option.
191
192 -o file, --output file
193 Print the output into the designated file.
194
195 --append
196 Append to the output file designated with the -o option. Ignored if
197 -o is not specified.
198
199 --log-fd
200 Log output to fd, instead of stderr. Complementary to --output, and
201 mutually exclusive with it. --append may be used here. Examples:
202 3>results perf stat --log-fd 3 -- $cmd 3>>results perf stat
203 --log-fd 3 --append -- $cmd
204
205 --control=fifo:ctl-fifo[,ack-fifo], --control=fd:ctl-fd[,ack-fd]
206 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as
207 follows. Listen on ctl-fd descriptor for command to control
208 measurement (enable: enable events, disable: disable events).
209 Measurements can be started with events disabled using --delay=-1
210 option. Optionally send control command completion (ack\n) to
211 ack-fd descriptor to synchronize with the controlling process.
212 Example of bash shell script to enable and disable events during
213 measurements:
214
215 #!/bin/bash
216
217 ctl_dir=/tmp/
218
219 ctl_fifo=${ctl_dir}perf_ctl.fifo
220 test -p ${ctl_fifo} && unlink ${ctl_fifo}
221 mkfifo ${ctl_fifo}
222 exec {ctl_fd}<>${ctl_fifo}
223
224 ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
225 test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
226 mkfifo ${ctl_ack_fifo}
227 exec {ctl_fd_ack}<>${ctl_ack_fifo}
228
229 perf stat -D -1 -e cpu-cycles -a -I 1000 \
230 --control fd:${ctl_fd},${ctl_fd_ack} \
231 \-- sleep 30 &
232 perf_pid=$!
233
234 sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
235 sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
236
237 exec {ctl_fd_ack}>&-
238 unlink ${ctl_ack_fifo}
239
240 exec {ctl_fd}>&-
241 unlink ${ctl_fifo}
242
243 wait -n ${perf_pid}
244 exit $?
245
246 --pre, --post
247 Pre and post measurement hooks, e.g.:
248
249 perf stat --repeat 10 --null --sync --pre make -s
250 O=defconfig-build/clean -- make -s -j64 O=defconfig-build/ bzImage
251
252 -I msecs, --interval-print msecs
253 Print count deltas every N milliseconds (minimum: 1ms) The overhead
254 percentage could be high in some cases, for instance with small,
255 sub 100ms intervals. Use with caution. example: perf stat -I 1000
256 -e cycles -a sleep 5
257
258 If the metric exists, it is calculated by the counts generated in this
259 interval and the metric is printed after #.
260
261 --interval-count times
262 Print count deltas for fixed number of times. This option should be
263 used together with "-I" option. example: perf stat -I 1000
264 --interval-count 2 -e cycles -a
265
266 --interval-clear
267 Clear the screen before next interval.
268
269 --timeout msecs
270 Stop the perf stat session and print count deltas after N
271 milliseconds (minimum: 10 ms). This option is not supported with
272 the "-I" option. example: perf stat --time 2000 -e cycles -a
273
274 --metric-only
275 Only print computed metrics. Print them in a single line. Don’t
276 show any raw values. Not supported with --per-thread.
277
278 --per-socket
279 Aggregate counts per processor socket for system-wide mode
280 measurements. This is a useful mode to detect imbalance between
281 sockets. To enable this mode, use --per-socket in addition to -a.
282 (system-wide). The output includes the socket number and the number
283 of online processors on that socket. This is useful to gauge the
284 amount of aggregation.
285
286 --per-die
287 Aggregate counts per processor die for system-wide mode
288 measurements. This is a useful mode to detect imbalance between
289 dies. To enable this mode, use --per-die in addition to -a.
290 (system-wide). The output includes the die number and the number of
291 online processors on that die. This is useful to gauge the amount
292 of aggregation.
293
294 --per-core
295 Aggregate counts per physical processor for system-wide mode
296 measurements. This is a useful mode to detect imbalance between
297 physical cores. To enable this mode, use --per-core in addition to
298 -a. (system-wide). The output includes the core number and the
299 number of online logical processors on that physical processor.
300
301 --per-thread
302 Aggregate counts per monitored threads, when monitoring threads (-t
303 option) or processes (-p option).
304
305 --per-node
306 Aggregate counts per NUMA nodes for system-wide mode measurements.
307 This is a useful mode to detect imbalance between NUMA nodes. To
308 enable this mode, use --per-node in addition to -a. (system-wide).
309
310 -D msecs, --delay msecs
311 After starting the program, wait msecs before measuring (-1: start
312 with events disabled). This is useful to filter out the startup
313 phase of the program, which is often very different.
314
315 -T, --transaction
316 Print statistics of transactional execution if supported.
317
318 --metric-no-group
319 By default, events to compute a metric are placed in weak groups.
320 The group tries to enforce scheduling all or none of the events.
321 The --metric-no-group option places events outside of groups and
322 may increase the chance of the event being scheduled - leading to
323 more accuracy. However, as events may not be scheduled together
324 accuracy for metrics like instructions per cycle can be lower - as
325 both metrics may no longer be being measured at the same time.
326
327 --metric-no-merge
328 By default metric events in different weak groups can be shared if
329 one group contains all the events needed by another. In such cases
330 one group will be eliminated reducing event multiplexing and making
331 it so that certain groups of metrics sum to 100%. A downside to
332 sharing a group is that the group may require multiplexing and so
333 accuracy for a small group that need not have multiplexing is
334 lowered. This option forbids the event merging logic from sharing
335 events between groups and may be used to increase accuracy in this
336 case.
337
338 --quiet
339 Don’t print output, warnings or messages. This is useful with perf
340 stat record below to only write data to the perf.data file.
341
343 Stores stat data into perf data file.
344
345 -o file, --output file
346 Output file name.
347
349 Reads and reports stat data from perf data file.
350
351 -i file, --input file
352 Input file name.
353
354 --per-socket
355 Aggregate counts per processor socket for system-wide mode
356 measurements.
357
358 --per-die
359 Aggregate counts per processor die for system-wide mode
360 measurements.
361
362 --per-core
363 Aggregate counts per physical processor for system-wide mode
364 measurements.
365
366 -M, --metrics
367 Print metrics or metricgroups specified in a comma separated list.
368 For a group all metrics from the group are added. The events from
369 the metrics are automatically measured. See perf list output for
370 the possible metrics and metricgroups.
371
372 -A, --no-aggr
373 Do not aggregate counts across all monitored CPUs.
374
375 --topdown
376 Print complete top-down metrics supported by the CPU. This allows
377 to determine bottle necks in the CPU pipeline for CPU bound
378 workloads, by breaking the cycles consumed down into frontend
379 bound, backend bound, bad speculation and retiring.
380
381 Frontend bound means that the CPU cannot fetch and decode instructions
382 fast enough. Backend bound means that computation or memory access is
383 the bottle neck. Bad Speculation means that the CPU wasted cycles due
384 to branch mispredictions and similar issues. Retiring means that the
385 CPU computed without an apparently bottleneck. The bottleneck is only
386 the real bottleneck if the workload is actually bound by the CPU and
387 not by something else.
388
389 For best results it is usually a good idea to use it with interval mode
390 like -I 1000, as the bottleneck of workloads can change often.
391
392 This enables --metric-only, unless overridden with --no-metric-only.
393
394 The following restrictions only apply to older Intel CPUs and Atom, on
395 newer CPUs (IceLake and later) TopDown can be collected for any thread:
396
397 The top down metrics are collected per core instead of per CPU thread.
398 Per core mode is automatically enabled and -a (global monitoring) is
399 needed, requiring root rights or perf.perf_event_paranoid=-1.
400
401 Topdown uses the full Performance Monitoring Unit, and needs disabling
402 of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog
403 for best results. Otherwise the bottlenecks may be inconsistent on
404 workload with changing phases.
405
406 To interpret the results it is usually needed to know on which CPUs the
407 workload runs on. If needed the CPUs can be forced using taskset.
408
409 --td-level
410 Print the top-down statistics that equal to or lower than the input
411 level. It allows users to print the interested top-down metrics
412 level instead of the complete top-down metrics.
413
414 The availability of the top-down metrics level depends on the hardware.
415 For example, Ice Lake only supports L1 top-down metrics. The Sapphire
416 Rapids supports both L1 and L2 top-down metrics.
417
418 Default: 0 means the max level that the current hardware support. Error
419 out if the input is higher than the supported max level.
420
421 --no-merge
422 Do not merge results from same PMUs.
423
424 When multiple events are created from a single event specification,
425 stat will, by default, aggregate the event counts and show the result
426 in a single row. This option disables that behavior and shows the
427 individual events and counts.
428
429 Multiple events are created from a single event specification when: 1.
430 Prefix or glob matching is used for the PMU name. 2. Aliases, which are
431 listed immediately after the Kernel PMU events by perf list, are used.
432
433 --hybrid-merge
434 Merge the hybrid event counts from all PMUs.
435
436 For hybrid events, by default, the stat aggregates and reports the
437 event counts per PMU. But sometimes, it’s also useful to aggregate
438 event counts from all PMUs. This option enables that behavior and
439 reports the counts without PMUs.
440
441 For non-hybrid events, it should be no effect.
442
443 --smi-cost
444 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
445
446 During the measurement, the /sys/device/cpu/freeze_on_smi will be set
447 to freeze core counters on SMI. The aperf counter will not be effected
448 by the setting. The cost of SMI can be measured by (aperf - unhalted
449 core cycles).
450
451 In practice, the percentages of SMI cycles is very useful for
452 performance oriented analysis. --metric_only will be applied by
453 default. The output is SMI cycles%, equals to (aperf - unhalted core
454 cycles) / aperf
455
456 Users who wants to get the actual value can apply --no-metric-only.
457
458 --all-kernel
459 Configure all used events to run in kernel space.
460
461 --all-user
462 Configure all used events to run in user space.
463
464 --percore-show-thread
465 The event modifier "percore" has supported to sum up the event
466 counts for all hardware threads in a core and show the counts per
467 core.
468
469 This option with event modifier "percore" enabled also sums up the
470 event counts for all hardware threads in a core but show the sum counts
471 per hardware thread. This is essentially a replacement for the any bit
472 and convenient for post processing.
473
474 --summary
475 Print summary for interval mode (-I).
476
477 --no-csv-summary
478 Don’t print summary at the first column for CVS summary output.
479 This option must be used with -x and --summary.
480
481 This option can be enabled in perf config by setting the variable
482 stat.no-csv-summary.
483
484 $ perf config stat.no-csv-summary=true
485
486 --cputype
487 Only enable events on applying cpu with this type for hybrid
488 platform (e.g. core or atom)"
489
491 $ perf stat -- make
492
493 Performance counter stats for 'make':
494
495 83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
496 0 context-switches:u # 0.000 K/sec
497 0 cpu-migrations:u # 0.000 K/sec
498 3,228,188 page-faults:u # 0.039 M/sec
499 229,570,665,834 cycles:u # 2.742 GHz
500 313,163,853,778 instructions:u # 1.36 insn per cycle
501 69,704,684,856 branches:u # 832.559 M/sec
502 2,078,861,393 branch-misses:u # 2.98% of all branches
503
504 83.409183620 seconds time elapsed
505
506 74.684747000 seconds user
507 8.739217000 seconds sys
508
510 As displayed in the example above we can display 3 types of timings. We
511 always display the time the counters were enabled/alive:
512
513 83.409183620 seconds time elapsed
514
515 For workload sessions we also display time the workloads spent in
516 user/system lands:
517
518 74.684747000 seconds user
519 8.739217000 seconds sys
520
521 Those times are the very same as displayed by the time tool.
522
524 With -x, perf stat is able to output a not-quite-CSV format output
525 Commas in the output are not put into "". To make it easy to parse it
526 is recommended to use a different character like -x \;
527
528 The fields are in this order:
529
530 • optional usec time stamp in fractions of second (with -I xxx)
531
532 • optional CPU, core, or socket identifier
533
534 • optional number of logical CPUs aggregated
535
536 • counter value
537
538 • unit of the counter value or empty
539
540 • event name
541
542 • run time of counter
543
544 • percentage of measurement time the counter was running
545
546 • optional variance if multiple values are collected with -r
547
548 • optional metric value
549
550 • optional unit of metric
551
552 Additional metrics may be printed with all earlier fields being empty.
553
555 Support for Intel hybrid events within perf tools.
556
557 For some Intel platforms, such as AlderLake, which is hybrid platform
558 and it consists of atom cpu and core cpu. Each cpu has dedicated event
559 list. Part of events are available on core cpu, part of events are
560 available on atom cpu and even part of events are available on both.
561
562 Kernel exports two new cpu pmus via sysfs: /sys/devices/cpu_core
563 /sys/devices/cpu_atom
564
565 The cpus files are created under the directories. For example,
566
567 cat /sys/devices/cpu_core/cpus 0-15
568
569 cat /sys/devices/cpu_atom/cpus 16-23
570
571 It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
572
573 As before, use perf-list to list the symbolic event.
574
575 perf list
576
577 inst_retired.any [Fixed Counter: Counts the number of instructions
578 retired. Unit: cpu_atom] inst_retired.any [Number of instructions
579 retired. Fixed Counter - architectural event. Unit: cpu_core]
580
581 The Unit: xxx is added to brief description to indicate which pmu the
582 event is belong to. Same event name but with different pmu can be
583 supported.
584
585 Enable hybrid event with a specific pmu
586
587 To enable a core only event or atom only event, following syntax is
588 supported:
589
590 cpu_core/<event name>/
591 or
592 cpu_atom/<event name>/
593
594 For example, count the cycles event on core cpus.
595
596 perf stat -e cpu_core/cycles/
597
598 Create two events for one hardware event automatically
599
600 When creating one event and the event is available on both atom and
601 core, two events are created automatically. One is for atom, the other
602 is for core. Most of hardware events and cache events are available on
603 both cpu_core and cpu_atom.
604
605 For hardware events, they have pre-defined configs (e.g. 0 for cycles).
606 But on hybrid platform, kernel needs to know where the event comes from
607 (from atom or from core). The original perf event type
608 PERF_TYPE_HARDWARE can’t carry pmu information. So now this type is
609 extended to be PMU aware type. The PMU type ID is stored at
610 attr.config[63:32].
611
612 PMU type ID is retrieved from sysfs. /sys/devices/cpu_atom/type
613 /sys/devices/cpu_core/type
614
615 The new attr.config layout for PERF_TYPE_HARDWARE:
616
617 PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA AA: hardware event ID EEEEEEEE:
618 PMU type ID
619
620 Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
621 PMU aware type. The PMU type ID is stored at attr.config[63:32].
622
623 The new attr.config layout for PERF_TYPE_HW_CACHE:
624
625 PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB BB: hardware cache ID CC:
626 hardware cache op ID DD: hardware cache op result ID EEEEEEEE: PMU type
627 ID
628
629 When enabling a hardware event without specified pmu, such as, perf
630 stat -e cycles -a (use system-wide in this example), two events are
631 created automatically.
632
633 ------------------------------------------------------------
634 perf_event_attr:
635 size 120
636 config 0x400000000
637 sample_type IDENTIFIER
638 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
639 disabled 1
640 inherit 1
641 exclude_guest 1
642 ------------------------------------------------------------
643
644 and
645
646 ------------------------------------------------------------
647 perf_event_attr:
648 size 120
649 config 0x800000000
650 sample_type IDENTIFIER
651 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
652 disabled 1
653 inherit 1
654 exclude_guest 1
655 ------------------------------------------------------------
656
657 type 0 is PERF_TYPE_HARDWARE. 0x4 in 0x400000000 indicates it’s
658 cpu_core pmu. 0x8 in 0x800000000 indicates it’s cpu_atom pmu (atom pmu
659 type id is random).
660
661 The kernel creates cycles (0x400000000) on cpu0-cpu15 (core cpus), and
662 create cycles (0x800000000) on cpu16-cpu23 (atom cpus).
663
664 For perf-stat result, it displays two events:
665
666 Performance counter stats for 'system wide':
667
668 6,744,979 cpu_core/cycles/
669 1,965,552 cpu_atom/cycles/
670
671 The first cycles is core event, the second cycles is atom event.
672
673 Thread mode example:
674
675 perf-stat reports the scaled counts for hybrid event and with a
676 percentage displayed. The percentage is the event’s running
677 time/enabling time.
678
679 One example, triad_loop runs on cpu16 (atom core), while we can see the
680 scaled value for core cycles is 160,444,092 and the percentage is
681 0.47%.
682
683 perf stat -e cycles -- taskset -c 16 ./triad_loop
684
685 As previous, two events are created.
686
687
688 .ft C
689 perf_event_attr:
690 size 120
691 config 0x400000000
692 sample_type IDENTIFIER
693 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
694 disabled 1
695 inherit 1
696 enable_on_exec 1
697 exclude_guest 1
698 .ft
699
700
701 and
702
703
704 .ft C
705 perf_event_attr:
706 size 120
707 config 0x800000000
708 sample_type IDENTIFIER
709 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
710 disabled 1
711 inherit 1
712 enable_on_exec 1
713 exclude_guest 1
714 .ft
715
716
717 Performance counter stats for 'taskset -c 16 ./triad_loop':
718
719 233,066,666 cpu_core/cycles/ (0.43%)
720 604,097,080 cpu_atom/cycles/ (99.57%)
721
722 perf-record:
723
724 If there is no -e specified in perf record, on hybrid platform, it
725 creates two default cycles and adds them to event list. One is for
726 core, the other is for atom.
727
728 perf-stat:
729
730 If there is no -e specified in perf stat, on hybrid platform, besides
731 of software events, following events are created and added to event
732 list in order.
733
734 cpu_core/cycles/, cpu_atom/cycles/, cpu_core/instructions/,
735 cpu_atom/instructions/, cpu_core/branches/, cpu_atom/branches/,
736 cpu_core/branch-misses/, cpu_atom/branch-misses/
737
738 Of course, both perf-stat and perf-record support to enable hybrid
739 event with a specific pmu.
740
741 e.g. perf stat -e cpu_core/cycles/ perf stat -e cpu_atom/cycles/ perf
742 stat -e cpu_core/r1a/ perf stat -e cpu_atom/L1-icache-loads/ perf stat
743 -e cpu_core/cycles/,cpu_atom/instructions/ perf stat -e
744 {cpu_core/cycles/,cpu_core/instructions/}
745
746 But {cpu_core/cycles/,cpu_atom/instructions/} will return warning and
747 disable grouping, because the pmus in group are not matched (cpu_core
748 vs. cpu_atom).
749
751 With -j, perf stat is able to print out a JSON format output that can
752 be used for parsing.
753
754 • timestamp : optional usec time stamp in fractions of second (with
755 -I)
756
757 • optional aggregate options:
758
759 • core : core identifier (with --per-core)
760
761 • die : die identifier (with --per-die)
762
763 • socket : socket identifier (with --per-socket)
764
765 • node : node identifier (with --per-node)
766
767 • thread : thread identifier (with --per-thread)
768
769 • counter-value : counter value
770
771 • unit : unit of the counter value or empty
772
773 • event : event name
774
775 • variance : optional variance if multiple values are collected (with
776 -r)
777
778 • runtime : run time of counter
779
780 • metric-value : optional metric value
781
782 • metric-unit : optional unit of metric
783
785 perf-top(1), perf-list(1)
786
787
788
789perf 01/12/2023 PERF-STAT(1)