1PERF-STAT(1) perf Manual PERF-STAT(1)
2
3
4
6 perf-stat - Run a command and gather performance counter statistics
7
9 perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
10 perf stat [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
11 perf stat [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
12 perf stat report [-i file]
13
15 This command runs a command and gathers performance counter statistics
16 from it.
17
19 <command>...
20 Any command you can specify in a shell.
21
22 record
23 See STAT RECORD.
24
25 report
26 See STAT REPORT.
27
28 -e, --event=
29 Select the PMU event. Selection can be:
30
31 • a symbolic event name (use perf list to list all events)
32
33 • a raw PMU event (eventsel+umask) in the form of rNNN where NNN
34 is a hexadecimal event descriptor.
35
36 • a symbolic or raw PMU event followed by an optional colon and a
37 list of event modifiers, e.g., cpu-cycles:p. See the perf-
38 list(1) man page for details on event modifiers.
39
40 • a symbolically formed event like pmu/param1=0x3,param2/ where
41 param1 and param2 are defined as formats for the PMU in
42 /sys/bus/event_source/devices/<pmu>/format/*
43
44 'percore' is a event qualifier that sums up the event counts for both
45 hardware threads in a core. For example:
46 perf stat -A -a -e cpu/event,percore=1/,otherevent ...
47
48 • a symbolically formed event like
49 pmu/config=M,config1=N,config2=K/ where M, N, K are numbers (in
50 decimal, hex, octal format). Acceptable values for each of
51 config, config1 and config2 parameters are defined by
52 corresponding entries in
53 /sys/bus/event_source/devices/<pmu>/format/*
54
55 Note that the last two syntaxes support prefix and glob matching in
56 the PMU name to simplify creation of events across multiple instances
57 of the same type of PMU in large systems (e.g. memory controller PMUs).
58 Multiple PMU instances are typical for uncore PMUs, so the prefix
59 'uncore_' is also ignored when performing this match.
60
61 -i, --no-inherit
62 child tasks do not inherit counters
63
64 -p, --pid=<pid>
65 stat events on existing process id (comma separated list)
66
67 -t, --tid=<tid>
68 stat events on existing thread id (comma separated list)
69
70 -b, --bpf-prog
71 stat events on existing bpf program id (comma separated list),
72 requiring root rights. bpftool-prog could be used to find program
73 id all bpf programs in the system. For example:
74
75 # bpftool prog | head -n 1
76 17247: tracepoint name sys_enter tag 192d548b9d754067 gpl
77
78 # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
79
80 Performance counter stats for 'BPF program(s) 17247':
81
82 85,967 cycles
83 28,982 instructions # 0.34 insn per cycle
84
85 1.102235068 seconds time elapsed
86
87 --bpf-counters
88 Use BPF programs to aggregate readings from perf_events. This
89 allows multiple perf-stat sessions that are counting the same
90 metric (cycles, instructions, etc.) to share hardware counters. To
91 use BPF programs on common events by default, use "perf config
92 stat.bpf-counter-events=<list_of_events>".
93
94 --bpf-attr-map
95 With option "--bpf-counters", different perf-stat sessions share
96 information about shared BPF programs and maps via a pinned
97 hashmap. Use "--bpf-attr-map" to specify the path of this pinned
98 hashmap. The default path is /sys/fs/bpf/perf_attr_map.
99
100 -a, --all-cpus
101 system-wide collection from all CPUs (default if no target is
102 specified)
103
104 --no-scale
105 Don’t scale/normalize counter values
106
107 -d, --detailed
108 print more detailed statistics, can be specified up to 3 times
109
110 -d: detailed events, L1 and LLC data cache
111 -d -d: more detailed events, dTLB and iTLB events
112 -d -d -d: very detailed events, adding prefetch events
113
114 -r, --repeat=<n>
115 repeat command and print average + stddev (max: 100). 0 means
116 forever.
117
118 -B, --big-num
119 print large numbers with thousands' separators according to locale.
120 Enabled by default. Use "--no-big-num" to disable. Default setting
121 can be changed with "perf config stat.big-num=false".
122
123 -C, --cpu=
124 Count only on the list of CPUs provided. Multiple CPUs can be
125 provided as a comma-separated list with no space: 0,1. Ranges of
126 CPUs are specified with -: 0-2. In per-thread mode, this option is
127 ignored. The -a option is still necessary to activate system-wide
128 monitoring. Default is to count on all CPUs.
129
130 -A, --no-aggr
131 Do not aggregate counts across all monitored CPUs.
132
133 -n, --null
134 null run - Don’t start any counters.
135
136 This can be useful to measure just elapsed wall-clock time - or to
137 assess the raw overhead of perf stat itself, without running any
138 counters.
139
140 -v, --verbose
141 be more verbose (show counter open errors, etc)
142
143 -x SEP, --field-separator SEP
144 print counts using a CSV-style output to make it easy to import
145 directly into spreadsheets. Columns are separated by the string
146 specified in SEP.
147
148 --table
149 Display time for each run (-r option), in a table format, e.g.:
150
151 $ perf stat --null -r 5 --table perf bench sched pipe
152
153 Performance counter stats for 'perf bench sched pipe' (5 runs):
154
155 # Table of individual measurements:
156 5.189 (-0.293) #
157 5.189 (-0.294) #
158 5.186 (-0.296) #
159 5.663 (+0.181) ##
160 6.186 (+0.703) ####
161
162 # Final result:
163 5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
164
165 -G name, --cgroup name
166 monitor only in the container (cgroup) called "name". This option
167 is available only in per-cpu mode. The cgroup filesystem must be
168 mounted. All threads belonging to container "name" are monitored
169 when they run on the monitored CPUs. Multiple cgroups can be
170 provided. Each cgroup is applied to the corresponding event, i.e.,
171 first cgroup to first event, second cgroup to second event and so
172 on. It is possible to provide an empty cgroup (monitor all the
173 time) using, e.g., -G foo,,bar. Cgroups must have corresponding
174 events, i.e., they always refer to events defined earlier on the
175 command line. If the user wants to track multiple events for a
176 specific cgroup, the user can use -e e1 -e e2 -G foo,foo or just
177 use -e e1 -e e2 -G foo.
178
179 If wanting to monitor, say, cycles for a cgroup and also for system
180 wide, this command line can be used: perf stat -e cycles -G cgroup_name
181 -a -e cycles.
182
183 --for-each-cgroup name
184 Expand event list for each cgroup in "name" (allow multiple cgroups
185 separated by comma). It also support regex patterns to match
186 multiple groups. This has same effect that repeating -e option and
187 -G option for each event x name. This option cannot be used with
188 -G/--cgroup option.
189
190 -o file, --output file
191 Print the output into the designated file.
192
193 --append
194 Append to the output file designated with the -o option. Ignored if
195 -o is not specified.
196
197 --log-fd
198 Log output to fd, instead of stderr. Complementary to --output, and
199 mutually exclusive with it. --append may be used here. Examples:
200 3>results perf stat --log-fd 3 -- $cmd 3>>results perf stat
201 --log-fd 3 --append -- $cmd
202
203 --control=fifo:ctl-fifo[,ack-fifo], --control=fd:ctl-fd[,ack-fd]
204 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as
205 follows. Listen on ctl-fd descriptor for command to control
206 measurement (enable: enable events, disable: disable events).
207 Measurements can be started with events disabled using --delay=-1
208 option. Optionally send control command completion (ack\n) to
209 ack-fd descriptor to synchronize with the controlling process.
210 Example of bash shell script to enable and disable events during
211 measurements:
212
213 #!/bin/bash
214
215 ctl_dir=/tmp/
216
217 ctl_fifo=${ctl_dir}perf_ctl.fifo
218 test -p ${ctl_fifo} && unlink ${ctl_fifo}
219 mkfifo ${ctl_fifo}
220 exec {ctl_fd}<>${ctl_fifo}
221
222 ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
223 test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
224 mkfifo ${ctl_ack_fifo}
225 exec {ctl_fd_ack}<>${ctl_ack_fifo}
226
227 perf stat -D -1 -e cpu-cycles -a -I 1000 \
228 --control fd:${ctl_fd},${ctl_fd_ack} \
229 \-- sleep 30 &
230 perf_pid=$!
231
232 sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
233 sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
234
235 exec {ctl_fd_ack}>&-
236 unlink ${ctl_ack_fifo}
237
238 exec {ctl_fd}>&-
239 unlink ${ctl_fifo}
240
241 wait -n ${perf_pid}
242 exit $?
243
244 --pre, --post
245 Pre and post measurement hooks, e.g.:
246
247 perf stat --repeat 10 --null --sync --pre make -s
248 O=defconfig-build/clean -- make -s -j64 O=defconfig-build/ bzImage
249
250 -I msecs, --interval-print msecs
251 Print count deltas every N milliseconds (minimum: 1ms) The overhead
252 percentage could be high in some cases, for instance with small,
253 sub 100ms intervals. Use with caution. example: perf stat -I 1000
254 -e cycles -a sleep 5
255
256 If the metric exists, it is calculated by the counts generated in this
257 interval and the metric is printed after #.
258
259 --interval-count times
260 Print count deltas for fixed number of times. This option should be
261 used together with "-I" option. example: perf stat -I 1000
262 --interval-count 2 -e cycles -a
263
264 --interval-clear
265 Clear the screen before next interval.
266
267 --timeout msecs
268 Stop the perf stat session and print count deltas after N
269 milliseconds (minimum: 10 ms). This option is not supported with
270 the "-I" option. example: perf stat --time 2000 -e cycles -a
271
272 --metric-only
273 Only print computed metrics. Print them in a single line. Don’t
274 show any raw values. Not supported with --per-thread.
275
276 --per-socket
277 Aggregate counts per processor socket for system-wide mode
278 measurements. This is a useful mode to detect imbalance between
279 sockets. To enable this mode, use --per-socket in addition to -a.
280 (system-wide). The output includes the socket number and the number
281 of online processors on that socket. This is useful to gauge the
282 amount of aggregation.
283
284 --per-die
285 Aggregate counts per processor die for system-wide mode
286 measurements. This is a useful mode to detect imbalance between
287 dies. To enable this mode, use --per-die in addition to -a.
288 (system-wide). The output includes the die number and the number of
289 online processors on that die. This is useful to gauge the amount
290 of aggregation.
291
292 --per-core
293 Aggregate counts per physical processor for system-wide mode
294 measurements. This is a useful mode to detect imbalance between
295 physical cores. To enable this mode, use --per-core in addition to
296 -a. (system-wide). The output includes the core number and the
297 number of online logical processors on that physical processor.
298
299 --per-thread
300 Aggregate counts per monitored threads, when monitoring threads (-t
301 option) or processes (-p option).
302
303 --per-node
304 Aggregate counts per NUMA nodes for system-wide mode measurements.
305 This is a useful mode to detect imbalance between NUMA nodes. To
306 enable this mode, use --per-node in addition to -a. (system-wide).
307
308 -D msecs, --delay msecs
309 After starting the program, wait msecs before measuring (-1: start
310 with events disabled). This is useful to filter out the startup
311 phase of the program, which is often very different.
312
313 -T, --transaction
314 Print statistics of transactional execution if supported.
315
316 --metric-no-group
317 By default, events to compute a metric are placed in weak groups.
318 The group tries to enforce scheduling all or none of the events.
319 The --metric-no-group option places events outside of groups and
320 may increase the chance of the event being scheduled - leading to
321 more accuracy. However, as events may not be scheduled together
322 accuracy for metrics like instructions per cycle can be lower - as
323 both metrics may no longer be being measured at the same time.
324
325 --metric-no-merge
326 By default metric events in different weak groups can be shared if
327 one group contains all the events needed by another. In such cases
328 one group will be eliminated reducing event multiplexing and making
329 it so that certain groups of metrics sum to 100%. A downside to
330 sharing a group is that the group may require multiplexing and so
331 accuracy for a small group that need not have multiplexing is
332 lowered. This option forbids the event merging logic from sharing
333 events between groups and may be used to increase accuracy in this
334 case.
335
336 --quiet
337 Don’t print output. This is useful with perf stat record below to
338 only write data to the perf.data file.
339
341 Stores stat data into perf data file.
342
343 -o file, --output file
344 Output file name.
345
347 Reads and reports stat data from perf data file.
348
349 -i file, --input file
350 Input file name.
351
352 --per-socket
353 Aggregate counts per processor socket for system-wide mode
354 measurements.
355
356 --per-die
357 Aggregate counts per processor die for system-wide mode
358 measurements.
359
360 --per-core
361 Aggregate counts per physical processor for system-wide mode
362 measurements.
363
364 -M, --metrics
365 Print metrics or metricgroups specified in a comma separated list.
366 For a group all metrics from the group are added. The events from
367 the metrics are automatically measured. See perf list output for
368 the possible metrics and metricgroups.
369
370 -A, --no-aggr
371 Do not aggregate counts across all monitored CPUs.
372
373 --topdown
374 Print complete top-down metrics supported by the CPU. This allows
375 to determine bottle necks in the CPU pipeline for CPU bound
376 workloads, by breaking the cycles consumed down into frontend
377 bound, backend bound, bad speculation and retiring.
378
379 Frontend bound means that the CPU cannot fetch and decode instructions
380 fast enough. Backend bound means that computation or memory access is
381 the bottle neck. Bad Speculation means that the CPU wasted cycles due
382 to branch mispredictions and similar issues. Retiring means that the
383 CPU computed without an apparently bottleneck. The bottleneck is only
384 the real bottleneck if the workload is actually bound by the CPU and
385 not by something else.
386
387 For best results it is usually a good idea to use it with interval mode
388 like -I 1000, as the bottleneck of workloads can change often.
389
390 This enables --metric-only, unless overridden with --no-metric-only.
391
392 The following restrictions only apply to older Intel CPUs and Atom, on
393 newer CPUs (IceLake and later) TopDown can be collected for any thread:
394
395 The top down metrics are collected per core instead of per CPU thread.
396 Per core mode is automatically enabled and -a (global monitoring) is
397 needed, requiring root rights or perf.perf_event_paranoid=-1.
398
399 Topdown uses the full Performance Monitoring Unit, and needs disabling
400 of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog
401 for best results. Otherwise the bottlenecks may be inconsistent on
402 workload with changing phases.
403
404 To interpret the results it is usually needed to know on which CPUs the
405 workload runs on. If needed the CPUs can be forced using taskset.
406
407 --td-level
408 Print the top-down statistics that equal to or lower than the input
409 level. It allows users to print the interested top-down metrics
410 level instead of the complete top-down metrics.
411
412 The availability of the top-down metrics level depends on the hardware.
413 For example, Ice Lake only supports L1 top-down metrics. The Sapphire
414 Rapids supports both L1 and L2 top-down metrics.
415
416 Default: 0 means the max level that the current hardware support. Error
417 out if the input is higher than the supported max level.
418
419 --no-merge
420 Do not merge results from same PMUs.
421
422 When multiple events are created from a single event specification,
423 stat will, by default, aggregate the event counts and show the result
424 in a single row. This option disables that behavior and shows the
425 individual events and counts.
426
427 Multiple events are created from a single event specification when: 1.
428 Prefix or glob matching is used for the PMU name. 2. Aliases, which are
429 listed immediately after the Kernel PMU events by perf list, are used.
430
431 --smi-cost
432 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
433
434 During the measurement, the /sys/device/cpu/freeze_on_smi will be set
435 to freeze core counters on SMI. The aperf counter will not be effected
436 by the setting. The cost of SMI can be measured by (aperf - unhalted
437 core cycles).
438
439 In practice, the percentages of SMI cycles is very useful for
440 performance oriented analysis. --metric_only will be applied by
441 default. The output is SMI cycles%, equals to (aperf - unhalted core
442 cycles) / aperf
443
444 Users who wants to get the actual value can apply --no-metric-only.
445
446 --all-kernel
447 Configure all used events to run in kernel space.
448
449 --all-user
450 Configure all used events to run in user space.
451
452 --percore-show-thread
453 The event modifier "percore" has supported to sum up the event
454 counts for all hardware threads in a core and show the counts per
455 core.
456
457 This option with event modifier "percore" enabled also sums up the
458 event counts for all hardware threads in a core but show the sum counts
459 per hardware thread. This is essentially a replacement for the any bit
460 and convenient for post processing.
461
462 --summary
463 Print summary for interval mode (-I).
464
465 --no-csv-summary
466 Don’t print summary at the first column for CVS summary output.
467 This option must be used with -x and --summary.
468
469 This option can be enabled in perf config by setting the variable
470 stat.no-csv-summary.
471
472 $ perf config stat.no-csv-summary=true
473
475 $ perf stat -- make
476
477 Performance counter stats for 'make':
478
479 83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
480 0 context-switches:u # 0.000 K/sec
481 0 cpu-migrations:u # 0.000 K/sec
482 3,228,188 page-faults:u # 0.039 M/sec
483 229,570,665,834 cycles:u # 2.742 GHz
484 313,163,853,778 instructions:u # 1.36 insn per cycle
485 69,704,684,856 branches:u # 832.559 M/sec
486 2,078,861,393 branch-misses:u # 2.98% of all branches
487
488 83.409183620 seconds time elapsed
489
490 74.684747000 seconds user
491 8.739217000 seconds sys
492
494 As displayed in the example above we can display 3 types of timings. We
495 always display the time the counters were enabled/alive:
496
497 83.409183620 seconds time elapsed
498
499 For workload sessions we also display time the workloads spent in
500 user/system lands:
501
502 74.684747000 seconds user
503 8.739217000 seconds sys
504
505 Those times are the very same as displayed by the time tool.
506
508 With -x, perf stat is able to output a not-quite-CSV format output
509 Commas in the output are not put into "". To make it easy to parse it
510 is recommended to use a different character like -x \;
511
512 The fields are in this order:
513
514 • optional usec time stamp in fractions of second (with -I xxx)
515
516 • optional CPU, core, or socket identifier
517
518 • optional number of logical CPUs aggregated
519
520 • counter value
521
522 • unit of the counter value or empty
523
524 • event name
525
526 • run time of counter
527
528 • percentage of measurement time the counter was running
529
530 • optional variance if multiple values are collected with -r
531
532 • optional metric value
533
534 • optional unit of metric
535
536 Additional metrics may be printed with all earlier fields being empty.
537
539 Support for Intel hybrid events within perf tools.
540
541 For some Intel platforms, such as AlderLake, which is hybrid platform
542 and it consists of atom cpu and core cpu. Each cpu has dedicated event
543 list. Part of events are available on core cpu, part of events are
544 available on atom cpu and even part of events are available on both.
545
546 Kernel exports two new cpu pmus via sysfs: /sys/devices/cpu_core
547 /sys/devices/cpu_atom
548
549 The cpus files are created under the directories. For example,
550
551 cat /sys/devices/cpu_core/cpus 0-15
552
553 cat /sys/devices/cpu_atom/cpus 16-23
554
555 It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
556
557 Quickstart
558
560 As before, use perf-list to list the symbolic event.
561
562 perf list
563
564 inst_retired.any [Fixed Counter: Counts the number of instructions
565 retired. Unit: cpu_atom] inst_retired.any [Number of instructions
566 retired. Fixed Counter - architectural event. Unit: cpu_core]
567
568 The Unit: xxx is added to brief description to indicate which pmu the
569 event is belong to. Same event name but with different pmu can be
570 supported.
571
573 To enable a core only event or atom only event, following syntax is
574 supported:
575
576 cpu_core/<event name>/
577 or
578 cpu_atom/<event name>/
579
580 For example, count the cycles event on core cpus.
581
582 perf stat -e cpu_core/cycles/
583
585 When creating one event and the event is available on both atom and
586 core, two events are created automatically. One is for atom, the other
587 is for core. Most of hardware events and cache events are available on
588 both cpu_core and cpu_atom.
589
590 For hardware events, they have pre-defined configs (e.g. 0 for cycles).
591 But on hybrid platform, kernel needs to know where the event comes from
592 (from atom or from core). The original perf event type
593 PERF_TYPE_HARDWARE can’t carry pmu information. So now this type is
594 extended to be PMU aware type. The PMU type ID is stored at
595 attr.config[63:32].
596
597 PMU type ID is retrieved from sysfs. /sys/devices/cpu_atom/type
598 /sys/devices/cpu_core/type
599
600 The new attr.config layout for PERF_TYPE_HARDWARE:
601
602 PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA AA: hardware event ID EEEEEEEE:
603 PMU type ID
604
605 Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
606 PMU aware type. The PMU type ID is stored at attr.config[63:32].
607
608 The new attr.config layout for PERF_TYPE_HW_CACHE:
609
610 PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB BB: hardware cache ID CC:
611 hardware cache op ID DD: hardware cache op result ID EEEEEEEE: PMU type
612 ID
613
614 When enabling a hardware event without specified pmu, such as, perf
615 stat -e cycles -a (use system-wide in this example), two events are
616 created automatically.
617
618 ------------------------------------------------------------
619 perf_event_attr:
620 size 120
621 config 0x400000000
622 sample_type IDENTIFIER
623 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
624 disabled 1
625 inherit 1
626 exclude_guest 1
627 ------------------------------------------------------------
628
629 and
630
631 ------------------------------------------------------------
632 perf_event_attr:
633 size 120
634 config 0x800000000
635 sample_type IDENTIFIER
636 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
637 disabled 1
638 inherit 1
639 exclude_guest 1
640 ------------------------------------------------------------
641
642 type 0 is PERF_TYPE_HARDWARE. 0x4 in 0x400000000 indicates it’s
643 cpu_core pmu. 0x8 in 0x800000000 indicates it’s cpu_atom pmu (atom pmu
644 type id is random).
645
646 The kernel creates cycles (0x400000000) on cpu0-cpu15 (core cpus), and
647 create cycles (0x800000000) on cpu16-cpu23 (atom cpus).
648
649 For perf-stat result, it displays two events:
650
651 Performance counter stats for 'system wide':
652
653 6,744,979 cpu_core/cycles/
654 1,965,552 cpu_atom/cycles/
655
656 The first cycles is core event, the second cycles is atom event.
657
659 perf-stat reports the scaled counts for hybrid event and with a
660 percentage displayed. The percentage is the event’s running
661 time/enabling time.
662
663 One example, triad_loop runs on cpu16 (atom core), while we can see the
664 scaled value for core cycles is 160,444,092 and the percentage is
665 0.47%.
666
667 perf stat -e cycles -- taskset -c 16 ./triad_loop
668
669 As previous, two events are created.
670
671
672 .ft C
673 perf_event_attr:
674 size 120
675 config 0x400000000
676 sample_type IDENTIFIER
677 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
678 disabled 1
679 inherit 1
680 enable_on_exec 1
681 exclude_guest 1
682 .ft
683
684
685 and
686
687
688 .ft C
689 perf_event_attr:
690 size 120
691 config 0x800000000
692 sample_type IDENTIFIER
693 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
694 disabled 1
695 inherit 1
696 enable_on_exec 1
697 exclude_guest 1
698 .ft
699
700
701 Performance counter stats for 'taskset -c 16 ./triad_loop':
702
703 233,066,666 cpu_core/cycles/ (0.43%)
704 604,097,080 cpu_atom/cycles/ (99.57%)
705
707 If there is no -e specified in perf record, on hybrid platform, it
708 creates two default cycles and adds them to event list. One is for
709 core, the other is for atom.
710
712 If there is no -e specified in perf stat, on hybrid platform, besides
713 of software events, following events are created and added to event
714 list in order.
715
716 cpu_core/cycles/, cpu_atom/cycles/, cpu_core/instructions/,
717 cpu_atom/instructions/, cpu_core/branches/, cpu_atom/branches/,
718 cpu_core/branch-misses/, cpu_atom/branch-misses/
719
720 Of course, both perf-stat and perf-record support to enable hybrid
721 event with a specific pmu.
722
723 e.g. perf stat -e cpu_core/cycles/ perf stat -e cpu_atom/cycles/ perf
724 stat -e cpu_core/r1a/ perf stat -e cpu_atom/L1-icache-loads/ perf stat
725 -e cpu_core/cycles/,cpu_atom/instructions/ perf stat -e
726 {cpu_core/cycles/,cpu_core/instructions/}
727
728 But {cpu_core/cycles/,cpu_atom/instructions/} will return warning and
729 disable grouping, because the pmus in group are not matched (cpu_core
730 vs. cpu_atom).
731
733 perf-top(1), perf-list(1)
734
735
736
737perf 11/22/2021 PERF-STAT(1)