1PERF-STAT(1)                      perf Manual                     PERF-STAT(1)
2
3
4

NAME

6       perf-stat - Run a command and gather performance counter statistics
7

SYNOPSIS

9       perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
10       perf stat [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
11       perf stat [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
12       perf stat report [-i file]
13

DESCRIPTION

15       This command runs a command and gathers performance counter statistics
16       from it.
17

OPTIONS

19       <command>...
20           Any command you can specify in a shell.
21
22       record
23           See STAT RECORD.
24
25       report
26           See STAT REPORT.
27
28       -e, --event=
29           Select the PMU event. Selection can be:
30
31           •   a symbolic event name (use perf list to list all events)
32
33           •   a raw PMU event in the form of rN where N is a hexadecimal
34               value that represents the raw register encoding with the layout
35               of the event control registers as described by entries in
36               /sys/bus/event_source/devices/cpu/format/*.
37
38           •   a symbolic or raw PMU event followed by an optional colon and a
39               list of event modifiers, e.g., cpu-cycles:p. See the perf-
40               list(1) man page for details on event modifiers.
41
42           •   a symbolically formed event like pmu/param1=0x3,param2/ where
43               param1 and param2 are defined as formats for the PMU in
44               /sys/bus/event_source/devices/<pmu>/format/*
45
46                   'percore' is a event qualifier that sums up the event counts for both
47                   hardware threads in a core. For example:
48                   perf stat -A -a -e cpu/event,percore=1/,otherevent ...
49
50           •   a symbolically formed event like
51               pmu/config=M,config1=N,config2=K/ where M, N, K are numbers (in
52               decimal, hex, octal format). Acceptable values for each of
53               config, config1 and config2 parameters are defined by
54               corresponding entries in
55               /sys/bus/event_source/devices/<pmu>/format/*
56
57                   Note that the last two syntaxes support prefix and glob matching in
58                   the PMU name to simplify creation of events across multiple instances
59                   of the same type of PMU in large systems (e.g. memory controller PMUs).
60                   Multiple PMU instances are typical for uncore PMUs, so the prefix
61                   'uncore_' is also ignored when performing this match.
62
63       -i, --no-inherit
64           child tasks do not inherit counters
65
66       -p, --pid=<pid>
67           stat events on existing process id (comma separated list)
68
69       -t, --tid=<tid>
70           stat events on existing thread id (comma separated list)
71
72       -b, --bpf-prog
73           stat events on existing bpf program id (comma separated list),
74           requiring root rights. bpftool-prog could be used to find program
75           id all bpf programs in the system. For example:
76
77               # bpftool prog | head -n 1
78               17247: tracepoint  name sys_enter  tag 192d548b9d754067  gpl
79
80               # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
81
82               Performance counter stats for 'BPF program(s) 17247':
83
84               85,967      cycles
85               28,982      instructions              #    0.34  insn per cycle
86
87               1.102235068 seconds time elapsed
88
89       --bpf-counters
90           Use BPF programs to aggregate readings from perf_events. This
91           allows multiple perf-stat sessions that are counting the same
92           metric (cycles, instructions, etc.) to share hardware counters. To
93           use BPF programs on common events by default, use "perf config
94           stat.bpf-counter-events=<list_of_events>".
95
96       --bpf-attr-map
97           With option "--bpf-counters", different perf-stat sessions share
98           information about shared BPF programs and maps via a pinned
99           hashmap. Use "--bpf-attr-map" to specify the path of this pinned
100           hashmap. The default path is /sys/fs/bpf/perf_attr_map.
101
102       -a, --all-cpus
103           system-wide collection from all CPUs (default if no target is
104           specified)
105
106       --no-scale
107           Don’t scale/normalize counter values
108
109       -d, --detailed
110           print more detailed statistics, can be specified up to 3 times
111
112                     -d:          detailed events, L1 and LLC data cache
113                  -d -d:     more detailed events, dTLB and iTLB events
114               -d -d -d:     very detailed events, adding prefetch events
115
116       -r, --repeat=<n>
117           repeat command and print average + stddev (max: 100). 0 means
118           forever.
119
120       -B, --big-num
121           print large numbers with thousands' separators according to locale.
122           Enabled by default. Use "--no-big-num" to disable. Default setting
123           can be changed with "perf config stat.big-num=false".
124
125       -C, --cpu=
126           Count only on the list of CPUs provided. Multiple CPUs can be
127           provided as a comma-separated list with no space: 0,1. Ranges of
128           CPUs are specified with -: 0-2. In per-thread mode, this option is
129           ignored. The -a option is still necessary to activate system-wide
130           monitoring. Default is to count on all CPUs.
131
132       -A, --no-aggr
133           Do not aggregate counts across all monitored CPUs.
134
135       -n, --null
136           null run - Don’t start any counters.
137
138       This can be useful to measure just elapsed wall-clock time - or to
139       assess the raw overhead of perf stat itself, without running any
140       counters.
141
142       -v, --verbose
143           be more verbose (show counter open errors, etc)
144
145       -x SEP, --field-separator SEP
146           print counts using a CSV-style output to make it easy to import
147           directly into spreadsheets. Columns are separated by the string
148           specified in SEP.
149
150       --table
151           Display time for each run (-r option), in a table format, e.g.:
152
153               $ perf stat --null -r 5 --table perf bench sched pipe
154
155               Performance counter stats for 'perf bench sched pipe' (5 runs):
156
157               # Table of individual measurements:
158               5.189 (-0.293) #
159               5.189 (-0.294) #
160               5.186 (-0.296) #
161               5.663 (+0.181) ##
162               6.186 (+0.703) ####
163
164               # Final result:
165               5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
166
167       -G name, --cgroup name
168           monitor only in the container (cgroup) called "name". This option
169           is available only in per-cpu mode. The cgroup filesystem must be
170           mounted. All threads belonging to container "name" are monitored
171           when they run on the monitored CPUs. Multiple cgroups can be
172           provided. Each cgroup is applied to the corresponding event, i.e.,
173           first cgroup to first event, second cgroup to second event and so
174           on. It is possible to provide an empty cgroup (monitor all the
175           time) using, e.g., -G foo,,bar. Cgroups must have corresponding
176           events, i.e., they always refer to events defined earlier on the
177           command line. If the user wants to track multiple events for a
178           specific cgroup, the user can use -e e1 -e e2 -G foo,foo or just
179           use -e e1 -e e2 -G foo.
180
181       If wanting to monitor, say, cycles for a cgroup and also for system
182       wide, this command line can be used: perf stat -e cycles -G cgroup_name
183       -a -e cycles.
184
185       --for-each-cgroup name
186           Expand event list for each cgroup in "name" (allow multiple cgroups
187           separated by comma). It also support regex patterns to match
188           multiple groups. This has same effect that repeating -e option and
189           -G option for each event x name. This option cannot be used with
190           -G/--cgroup option.
191
192       -o file, --output file
193           Print the output into the designated file.
194
195       --append
196           Append to the output file designated with the -o option. Ignored if
197           -o is not specified.
198
199       --log-fd
200           Log output to fd, instead of stderr. Complementary to --output, and
201           mutually exclusive with it. --append may be used here. Examples:
202           3>results perf stat --log-fd 3 -- $cmd 3>>results perf stat
203           --log-fd 3 --append -- $cmd
204
205       --control=fifo:ctl-fifo[,ack-fifo], --control=fd:ctl-fd[,ack-fd]
206           ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as
207           follows. Listen on ctl-fd descriptor for command to control
208           measurement (enable: enable events, disable: disable events).
209           Measurements can be started with events disabled using --delay=-1
210           option. Optionally send control command completion (ack\n) to
211           ack-fd descriptor to synchronize with the controlling process.
212           Example of bash shell script to enable and disable events during
213           measurements:
214
215               #!/bin/bash
216
217               ctl_dir=/tmp/
218
219               ctl_fifo=${ctl_dir}perf_ctl.fifo
220               test -p ${ctl_fifo} && unlink ${ctl_fifo}
221               mkfifo ${ctl_fifo}
222               exec {ctl_fd}<>${ctl_fifo}
223
224               ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
225               test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
226               mkfifo ${ctl_ack_fifo}
227               exec {ctl_fd_ack}<>${ctl_ack_fifo}
228
229               perf stat -D -1 -e cpu-cycles -a -I 1000       \
230                         --control fd:${ctl_fd},${ctl_fd_ack} \
231                         \-- sleep 30 &
232               perf_pid=$!
233
234               sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
235               sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
236
237               exec {ctl_fd_ack}>&-
238               unlink ${ctl_ack_fifo}
239
240               exec {ctl_fd}>&-
241               unlink ${ctl_fifo}
242
243               wait -n ${perf_pid}
244               exit $?
245
246       --pre, --post
247           Pre and post measurement hooks, e.g.:
248
249       perf stat --repeat 10 --null --sync --pre make -s
250       O=defconfig-build/clean -- make -s -j64 O=defconfig-build/ bzImage
251
252       -I msecs, --interval-print msecs
253           Print count deltas every N milliseconds (minimum: 1ms) The overhead
254           percentage could be high in some cases, for instance with small,
255           sub 100ms intervals. Use with caution. example: perf stat -I 1000
256           -e cycles -a sleep 5
257
258       If the metric exists, it is calculated by the counts generated in this
259       interval and the metric is printed after #.
260
261       --interval-count times
262           Print count deltas for fixed number of times. This option should be
263           used together with "-I" option. example: perf stat -I 1000
264           --interval-count 2 -e cycles -a
265
266       --interval-clear
267           Clear the screen before next interval.
268
269       --timeout msecs
270           Stop the perf stat session and print count deltas after N
271           milliseconds (minimum: 10 ms). This option is not supported with
272           the "-I" option. example: perf stat --time 2000 -e cycles -a
273
274       --metric-only
275           Only print computed metrics. Print them in a single line. Don’t
276           show any raw values. Not supported with --per-thread.
277
278       --per-socket
279           Aggregate counts per processor socket for system-wide mode
280           measurements. This is a useful mode to detect imbalance between
281           sockets. To enable this mode, use --per-socket in addition to -a.
282           (system-wide). The output includes the socket number and the number
283           of online processors on that socket. This is useful to gauge the
284           amount of aggregation.
285
286       --per-die
287           Aggregate counts per processor die for system-wide mode
288           measurements. This is a useful mode to detect imbalance between
289           dies. To enable this mode, use --per-die in addition to -a.
290           (system-wide). The output includes the die number and the number of
291           online processors on that die. This is useful to gauge the amount
292           of aggregation.
293
294       --per-cache
295           Aggregate counts per cache instance for system-wide mode
296           measurements. By default, the aggregation happens for the cache
297           level at the highest index in the system. To specify a particular
298           level, mention the cache level alongside the option in the format
299           [Ll][1-9][0-9]*. For example: Using option "--per-cache=l3" or
300           "--per-cache=L3" will aggregate the information at the boundary of
301           the level 3 cache in the system.
302
303       --per-core
304           Aggregate counts per physical processor for system-wide mode
305           measurements. This is a useful mode to detect imbalance between
306           physical cores. To enable this mode, use --per-core in addition to
307           -a. (system-wide). The output includes the core number and the
308           number of online logical processors on that physical processor.
309
310       --per-thread
311           Aggregate counts per monitored threads, when monitoring threads (-t
312           option) or processes (-p option).
313
314       --per-node
315           Aggregate counts per NUMA nodes for system-wide mode measurements.
316           This is a useful mode to detect imbalance between NUMA nodes. To
317           enable this mode, use --per-node in addition to -a. (system-wide).
318
319       -D msecs, --delay msecs
320           After starting the program, wait msecs before measuring (-1: start
321           with events disabled). This is useful to filter out the startup
322           phase of the program, which is often very different.
323
324       -T, --transaction
325           Print statistics of transactional execution if supported.
326
327       --metric-no-group
328           By default, events to compute a metric are placed in weak groups.
329           The group tries to enforce scheduling all or none of the events.
330           The --metric-no-group option places events outside of groups and
331           may increase the chance of the event being scheduled - leading to
332           more accuracy. However, as events may not be scheduled together
333           accuracy for metrics like instructions per cycle can be lower - as
334           both metrics may no longer be being measured at the same time.
335
336       --metric-no-merge
337           By default metric events in different weak groups can be shared if
338           one group contains all the events needed by another. In such cases
339           one group will be eliminated reducing event multiplexing and making
340           it so that certain groups of metrics sum to 100%. A downside to
341           sharing a group is that the group may require multiplexing and so
342           accuracy for a small group that need not have multiplexing is
343           lowered. This option forbids the event merging logic from sharing
344           events between groups and may be used to increase accuracy in this
345           case.
346
347       --metric-no-threshold
348           Metric thresholds may increase the number of events necessary to
349           compute whether a metric has exceeded its threshold expression.
350           This may not be desirable, for example, as the events can introduce
351           multiplexing. This option disables the adding of threshold
352           expression events for a metric. However, if there are sufficient
353           events to compute the threshold then the threshold is still
354           computed and used to color the metric’s computed value.
355
356       --quiet
357           Don’t print output, warnings or messages. This is useful with perf
358           stat record below to only write data to the perf.data file.
359

STAT RECORD

361       Stores stat data into perf data file.
362
363       -o file, --output file
364           Output file name.
365

STAT REPORT

367       Reads and reports stat data from perf data file.
368
369       -i file, --input file
370           Input file name.
371
372       --per-socket
373           Aggregate counts per processor socket for system-wide mode
374           measurements.
375
376       --per-die
377           Aggregate counts per processor die for system-wide mode
378           measurements.
379
380       --per-cache
381           Aggregate counts per cache instance for system-wide mode
382           measurements. By default, the aggregation happens for the cache
383           level at the highest index in the system. To specify a particular
384           level, mention the cache level alongside the option in the format
385           [Ll][1-9][0-9]*. For example: Using option "--per-cache=l3" or
386           "--per-cache=L3" will aggregate the information at the boundary of
387           the level 3 cache in the system.
388
389       --per-core
390           Aggregate counts per physical processor for system-wide mode
391           measurements.
392
393       -M, --metrics
394           Print metrics or metricgroups specified in a comma separated list.
395           For a group all metrics from the group are added. The events from
396           the metrics are automatically measured. See perf list output for
397           the possible metrics and metricgroups.
398
399               When threshold information is available for a metric, the
400               color red is used to signify a metric has exceeded a threshold
401               while green shows it hasn't. The default color means that
402               no threshold information was available or the threshold
403               couldn't be computed.
404
405       -A, --no-aggr
406           Do not aggregate counts across all monitored CPUs.
407
408       --topdown
409           Print top-down metrics supported by the CPU. This allows to
410           determine bottle necks in the CPU pipeline for CPU bound workloads,
411           by breaking the cycles consumed down into frontend bound, backend
412           bound, bad speculation and retiring.
413
414       Frontend bound means that the CPU cannot fetch and decode instructions
415       fast enough. Backend bound means that computation or memory access is
416       the bottle neck. Bad Speculation means that the CPU wasted cycles due
417       to branch mispredictions and similar issues. Retiring means that the
418       CPU computed without an apparently bottleneck. The bottleneck is only
419       the real bottleneck if the workload is actually bound by the CPU and
420       not by something else.
421
422       For best results it is usually a good idea to use it with interval mode
423       like -I 1000, as the bottleneck of workloads can change often.
424
425       This enables --metric-only, unless overridden with --no-metric-only.
426
427       The following restrictions only apply to older Intel CPUs and Atom, on
428       newer CPUs (IceLake and later) TopDown can be collected for any thread:
429
430       The top down metrics are collected per core instead of per CPU thread.
431       Per core mode is automatically enabled and -a (global monitoring) is
432       needed, requiring root rights or perf.perf_event_paranoid=-1.
433
434       Topdown uses the full Performance Monitoring Unit, and needs disabling
435       of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog
436       for best results. Otherwise the bottlenecks may be inconsistent on
437       workload with changing phases.
438
439       To interpret the results it is usually needed to know on which CPUs the
440       workload runs on. If needed the CPUs can be forced using taskset.
441
442       --td-level
443           Print the top-down statistics that equal the input level. It allows
444           users to print the interested top-down metrics level instead of the
445           level 1 top-down metrics.
446
447       As the higher levels gather more metrics and use more counters they
448       will be less accurate. By convention a metric can be examined by
449       appending _group to it and this will increase accuracy compared to
450       gathering all metrics for a level. For example, level 1 analysis may
451       highlight tma_frontend_bound. This metric may be drilled into with
452       tma_frontend_bound_group with perf stat -M tma_frontend_bound_group....
453
454       Error out if the input is higher than the supported max level.
455
456       --no-merge
457           Do not merge results from same PMUs.
458
459       When multiple events are created from a single event specification,
460       stat will, by default, aggregate the event counts and show the result
461       in a single row. This option disables that behavior and shows the
462       individual events and counts.
463
464       Multiple events are created from a single event specification when: 1.
465       Prefix or glob matching is used for the PMU name. 2. Aliases, which are
466       listed immediately after the Kernel PMU events by perf list, are used.
467
468       --hybrid-merge
469           Merge the hybrid event counts from all PMUs.
470
471       For hybrid events, by default, the stat aggregates and reports the
472       event counts per PMU. But sometimes, it’s also useful to aggregate
473       event counts from all PMUs. This option enables that behavior and
474       reports the counts without PMUs.
475
476       For non-hybrid events, it should be no effect.
477
478       --smi-cost
479           Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
480
481       During the measurement, the /sys/device/cpu/freeze_on_smi will be set
482       to freeze core counters on SMI. The aperf counter will not be effected
483       by the setting. The cost of SMI can be measured by (aperf - unhalted
484       core cycles).
485
486       In practice, the percentages of SMI cycles is very useful for
487       performance oriented analysis. --metric_only will be applied by
488       default. The output is SMI cycles%, equals to (aperf - unhalted core
489       cycles) / aperf
490
491       Users who wants to get the actual value can apply --no-metric-only.
492
493       --all-kernel
494           Configure all used events to run in kernel space.
495
496       --all-user
497           Configure all used events to run in user space.
498
499       --percore-show-thread
500           The event modifier "percore" has supported to sum up the event
501           counts for all hardware threads in a core and show the counts per
502           core.
503
504       This option with event modifier "percore" enabled also sums up the
505       event counts for all hardware threads in a core but show the sum counts
506       per hardware thread. This is essentially a replacement for the any bit
507       and convenient for post processing.
508
509       --summary
510           Print summary for interval mode (-I).
511
512       --no-csv-summary
513           Don’t print summary at the first column for CVS summary output.
514           This option must be used with -x and --summary.
515
516       This option can be enabled in perf config by setting the variable
517       stat.no-csv-summary.
518
519       $ perf config stat.no-csv-summary=true
520
521       --cputype
522           Only enable events on applying cpu with this type for hybrid
523           platform (e.g. core or atom)"
524

EXAMPLES

526       $ perf stat -- make
527
528           Performance counter stats for 'make':
529
530              83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
531                         0      context-switches:u        #    0.000 K/sec
532                         0      cpu-migrations:u          #    0.000 K/sec
533                 3,228,188      page-faults:u             #    0.039 M/sec
534           229,570,665,834      cycles:u                  #    2.742 GHz
535           313,163,853,778      instructions:u            #    1.36  insn per cycle
536            69,704,684,856      branches:u                #  832.559 M/sec
537             2,078,861,393      branch-misses:u           #    2.98% of all branches
538
539           83.409183620 seconds time elapsed
540
541           74.684747000 seconds user
542            8.739217000 seconds sys
543

TIMINGS

545       As displayed in the example above we can display 3 types of timings. We
546       always display the time the counters were enabled/alive:
547
548           83.409183620 seconds time elapsed
549
550       For workload sessions we also display time the workloads spent in
551       user/system lands:
552
553           74.684747000 seconds user
554            8.739217000 seconds sys
555
556       Those times are the very same as displayed by the time tool.
557

CSV FORMAT

559       With -x, perf stat is able to output a not-quite-CSV format output
560       Commas in the output are not put into "". To make it easy to parse it
561       is recommended to use a different character like -x \;
562
563       The fields are in this order:
564
565       •   optional usec time stamp in fractions of second (with -I xxx)
566
567       •   optional CPU, core, or socket identifier
568
569       •   optional number of logical CPUs aggregated
570
571       •   counter value
572
573       •   unit of the counter value or empty
574
575       •   event name
576
577       •   run time of counter
578
579       •   percentage of measurement time the counter was running
580
581       •   optional variance if multiple values are collected with -r
582
583       •   optional metric value
584
585       •   optional unit of metric
586
587       Additional metrics may be printed with all earlier fields being empty.
588

INTEL HYBRID SUPPORT

590       Support for Intel hybrid events within perf tools.
591
592       For some Intel platforms, such as AlderLake, which is hybrid platform
593       and it consists of atom cpu and core cpu. Each cpu has dedicated event
594       list. Part of events are available on core cpu, part of events are
595       available on atom cpu and even part of events are available on both.
596
597       Kernel exports two new cpu pmus via sysfs: /sys/devices/cpu_core
598       /sys/devices/cpu_atom
599
600       The cpus files are created under the directories. For example,
601
602       cat /sys/devices/cpu_core/cpus 0-15
603
604       cat /sys/devices/cpu_atom/cpus 16-23
605
606       It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
607
608       As before, use perf-list to list the symbolic event.
609
610       perf list
611
612       inst_retired.any [Fixed Counter: Counts the number of instructions
613       retired. Unit: cpu_atom] inst_retired.any [Number of instructions
614       retired. Fixed Counter - architectural event. Unit: cpu_core]
615
616       The Unit: xxx is added to brief description to indicate which pmu the
617       event is belong to. Same event name but with different pmu can be
618       supported.
619
620       Enable hybrid event with a specific pmu
621
622       To enable a core only event or atom only event, following syntax is
623       supported:
624
625                   cpu_core/<event name>/
626           or
627                   cpu_atom/<event name>/
628
629       For example, count the cycles event on core cpus.
630
631           perf stat -e cpu_core/cycles/
632
633       Create two events for one hardware event automatically
634
635       When creating one event and the event is available on both atom and
636       core, two events are created automatically. One is for atom, the other
637       is for core. Most of hardware events and cache events are available on
638       both cpu_core and cpu_atom.
639
640       For hardware events, they have pre-defined configs (e.g. 0 for cycles).
641       But on hybrid platform, kernel needs to know where the event comes from
642       (from atom or from core). The original perf event type
643       PERF_TYPE_HARDWARE can’t carry pmu information. So now this type is
644       extended to be PMU aware type. The PMU type ID is stored at
645       attr.config[63:32].
646
647       PMU type ID is retrieved from sysfs. /sys/devices/cpu_atom/type
648       /sys/devices/cpu_core/type
649
650       The new attr.config layout for PERF_TYPE_HARDWARE:
651
652       PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA AA: hardware event ID EEEEEEEE:
653       PMU type ID
654
655       Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
656       PMU aware type. The PMU type ID is stored at attr.config[63:32].
657
658       The new attr.config layout for PERF_TYPE_HW_CACHE:
659
660       PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB BB: hardware cache ID CC:
661       hardware cache op ID DD: hardware cache op result ID EEEEEEEE: PMU type
662       ID
663
664       When enabling a hardware event without specified pmu, such as, perf
665       stat -e cycles -a (use system-wide in this example), two events are
666       created automatically.
667
668           ------------------------------------------------------------
669           perf_event_attr:
670             size                             120
671             config                           0x400000000
672             sample_type                      IDENTIFIER
673             read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
674             disabled                         1
675             inherit                          1
676             exclude_guest                    1
677           ------------------------------------------------------------
678
679       and
680
681           ------------------------------------------------------------
682           perf_event_attr:
683             size                             120
684             config                           0x800000000
685             sample_type                      IDENTIFIER
686             read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
687             disabled                         1
688             inherit                          1
689             exclude_guest                    1
690           ------------------------------------------------------------
691
692       type 0 is PERF_TYPE_HARDWARE. 0x4 in 0x400000000 indicates it’s
693       cpu_core pmu. 0x8 in 0x800000000 indicates it’s cpu_atom pmu (atom pmu
694       type id is random).
695
696       The kernel creates cycles (0x400000000) on cpu0-cpu15 (core cpus), and
697       create cycles (0x800000000) on cpu16-cpu23 (atom cpus).
698
699       For perf-stat result, it displays two events:
700
701           Performance counter stats for 'system wide':
702
703           6,744,979      cpu_core/cycles/
704           1,965,552      cpu_atom/cycles/
705
706       The first cycles is core event, the second cycles is atom event.
707
708       Thread mode example:
709
710       perf-stat reports the scaled counts for hybrid event and with a
711       percentage displayed. The percentage is the event’s running
712       time/enabling time.
713
714       One example, triad_loop runs on cpu16 (atom core), while we can see the
715       scaled value for core cycles is 160,444,092 and the percentage is
716       0.47%.
717
718       perf stat -e cycles -- taskset -c 16 ./triad_loop
719
720       As previous, two events are created.
721
722
723           .ft C
724           perf_event_attr:
725             size                             120
726             config                           0x400000000
727             sample_type                      IDENTIFIER
728             read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
729             disabled                         1
730             inherit                          1
731             enable_on_exec                   1
732             exclude_guest                    1
733           .ft
734
735
736       and
737
738
739           .ft C
740           perf_event_attr:
741             size                             120
742             config                           0x800000000
743             sample_type                      IDENTIFIER
744             read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
745             disabled                         1
746             inherit                          1
747             enable_on_exec                   1
748             exclude_guest                    1
749           .ft
750
751
752           Performance counter stats for 'taskset -c 16 ./triad_loop':
753
754           233,066,666      cpu_core/cycles/                                              (0.43%)
755           604,097,080      cpu_atom/cycles/                                              (99.57%)
756
757       perf-record:
758
759       If there is no -e specified in perf record, on hybrid platform, it
760       creates two default cycles and adds them to event list. One is for
761       core, the other is for atom.
762
763       perf-stat:
764
765       If there is no -e specified in perf stat, on hybrid platform, besides
766       of software events, following events are created and added to event
767       list in order.
768
769       cpu_core/cycles/, cpu_atom/cycles/, cpu_core/instructions/,
770       cpu_atom/instructions/, cpu_core/branches/, cpu_atom/branches/,
771       cpu_core/branch-misses/, cpu_atom/branch-misses/
772
773       Of course, both perf-stat and perf-record support to enable hybrid
774       event with a specific pmu.
775
776       e.g. perf stat -e cpu_core/cycles/ perf stat -e cpu_atom/cycles/ perf
777       stat -e cpu_core/r1a/ perf stat -e cpu_atom/L1-icache-loads/ perf stat
778       -e cpu_core/cycles/,cpu_atom/instructions/ perf stat -e
779       {cpu_core/cycles/,cpu_core/instructions/}
780
781       But {cpu_core/cycles/,cpu_atom/instructions/} will return warning and
782       disable grouping, because the pmus in group are not matched (cpu_core
783       vs. cpu_atom).
784

JSON FORMAT

786       With -j, perf stat is able to print out a JSON format output that can
787       be used for parsing.
788
789       •   timestamp : optional usec time stamp in fractions of second (with
790           -I)
791
792       •   optional aggregate options:
793
794       •   core : core identifier (with --per-core)
795
796       •   die : die identifier (with --per-die)
797
798       •   socket : socket identifier (with --per-socket)
799
800       •   node : node identifier (with --per-node)
801
802       •   thread : thread identifier (with --per-thread)
803
804       •   counter-value : counter value
805
806       •   unit : unit of the counter value or empty
807
808       •   event : event name
809
810       •   variance : optional variance if multiple values are collected (with
811           -r)
812
813       •   runtime : run time of counter
814
815       •   metric-value : optional metric value
816
817       •   metric-unit : optional unit of metric
818

SEE ALSO

820       perf-top(1), perf-list(1)
821
822
823
824perf                              11/28/2023                      PERF-STAT(1)
Impressum