perf-c2c(1)

1PERF-C2C(1)                       perf Manual                      PERF-C2C(1)
2
3
4

NAME

6       perf-c2c - Shared Data C2C/HITM Analyzer.
7

SYNOPSIS

9       perf c2c record [<options>] <command>
10       perf c2c record [<options>] — [<record command options>] <command>
11       perf c2c report [<options>]
12

DESCRIPTION

14       C2C stands for Cache To Cache.
15
16       The perf c2c tool provides means for Shared Data C2C/HITM analysis. It
17       allows you to track down the cacheline contentions.
18
19       On x86, the tool is based on load latency and precise store facility
20       events provided by Intel CPUs. On PowerPC, the tool uses random
21       instruction sampling with thresholding feature.
22
23       These events provide: - memory address of the access - type of the
24       access (load and store details) - latency (in cycles) of the load
25       access
26
27       The c2c tool provide means to record this data and report back access
28       details for cachelines with highest contention - highest number of HITM
29       accesses.
30
31       The basic workflow with this tool follows the standard record/report
32       phase. User uses the record command to record events data and report
33       command to display it.
34

RECORD OPTIONS

36       -e, --event=
37           Select the PMU event. Use perf mem record -e list to list available
38           events.
39
40       -v, --verbose
41           Be more verbose (show counter open errors, etc).
42
43       -l, --ldlat
44           Configure mem-loads latency. (x86 only)
45
46       -k, --all-kernel
47           Configure all used events to run in kernel space.
48
49       -u, --all-user
50           Configure all used events to run in user space.
51

REPORT OPTIONS

53       -k, --vmlinux=<file>
54           vmlinux pathname
55
56       -v, --verbose
57           Be more verbose (show counter open errors, etc).
58
59       -i, --input
60           Specify the input file to process.
61
62       -N, --node-info
63           Show extra node info in report (see NODE INFO section)
64
65       -c, --coalesce
66           Specify sorting fields for single cacheline display. Following
67           fields are available: tid,pid,iaddr,dso (see COALESCE)
68
69       -g, --call-graph
70           Setup callchains parameters. Please refer to perf-report man page
71           for details.
72
73       --stdio
74           Force the stdio output (see STDIO OUTPUT)
75
76       --stats
77           Display only statistic tables and force stdio mode.
78
79       --full-symbols
80           Display full length of symbols.
81
82       --no-source
83           Do not display Source:Line column.
84
85       --show-all
86           Show all captured HITM lines, with no regard to HITM % 0.0005
87           limit.
88
89       -f, --force
90           Don’t do ownership validation.
91
92       -d, --display
93           Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs
94           as default.
95

C2C RECORD

97       The perf c2c record command setup options related to HITM cacheline
98       analysis and calls standard perf record command.
99
100       Following perf record options are configured by default: (check perf
101       record man page for details)
102
103           -W,-d,--phys-data,--sample-cpu
104
105       Unless specified otherwise with -e option, following events are
106       monitored by default on x86:
107
108           cpu/mem-loads,ldlat=30/P
109           cpu/mem-stores/P
110
111       and following on PowerPC:
112
113           cpu/mem-loads/
114           cpu/mem-stores/
115
116       User can pass any perf record option behind -- mark, like (to enable
117       callchains and system wide monitoring):
118
119           $ perf c2c record -- -g -a
120
121       Please check RECORD OPTIONS section for specific c2c record options.
122

C2C REPORT

124       The perf c2c report command displays shared data analysis. It comes in
125       two display modes: stdio and tui (default).
126
127       The report command workflow is following: - sort all the data based on
128       the cacheline address - store access details for each cacheline - sort
129       all cachelines based on user settings - display data
130
131       In general perf report output consist of 2 basic views: 1) most
132       expensive cachelines list 2) offsets details for each cacheline
133
134       For each cacheline in the 1) list we display following data: (Both
135       stdio and TUI modes follow the same fields output)
136
137           Index
138           - zero based index to identify the cacheline
139
140           Cacheline
141           - cacheline address (hex number)
142
143           Total records
144           - sum of all cachelines accesses
145
146           Rmt/Lcl Hitm
147           - cacheline percentage of all Remote/Local HITM accesses
148
149           LLC Load Hitm - Total, Lcl, Rmt
150           - count of Total/Local/Remote load HITMs
151
152           Store Reference - Total, L1Hit, L1Miss
153             Total - all store accesses
154             L1Hit - store accesses that hit L1
155             L1Hit - store accesses that missed L1
156
157           Load Dram
158           - count of local and remote DRAM accesses
159
160           LLC Ld Miss
161           - count of all accesses that missed LLC
162
163           Total Loads
164           - sum of all load accesses
165
166           Core Load Hit - FB, L1, L2
167           - count of load hits in FB (Fill Buffer), L1 and L2 cache
168
169           LLC Load Hit - Llc, Rmt
170           - count of LLC and Remote load hits
171
172       For each offset in the 2) list we display following data:
173
174           HITM - Rmt, Lcl
175           - % of Remote/Local HITM accesses for given offset within cacheline
176
177           Store Refs - L1 Hit, L1 Miss
178           - % of store accesses that hit/missed L1 for given offset within cacheline
179
180           Data address - Offset
181           - offset address
182
183           Pid
184           - pid of the process responsible for the accesses
185
186           Tid
187           - tid of the process responsible for the accesses
188
189           Code address
190           - code address responsible for the accesses
191
192           cycles - rmt hitm, lcl hitm, load
193             - sum of cycles for given accesses - Remote/Local HITM and generic load
194
195           cpu cnt
196             - number of cpus that participated on the access
197
198           Symbol
199             - code symbol related to the 'Code address' value
200
201           Shared Object
202             - shared object name related to the 'Code address' value
203
204           Source:Line
205             - source information related to the 'Code address' value
206
207           Node
208             - nodes participating on the access (see NODE INFO section)
209

NODE INFO

211       The Node field displays nodes that accesses given cacheline offset. Its
212       output comes in 3 flavors: - node IDs separated by , - node IDs with
213       stats for each ID, in following format: Node{cpus %hitms %stores} -
214       node IDs with list of affected CPUs in following format: Node{cpu list}
215
216       User can switch between above flavors with -N option or use n key to
217       interactively switch in TUI mode.
218

COALESCE

220       User can specify how to sort offsets for cacheline.
221
222       Following fields are available and governs the final output fields set
223       for caheline offsets output:
224
225           tid   - coalesced by process TIDs
226           pid   - coalesced by process PIDs
227           iaddr - coalesced by code address, following fields are displayed:
228                      Code address, Code symbol, Shared Object, Source line
229           dso   - coalesced by shared object
230
231       By default the coalescing is setup with pid,iaddr.
232

STDIO OUTPUT

234       The stdio output displays data on standard output.
235
236       Following tables are displayed: Trace Event Information - overall
237       statistics of memory accesses
238
239           Global Shared Cache Line Event Information
240           - overall statistics on shared cachelines
241
242           Shared Data Cache Line Table
243           - list of most expensive cachelines
244
245           Shared Cache Line Distribution Pareto
246           - list of all accessed offsets for each cacheline
247

TUI OUTPUT

249       The TUI output provides interactive interface to navigate through
250       cachelines list and to display offset details.
251
252       For details please refer to the help window by pressing ? key.
253

CREDITS

255       Although Don Zickus, Dick Fowles and Joe Mario worked together to get
256       this implemented, we got lots of early help from Arnaldo Carvalho de
257       Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
258

C2C BLOG

260       Check Joe’s blog on c2c tool for detailed use case explanation:
261       https://joemario.github.io/blog/2016/09/01/c2c-blog/
262