1mprof-report(1) General Commands Manual mprof-report(1)
2
3
4
6 The Mono log profiler can be used to collect a lot of information about
7 a program running in the Mono runtime. This data can be used (both
8 while the process is running and later) to do analyses of the program
9 behaviour, determine resource usage, performance issues or even look
10 for particular execution patterns.
11
12 This is accomplished by logging the events provided by the Mono runtime
13 through the profiling interface and periodically writing them to a file
14 which can be later inspected with the command line mprof-report program
15 or with a GUI (not developed yet).
16
17 The events collected include (among others):
18
19 · method enter and leave
20
21 · object allocation
22
23 · garbage collection
24
25 · JIT compilation
26
27 · metadata loading
28
29 · lock contention
30
31 · exceptions
32
33 In addition, the profiler can periodically collect info about all the
34 objects present in the heap at the end of a garbage collection (this is
35 called heap shot and currently implemented only for the sgen garbage
36 collector). Another available profiler mode is the sampling or statis‐
37 tical mode: periodically the program is sampled and the information
38 about what the program was busy with is saved. This allows to get
39 information about the program behaviour without degrading its perfor‐
40 mance too much (usually less than 10%).
41
42 Basic profiler usage
43 The simpler way to use the profiler is the following:
44
45 mono --profile=log program.exe
46
47 At the end of the execution the file output.mlpd will be found in the
48 current directory. A summary report of the data can be printed by run‐
49 ning:
50
51 mprof-report output.mlpd
52
53 With this invocation a huge amount of data is collected about the pro‐
54 gram execution and collecting and saving this data can significantly
55 slow down program execution. If saving the profiling data is not
56 needed, a report can be generated directly with:
57
58 mono --profile=log:report program.exe
59
60 If the information about allocations is not of interest, it can be
61 excluded:
62
63 mono --profile=log:noalloc program.exe
64
65 On the other hand, if method call timing is not important, while allo‐
66 cations are, the needed info can be gathered with:
67
68 mono --profile=log:nocalls program.exe
69
70 You will still be able to inspect information about the sequence of
71 calls that lead to each allocation because at each object allocation a
72 stack trace is collected if full enter/leave information is not avail‐
73 able.
74
75 To periodically collect heap shots (and exclude method and allocation
76 events) use the following options (making sure you run with the sgen
77 garbage collector):
78
79 mono --gc=sgen --profile=log:heapshot program.exe
80
81 To perform a sampling profiler run, use the sample option:
82
83 mono --profile=log:sample program.exe
84
85 Profiler option documentation
86 By default the log profiler will gather all the events provided by the
87 Mono runtime and write them to a file named output.mlpd. When no
88 option is specified, it is equivalent to using:
89
90 --profile=log:calls,alloc,output=output.mlpd,maxframes=32,calldepth=100
91
92 The following options can be used to modify this default behaviour.
93 Each option is separated from the next by a , character, with no spaces
94 and all the options are included after the log: profile module speci‐
95 fier.
96
97 · help: display concise help info about each available option
98
99 · [no]alloc: noalloc disables collecting object allocation info, alloc
100 enables it if it was disabled by another option like heapshot.
101
102 · [no]calls: nocalls disables collecting method enter and leave events.
103 When this option is used at each object allocation and at some other
104 events (like lock contentions and exception throws) a stack trace is
105 collected by default. See the maxframes option to control this be‐
106 haviour. calls enables method enter/leave events if they were dis‐
107 abled by another option like heapshot.
108
109 · heapshot[=MODE]: collect heap shot data at each major collection.
110 The frequency of the heap shots can be changed with the MODE parame‐
111 ter. When this option is used allocation events and method
112 enter/leave events are not recorded by default: if they are needed,
113 they need to be enabled explicitly. The optional parameter MODE can
114 modify the default heap shot frequency. heapshot can be used multi‐
115 ple times with different modes: in that case a heap shot is taken if
116 either of the conditions are met. MODE can be one of:
117
118 · NUMms: perform a heap shot if at least NUM milliseconds passed
119 since the last one.
120
121 · NUMgc: perform a heap shot every NUM major garbage collections
122
123 · ondemand: perform a heap shot when such a command is sent to the
124 control port
125
126 · sample[=FREQ]: collect statistical samples of the program behaviour.
127 The default is to collect a 100 times per second (100 Hz) the
128 instruction pointer. This is equivalent to the value “100”. A value
129 of zero for FREQ effectively disables sampling.
130
131 · maxframes=NUM: when a stack trace needs to be performed, collect NUM
132 frames at the most. The default is 32.
133
134 · maxsamples=NUM: stop allocating reusable sample events once NUM
135 events have been allocated (a value of zero for all intents and pur‐
136 poses means unlimited). By default, the value of this setting is the
137 number of CPU cores multiplied by 1000. This is usually a good enough
138 value for typical desktop and mobile apps. If you're losing too many
139 samples due to this default (which is possible in apps with an unusu‐
140 ally high amount of threads), you may want to tinker with this value
141 to find a good balance between sample hit rate and performance impact
142 on the app. The way it works is that sample events are enqueued for
143 reuse after they're flushed to the output file; if a thread gets a
144 sampling signal but there are no sample events in the reuse queue and
145 the profiler has reached the maximum number of sample allocations,
146 the sample gets dropped. So a higher number for this setting will
147 increase the chance that a thread is able to collect a sample, but
148 also necessarily means that there will be more work done by the pro‐
149 filer. You can run Mono with the --stats option to see statistics
150 about sample events.
151
152 · calldepth=NUM: ignore method enter/leave events when the call chain
153 depth is bigger than NUM.
154
155 · zip: automatically compress the output data in gzip format.
156
157 · output=OUTSPEC: instead of writing the profiling data to the out‐
158 put.mlpd file, substitute %p in OUTSPEC with the current process id
159 and %t with the current date and time, then do according to OUTSPEC:
160
161 · if OUTSPEC begins with a | character, execute the rest as a program
162 and feed the data to its standard input
163
164 · if OUTSPEC begins with a - character, use the rest of OUTSPEC as
165 the filename, but force overwrite any existing file by that name
166
167 · otherwise write the data the the named file: note that is a file by
168 that name already exists, a warning is issued and profiling is dis‐
169 abled.
170
171 · report: the profiling data is sent to mprof-report, which will print
172 a summary report. This is equivalent to the option: output=mprof-
173 report -. If the output option is specified as well, the report will
174 be written to the output file instead of the console.
175
176 · port=PORT: specify the tcp/ip port to use for the listening command
177 server. Currently not available for windows. This server is started
178 for example when heapshot=ondemand is used: it will read commands
179 line by line. The following commands are available:
180
181 · heapshot: perform a heapshot as soon as possible
182
183 · nocounters: disables sampling of runtime and performance counters,
184 which is normally done every 1 second.
185
186 · coverage: collect code coverage data. This implies enabling the calls
187 option.
188
189 · onlycoverage: can only be used with coverage. This disables most
190 other events so that the profiler mostly only collects coverage data.
191
192 Analyzing the profile data
193 Currently there is a command line program (mprof-report) to analyze the
194 data produced by the profiler. This is ran automatically when the
195 report profiler option is used. Simply run:
196
197 mprof-report output.mlpd
198
199 to see a summary report of the data included in the file.
200
201 Trace information for events
202 Often it is important for some events, like allocations, lock con‐
203 tention and exception throws to know where they happened. Or we may
204 want to see what sequence of calls leads to a particular method invoca‐
205 tion. To see this info invoke mprof-report as follows:
206
207 mprof-report --traces output.mlpd
208
209 The maximum number of methods in each stack trace can be specified with
210 the --maxframes=NUM option:
211
212 mprof-report --traces --maxframes=4 output.mlpd
213
214 The stack trace info will be available if method enter/leave events
215 have been recorded or if stack trace collection wasn't explicitly dis‐
216 abled with the maxframes=0 profiler option.
217
218 The --traces option also controls the reverse reference feature in the
219 heapshot report: for each class it reports how many references to
220 objects of that class come from other classes.
221
222 Sort order for methods and allocations
223 When a list of methods is printed the default sort order is based on
224 the total time spent in the method. This time is wall clock time (that
225 is, it includes the time spent, for example, in a sleep call, even if
226 actual cpu time would be basically 0). Also, if the method has been
227 ran on different threads, the time will be a sum of the time used in
228 each thread.
229
230 To change the sort order, use the option:
231
232 --method-sort=MODE
233
234 where MODE can be:
235
236 · self: amount of time spent in the method itself and not in its
237 callees
238
239 · calls: the number of method invocations
240
241 · total: the total time spent in the method.
242
243 Object allocation lists are sorted by default depending on the total
244 amount of bytes used by each type.
245
246 To change the sort order of object allocations, use the option:
247
248 --alloc-sort=MODE
249
250 where MODE can be:
251
252 · count: the number of allocated objects of the given type
253
254 · bytes: the total number of bytes used by objects of the given type
255
256 To change the sort order of counters, use the option:
257
258 --counters-sort=MODE
259
260 where MODE can be:
261
262 · time: sort values by time then category
263
264 · category: sort values by category then time
265
266 Selecting what data to report
267 The profiler by default collects data about many runtime subsystems and
268 mprof-report prints a summary of all the subsystems that are found in
269 the data file. It is possible to tell mprof-report to only show infor‐
270 mation about some of them with the following option:
271
272 --reports=R1[,R2...]
273
274 where the report names R1, R2 etc. can be:
275
276 · header: information about program startup and profiler version
277
278 · jit: JIT compiler information
279
280 · sample: statistical sampling information
281
282 · gc: garbage collection information
283
284 · alloc: object allocation information
285
286 · call: method profiling information
287
288 · metadata: metadata events like image loads
289
290 · exception: exception throw and handling information
291
292 · monitor: lock contention information
293
294 · thread: thread information
295
296 · domain: app domain information
297
298 · context: remoting context information
299
300 · heapshot: live heap usage at heap shots
301
302 · counters: counters samples
303
304 · coverage: code coverage data
305
306 · stats: event statistics
307
308 It is possible to limit some of the data displayed to a timeframe of
309 the program execution with the option:
310
311 --time=FROM-TO
312
313 where FROM and TO are seconds since application startup (they can be
314 floating point numbers).
315
316 Another interesting option is to consider only events happening on a
317 particular thread with the following option:
318
319 --thread=THREADID
320
321 where THREADID is one of the numbers listed in the thread summary
322 report (or a thread name when present).
323
324 By default long lists of methods or other information like object allo‐
325 cations are limited to the most important data. To increase the amount
326 of information printed you can use the option:
327
328 --verbose
329
330 Track individual objects
331 Instead of printing the usual reports from the profiler data, it is
332 possible to track some interesting information about some specific
333 object addresses. The objects are selected based on their address with
334 the --track option as follows:
335
336 --track=0xaddr1[,0xaddr2,...]
337
338 The reported info (if available in the data file), will be class name,
339 size, creation time, stack trace of creation (with the --traces
340 option), etc. If heapshot data is available it will be possible to
341 also track what other objects reference one of the listed addresses.
342
343 The object addresses can be gathered either from the profiler report in
344 some cases (like in the monitor lock report), from the live application
345 or they can be selected with the --find=FINDSPEC option. FINDSPEC can
346 be one of the following:
347
348 · S:SIZE: where the object is selected if its size is at least SIZE
349
350 · T:NAME: where the object is selected if NAME partially matches its
351 class name
352
353 This option can be specified multiple times with one of the different
354 kinds of FINDSPEC. For example, the following:
355
356 --find=S:10000 --find=T:Byte[]
357
358 will find all the byte arrays that are at least 10000 bytes in size.
359
360 Note that with a moving garbage collector the object address can
361 change, so you may need to track the changed address manually. It can
362 also happen that multiple objects are allocated at the same address, so
363 the output from this option can become large.
364
365 Saving a profiler report
366 By default mprof-report will print the summary data to the console. To
367 print it to a file, instead, use the option:
368
369 --out=FILENAME
370
371 Processing code coverage data
372 If you ran the profiler with the coverage option, you can process the
373 collected coverage data into an XML file by running mprof-report like
374 this:
375
376 mprof-report --coverage-out=coverage.xml output.mlpd
377
378 Dealing with profiler slowness
379 If the profiler needs to collect lots of data, the execution of the
380 program will slow down significantly, usually 10 to 20 times slower.
381 There are several ways to reduce the impact of the profiler on the pro‐
382 gram execution.
383
384 Use the statistical sampling mode
385
386 Statistical sampling allows executing a program under the profiler
387 with minimal performance overhead (usually less than 10%). This
388 mode allows checking where the program is spending most of its exe‐
389 cution time without significantly perturbing its behaviour.
390
391 Collect less data
392
393 Collecting method enter/leave events can be very expensive, espe‐
394 cially in programs that perform many millions of tiny calls. The
395 profiler option nocalls can be used to avoid collecting this data
396 or it can be limited to only a few call levels with the calldepth
397 option.
398
399 Object allocation information is expensive as well, though much
400 less than method enter/leave events. If it's not needed, it can be
401 skipped with the noalloc profiler option. Note that when method
402 enter/leave events are discarded, by default stack traces are col‐
403 lected at each allocation and this can be expensive as well. The
404 impact of stack trace information can be reduced by setting a low
405 value with the maxframes option or by eliminating them completely,
406 by setting it to 0.
407
408 The other major source of data is the heapshot profiler option:
409 especially if the managed heap is big, since every object needs to
410 be inspected. The MODE parameter of the heapshot option can be
411 used to reduce the frequency of the heap shots.
412
413 Dealing with the size of the data files
414 When collecting a lot of information about a profiled program, huge
415 data files can be generated. There are a few ways to minimize the
416 amount of data, for example by not collecting some of the more space-
417 consuming information or by compressing the information on the fly or
418 by just generating a summary report.
419
420 Reducing the amount of data
421
422 Method enter/leave events can be excluded completely with the
423 nocalls option or they can be limited to just a few levels of calls
424 with the calldepth option. For example, the option:
425
426 calldepth=10
427
428 will ignore the method events when there are more than 10 managed
429 stack frames. This is very useful for programs that have deep
430 recursion or for programs that perform many millions of tiny calls
431 deep enough in the call stack. The optimal number for the
432 calldepth option depends on the program and it needs to be balanced
433 between providing enough profiling information and allowing fast
434 execution speed.
435
436 Note that by default, if method events are not recorded at all, the
437 profiler will collect stack trace information at events like allo‐
438 cations. To avoid gathering this data, use the maxframes=0 pro‐
439 filer option.
440
441 Allocation events can be eliminated with the noalloc option.
442
443 Heap shot data can also be huge: by default it is collected at each
444 major collection. To reduce the frequency, you can specify a heap‐
445 shot mode: for example to collect every 5 collections (including
446 major and minor):
447
448 heapshot=5gc
449
450 or when at least 5 seconds passed since the last heap shot:
451
452 heapshot=5000ms
453
454 Compressing the data
455
456 To reduce the amout of disk space used by the data, the data can be
457 compressed either after it has been generated with the gzip com‐
458 mand:
459
460 gzip -9 output.mlpd
461
462 or it can be compressed automatically by using the zip profiler
463 option. Note that in this case there could be a significant slow‐
464 down of the profiled program.
465
466 The mprof-report program will tranparently deal with either com‐
467 pressed or uncompressed data files.
468
469 Generating only a summary report
470
471 Often it's enough to look at the profiler summary report to diag‐
472 nose an issue and in this case it's possible to avoid saving the
473 profiler data file to disk. This can be accomplished with the
474 report profiler option, which will basically send the data to the
475 mprof-report program for display.
476
477 To have more control of what summary information is reported (or to
478 use a completely different program to decode the profiler data),
479 the output profiler option can be used, with | as the first charac‐
480 ter: the rest of the output name will be executed as a program with
481 the data fed in on the standard input.
482
483 For example, to print only the Monitor summary with stack trace
484 information, you could use it like this:
485
486 output=|mprof-report --reports=monitor --traces -
487
489 http://www.mono-project.com/docs/debug+profile/profile/profiler/
490
492 mono(1)
493
495 Paolo Molaro, Alex Rønne Petersen
496
497
498
499 mprof-report(1)