1MK-QUERY-DIGEST(1) User Contributed Perl Documentation MK-QUERY-DIGEST(1)
2
3
4
6 mk-query-digest - Analyze query execution logs and generate a query
7 report, filter, replay, or transform queries for MySQL, PostgreSQL,
8 memcached, and more.
9
11 Usage: mk-query-digest [OPTION...] [FILE]
12
13 mk-query-digest parses and analyzes MySQL log files. With no FILE, or
14 when FILE is -, it read standard input.
15
16 Analyze, aggregate, and report on a slow query log:
17
18 mk-query-digest /path/to/slow.log
19
20 Review a slow log, saving results to the test.query_review table in a
21 MySQL server running on host1. See "--review" for more on reviewing
22 queries:
23
24 mk-query-digest --review h=host1,D=test,t=query_review /path/to/slow.log
25
26 Filter out everything but SELECT queries, replay the queries against
27 another server, then use the timings from replaying them to analyze
28 their performance:
29
30 mk-query-digest /path/to/slow.log --execute h=another_server \
31 --filter '$event->{fingerprint} =~ m/^select/'
32
33 Print the structure of events so you can construct a complex
34 "--filter":
35
36 mk-query-digest /path/to/slow.log --no-report \
37 --filter 'print Dumper($event)'
38
39 Watch SHOW FULL PROCESSLIST and output a log in slow query log format:
40
41 mk-query-digest --processlist h=host1 --print --no-report
42
43 The default aggregation and analysis is CPU and memory intensive.
44 Disable it if you don't need the default report:
45
46 mk-query-digest <arguments> --no-report
47
49 The following section is included to inform users about the potential
50 risks, whether known or unknown, of using this tool. The two main
51 categories of risks are those created by the nature of the tool (e.g.
52 read-only tools vs. read-write tools) and those created by bugs.
53
54 By default mk-query-digest merely collects and aggregates data from the
55 files specified. It is designed to be as efficient as possible, but
56 depending on the input you give it, it can use a lot of CPU and memory.
57 Practically speaking, it is safe to run even on production systems, but
58 you might want to monitor it until you are satisfied that the input you
59 give it does not cause undue load.
60
61 Various options will cause mk-query-digest to insert data into tables,
62 execute SQL queries, and so on. These include the "--execute" option
63 and "--review".
64
65 At the time of this release, we know of no bugs that could cause
66 serious harm to users.
67
68 The authoritative source for updated information is always the online
69 issue tracking system. Issues that affect this tool will be marked as
70 such. You can see a list of such issues at the following URL:
71 <http://www.maatkit.org/bugs/mk-query-digest>.
72
73 See also "BUGS" for more information on filing bugs and getting help.
74
76 This tool was formerly known as mk-log-parser.
77
78 "mk-query-digest" is a framework for doing things with events from a
79 query source such as the slow query log or PROCESSLIST. By default it
80 acts as a very sophisticated log analysis tool. You can group and sort
81 queries in many different ways simultaneously and find the most
82 expensive queries, or create a timeline of queries in the log, for
83 example. It can also do a "query review," which means to save a sample
84 of each type of query into a MySQL table so you can easily see whether
85 you've reviewed and analyzed a query before. The benefit of this is
86 that you can keep track of changes to your server's queries and avoid
87 repeated work. You can also save other information with the queries,
88 such as comments, issue numbers in your ticketing system, and so on.
89
90 Note that this is a work in *very* active progress and you should
91 expect incompatible changes in the future.
92
94 mk-query-digest works on events, which are a collection of key/value
95 pairs called attributes. You'll recognize most of the attributes right
96 away: Query_time, Lock_time, and so on. You can just look at a slow
97 log and see them. However, there are some that don't exist in the slow
98 log, and slow logs may actually include different kinds of attributes
99 (for example, you may have a server with the Percona patches).
100
101 For a full list of attributes, see
102 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
103
104 With creative use of "--filter", you can create new attributes derived
105 from existing attributes. For example, to create an attribute called
106 "Row_ratio" for examining the ratio of "Rows_sent" to "Rows_examined",
107 specify a filter like:
108
109 --filter '($event->{Row_ratio} = $event->{Rows_sent} / ($event->{Rows_examined})) && 1'
110
111 The "&& 1" trick is needed to create a valid one-line syntax that is
112 always true, even if the assignment happens to evaluate false. The new
113 attribute will automatically appears in the output:
114
115 # Row ratio 1.00 0.00 1 0.50 1 0.71 0.50
116
117 Attributes created this way can be specified for "--order-by" or any
118 option that requires an attribute.
119
120 memcached
121 memcached events have additional attributes related to the memcached
122 protocol: cmd, key, res (result) and val. Also, boolean attributes are
123 created for the various commands, misses and errors: Memc_CMD where CMD
124 is a memcached command (get, set, delete, etc.), Memc_error and
125 Memc_miss.
126
127 These attributes are no different from slow log attributes, so you can
128 use them with "--[no]report", "--group-by", in a "--filter", etc.
129
130 These attributes and more are documented at
131 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
132
134 The default output is a query analysis report. The "--[no]report"
135 option controls whether or not this report is printed. Sometimes you
136 may wish to parse all the queries but suppress the report, for example
137 when using "--print", "--review" or "--save-results".
138
139 There is one paragraph for each class of query analyzed. A "class" of
140 queries all have the same value for the "--group-by" attribute which is
141 "fingerprint" by default. (See "ATTRIBUTES".) A fingerprint is an
142 abstracted version of the query text with literals removed, whitespace
143 collapsed, and so forth. The report is formatted so it's easy to paste
144 into emails without wrapping, and all non-query lines begin with a
145 comment, so you can save it to a .sql file and open it in your favorite
146 syntax-highlighting text editor. There is a response-time profile at
147 the beginning.
148
149 The output described here is controlled by "--report-format". That
150 option allows you to specify what to print and in what order. The
151 default output in the default order is described here.
152
153 The report, by default, begins with a paragraph about the entire
154 analysis run The information is very similar to what you'll see for
155 each class of queries in the log, but it doesn't have some information
156 that would be too expensive to keep globally for the analysis. It also
157 has some statistics about the code's execution itself, such as the CPU
158 and memory usage, the local date and time of the run, and a list of
159 input file read/parsed.
160
161 Following this is the response-time profile over the events. This is a
162 highly summarized view of the unique events in the detailed query
163 report that follows. It contains the following columns:
164
165 Column Meaning
166 ============ ==========================================================
167 Rank The query's rank within the entire set of queries analyzed
168 Query ID The query's fingerprint
169 Response time The total response time, and percentage of overall total
170 Calls The number of times this query was executed
171 R/Call The mean response time per execution
172 Apdx The Apdex score; see --apdex-threshold for details
173 V/M The Variance-to-mean ratio of response time
174 EXPLAIN If --explain was specified, a sparkline; see --explain
175 Item The distilled query
176
177 A final line whose rank is shown as MISC contains aggregate statistics
178 on the queries that were not included in the report, due to options
179 such as "--limit" and "--outliers". For details on the variance-to-
180 mean ratio, please see
181 http://en.wikipedia.org/wiki/Index_of_dispersion.
182
183 Next, the detailed query report is printed. Each query appears in a
184 paragraph. Here is a sample, slightly reformatted so 'perldoc' will
185 not wrap lines in a terminal. The following will all be one paragraph,
186 but we'll break it up for commentary.
187
188 # Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665
189
190 This line identifies the sequential number of the query in the sort
191 order specified by "--order-by". Then there's the queries per second,
192 and the approximate concurrency for this query (calculated as a
193 function of the timespan and total Query_time). Next there's a query
194 ID. This ID is a hex version of the query's checksum in the database,
195 if you're using "--review". You can select the reviewed query's
196 details from the database with a query like "SELECT .... WHERE
197 checksum=0xFDEA8D2993C9CAF3".
198
199 If you are investigating the report and want to print out every sample
200 of a particular query, then the following "--filter" may be helpful:
201 "mk-query-digest slow-log.log --no-report --print --filter
202 '$event-"{fingerprint} && make_checksum($event->{fingerprint}) eq
203 "FDEA8D2993C9CAF3"'>.
204
205 Notice that you must remove the 0x prefix from the checksum in order
206 for this to work.
207
208 Finally, in case you want to find a sample of the query in the log
209 file, there's the byte offset where you can look. (This is not always
210 accurate, due to some silly anomalies in the slow-log format, but it's
211 usually right.) The position refers to the worst sample, which we'll
212 see more about below.
213
214 Next is the table of metrics about this class of queries.
215
216 # pct total min max avg 95% stddev median
217 # Count 0 2
218 # Exec time 13 1105s 552s 554s 553s 554s 2s 553s
219 # Lock time 0 216us 99us 117us 108us 117us 12us 108us
220 # Rows sent 20 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
221 # Rows exam 0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
222
223 The first line is column headers for the table. The percentage is the
224 percent of the total for the whole analysis run, and the total is the
225 actual value of the specified metric. For example, in this case we can
226 see that the query executed 2 times, which is 13% of the total number
227 of queries in the file. The min, max and avg columns are self-
228 explanatory. The 95% column shows the 95th percentile; 95% of the
229 values are less than or equal to this value. The standard deviation
230 shows you how tightly grouped the values are. The standard deviation
231 and median are both calculated from the 95th percentile, discarding the
232 extremely large values.
233
234 The stddev, median and 95th percentile statistics are approximate.
235 Exact statistics require keeping every value seen, sorting, and doing
236 some calculations on them. This uses a lot of memory. To avoid this,
237 we keep 1000 buckets, each of them 5% bigger than the one before,
238 ranging from .000001 up to a very big number. When we see a value we
239 increment the bucket into which it falls. Thus we have fixed memory
240 per class of queries. The drawback is the imprecision, which typically
241 falls in the 5 percent range.
242
243 Next we have statistics on the users, databases and time range for the
244 query.
245
246 # Users 1 user1
247 # Databases 2 db1(1), db2(1)
248 # Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15
249
250 The users and databases are shown as a count of distinct values,
251 followed by the values. If there's only one, it's shown alone; if
252 there are many, we show each of the most frequent ones, followed by the
253 number of times it appears.
254
255 # Query_time distribution
256 # 1us
257 # 10us
258 # 100us
259 # 1ms
260 # 10ms
261 # 100ms
262 # 1s
263 # 10s+ #############################################################
264
265 The execution times show a logarithmic chart of time clustering. Each
266 query goes into one of the "buckets" and is counted up. The buckets
267 are powers of ten. The first bucket is all values in the "single
268 microsecond range" -- that is, less than 10us. The second is "tens of
269 microseconds," which is from 10us up to (but not including) 100us; and
270 so on. The charted attribute can be changed by specifying
271 "--report-histogram" but is limited to time-based attributes.
272
273 # Tables
274 # SHOW TABLE STATUS LIKE 'table1'\G
275 # SHOW CREATE TABLE `table1`\G
276 # EXPLAIN
277 SELECT * FROM table1\G
278
279 This section is a convenience: if you're trying to optimize the queries
280 you see in the slow log, you probably want to examine the table
281 structure and size. These are copy-and-paste-ready commands to do
282 that.
283
284 Finally, we see a sample of the queries in this class of query. This
285 is not a random sample. It is the query that performed the worst,
286 according to the sort order given by "--order-by". You will normally
287 see a commented "# EXPLAIN" line just before it, so you can copy-paste
288 the query to examine its EXPLAIN plan. But for non-SELECT queries that
289 isn't possible to do, so the tool tries to transform the query into a
290 roughly equivalent SELECT query, and adds that below.
291
292 If you want to find this sample event in the log, use the offset
293 mentioned above, and something like the following:
294
295 tail -c +<offset> /path/to/file | head
296
297 See also "--report-format".
298
299 SPARKLINES
300 The output also contains sparklines. Sparklines are "data-intense,
301 design-simple, word-sized graphics"
302 (<http://en.wikipedia.org/wiki/Sparkline>).There is a sparkline for
303 "--report-histogram" and for "--explain". See each of those options
304 for details about interpreting their sparklines.
305
307 A "query review" is the process of storing all the query fingerprints
308 analyzed. This has several benefits:
309
310 • You can add meta-data to classes of queries, such as marking them
311 for follow-up, adding notes to queries, or marking them with an
312 issue ID for your issue tracking system.
313
314 • You can refer to the stored values on subsequent runs so you'll
315 know whether you've seen a query before. This can help you cut
316 down on duplicated work.
317
318 • You can store historical data such as the row count, query times,
319 and generally anything you can see in the report.
320
321 To use this feature, you run mk-query-digest with the "--review"
322 option. It will store the fingerprints and other information into the
323 table you specify. Next time you run it with the same option, it will
324 do the following:
325
326 • It won't show you queries you've already reviewed. A query is
327 considered to be already reviewed if you've set a value for the
328 "reviewed_by" column. (If you want to see queries you've already
329 reviewed, use the "--report-all" option.)
330
331 • Queries that you've reviewed, and don't appear in the output, will
332 cause gaps in the query number sequence in the first line of each
333 paragraph. And the value you've specified for "--limit" will still
334 be honored. So if you've reviewed all queries in the top 10 and
335 you ask for the top 10, you won't see anything in the output.
336
337 • If you want to see the queries you've already reviewed, you can
338 specify "--report-all". Then you'll see the normal analysis
339 output, but you'll also see the information from the review table,
340 just below the execution time graph. For example,
341
342 # Review information
343 # comments: really bad IN() subquery, fix soon!
344 # first_seen: 2008-12-01 11:48:57
345 # jira_ticket: 1933
346 # last_seen: 2008-12-18 11:49:07
347 # priority: high
348 # reviewed_by: xaprb
349 # reviewed_on: 2008-12-18 15:03:11
350
351 You can see how useful this meta-data is -- as you analyze your
352 queries, you get your comments integrated right into the report.
353
354 If you add the "--review-history" option, it will also store
355 information into a separate database table, so you can keep
356 historical trending information on classes of queries.
357
359 A query fingerprint is the abstracted form of a query, which makes it
360 possible to group similar queries together. Abstracting a query
361 removes literal values, normalizes whitespace, and so on. For example,
362 consider these two queries:
363
364 SELECT name, password FROM user WHERE id='12823';
365 select name, password from user
366 where id=5;
367
368 Both of those queries will fingerprint to
369
370 select name, password from user where id=?
371
372 Once the query's fingerprint is known, we can then talk about a query
373 as though it represents all similar queries.
374
375 What "mk-query-digest" does is analogous to a GROUP BY statement in
376 SQL. (But note that "multiple columns" doesn't define a multi-column
377 grouping; it defines multiple reports!) If your command-line looks like
378 this,
379
380 mk-query-digest /path/to/slow.log --select Rows_read,Rows_sent \
381 --group-by fingerprint --order-by Query_time:sum --limit 10
382
383 The corresponding pseudo-SQL looks like this:
384
385 SELECT WORST(query BY Query_time), SUM(Query_time), ...
386 FROM /path/to/slow.log
387 GROUP BY FINGERPRINT(query)
388 ORDER BY SUM(Query_time) DESC
389 LIMIT 10
390
391 You can also use the value "distill", which is a kind of super-
392 fingerprint. See "--group-by" for more.
393
394 When parsing memcached input ("--type" memcached), the fingerprint is
395 an abstracted version of the command and key, with placeholders
396 removed. For example, "get user_123_preferences" fingerprints to "get
397 user_?_preferences". There is also a "key_print" which a fingerprinted
398 version of the key. This example's key_print is "user_?_preferences".
399
400 Query fingerprinting accommodates a great many special cases, which
401 have proven necessary in the real world. For example, an IN list with
402 5 literals is really equivalent to one with 4 literals, so lists of
403 literals are collapsed to a single one. If you want to understand more
404 about how and why all of these cases are handled, please review the
405 test cases in the Subversion repository. If you find something that is
406 not fingerprinted properly, please submit a bug report with a
407 reproducible test case. Here is a list of transformations during
408 fingerprinting, which might not be exhaustive:
409
410 • Group all SELECT queries from mysqldump together, even if they are
411 against different tables. Ditto for all of mk-table-checksum's
412 checksum queries.
413
414 • Shorten multi-value INSERT statements to a single VALUES() list.
415
416 • Strip comments.
417
418 • Abstract the databases in USE statements, so all USE statements are
419 grouped together.
420
421 • Replace all literals, such as quoted strings. For efficiency, the
422 code that replaces literal numbers is somewhat non-selective, and
423 might replace some things as numbers when they really are not.
424 Hexadecimal literals are also replaced. NULL is treated as a
425 literal. Numbers embedded in identifiers are also replaced, so
426 tables named similarly will be fingerprinted to the same values
427 (e.g. users_2009 and users_2010 will fingerprint identically).
428
429 • Collapse all whitespace into a single space.
430
431 • Lowercase the entire query.
432
433 • Replace all literals inside of IN() and VALUES() lists with a
434 single placeholder, regardless of cardinality.
435
436 • Collapse multiple identical UNION queries into a single one.
437
439 DSN values in "--review-history" default to values in "--review" if
440 COPY is yes.
441
442 This tool accepts additional command-line arguments. Refer to the
443 "SYNOPSIS" and usage information for details.
444
445 --apdex-threshold
446 type: float; default: 1.0
447
448 Set Apdex target threshold (T) for query response time. The
449 Application Performance Index (Apdex) Technical Specification V1.1
450 defines T as "a positive decimal value in seconds, having no more
451 than two significant digits of granularity." This value only
452 applies to query response time (Query_time).
453
454 Options can be abbreviated so specifying "--apdex-t" also works.
455
456 See <http://www.apdex.org/>.
457
458 --ask-pass
459 Prompt for a password when connecting to MySQL.
460
461 --attribute-aliases
462 type: array; default: db|Schema
463
464 List of attribute|alias,etc.
465
466 Certain attributes have multiple names, like db and Schema. If an
467 event does not have the primary attribute, mk-query-digest looks
468 for an alias attribute. If it finds an alias, it creates the
469 primary attribute with the alias attribute's value and removes the
470 alias attribute.
471
472 If the event has the primary attribute, all alias attributes are
473 deleted.
474
475 This helps simplify event attributes so that, for example, there
476 will not be report lines for both db and Schema.
477
478 --attribute-value-limit
479 type: int; default: 4294967296
480
481 A sanity limit for attribute values.
482
483 This option deals with bugs in slow-logging functionality that
484 causes large values for attributes. If the attribute's value is
485 bigger than this, the last-seen value for that class of query is
486 used instead.
487
488 --aux-dsn
489 type: DSN
490
491 Auxiliary DSN used for special options.
492
493 The following options may require a DSN even when only parsing a
494 slow log file:
495
496 * --since
497 * --until
498
499 See each option for why it might require a DSN.
500
501 --charset
502 short form: -A; type: string
503
504 Default character set. If the value is utf8, sets Perl's binmode
505 on STDOUT to utf8, passes the mysql_enable_utf8 option to
506 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
507 other value sets binmode on STDOUT without the utf8 layer, and runs
508 SET NAMES after connecting to MySQL.
509
510 --check-attributes-limit
511 type: int; default: 1000
512
513 Stop checking for new attributes after this many events.
514
515 For better speed, mk-query-digest stops checking events for new
516 attributes after a certain number of events. Any new attributes
517 after this number will be ignored and will not be reported.
518
519 One special case is new attributes for pre-existing query classes
520 (see "--group-by" about query classes). New attributes will not be
521 added to pre-existing query classes even if the attributes are
522 detected before the "--check-attributes-limit" limit.
523
524 --config
525 type: Array
526
527 Read this comma-separated list of config files; if specified, this
528 must be the first option on the command line.
529
530 --[no]continue-on-error
531 default: yes
532
533 Continue parsing even if there is an error.
534
535 --create-review-history-table
536 Create the "--review-history" table if it does not exist.
537
538 This option causes the table specified by "--review-history" to be
539 created with the default structure shown in the documentation for
540 that option.
541
542 --create-review-table
543 Create the "--review" table if it does not exist.
544
545 This option causes the table specified by "--review" to be created
546 with the default structure shown in the documentation for that
547 option.
548
549 --daemonize
550 Fork to the background and detach from the shell. POSIX operating
551 systems only.
552
553 --defaults-file
554 short form: -F; type: string
555
556 Only read mysql options from the given file. You must give an
557 absolute pathname.
558
559 --embedded-attributes
560 type: array
561
562 Two Perl regex patterns to capture pseudo-attributes embedded in
563 queries.
564
565 Embedded attributes might be special attribute-value pairs that
566 you've hidden in comments. The first regex should match the entire
567 set of attributes (in case there are multiple). The second regex
568 should match and capture attribute-value pairs from the first
569 regex.
570
571 For example, suppose your query looks like the following:
572
573 SELECT * from users -- file: /login.php, line: 493;
574
575 You might run mk-query-digest with the following option:
576
577 mk-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'
578
579 The first regular expression captures the whole comment:
580
581 " -- file: /login.php, line: 493;"
582
583 The second one splits it into attribute-value pairs and adds them
584 to the event:
585
586 ATTRIBUTE VALUE
587 ========= ==========
588 file /login.php
589 line 493
590
591 NOTE: All commas in the regex patterns must be escaped with \
592 otherwise the pattern will break.
593
594 --execute
595 type: DSN
596
597 Execute queries on this DSN.
598
599 Adds a callback into the chain, after filters but before the
600 reports. Events are executed on this DSN. If they are successful,
601 the time they take to execute overwrites the event's Query_time
602 attribute and the original Query_time value (from the log) is saved
603 as the Exec_orig_time attribute. If unsuccessful, the callback
604 returns false and terminates the chain.
605
606 If the connection fails, mk-query-digest tries to reconnect once
607 per second.
608
609 See also "--mirror" and "--execute-throttle".
610
611 --execute-throttle
612 type: array
613
614 Throttle values for "--execute".
615
616 By default "--execute" runs without any limitations or concerns for
617 the amount of time that it takes to execute the events. The
618 "--execute-throttle" allows you to limit the amount of time spent
619 doing "--execute" relative to the other processes that handle
620 events. This works by marking some events with a "Skip_exec"
621 attribute when "--execute" begins to take too much time.
622 "--execute" will not execute an event if this attribute is true.
623 This indirectly decreases the time spent doing "--execute".
624
625 The "--execute-throttle" option takes at least two comma-separated
626 values: max allowed "--execute" time as a percentage and a check
627 interval time. An optional third value is a percentage step for
628 increasing and decreasing the probability that an event will be
629 marked "Skip_exec" true. 5 (percent) is the default step.
630
631 For example: "--execute-throttle" "70,60,10". This will limit
632 "--execute" to 70% of total event processing time, checked every
633 minute (60 seconds) and probability stepped up and down by 10%.
634 When "--execute" exceeds 70%, the probability that events will be
635 marked "Skip_exec" true increases by 10%. "--execute" time is
636 checked again after another minute. If it's still above 70%, then
637 the probability will increase another 10%. Or, if it's dropped
638 below 70%, then the probability will decrease by 10%.
639
640 --expected-range
641 type: array; default: 5,10
642
643 Explain items when there are more or fewer than expected.
644
645 Defines the number of items expected to be seen in the report given
646 by "--[no]report", as controlled by "--limit" and "--outliers". If
647 there are more or fewer items in the report, each one will explain
648 why it was included.
649
650 --explain
651 type: DSN
652
653 Run EXPLAIN for the sample query with this DSN and print results.
654
655 This works only when "--group-by" includes fingerprint. It causes
656 mk-query-digest to run EXPLAIN and include the output into the
657 report. For safety, queries that appear to have a subquery that
658 EXPLAIN will execute won't be EXPLAINed. Those are typically
659 "derived table" queries of the form
660
661 select ... from ( select .... ) der;
662
663 The EXPLAIN results are printed in three places: a sparkline in the
664 event header, a full vertical format in the event report, and a
665 sparkline in the profile.
666
667 The full format appears at the end of each event report in vertical
668 style ("\G") just like MySQL prints it.
669
670 The sparklines (see "SPARKLINES") are compact representations of
671 the access type for each table and whether or not "Using temporary"
672 or "Using filesort" appear in EXPLAIN. The sparklines look like:
673
674 nr>TF
675
676 That sparkline means that there are two tables, the first uses a
677 range (n) access, the second uses a ref access, and both "Using
678 temporary" (T) and "Using filesort" (F) appear. The greater-than
679 character just separates table access codes from T and/or F.
680
681 The abbreviated table access codes are:
682
683 a ALL
684 c const
685 e eq_ref
686 f fulltext
687 i index
688 m index_merge
689 n range
690 o ref_or_null
691 r ref
692 s system
693 u unique_subquery
694
695 A capitalized access code means that "Using index" appears in
696 EXPLAIN for that table.
697
698 --filter
699 type: string
700
701 Discard events for which this Perl code doesn't return true.
702
703 This option is a string of Perl code or a file containing Perl code
704 that gets compiled into a subroutine with one argument: $event.
705 This is a hashref. If the given value is a readable file, then mk-
706 query-digest reads the entire file and uses its contents as the
707 code. The file should not contain a shebang (#!/usr/bin/perl)
708 line.
709
710 If the code returns true, the chain of callbacks continues;
711 otherwise it ends. The code is the last statement in the
712 subroutine other than "return $event". The subroutine template is:
713
714 sub { $event = shift; filter && return $event; }
715
716 Filters given on the command line are wrapped inside parentheses
717 like like "( filter )". For complex, multi-line filters, you must
718 put the code inside a file so it will not be wrapped inside
719 parentheses. Either way, the filter must produce syntactically
720 valid code given the template. For example, an if-else branch
721 given on the command line would not be valid:
722
723 --filter 'if () { } else { }' # WRONG
724
725 Since it's given on the command line, the if-else branch would be
726 wrapped inside parentheses which is not syntactically valid. So to
727 accomplish something more complex like this would require putting
728 the code in a file, for example filter.txt:
729
730 my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
731
732 Then specify "--filter filter.txt" to read the code from
733 filter.txt.
734
735 If the filter code won't compile, mk-query-digest will die with an
736 error. If the filter code does compile, an error may still occur
737 at runtime if the code tries to do something wrong (like pattern
738 match an undefined value). mk-query-digest does not provide any
739 safeguards so code carefully!
740
741 An example filter that discards everything but SELECT statements:
742
743 --filter '$event->{arg} =~ m/^select/i'
744
745 This is compiled into a subroutine like the following:
746
747 sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
748
749 It is permissible for the code to have side effects (to alter
750 $event).
751
752 You can find an explanation of the structure of $event at
753 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
754
755 Here are more examples of filter code:
756
757 Host/IP matches domain.com
758 --filter '($event->{host} || $event->{ip} || "") =~
759 m/domain.com/'
760
761 Sometimes MySQL logs the host where the IP is expected.
762 Therefore, we check both.
763
764 User matches john
765 --filter '($event->{user} || "") =~ m/john/'
766
767 More than 1 warning
768 --filter '($event->{Warning_count} || 0) > 1'
769
770 Query does full table scan or full join
771 --filter '(($event->{Full_scan} || "") eq "Yes") ||
772 (($event->{Full_join} || "") eq "Yes")'
773
774 Query was not served from query cache
775 --filter '($event->{QC_Hit} || "") eq "No"'
776
777 Query is 1 MB or larger
778 --filter '$event->{bytes} >= 1_048_576'
779
780 Since "--filter" allows you to alter $event, you can use it to do
781 other things, like create new attributes. See "ATTRIBUTES" for an
782 example.
783
784 --fingerprints
785 Add query fingerprints to the standard query analysis report. This
786 is mostly useful for debugging purposes.
787
788 --[no]for-explain
789 default: yes
790
791 Print extra information to make analysis easy.
792
793 This option adds code snippets to make it easy to run SHOW CREATE
794 TABLE and SHOW TABLE STATUS for the query's tables. It also
795 rewrites non-SELECT queries into a SELECT that might be helpful for
796 determining the non-SELECT statement's index usage.
797
798 --group-by
799 type: Array; default: fingerprint
800
801 Which attribute of the events to group by.
802
803 In general, you can group queries into classes based on any
804 attribute of the query, such as "user" or "db", which will by
805 default show you which users and which databases get the most
806 "Query_time". The default attribute, "fingerprint", groups
807 similar, abstracted queries into classes; see below and see also
808 "FINGERPRINTS".
809
810 A report is printed for each "--group-by" value (unless
811 "--no-report" is given). Therefore, "--group-by user,db" means
812 "report on queries with the same user and report on queries with
813 the same db"--it does not mean "report on queries with the same
814 user and db." See also "OUTPUT".
815
816 Every value must have a corresponding value in the same position in
817 "--order-by". However, adding values to "--group-by" will
818 automatically add values to "--order-by", for your convenience.
819
820 There are several magical values that cause some extra data mining
821 to happen before the grouping takes place:
822
823 fingerprint
824 This causes events to be fingerprinted to abstract queries into
825 a canonical form, which is then used to group events together
826 into a class. See "FINGERPRINTS" for more about
827 fingerprinting.
828
829 tables
830 This causes events to be inspected for what appear to be
831 tables, and then aggregated by that. Note that a query that
832 contains two or more tables will be counted as many times as
833 there are tables; so a join against two tables will count the
834 Query_time against both tables.
835
836 distill
837 This is a sort of super-fingerprint that collapses queries down
838 into a suggestion of what they do, such as "INSERT SELECT
839 table1 table2".
840
841 If parsing memcached input ("--type" memcached), there are other
842 attributes which you can group by: key_print (see memcached section
843 in "FINGERPRINTS"), cmd, key, res and val (see memcached section in
844 "ATTRIBUTES").
845
846 --[no]gzip
847 default: yes
848
849 Gzip "--save-results" files; requires IO::Compress::Gzip.
850
851 --help
852 Show help and exit.
853
854 --host
855 short form: -h; type: string
856
857 Connect to host.
858
859 --ignore-attributes
860 type: array; default: arg, cmd, insert_id, ip, port, Thread_id,
861 timestamp, exptime, flags, key, res, val, server_id, offset,
862 end_log_pos, Xid
863
864 Do not aggregate these attributes when auto-detecting "--select".
865
866 If you do not specify "--select" then mk-query-digest auto-detects
867 and aggregates every attribute that it finds in the slow log. Some
868 attributes, however, should not be aggregated. This option allows
869 you to specify a list of attributes to ignore. This only works
870 when no explicit "--select" is given.
871
872 --inherit-attributes
873 type: array; default: db,ts
874
875 If missing, inherit these attributes from the last event that had
876 them.
877
878 This option sets which attributes are inherited or carried forward
879 to events which do not have them. For example, if one event has
880 the db attribute equal to "foo", but the next event doesn't have
881 the db attribute, then it inherits "foo" for its db attribute.
882
883 Inheritance is usually desirable, but in some cases it might
884 confuse things. If a query inherits a database that it doesn't
885 actually use, then this could confuse "--execute".
886
887 --interval
888 type: float; default: .1
889
890 How frequently to poll the processlist, in seconds.
891
892 --iterations
893 type: int; default: 1
894
895 How many times to iterate through the collect-and-report cycle. If
896 0, iterate to infinity. Each iteration runs for "--run-time"
897 amount of time. An iteration is usually determined by an amount of
898 time and a report is printed when that amount of time elapses.
899 With "--run-time-mode" "interval", an interval is instead
900 determined by the interval time you specify with "--run-time". See
901 "--run-time" and "--run-time-mode" for more information.
902
903 --limit
904 type: Array; default: 95%:20
905
906 Limit output to the given percentage or count.
907
908 If the argument is an integer, report only the top N worst queries.
909 If the argument is an integer followed by the "%" sign, report that
910 percentage of the worst queries. If the percentage is followed by
911 a colon and another integer, report the top percentage or the
912 number specified by that integer, whichever comes first.
913
914 The value is actually a comma-separated array of values, one for
915 each item in "--group-by". If you don't specify a value for any of
916 those items, the default is the top 95%.
917
918 See also "--outliers".
919
920 --log
921 type: string
922
923 Print all output to this file when daemonized.
924
925 --mirror
926 type: float
927
928 How often to check whether connections should be moved, depending
929 on "read_only". Requires "--processlist" and "--execute".
930
931 This option causes mk-query-digest to check every N seconds whether
932 it is reading from a read-write server and executing against a
933 read-only server, which is a sensible way to set up two servers if
934 you're doing something like master-master replication. The
935 <http://code.google.com/p/mysql-master-master/> master-master
936 toolkit does this. The aim is to keep the passive server ready for
937 failover, which is impossible without putting it under a realistic
938 workload.
939
940 --order-by
941 type: Array; default: Query_time:sum
942
943 Sort events by this attribute and aggregate function.
944
945 This is a comma-separated list of order-by expressions, one for
946 each "--group-by" attribute. The default "Query_time:sum" is used
947 for "--group-by" attributes without explicitly given "--order-by"
948 attributes (that is, if you specify more "--group-by" attributes
949 than corresponding "--order-by" attributes). The syntax is
950 "attribute:aggregate". See "ATTRIBUTES" for valid attributes.
951 Valid aggregates are:
952
953 Aggregate Meaning
954 ========= ============================
955 sum Sum/total attribute value
956 min Minimum attribute value
957 max Maximum attribute value
958 cnt Frequency/count of the query
959
960 For example, the default "Query_time:sum" means that queries in the
961 query analysis report will be ordered (sorted) by their total query
962 execution time ("Exec time"). "Query_time:max" orders the queries
963 by their maximum query execution time, so the query with the single
964 largest "Query_time" will be list first. "cnt" refers more to the
965 frequency of the query as a whole, how often it appears; "Count" is
966 its corresponding line in the query analysis report. So any
967 attribute and "cnt" should yield the same report wherein queries
968 are sorted by the number of times they appear.
969
970 When parsing general logs ("--type" "genlog"), the default
971 "--order-by" becomes "Query_time:cnt". General logs do not report
972 query times so only the "cnt" aggregate makes sense because all
973 query times are zero.
974
975 If you specify an attribute that doesn't exist in the events, then
976 mk-query-digest falls back to the default "Query_time:sum" and
977 prints a notice at the beginning of the report for each query
978 class. You can create attributes with "--filter" and order by
979 them; see "ATTRIBUTES" for an example.
980
981 --outliers
982 type: array; default: Query_time:1:10
983
984 Report outliers by attribute:percentile:count.
985
986 The syntax of this option is a comma-separated list of colon-
987 delimited strings. The first field is the attribute by which an
988 outlier is defined. The second is a number that is compared to the
989 attribute's 95th percentile. The third is optional, and is
990 compared to the attribute's cnt aggregate. Queries that pass this
991 specification are added to the report, regardless of any limits you
992 specified in "--limit".
993
994 For example, to report queries whose 95th percentile Query_time is
995 at least 60 seconds and which are seen at least 5 times, use the
996 following argument:
997
998 --outliers Query_time:60:5
999
1000 You can specify an --outliers option for each value in
1001 "--group-by".
1002
1003 --password
1004 short form: -p; type: string
1005
1006 Password to use when connecting.
1007
1008 --pid
1009 type: string
1010
1011 Create the given PID file when daemonized. The file contains the
1012 process ID of the daemonized instance. The PID file is removed
1013 when the daemonized instance exits. The program checks for the
1014 existence of the PID file when starting; if it exists and the
1015 process with the matching PID exists, the program exits.
1016
1017 --pipeline-profile
1018 Print a profile of the pipeline processes.
1019
1020 --port
1021 short form: -P; type: int
1022
1023 Port number to use for connection.
1024
1025 --print
1026 Print log events to STDOUT in standard slow-query-log format.
1027
1028 --print-iterations
1029 Print the start time for each "--iterations".
1030
1031 This option causes a line like the following to be printed at the
1032 start of each "--iterations" report:
1033
1034 # Iteration 2 started at 2009-11-24T14:39:48.345780
1035
1036 This line will print even if "--no-report" is specified. If
1037 "--iterations 0" is specified, each iteration number will be 0.
1038
1039 --processlist
1040 type: DSN
1041
1042 Poll this DSN's processlist for queries, with "--interval" sleep
1043 between.
1044
1045 If the connection fails, mk-query-digest tries to reopen it once
1046 per second. See also "--mirror".
1047
1048 --progress
1049 type: array; default: time,30
1050
1051 Print progress reports to STDERR. The value is a comma-separated
1052 list with two parts. The first part can be percentage, time, or
1053 iterations; the second part specifies how often an update should be
1054 printed, in percentage, seconds, or number of iterations.
1055
1056 --read-timeout
1057 type: time; default: 0
1058
1059 Wait this long for an event from the input; 0 to wait forever.
1060
1061 This option sets the maximum time to wait for an event from the
1062 input. It applies to all types of input except "--processlist".
1063 If an event is not received after the specified time, the script
1064 stops reading the input and prints its reports. If "--iterations"
1065 is 0 or greater than 1, the next iteration will begin, else the
1066 script will exit.
1067
1068 This option requires the Perl POSIX module.
1069
1070 --[no]report
1071 default: yes
1072
1073 Print out reports on the aggregate results from "--group-by".
1074
1075 This is the standard slow-log analysis functionality. See "OUTPUT"
1076 for the description of what this does and what the results look
1077 like.
1078
1079 --report-all
1080 Include all queries, even if they have already been reviewed.
1081
1082 --report-format
1083 type: Array; default:
1084 rusage,date,hostname,files,header,profile,query_report,prepared
1085
1086 Print these sections of the query analysis report.
1087
1088 SECTION PRINTS
1089 ============ ======================================================
1090 rusage CPU times and memory usage reported by ps
1091 date Current local date and time
1092 hostname Hostname of machine on which mk-query-digest was run
1093 files Input files read/parse
1094 header Summary of the entire analysis run
1095 profile Compact table of queries for an overview of the report
1096 query_report Detailed information about each unique query
1097 prepared Prepared statements
1098
1099 The sections are printed in the order specified. The rusage, date,
1100 files and header sections are grouped together if specified
1101 together; other sections are separated by blank lines.
1102
1103 See "OUTPUT" for more information on the various parts of the query
1104 report.
1105
1106 --report-histogram
1107 type: string; default: Query_time
1108
1109 Chart the distribution of this attribute's values.
1110
1111 The distribution chart is limited to time-based attributes, so
1112 charting "Rows_examined", for example, will produce a useless
1113 chart. Charts look like:
1114
1115 # Query_time distribution
1116 # 1us
1117 # 10us
1118 # 100us
1119 # 1ms
1120 # 10ms ################################
1121 # 100ms ################################################################
1122 # 1s ########
1123 # 10s+
1124
1125 A sparkline (see "SPARKLINES") of the full chart is also printed in
1126 the header for each query event. The sparkline of that full chart
1127 is:
1128
1129 # Query_time sparkline: | .^_ |
1130
1131 The sparkline itself is the 8 characters between the pipes ("|"),
1132 one character for each of the 8 buckets (1us, 10us, etc.) Four
1133 character codes are used to represent the approximate relation
1134 between each bucket's value:
1135
1136 _ . - ^
1137
1138 The caret "^" represents peaks (buckets with the most values), and
1139 the underscore "_" represents lows (buckets with the least or at
1140 least one value). The period "." and the hyphen "-" represent
1141 buckets with values between these two extremes. If a bucket has no
1142 values, a space is printed. So in the example above, the period
1143 represents the 10ms bucket, the caret the 100ms bucket, and the
1144 underscore the 1s bucket.
1145
1146 See "OUTPUT" for more information.
1147
1148 --review
1149 type: DSN
1150
1151 Store a sample of each class of query in this DSN.
1152
1153 The argument specifies a table to store all unique query
1154 fingerprints in. The table must have at least the following
1155 columns. You can add more columns for your own special purposes,
1156 but they won't be used by mk-query-digest. The following CREATE
1157 TABLE definition is also used for "--create-review-table".
1158 MAGIC_create_review:
1159
1160 CREATE TABLE query_review (
1161 checksum BIGINT UNSIGNED NOT NULL PRIMARY KEY,
1162 fingerprint TEXT NOT NULL,
1163 sample TEXT NOT NULL,
1164 first_seen DATETIME,
1165 last_seen DATETIME,
1166 reviewed_by VARCHAR(20),
1167 reviewed_on DATETIME,
1168 comments TEXT
1169 )
1170
1171 The columns are as follows:
1172
1173 COLUMN MEANING
1174 =========== ===============
1175 checksum A 64-bit checksum of the query fingerprint
1176 fingerprint The abstracted version of the query; its primary key
1177 sample The query text of a sample of the class of queries
1178 first_seen The smallest timestamp of this class of queries
1179 last_seen The largest timestamp of this class of queries
1180 reviewed_by Initially NULL; if set, query is skipped thereafter
1181 reviewed_on Initially NULL; not assigned any special meaning
1182 comments Initially NULL; not assigned any special meaning
1183
1184 Note that the "fingerprint" column is the true primary key for a
1185 class of queries. The "checksum" is just a cryptographic hash of
1186 this value, which provides a shorter value that is very likely to
1187 also be unique.
1188
1189 After parsing and aggregating events, your table should contain a
1190 row for each fingerprint. This option depends on "--group-by
1191 fingerprint" (which is the default). It will not work otherwise.
1192
1193 --review-history
1194 type: DSN
1195
1196 The table in which to store historical values for review trend
1197 analysis.
1198
1199 Each time you review queries with "--review", mk-query-digest will
1200 save information into this table so you can see how classes of
1201 queries have changed over time.
1202
1203 This DSN inherits unspecified values from "--review". It should
1204 mention a table in which to store statistics about each class of
1205 queries. mk-query-digest verifies the existence of the table, and
1206 your privileges to insert, delete and update on that table.
1207
1208 mk-query-digest then inspects the columns in the table. The table
1209 must have at least the following columns:
1210
1211 CREATE TABLE query_review_history (
1212 checksum BIGINT UNSIGNED NOT NULL,
1213 sample TEXT NOT NULL
1214 );
1215
1216 Any columns not mentioned above are inspected to see if they follow
1217 a certain naming convention. The column is special if the name
1218 ends with an underscore followed by any of these MAGIC_history_cols
1219 values:
1220
1221 pct|avt|cnt|sum|min|max|pct_95|stddev|median|rank
1222
1223 If the column ends with one of those values, then the prefix is
1224 interpreted as the event attribute to store in that column, and the
1225 suffix is interpreted as the metric to be stored. For example, a
1226 column named Query_time_min will be used to store the minimum
1227 Query_time for the class of events. The presence of this column
1228 will also add Query_time to the "--select" list.
1229
1230 The table should also have a primary key, but that is up to you,
1231 depending on how you want to store the historical data. We suggest
1232 adding ts_min and ts_max columns and making them part of the
1233 primary key along with the checksum. But you could also just add a
1234 ts_min column and make it a DATE type, so you'd get one row per
1235 class of queries per day.
1236
1237 The default table structure follows. The following
1238 MAGIC_create_review_history table definition is used for
1239 "--create-review-history-table":
1240
1241 CREATE TABLE query_review_history (
1242 checksum BIGINT UNSIGNED NOT NULL,
1243 sample TEXT NOT NULL,
1244 ts_min DATETIME,
1245 ts_max DATETIME,
1246 ts_cnt FLOAT,
1247 Query_time_sum FLOAT,
1248 Query_time_min FLOAT,
1249 Query_time_max FLOAT,
1250 Query_time_pct_95 FLOAT,
1251 Query_time_stddev FLOAT,
1252 Query_time_median FLOAT,
1253 Lock_time_sum FLOAT,
1254 Lock_time_min FLOAT,
1255 Lock_time_max FLOAT,
1256 Lock_time_pct_95 FLOAT,
1257 Lock_time_stddev FLOAT,
1258 Lock_time_median FLOAT,
1259 Rows_sent_sum FLOAT,
1260 Rows_sent_min FLOAT,
1261 Rows_sent_max FLOAT,
1262 Rows_sent_pct_95 FLOAT,
1263 Rows_sent_stddev FLOAT,
1264 Rows_sent_median FLOAT,
1265 Rows_examined_sum FLOAT,
1266 Rows_examined_min FLOAT,
1267 Rows_examined_max FLOAT,
1268 Rows_examined_pct_95 FLOAT,
1269 Rows_examined_stddev FLOAT,
1270 Rows_examined_median FLOAT,
1271 -- Percona extended slowlog attributes
1272 -- http://www.percona.com/docs/wiki/patches:slow_extended
1273 Rows_affected_sum FLOAT,
1274 Rows_affected_min FLOAT,
1275 Rows_affected_max FLOAT,
1276 Rows_affected_pct_95 FLOAT,
1277 Rows_affected_stddev FLOAT,
1278 Rows_affected_median FLOAT,
1279 Rows_read_sum FLOAT,
1280 Rows_read_min FLOAT,
1281 Rows_read_max FLOAT,
1282 Rows_read_pct_95 FLOAT,
1283 Rows_read_stddev FLOAT,
1284 Rows_read_median FLOAT,
1285 Merge_passes_sum FLOAT,
1286 Merge_passes_min FLOAT,
1287 Merge_passes_max FLOAT,
1288 Merge_passes_pct_95 FLOAT,
1289 Merge_passes_stddev FLOAT,
1290 Merge_passes_median FLOAT,
1291 InnoDB_IO_r_ops_min FLOAT,
1292 InnoDB_IO_r_ops_max FLOAT,
1293 InnoDB_IO_r_ops_pct_95 FLOAT,
1294 InnoDB_IO_r_ops_stddev FLOAT,
1295 InnoDB_IO_r_ops_median FLOAT,
1296 InnoDB_IO_r_bytes_min FLOAT,
1297 InnoDB_IO_r_bytes_max FLOAT,
1298 InnoDB_IO_r_bytes_pct_95 FLOAT,
1299 InnoDB_IO_r_bytes_stddev FLOAT,
1300 InnoDB_IO_r_bytes_median FLOAT,
1301 InnoDB_IO_r_wait_min FLOAT,
1302 InnoDB_IO_r_wait_max FLOAT,
1303 InnoDB_IO_r_wait_pct_95 FLOAT,
1304 InnoDB_IO_r_wait_stddev FLOAT,
1305 InnoDB_IO_r_wait_median FLOAT,
1306 InnoDB_rec_lock_wait_min FLOAT,
1307 InnoDB_rec_lock_wait_max FLOAT,
1308 InnoDB_rec_lock_wait_pct_95 FLOAT,
1309 InnoDB_rec_lock_wait_stddev FLOAT,
1310 InnoDB_rec_lock_wait_median FLOAT,
1311 InnoDB_queue_wait_min FLOAT,
1312 InnoDB_queue_wait_max FLOAT,
1313 InnoDB_queue_wait_pct_95 FLOAT,
1314 InnoDB_queue_wait_stddev FLOAT,
1315 InnoDB_queue_wait_median FLOAT,
1316 InnoDB_pages_distinct_min FLOAT,
1317 InnoDB_pages_distinct_max FLOAT,
1318 InnoDB_pages_distinct_pct_95 FLOAT,
1319 InnoDB_pages_distinct_stddev FLOAT,
1320 InnoDB_pages_distinct_median FLOAT,
1321 -- Boolean (Yes/No) attributes. Only the cnt and sum are needed for these.
1322 -- cnt is how many times is attribute was recorded and sum is how many of
1323 -- those times the value was Yes. Therefore sum/cnt * 100 = % of recorded
1324 -- times that the value was Yes.
1325 QC_Hit_cnt FLOAT,
1326 QC_Hit_sum FLOAT,
1327 Full_scan_cnt FLOAT,
1328 Full_scan_sum FLOAT,
1329 Full_join_cnt FLOAT,
1330 Full_join_sum FLOAT,
1331 Tmp_table_cnt FLOAT,
1332 Tmp_table_sum FLOAT,
1333 Disk_tmp_table_cnt FLOAT,
1334 Disk_tmp_table_sum FLOAT,
1335 Filesort_cnt FLOAT,
1336 Filesort_sum FLOAT,
1337 Disk_filesort_cnt FLOAT,
1338 Disk_filesort_sum FLOAT,
1339 PRIMARY KEY(checksum, ts_min, ts_max)
1340 );
1341
1342 Note that we store the count (cnt) for the ts attribute only; it
1343 will be redundant to store this for other attributes.
1344
1345 --run-time
1346 type: time
1347
1348 How long to run for each "--iterations". The default is to run
1349 forever (you can interrupt with CTRL-C). Because "--iterations"
1350 defaults to 1, if you only specify "--run-time", mk-query-digest
1351 runs for that amount of time and then exits. The two options are
1352 specified together to do collect-and-report cycles. For example,
1353 specifying "--iterations" 4 "--run-time" "15m" with a continuous
1354 input (like STDIN or "--processlist") will cause mk-query-digest to
1355 run for 1 hour (15 minutes x 4), reporting four times, once at each
1356 15 minute interval.
1357
1358 --run-time-mode
1359 type: string; default: clock
1360
1361 Set what the value of "--run-time" operates on. Following are the
1362 possible values for this option:
1363
1364 clock
1365 "--run-time" specifies an amount of real clock time during
1366 which the tool should run for each "--iterations".
1367
1368 event
1369 "--run-time" specifies an amount of log time. Log time is
1370 determined by timestamps in the log. The first timestamp seen
1371 is remembered, and each timestamp after that is compared to the
1372 first to determine how much log time has passed. For example,
1373 if the first timestamp seen is "12:00:00" and the next is
1374 "12:01:30", that is 1 minute and 30 seconds of log time. The
1375 tool will read events until the log time is greater than or
1376 equal to the specified "--run-time" value.
1377
1378 Since timestamps in logs are not always printed, or not always
1379 printed frequently, this mode varies in accuracy.
1380
1381 interval
1382 "--run-time" specifies interval boundaries of log time into
1383 which events are divided and reports are generated. This mode
1384 is different from the others because it doesn't specify how
1385 long to run. The value of "--run-time" must be an interval
1386 that divides evenly into minutes, hours or days. For example,
1387 "5m" divides evenly into hours (60/5=12, so 12 5 minutes
1388 intervals per hour) but "7m" does not (60/7=8.6).
1389
1390 Specifying "--run-time-mode interval --run-time 30m
1391 --iterations 0" is similar to specifying "--run-time-mode clock
1392 --run-time 30m --iterations 0". In the latter case, mk-query-
1393 digest will run forever, producing reports every 30 minutes,
1394 but this only works effectively with continuous inputs like
1395 STDIN and the processlist. For fixed inputs, like log files,
1396 the former example produces multiple reports by dividing the
1397 log into 30 minutes intervals based on timestamps.
1398
1399 Intervals are calculated from the zeroth second/minute/hour in
1400 which a timestamp occurs, not from whatever time it specifies.
1401 For example, with 30 minute intervals and a timestamp of
1402 "12:10:30", the interval is not "12:10:30" to "12:40:30", it is
1403 "12:00:00" to "12:29:59". Or, with 1 hour intervals, it is
1404 "12:00:00" to "12:59:59". When a new timestamp exceeds the
1405 interval, a report is printed, and the next interval is
1406 recalculated based on the new timestamp.
1407
1408 Since "--iterations" is 1 by default, you probably want to
1409 specify a new value else mk-query-digest will only get and
1410 report on the first interval from the log since 1 interval = 1
1411 iteration. If you want to get and report every interval in a
1412 log, specify "--iterations" 0.
1413
1414 --sample
1415 type: int
1416
1417 Filter out all but the first N occurrences of each query. The
1418 queries are filtered on the first value in "--group-by", so by
1419 default, this will filter by query fingerprint. For example,
1420 "--sample 2" will permit two sample queries for each fingerprint.
1421 Useful in conjunction with "--print" to print out the queries. You
1422 probably want to set "--no-report" to avoid the overhead of
1423 aggregating and reporting if you're just using this to print out
1424 samples of queries. A complete example:
1425
1426 mk-query-digest --sample 2 --no-report --print slow.log
1427
1428 --save-results
1429 type: string
1430
1431 Save results to the specified file.
1432
1433 If "--[no]gzip" is true (by default it is) then .gz is appended to
1434 the file name.
1435
1436 --select
1437 type: Array
1438
1439 Compute aggregate statistics for these attributes.
1440
1441 By default mk-query-digest auto-detects, aggregates and prints
1442 metrics for every query attribute that it finds in the slow query
1443 log. This option specifies a list of only the attributes that you
1444 want. You can specify an alternative attribute with a colon. For
1445 example, "db:Schema" uses db if it's available, and Schema if it's
1446 not.
1447
1448 Previously, mk-query-digest only aggregated these attributes:
1449
1450 Query_time,Lock_time,Rows_sent,Rows_examined,user,db:Schema,ts
1451
1452 Attributes specified in the "--review-history" table will always be
1453 selected even if you do not specify "--select".
1454
1455 See also "--ignore-attributes" and "ATTRIBUTES".
1456
1457 --set-vars
1458 type: string; default: wait_timeout=10000
1459
1460 Set these MySQL variables. Immediately after connecting to MySQL,
1461 this string will be appended to SET and executed.
1462
1463 --shorten
1464 type: int; default: 1024
1465
1466 Shorten long statements in reports.
1467
1468 Shortens long statements, replacing the omitted portion with a
1469 "/*... omitted ...*/" comment. This applies only to the output in
1470 reports, not to information stored for "--review" or other places.
1471 It prevents a large statement from causing difficulty in a report.
1472 The argument is the preferred length of the shortened statement.
1473 Not all statements can be shortened, but very large INSERT and
1474 similar statements often can; and so can IN() lists, although only
1475 the first such list in the statement will be shortened.
1476
1477 If it shortens something beyond recognition, you can find the
1478 original statement in the log, at the offset shown in the report
1479 header (see "OUTPUT").
1480
1481 --show-all
1482 type: Hash
1483
1484 Show all values for these attributes.
1485
1486 By default mk-query-digest only shows as many of an attribute's
1487 value that fit on a single line. This option allows you to specify
1488 attributes for which all values will be shown (line width is
1489 ignored). This only works for attributes with string values like
1490 user, host, db, etc. Multiple attributes can be specified, comma-
1491 separated.
1492
1493 --since
1494 type: string
1495
1496 Parse only queries newer than this value (parse queries since this
1497 date).
1498
1499 This option allows you to ignore queries older than a certain value
1500 and parse only those queries which are more recent than the value.
1501 The value can be several types:
1502
1503 * Simple time value N with optional suffix: N[shmd], where
1504 s=seconds, h=hours, m=minutes, d=days (default s if no suffix
1505 given); this is like saying "since N[shmd] ago"
1506 * Full date with optional hours:minutes:seconds:
1507 YYYY-MM-DD [HH:MM::SS]
1508 * Short, MySQL-style date:
1509 YYMMDD [HH:MM:SS]
1510 * Any time expression evaluated by MySQL:
1511 CURRENT_DATE - INTERVAL 7 DAY
1512
1513 If you give a MySQL time expression, then you must also specify a
1514 DSN so that mk-query-digest can connect to MySQL to evaluate the
1515 expression. If you specify "--execute", "--explain",
1516 "--processlist", "--review" or "--review-history", then one of
1517 these DSNs will be used automatically. Otherwise, you must specify
1518 an "--aux-dsn" or mk-query-digest will die saying that the value is
1519 invalid.
1520
1521 The MySQL time expression is wrapped inside a query like "SELECT
1522 UNIX_TIMESTAMP(<expression>)", so be sure that the expression is
1523 valid inside this query. For example, do not use UNIX_TIMESTAMP()
1524 because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.
1525
1526 Events are assumed to be in chronological--older events at the
1527 beginning of the log and newer events at the end of the log.
1528 "--since" is strict: it ignores all queries until one is found that
1529 is new enough. Therefore, if the query events are not consistently
1530 timestamped, some may be ignored which are actually new enough.
1531
1532 See also "--until".
1533
1534 --socket
1535 short form: -S; type: string
1536
1537 Socket file to use for connection.
1538
1539 --statistics
1540 Print statistics about internal counters. This option is mostly
1541 for development and debugging. The statistics report is printed
1542 for each iteration after all other reports, even if no events are
1543 processed or "--no-report" is specified. The statistics report
1544 looks like:
1545
1546 # No events processed.
1547
1548 # Statistic Count %/Events
1549 # ================================================ ====== ========
1550 # events_read 142030 100.00
1551 # events_parsed 50430 35.51
1552 # events_aggregated 0 0.00
1553 # ignored_midstream_server_response 18111 12.75
1554 # no_tcp_data 91600 64.49
1555 # pipeline_restarted_after_MemcachedProtocolParser 142030 100.00
1556 # pipeline_restarted_after_TcpdumpParser 1 0.00
1557 # unknown_client_command 1 0.00
1558 # unknown_client_data 32318 22.75
1559
1560 The first column is the internal counter name; the second column is
1561 counter's count; and the third column is the count as a percentage
1562 of "events_read".
1563
1564 In this case, it shows why no events were processed/aggregated:
1565 100% of events were rejected by the "MemcachedProtocolParser". Of
1566 those, 35.51% were data packets, but of these 12.75% of ignored
1567 mid-stream server response, one was an unknown client command, and
1568 22.75% were unknown client data. The other 64.49% were TCP control
1569 packets (probably most ACKs).
1570
1571 Since mk-query-digest is complex, you will probably need someone
1572 familiar with its code to decipher the statistics report.
1573
1574 --table-access
1575 Print a table access report.
1576
1577 The table access report shows which tables are accessed by all the
1578 queries and if the access is a read or write. The report looks
1579 like:
1580
1581 write `baz`.`tbl`
1582 read `baz`.`new_tbl`
1583 write `baz`.`tbl3`
1584 write `db6`.`tbl6`
1585
1586 If you pipe the output to sort, the read and write tables will be
1587 grouped together and sorted alphabetically:
1588
1589 read `baz`.`new_tbl`
1590 write `baz`.`tbl`
1591 write `baz`.`tbl3`
1592 write `db6`.`tbl6`
1593
1594 --tcpdump-errors
1595 type: string
1596
1597 Write the tcpdump data to this file on error. If mk-query-digest
1598 doesn't parse the stream correctly for some reason, the session's
1599 packets since the last query event will be written out to create a
1600 usable test case. If this happens, mk-query-digest will not raise
1601 an error; it will just discard the session's saved state and permit
1602 the tool to continue working. See "tcpdump" for more information
1603 about parsing tcpdump output.
1604
1605 --timeline
1606 Show a timeline of events.
1607
1608 This option makes mk-query-digest print another kind of report: a
1609 timeline of the events. Each query is still grouped and aggregate
1610 into classes according to "--group-by", but then they are printed
1611 in chronological order. The timeline report prints out the
1612 timestamp, interval, count and value of each classes.
1613
1614 If all you want is the timeline report, then specify "--no-report"
1615 to suppress the default query analysis report. Otherwise, the
1616 timeline report will be printed at the end before the response-time
1617 profile (see "--report-format" and "OUTPUT").
1618
1619 For example, this:
1620
1621 mk-query-digest /path/to/log --group-by distill --timeline
1622
1623 will print something like:
1624
1625 # ########################################################
1626 # distill report
1627 # ########################################################
1628 # 2009-07-25 11:19:27 1+00:00:01 2 SELECT foo
1629 # 2009-07-27 11:19:30 00:01 2 SELECT bar
1630 # 2009-07-27 11:30:00 1+06:30:00 2 SELECT foo
1631
1632 --type
1633 type: Array
1634
1635 The type of input to parse (default slowlog). The permitted types
1636 are
1637
1638 binlog
1639 Parse a binary log file.
1640
1641 genlog
1642 Parse a MySQL general log file. General logs lack a lot of
1643 "ATTRIBUTES", notably "Query_time". The default "--order-by"
1644 for general logs changes to "Query_time:cnt".
1645
1646 http
1647 Parse HTTP traffic from tcpdump.
1648
1649 pglog
1650 Parse a log file in PostgreSQL format. The parser will
1651 automatically recognize logs sent to syslog and transparently
1652 parse the syslog format, too. The recommended configuration
1653 for logging in your postgresql.conf is as follows.
1654
1655 The log_destination setting can be set to either syslog or
1656 stderr. Syslog has the added benefit of not interleaving log
1657 messages from several sessions concurrently, which the parser
1658 cannot handle, so this might be better than stderr. CSV-
1659 formatted logs are not supported at this time.
1660
1661 The log_min_duration_statement setting should be set to 0 to
1662 capture all statements with their durations. Alternatively,
1663 the parser will also recognize and handle various combinations
1664 of log_duration and log_statement.
1665
1666 You may enable log_connections and log_disconnections, but this
1667 is optional.
1668
1669 It is highly recommended to set your log_line_prefix to the
1670 following:
1671
1672 log_line_prefix = '%m c=%c,u=%u,D=%d '
1673
1674 This lets the parser find timestamps with milliseconds, session
1675 IDs, users, and databases from the log. If these items are
1676 missing, you'll simply get less information to analyze. For
1677 compatibility with other log analysis tools such as PQA and
1678 pgfouine, various log line prefix formats are supported. The
1679 general format is as follows: a timestamp can be detected and
1680 extracted (the syslog timestamp is NOT parsed), and a
1681 name=value list of properties can also. Although the suggested
1682 format is as shown above, any name=value list will be captured
1683 and interpreted by using the first letter of the 'name' part,
1684 lowercased, to determine the meaning of the item. The
1685 lowercased first letter is interpreted to mean the same thing
1686 as PostgreSQL's built-in %-codes for the log_line_prefix format
1687 string. For example, u means user, so unicorn=fred will be
1688 interpreted as user=fred; d means database, so D=john will be
1689 interpreted as database=john. The pgfouine-suggested
1690 formatting is user=%u and db=%d, so it should Just Work
1691 regardless of which format you choose. The main thing is to
1692 add as much information as possible into the log_line_prefix to
1693 permit richer analysis.
1694
1695 Currently, only English locale messages are supported, so if
1696 your server's locale is set to something else, the log won't be
1697 parsed properly. (Log messages with "duration:" and
1698 "statement:" won't be recognized.)
1699
1700 slowlog
1701 Parse a log file in any variation of MySQL slow-log format.
1702
1703 tcpdump
1704 Inspect network packets and decode the MySQL client protocol,
1705 extracting queries and responses from it.
1706
1707 mk-query-digest does not actually watch the network (i.e. it
1708 does NOT "sniff packets"). Instead, it's just parsing the
1709 output of tcpdump. You are responsible for generating this
1710 output; mk-query-digest does not do it for you. Then you send
1711 this to mk-query-digest as you would any log file: as files on
1712 the command line or to STDIN.
1713
1714 The parser expects the input to be formatted with the following
1715 options: "-x -n -q -tttt". For example, if you want to capture
1716 output from your local machine, you can do something like the
1717 following (the port must come last on FreeBSD):
1718
1719 tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 \
1720 > mysql.tcp.txt
1721 mk-query-digest --type tcpdump mysql.tcp.txt
1722
1723 The other tcpdump parameters, such as -s, -c, and -i, are up to
1724 you. Just make sure the output looks like this (there is a
1725 line break in the first line to avoid man-page problems):
1726
1727 2009-04-12 09:50:16.804849 IP 127.0.0.1.42167
1728 > 127.0.0.1.3306: tcp 37
1729 0x0000: 4508 0059 6eb2 4000 4006 cde2 7f00 0001
1730 0x0010: ....
1731
1732 Remember tcpdump has a handy -c option to stop after it
1733 captures some number of packets! That's very useful for
1734 testing your tcpdump command. Note that tcpdump can't capture
1735 traffic on a Unix socket. Read
1736 <http://bugs.mysql.com/bug.php?id=31577> if you're confused
1737 about this.
1738
1739 Devananda Van Der Veen explained on the MySQL Performance Blog
1740 how to capture traffic without dropping packets on busy
1741 servers. Dropped packets cause mk-query-digest to miss the
1742 response to a request, then see the response to a later request
1743 and assign the wrong execution time to the query. You can
1744 change the filter to something like the following to help
1745 capture a subset of the queries. (See
1746 <http://www.mysqlperformanceblog.com/?p=6092> for details.)
1747
1748 tcpdump -i any -s 65535 -x -n -q -tttt \
1749 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'
1750
1751 All MySQL servers running on port 3306 are automatically
1752 detected in the tcpdump output. Therefore, if the tcpdump out
1753 contains packets from multiple servers on port 3306 (for
1754 example, 10.0.0.1:3306, 10.0.0.2:3306, etc.), all
1755 packets/queries from all these servers will be analyzed
1756 together as if they were one server.
1757
1758 If you're analyzing traffic for a MySQL server that is not
1759 running on port 3306, see "--watch-server".
1760
1761 Also note that mk-query-digest may fail to report the database
1762 for queries when parsing tcpdump output. The database is
1763 discovered only in the initial connect events for a new client
1764 or when <USE db> is executed. If the tcpdump output contains
1765 neither of these, then mk-query-digest cannot discover the
1766 database.
1767
1768 Server-side prepared statements are supported. SSL-encrypted
1769 traffic cannot be inspected and decoded.
1770
1771 memcached
1772 Similar to tcpdump, but the expected input is memcached packets
1773 instead of MySQL packets. For example:
1774
1775 tcpdump -i any port 11211 -s 65535 -x -nn -q -tttt \
1776 > memcached.tcp.txt
1777 mk-query-digest --type memcached memcached.tcp.txt
1778
1779 memcached uses port 11211 by default.
1780
1781 --until
1782 type: string
1783
1784 Parse only queries older than this value (parse queries until this
1785 date).
1786
1787 This option allows you to ignore queries newer than a certain value
1788 and parse only those queries which are older than the value. The
1789 value can be one of the same types listed for "--since".
1790
1791 Unlike "--since", "--until" is not strict: all queries are parsed
1792 until one has a timestamp that is equal to or greater than
1793 "--until". Then all subsequent queries are ignored.
1794
1795 --user
1796 short form: -u; type: string
1797
1798 User for login if not current user.
1799
1800 --variations
1801 type: Array
1802
1803 Report the number of variations in these attributes' values.
1804
1805 Variations show how many distinct values an attribute had within a
1806 class. The usual value for this option is "arg" which shows how
1807 many distinct queries were in the class. This can be useful to
1808 determine a query's cacheability.
1809
1810 Distinct values are determined by CRC32 checksums of the
1811 attributes' values. These checksums are reported in the query
1812 report for attributes specified by this option, like:
1813
1814 # arg crc 109 (1/25%), 144 (1/25%)... 2 more
1815
1816 In that class there were 4 distinct queries. The checksums of the
1817 first two variations are shown, and each one occurred once (or, 25%
1818 of the time).
1819
1820 The counts of distinct variations is approximate because only 1,000
1821 variations are saved. The mod (%) 1000 of the full CRC32 checksum
1822 is saved, so some distinct checksums are treated as equal.
1823
1824 --version
1825 Show version and exit.
1826
1827 --watch-server
1828 type: string
1829
1830 This option tells mk-query-digest which server IP address and port
1831 (like "10.0.0.1:3306") to watch when parsing tcpdump (for "--type"
1832 tcpdump and memcached); all other servers are ignored. If you
1833 don't specify it, mk-query-digest watches all servers by looking
1834 for any IP address using port 3306 or "mysql". If you're watching
1835 a server with a non-standard port, this won't work, so you must
1836 specify the IP address and port to watch.
1837
1838 If you want to watch a mix of servers, some running on standard
1839 port 3306 and some running on non-standard ports, you need to
1840 create separate tcpdump outputs for the non-standard port servers
1841 and then specify this option for each. At present mk-query-digest
1842 cannot auto-detect servers on port 3306 and also be told to watch a
1843 server on a non-standard port.
1844
1845 --[no]zero-admin
1846 default: yes
1847
1848 Zero out the Rows_XXX properties for administrator command events.
1849
1850 --[no]zero-bool
1851 default: yes
1852
1853 Print 0% boolean values in report.
1854
1856 These DSN options are used to create a DSN. Each option is given like
1857 "option=value". The options are case-sensitive, so P and p are not the
1858 same option. There cannot be whitespace before or after the "=" and if
1859 the value contains whitespace it must be quoted. DSN options are
1860 comma-separated. See the maatkit manpage for full details.
1861
1862 • A
1863
1864 dsn: charset; copy: yes
1865
1866 Default character set.
1867
1868 • D
1869
1870 dsn: database; copy: yes
1871
1872 Database that contains the query review table.
1873
1874 • F
1875
1876 dsn: mysql_read_default_file; copy: yes
1877
1878 Only read default options from the given file
1879
1880 • h
1881
1882 dsn: host; copy: yes
1883
1884 Connect to host.
1885
1886 • p
1887
1888 dsn: password; copy: yes
1889
1890 Password to use when connecting.
1891
1892 • P
1893
1894 dsn: port; copy: yes
1895
1896 Port number to use for connection.
1897
1898 • S
1899
1900 dsn: mysql_socket; copy: yes
1901
1902 Socket file to use for connection.
1903
1904 • t
1905
1906 Table to use as the query review table.
1907
1908 • u
1909
1910 dsn: user; copy: yes
1911
1912 User for login if not current user.
1913
1915 You can download Maatkit from Google Code at
1916 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1917 easily with a command like the following:
1918
1919 wget http://www.maatkit.org/get/toolname
1920 or
1921 wget http://www.maatkit.org/trunk/toolname
1922
1923 Where "toolname" can be replaced with the name (or fragment of a name)
1924 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1925 installation is needed. The first URL gets the latest released version
1926 of the tool, and the second gets the latest trunk code from Subversion.
1927
1929 The environment variable "MKDEBUG" enables verbose debugging output in
1930 all of the Maatkit tools:
1931
1932 MKDEBUG=1 mk-....
1933
1935 You need Perl and some core packages that ought to be installed in any
1936 reasonably new version of Perl.
1937
1939 For a list of known bugs see
1940 <http://www.maatkit.org/bugs/mk-query-digest>.
1941
1942 Please use Google Code Issues and Groups to report bugs or request
1943 support: <http://code.google.com/p/maatkit/>. You can also join
1944 #maatkit on Freenode to discuss Maatkit.
1945
1946 Please include the complete command-line used to reproduce the problem
1947 you are seeing, the version of all MySQL servers involved, the complete
1948 output of the tool when run with "--version", and if possible,
1949 debugging output produced by running with the "MKDEBUG=1" environment
1950 variable.
1951
1953 This program is copyright 2007-2011 Baron Schwartz. Feedback and
1954 improvements are welcome.
1955
1956 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1957 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1958 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1959
1960 This program is free software; you can redistribute it and/or modify it
1961 under the terms of the GNU General Public License as published by the
1962 Free Software Foundation, version 2; OR the Perl Artistic License. On
1963 UNIX and similar systems, you can issue `man perlgpl' or `man
1964 perlartistic' to read these licenses.
1965
1966 You should have received a copy of the GNU General Public License along
1967 with this program; if not, write to the Free Software Foundation, Inc.,
1968 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1969
1971 Baron Schwartz, Daniel Nichter
1972
1974 This tool is part of Maatkit, a toolkit for power users of MySQL.
1975 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1976 primary code contributors. Both are employed by Percona. Financial
1977 support for Maatkit development is primarily provided by Percona and
1978 its clients.
1979
1981 This manual page documents Ver 0.9.29 Distrib 7540 $Revision: 7531 $.
1982
1983
1984
1985perl v5.38.0 2023-07-20 MK-QUERY-DIGEST(1)