1MK-QUERY-DIGEST(1) User Contributed Perl Documentation MK-QUERY-DIGEST(1)
2
3
4
6 mk-query-digest - Parses logs and more. Analyze, transform, filter,
7 review and report on queries.
8
10 Analyze, aggregate, and report on a slow query log:
11
12 mk-query-digest /path/to/slow.log
13
14 Review a slow log, saving results to the test.query_review table in a
15 MySQL server running on host1. See "--review" for more on reviewing
16 queries:
17
18 mk-query-digest --review h=host1,D=test,t=query_review /path/to/slow.log
19
20 Filter out everything but SELECT queries, replay the queries against
21 another server, then use the timings from replaying them to analyze
22 their performance:
23
24 mk-query-digest /path/to/slow.log --execute h=another_server \
25 --filter '$event->{fingerprint} =~ m/^select/'
26
27 Print the structure of events so you can construct a complex
28 "--filter":
29
30 mk-query-digest /path/to/slow.log --no-report \
31 --filter 'print Dumper($event)'
32
33 Watch SHOW FULL PROCESSLIST and output a log in slow query log format:
34
35 mk-query-digest --processlist h=host1 --print --no-report
36
37 The default aggregation and analysis is CPU and memory intensive.
38 Disable it if you don't need the default report:
39
40 mk-query-digest <arguments> --no-report
41
43 The following section is included to inform users about the potential
44 risks, whether known or unknown, of using this tool. The two main
45 categories of risks are those created by the nature of the tool (e.g.
46 read-only tools vs. read-write tools) and those created by bugs.
47
48 By default mk-query-digest merely collects and aggregates data from the
49 files specified. It is designed to be as efficient as possible, but
50 depending on the input you give it, it can use a lot of CPU and memory.
51 Practically speaking, it is safe to run even on production systems, but
52 you might want to monitor it until you are satisfied that the input you
53 give it does not cause undue load.
54
55 Various options will cause mk-query-digest to insert data into tables,
56 execute SQL queries, and so on. These include the "--execute" option
57 and "--review".
58
59 At the time of this release, we know of no bugs that could cause
60 serious harm to users.
61
62 The authoritative source for updated information is always the online
63 issue tracking system. Issues that affect this tool will be marked as
64 such. You can see a list of such issues at the following URL:
65 http://www.maatkit.org/bugs/mk-query-digest
66 <http://www.maatkit.org/bugs/mk-query-digest>.
67
68 See also "BUGS" for more information on filing bugs and getting help.
69
71 This tool was formerly known as mk-log-parser.
72
73 "mk-query-digest" is a framework for doing things with events from a
74 query source such as the slow query log or PROCESSLIST. By default it
75 acts as a very sophisticated log analysis tool. You can group and sort
76 queries in many different ways simultaneously and find the most
77 expensive queries, or create a timeline of queries in the log, for
78 example. It can also do a "query review," which means to save a sample
79 of each type of query into a MySQL table so you can easily see whether
80 you've reviewed and analyzed a query before. The benefit of this is
81 that you can keep track of changes to your server's queries and avoid
82 repeated work. You can also save other information with the queries,
83 such as comments, issue numbers in your ticketing system, and so on.
84
85 Note that this is a work in *very* active progress and you should
86 expect incompatible changes in the future.
87
89 mk-query-digest works on events, which are a collection of key/value
90 pairs called attributes. You'll recognize most of the attributes right
91 away: Query_time, Lock_time, and so on. You can just look at a slow
92 log and see them. However, there are some that don't exist in the slow
93 log, and slow logs may actually include different kinds of attributes
94 (for example, you may have a server with the Percona patches).
95
96 For a full list of attributes, see
97 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
98
99 memcached
100 memcached events have additional attributes related to the memcached
101 protocol: cmd, key, res (result) and val. Also, boolean attributes are
102 created for the various commands, misses and errors: Memc_CMD where CMD
103 is a memcached command (get, set, delete, etc.), Memc_error and
104 Memc_miss.
105
106 These attributes are no different from slow log attributes, so you can
107 use them with "--[no]report", "--group-by", in a "--filter", etc.
108
109 These attributes and more are documented at
110 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
111
113 The default output is a query analysis report. The "--[no]report"
114 option controls whether or not this report is printed. Sometimes you
115 may wish to parse all the queries but suppress the report, for example
116 when using "--print", "--review" or "--save-results".
117
118 There is one paragraph for each class of query analyzed. A "class" of
119 queries all have the same value for the "--group-by" attribute which is
120 "fingerprint" by default. (See "ATTRIBUTES".) A fingerprint is an
121 abstracted version of the query text with literals removed, whitespace
122 collapsed, and so forth. The report is formatted so it's easy to paste
123 into emails without wrapping, and all non-query lines begin with a
124 comment, so you can save it to a .sql file and open it in your favorite
125 syntax-highlighting text editor. There is a response-time profile at
126 the beginning.
127
128 The output described here is controlled by "--report-format". That
129 option allows you to specify what to print and in what order. The
130 default output in the default order is described here.
131
132 The report, by default, begins with a paragraph about the entire
133 analysis run The information is very similar to what you'll see for
134 each class of queries in the log, but it doesn't have some information
135 that would be too expensive to keep globally for the analysis. It also
136 has some statistics about the code's excution itself, such as the CPU
137 and memory usage, the local date and time of the run, and a list of
138 input file read/parsed.
139
140 Following this is the response-time profile over the events. This is a
141 highly summarized view of the unique events in the detailed query
142 report that follows. Here is a sample:
143
144 # Rank Query ID Response time Calls R/Call Item
145 # ==== ================== ================ ======= ========== ====
146 # 1 0x31DA25F95494CA95 0.1494 99.9% 1 0.149435 SHOW
147 # 2 0x3AEAAD0E15D725B5 0.0001 0.1% 2 0.000041 SET
148 # 3 0x813031B8BBC3B329 0.0000 0.0% 1 0.000015 COMMIT
149
150 The columns should be fairly self-explanatory: rank, query ID, response
151 time sum and percentage of total; number of calls and response time per
152 call; and the distilled query (see "distill" for more detail on this).
153
154 Next, the detailed query report is printed. Each query appears in a
155 paragraph. Here is a sample, slightly reformatted so 'perldoc' will
156 not wrap lines in a terminal. The following will all be one paragraph,
157 but we'll break it up for commentary.
158
159 # Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665
160
161 This line identifies the sequential number of the query in the sort
162 order specified by "--order-by". Then there's the queries per second,
163 and the approximate concurrency for this query (calculated as a
164 function of the timespan and total Query_time). Next there's a query
165 ID. This ID is a hex version of the query's checksum in the database,
166 if you're using "--review". You can select the reviewed query's
167 details from the database with a query like "SELECT .... WHERE
168 checksum=0xFDEA8D2993C9CAF3".
169
170 Finally, in case you want to find a sample of the query in the log
171 file, there's the byte offset where you can look. (This is not always
172 accurate, due to some silly anomalies in the slow-log format, but it's
173 usually right.) The position refers to the worst sample, which we'll
174 see more about below.
175
176 Next is the table of metrics about this class of queries.
177
178 # pct total min max avg 95% stddev median
179 # Count 0 2
180 # Exec time 13 1105s 552s 554s 553s 554s 2s 553s
181 # Lock time 0 216us 99us 117us 108us 117us 12us 108us
182 # Rows sent 20 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
183 # Rows exam 0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
184
185 The first line is column headers for the table. The percentage is the
186 percent of the total for the whole analysis run, and the total is the
187 actual value of the specified metric. For example, in this case we can
188 see that the query executed 2 times, which is 13% of the total number
189 of queries in the file. The min, max and avg columns are self-
190 explanatory. The 95% column shows the 95th percentile; 95% of the
191 values are less than or equal to this value. The standard deviation
192 shows you how tightly grouped the values are. The standard deviation
193 and median are both calculated from the 95th percentile, discarding the
194 extremely large values.
195
196 The stddev, median and 95th percentile statistics are approximate.
197 Exact statistics require keeping every value seen, sorting, and doing
198 some calculations on them. This uses a lot of memory. To avoid this,
199 we keep 1000 buckets, each of them 5% bigger than the one before,
200 ranging from .000001 up to a very big number. When we see a value we
201 increment the bucket into which it falls. Thus we have fixed memory
202 per class of queries. The drawback is the imprecision, which typically
203 falls in the 5 percent range.
204
205 Next we have statistics on the users, databases and time range for the
206 query.
207
208 # Users 1 user1
209 # Databases 2 db1(1), db2(1)
210 # Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15
211
212 The users and databases are shown as a count of distinct values,
213 followed by the values. If there's only one, it's shown alone; if
214 there are many, we show each of the most frequent ones, followed by the
215 number of times it appears.
216
217 # Query_time distribution
218 # 1us
219 # 10us
220 # 100us
221 # 1ms
222 # 10ms
223 # 100ms
224 # 1s
225 # 10s+ #############################################################
226
227 The execution times show a logarithmic chart of time clustering. Each
228 query goes into one of the "buckets" and is counted up. The buckets
229 are powers of ten. The first bucket is all values in the "single
230 microsecond range" -- that is, less than 10us. The second is "tens of
231 microseconds," which is from 10us up to (but not including) 100us; and
232 so on. The charted attribute can be changed by specifying
233 "--report-histogram" but is limited to time-based attributes.
234
235 # Tables
236 # SHOW TABLE STATUS LIKE 'table1'\G
237 # SHOW CREATE TABLE `table1`\G
238 # EXPLAIN
239 SELECT * FROM table1\G
240
241 This section is a convenience: if you're trying to optimize the queries
242 you see in the slow log, you probably want to examine the table
243 structure and size. These are copy-and-paste-ready commands to do
244 that.
245
246 Finally, we see a sample of the queries in this class of query. This
247 is not a random sample. It is the query that performed the worst,
248 according to the sort order given by "--order-by". You will normally
249 see a commented "# EXPLAIN" line just before it, so you can copy-paste
250 the query to examine its EXPLAIN plan. But for non-SELECT queries that
251 isn't possible to do, so the tool tries to transform the query into a
252 roughly equivalent SELECT query, and adds that below.
253
254 If you want to find this sample event in the log, use the offset
255 mentioned above, and something like the following:
256
257 tail -c +<offset> /path/to/file | head
258
259 See also "--report-format".
260
262 A "query review" is the process of storing all the query fingerprints
263 analyzed. This has several benefits:
264
265 · You can add meta-data to classes of queries, such as marking them
266 for follow-up, adding notes to queries, or marking them with an
267 issue ID for your issue tracking system.
268
269 · You can refer to the stored values on subsequent runs so you'll
270 know whether you've seen a query before. This can help you cut
271 down on duplicated work.
272
273 · You can store historical data such as the row count, query times,
274 and generally anything you can see in the report.
275
276 To use this feature, you run mk-query-digest with the "--review"
277 option. It will store the fingerprints and other information into the
278 table you specify. Next time you run it with the same option, it will
279 do the following:
280
281 · It won't show you queries you've already reviewed. A query is
282 considered to be already reviewed if you've set a value for the
283 "reviewed_by" column. (If you want to see queries you've already
284 reviewed, use the "--report-all" option.)
285
286 · Queries that you've reviewed, and don't appear in the output, will
287 cause gaps in the query number sequence in the first line of each
288 paragraph. And the value you've specified for "--limit" will still
289 be honored. So if you've reviewed all queries in the top 10 and
290 you ask for the top 10, you won't see anything in the output.
291
292 · If you want to see the queries you've already reviewed, you can
293 specify "--report-all". Then you'll see the normal analysis
294 output, but you'll also see the information from the review table,
295 just below the execution time graph. For example,
296
297 # Review information
298 # comments: really bad IN() subquery, fix soon!
299 # first_seen: 2008-12-01 11:48:57
300 # jira_ticket: 1933
301 # last_seen: 2008-12-18 11:49:07
302 # priority: high
303 # reviewed_by: xaprb
304 # reviewed_on: 2008-12-18 15:03:11
305
306 You can see how useful this meta-data is -- as you analyze your
307 queries, you get your comments integrated right into the report.
308
309 If you add the "--review-history" option, it will also store
310 information into a separate database table, so you can keep
311 historical trending information on classes of queries.
312
314 A query fingerprint is the abstracted form of a query, which makes it
315 possible to group similar queries together. Abstracting a query
316 removes literal values, normalizes whitespace, and so on. For example,
317 consider these two queries:
318
319 SELECT name, password FROM user WHERE id='12823';
320 select name, password from user
321 where id=5;
322
323 Both of those queries will fingerprint to
324
325 select name, password from user where id=?
326
327 Once the query's fingerprint is known, we can then talk about a query
328 as though it represents all similar queries.
329
330 What "mk-query-digest" does is analogous to a GROUP BY statement in
331 SQL. (But note that "multiple columns" doesn't define a multi-column
332 grouping; it defines multiple reports!) If your command-line looks like
333 this,
334
335 mk-query-digest /path/to/slow.log --select Rows_read,Rows_sent \
336 --group-by fingerprint --order-by Query_time:sum --limit 10
337
338 The corresponding pseudo-SQL looks like this:
339
340 SELECT WORST(query BY Query_time), SUM(Query_time), ...
341 FROM /path/to/slow.log
342 GROUP BY FINGERPRINT(query)
343 ORDER BY SUM(Query_time) DESC
344 LIMIT 10
345
346 You can also use the value "distill", which is a kind of super-
347 fingerprint. See "--group-by" for more.
348
349 When parsing memcached input ("--type" memcached), the fingerprint is
350 an abstracted version of the command and key, with placeholders
351 removed. For example, "get user_123_preferences" fingerprints to "get
352 user_?_preferences". There is also a "key_print" which a fingerprinted
353 version of the key. This example's key_print is "user_?_preferences".
354
355 Query fingerprinting accommodates a great many special cases, which
356 have proven necessary in the real world. For example, an IN list with
357 5 literals is really equivalent to one with 4 literals, so lists of
358 literals are collapsed to a single one. If you want to understand more
359 about how and why all of these cases are handled, please review the
360 test cases in the Subversion repository. If you find something that is
361 not fingerprinted properly, please submit a bug report with a
362 reproducible test case. Here is a list of transformations during
363 fingerprinting, which might not be exhaustive:
364
365 · Group all SELECT queries from mysqldump together, even if they are
366 against different tables. Ditto for all of mk-table-checksum's
367 checksum queries.
368
369 · Shorten multi-value INSERT statements to a single VALUES() list.
370
371 · Strip comments.
372
373 · Abstract the databases in USE statements, so all USE statements are
374 grouped together.
375
376 · Replace all literals, such as quoted strings. For efficiency, the
377 code that replaces literal numbers is somewhat non-selective, and
378 might replace some things as numbers when they really are not.
379 Hexadecimal literals are also replaced. NULL is treated as a
380 literal. Numbers embedded in identifiers are also replaced, so
381 tables named similarly will be fingerprinted to the same values
382 (e.g. users_2009 and users_2010 will fingerprint identically).
383
384 · Collapse all whitespace into a single space.
385
386 · Lowercase the entire query.
387
388 · Replace all literals inside of IN() and VALUES() lists with a
389 single placeholder, regardless of cardinality.
390
391 · Collapse multiple identical UNION queries into a single one.
392
394 DSN values in "--review-history" default to values in "--review" if
395 COPY is yes.
396
397 --ask-pass
398 Prompt for a password when connecting to MySQL.
399
400 --attribute-aliases
401 type: array; default: db|Schema
402
403 List of attribute|alias,etc.
404
405 Certain attributes have multiple names, like db and Schema. If an
406 event does not have the priamry attribute, mk-query-digest looks
407 for an alias attribute. If it finds an alias, it creates the
408 primary attribute with the alias attribute's value and removes the
409 alias attribute.
410
411 If the event has the primary attribute, all alias attributes are
412 deleted.
413
414 This helps to simplify event attributes so that, for example, there
415 will not be report lines for both db and Schema.
416
417 --attribute-value-limit
418 type: int; default: 4294967296
419
420 A sanity limit for attribute values.
421
422 This option deals with bugs in slow-logging functionality that
423 causes large values for attributes. If the attribute's value is
424 bigger than this, the last-seen value for that class of query is
425 used instead.
426
427 --aux-dsn
428 type: DSN
429
430 Auxiliary DSN used for special options.
431
432 The following options may require a DSN even when only parsing a
433 slow log file:
434
435 * L<"--since">
436 * L<"--until">
437
438 See each option for why it might require a DSN.
439
440 --charset
441 short form: -A; type: string
442
443 Default character set. If the value is utf8, sets Perl's binmode
444 on STDOUT to utf8, passes the mysql_enable_utf8 option to
445 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
446 other value sets binmode on STDOUT without the utf8 layer, and runs
447 SET NAMES after connecting to MySQL.
448
449 --check-attributes-limit
450 type: int; default: 1000
451
452 Stop checking for new attributes after this many events.
453
454 For better speed, mk-query-digest stops checking events for new
455 attributes after a certain number of events. Any new attributes
456 after this number will be ignored and will not be reported.
457
458 One special case is new attributes for pre-existing query classes
459 (see "--group-by" about query classes). New attributes will not be
460 added to pre-existing query classes even if the attributes are
461 detected before the "--check-attributes-limit" limit.
462
463 --config
464 type: Array
465
466 Read this comma-separated list of config files; if specified, this
467 must be the first option on the command line.
468
469 --[no]continue-on-error
470 default: yes
471
472 Continue parsing even if there is an error.
473
474 --create-review-history-table
475 Create the "--review-history" table if it does not exist.
476
477 This option causes the table specified by "--review-history" to be
478 created with the default structure shown in the documentation for
479 that option.
480
481 --create-review-table
482 Create the "--review" table if it does not exist.
483
484 This option causes the table specified by "--review" to be created
485 with the default structure shown in the documentation for that
486 option.
487
488 --daemonize
489 Fork to the background and detach from the shell. POSIX operating
490 systems only.
491
492 --defaults-file
493 short form: -F; type: string
494
495 Only read mysql options from the given file. You must give an
496 absolute pathname.
497
498 --embedded-attributes
499 type: array
500
501 Two Perl regex patterns to capture pseudo-attributes embedded in
502 queries.
503
504 Embedded attributes might be special attribute-value pairs that
505 you've hidden in comments. The first regex should match the entire
506 set of attributes (in case there are multiple). The second regex
507 should match and capture attribute-value pairs from the first
508 regex.
509
510 For example, suppose your query looks like the following:
511
512 SELECT * from users -- file: /login.php, line: 493;
513
514 You might run mk-query-digest with the following option:
515
516 mk-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'
517
518 The first regular expression captures the whole comment:
519
520 " -- file: /login.php, line: 493;"
521
522 The second one splits it into attribute-value pairs and adds them
523 to the event:
524
525 ATTRIBUTE VALUE
526 ========= ==========
527 file /login.php
528 line 493
529
530 NOTE: All commas in the regex patterns must be escaped with \
531 otherwise the pattern will break.
532
533 --execute
534 type: DSN
535
536 Execute queries on this DSN.
537
538 Adds a callback into the chain, after filters but before the
539 reports. Events are executed on this DSN. If they are successful,
540 the time they take to execute overwrites the event's Query_time
541 attribute. If unsuccessful, the callback returns false and
542 terminates the chain.
543
544 If the connection fails, mk-query-digest tries to reconnect once
545 per second.
546
547 See also "--mirror" and "--execute-throttle".
548
549 --execute-throttle
550 type: array
551
552 Throttle values for "--execute".
553
554 By default "--execute" runs without any limitations or concerns for
555 the amount of time that it takes to execute the events. The
556 "--execute-throttle" allows you to limit the amount of time spent
557 doing "--execute" relative to the other processes that handle
558 events. This works by marking some events with a "Skip_exec"
559 attribute when "--execute" begins to take too much time.
560 "--execute" will not execute an event if this attribute is true.
561 This indirectly decreases the time spent doing "--execute".
562
563 The "--execute-throttle" option takes at least two comma-separated
564 values: max allowed "--execute" time as a percentage and a check
565 interval time. An optional third value is a percentage step for
566 increasing and decreasing the probability that an event will be
567 marked "Skip_exec" true. 5 (percent) is the default step.
568
569 For example: "--execute-throttle" "70,60,10". This will limit
570 "--execute" to 70% of total event processing time, checked every
571 minute (60 seconds) and probability stepped up and down by 10%.
572 When "--execute" exceeds 70%, the probability that events will be
573 marked "Skip_exec" true increases by 10%. "--execute" time is
574 checked again after another minute. If it's still above 70%, then
575 the probability will increase another 10%. Or, if it's dropped
576 below 70%, then the probability will decrease by 10%.
577
578 --expected-range
579 type: array; default: 5,10
580
581 Explain items when there are more or fewer than expected.
582
583 Defines the number of items expected to be seen in the report given
584 by "--[no]report", as controlled by "--limit" and "--outliers". If
585 there are more or fewer items in the report, each one will explain
586 why it was included.
587
588 --explain
589 type: DSN
590
591 Run EXPLAIN for the sample query with this DSN and print results.
592
593 This works only when "--group-by" includes fingerprint. It causes
594 mk-query-digest to run EXPLAIN and include the output into the
595 report. For safety, queries that appear to have a subquery that
596 EXPLAIN will execute won't be EXPLAINed. Those are typically
597 "derived table" queries of the form
598
599 select ... from ( select .... ) der;
600
601 --filter
602 type: string
603
604 Discard events for which this Perl code doesn't return true.
605
606 This option is a string of Perl code or a file containing Perl code
607 that gets compiled into a subroutine with one argument: $event.
608 This is a hashref. If the given value is a readable file, then mk-
609 query-digest reads the entire file and uses its contents as the
610 code. The file should not contain a shebang (#!/usr/bin/perl)
611 line.
612
613 If the code returns true, the chain of callbacks continues;
614 otherwise it ends. The code is the last statement in the
615 subroutine other than "return $event". The subroutine template is:
616
617 sub { $event = shift; filter && return $event; }
618
619 Filters given on the command line are wrapped inside parentheses
620 like like "( filter )". For complex, multi-line filters, you must
621 put the code inside a file so it will not be wrapped inside
622 parentheses. Either way, the filter must produce syntactically
623 valid code given the template. For example, an if-else branch
624 given on the command line would not be valid:
625
626 --filter 'if () { } else { }' # WRONG
627
628 Since it's given on the command line, the if-else branch would be
629 wrapped inside parentheses which is not syntactically valid. So to
630 accomplish something more complex like this would require putting
631 the code in a file, for example filter.txt:
632
633 my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
634
635 Then specify "--filter filter.txt" to read the code from
636 filter.txt.
637
638 If the filter code won't compile, mk-query-digest will die with an
639 error. If the filter code does compile, an error may still occur
640 at runtime if the code tries to do something wrong (like pattern
641 match an undefined value). mk-query-digest does not provide any
642 safeguards so code carefully!
643
644 An example filter that discards everything but SELECT statements:
645
646 --filter '$event->{arg} =~ m/^select/i'
647
648 This is compiled into a subroutine like the following:
649
650 sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
651
652 It is permissible for the code to have side effects (to alter
653 $event).
654
655 You can find an explanation of the structure of $event at
656 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
657
658 Here are more examples of filter code:
659
660 Host/IP matches domain.com
661 --filter '($event->{host} || $event->{ip} || "") =~
662 m/domain.com/'
663
664 Sometimes MySQL logs the host where the IP is expected.
665 Therefore, we check both.
666
667 User matches john
668 --filter '($event->{user} || "") =~ m/john/'
669
670 More than 1 warning
671 --filter '($event->{Warning_count} || 0) > 1'
672
673 Query does full table scan or full join
674 --filter '(($event->{Full_scan} || "") eq "Yes") ||
675 (($event->{Full_join} || "") eq "Yes")'
676
677 Query was not served from query cache
678 --filter '($event->{QC_Hit} || "") eq "No"'
679
680 Query is 1 MB or larger
681 --filter '$event->{bytes} >= 1_048_576'
682
683 --fingerprints
684 Add query fingerprints to the standard query analysis report. This
685 is mostly useful for debugging purposes.
686
687 --[no]for-explain
688 default: yes
689
690 Print extra information to make analysis easy.
691
692 This option adds code snippets to make it easy to run SHOW CREATE
693 TABLE and SHOW TABLE STATUS for the query's tables. It also
694 rewrites non-SELECT queries into a SELECT that might be helpful for
695 determining the non-SELECT statement's index usage.
696
697 --group-by
698 type: Array; default: fingerprint
699
700 Which attribute of the events to group by.
701
702 In general, you can group queries into classes based on any
703 attribute of the query, such as "user" or "db", which will by
704 default show you which users and which databases get the most
705 "Query_time". The default attribute, "fingerprint", groups
706 similar, abstracted queries into classes; see below and see also
707 "FINGERPRINTS".
708
709 A report is printed for each "--group-by" value (unless
710 "--no-report" is given). Therefore, "--group-by user,db" means
711 "report on queries with the same user and report on queries with
712 the same db"--it does not mean "report on queries with the same
713 user and db." See also "OUTPUT".
714
715 Every value must have a corresponding value in the same position in
716 "--order-by". However, adding values to "--group-by" will
717 automatically add values to "--order-by", for your convenience.
718
719 There are several magical values that cause some extra data mining
720 to happen before the grouping takes place:
721
722 fingerprint
723 This causes events to be fingerprinted to abstract queries into
724 a canonical form, which is then used to group events together
725 into a class. See "FINGERPRINTS" for more about
726 fingerprinting.
727
728 tables
729 This causes events to be inspected for what appear to be
730 tables, and then aggregated by that. Note that a query that
731 contains two or more tables will be counted as many times as
732 there are tables; so a join against two tables will count the
733 Query_time against both tables.
734
735 distill
736 This is a sort of super-fingerprint that collapses queries down
737 into a suggestion of what they do, such as "INSERT SELECT
738 table1 table2".
739
740 If parsing memcached input ("--type" memcached), there are other
741 attributes which you can group by: key_print (see memcached section
742 in "FINGERPRINTS"), cmd, key, res and val (see memcached section in
743 "ATTRIBUTES").
744
745 --[no]gzip
746 default: yes
747
748 Gzip "--save-results" files; requires IO::Compress::Gzip.
749
750 --help
751 Show help and exit.
752
753 --host
754 short form: -h; type: string
755
756 Connect to host.
757
758 --ignore-attributes
759 type: array; default: arg, cmd, insert_id, ip, port, Thread_id,
760 timestamp, exptime, flags, key, res, val, server_id, offset,
761 end_log_pos, Xid
762
763 Do not aggregate these attributes when auto-detecting "--select".
764
765 If you do not specify "--select" then mk-query-digest auto-detects
766 and aggregates every attribute that it finds in the slow log. Some
767 attributes, however, should not be aggregated. This option allows
768 you to specify a list of attributes to ignore. This only works
769 when no explicit "--select" is given.
770
771 --inherit-attributes
772 type: array; default: db,ts
773
774 If missing, inherit these attributes from the last event that had
775 them.
776
777 This option sets which attributes are inherited or carried forward
778 to events which do not have them. For example, if one event has
779 the db attribute equal to "foo", but the next event doesn't have
780 the db attribute, then it inherits "foo" for its db attribute.
781
782 Inheritance is usually desirable, but in some cases it might
783 confuse things. If a query inherits a database that it doesn't
784 actually use, then this could confuse "--execute".
785
786 --interval
787 type: float; default: .1
788
789 How frequently to poll the processlist, in seconds.
790
791 --iterations
792 type: int; default: 1
793
794 How many times to iterate through the collect-and-report cycle. If
795 0, iterate to infinity. See also "--run-time".
796
797 --limit
798 type: Array; default: 95%:20
799
800 Limit output to the given percentage or count.
801
802 If the argument is an integer, report only the top N worst queries.
803 If the argument is an integer followed by the "%" sign, report that
804 percentage of the worst queries. If the percentage is followed by
805 a colon and another integer, report the top percentage or the
806 number specified by that integer, whichever comes first.
807
808 The value is actually a comma-separated array of values, one for
809 each item in "--group-by". If you don't specify a value for any of
810 those items, the default is the top 95%.
811
812 See also "--outliers".
813
814 --log
815 type: string
816
817 Print all output to this file when daemonized.
818
819 --mirror
820 type: float
821
822 How often to check whether connections should be moved, depending
823 on "read_only". Requires "--processlist" and "--execute".
824
825 This option causes mk-query-digest to check every N seconds whether
826 it is reading from a read-write server and executing against a
827 read-only server, which is a sensible way to set up two servers if
828 you're doing something like master-master replication. The
829 http://code.google.com/p/mysql-master-master/
830 <http://code.google.com/p/mysql-master-master/> master-master
831 toolkit does this. The aim is to keep the passive server ready for
832 failover, which is impossible without putting it under a realistic
833 workload.
834
835 --order-by
836 type: Array; default: Query_time:sum
837
838 Sort events by this attribute and aggregate function.
839
840 This is a comma-separated list of order-by expressions, one for
841 each "--group-by" attribute. The default "Query_time:sum" is used
842 for "--group-by" attributes without explicitly given "--order-by"
843 attributes (that is, if you specify more "--group-by" attributes
844 than corresponding "--order-by" attributes). The syntax is
845 "attribute:aggregate". See "ATTRIBUTES" for valid attributes.
846 Valid aggregates are:
847
848 Aggregate Meaning
849 ========= ============================
850 sum Sum/total attribute value
851 min Minimum attribute value
852 max Maximum attribute value
853 cnt Frequency/count of the query
854
855 For example, the default "Query_time:sum" means that queries in the
856 query analysis report will be ordered (sorted) by their total query
857 execution time ("Exec time"). "Query_time:max" orders the queries
858 by their maximum query execution time, so the query with the single
859 largest "Query_time" will be list first. "cnt" refers more to the
860 frequency of the query as a whole, how often it appears; "Count" is
861 its corresponding line in the query analysis report. So any
862 attribute and "cnt" should yield the same report wherein queries
863 are sorted by the number of times they appear.
864
865 When parsing general logs ("--type" "genlog"), the default
866 "--order-by" becomes "Query_time:cnt". General logs do not report
867 query times so only the "cnt" aggregate makes sense because all
868 query times are zero.
869
870 If you specify an attribute that doesn't exist in the events, then
871 mk-query-digest falls back to the default "Query_time:sum" and
872 prints a notice at the beginning of the report for each query
873 class.
874
875 --outliers
876 type: array; default: Query_time:1:10
877
878 Report outliers by attribute:percentile:count.
879
880 The syntax of this option is a comma-separated list of colon-
881 delimited strings. The first field is the attribute by which an
882 outlier is defined. The second is a number that is compared to the
883 attribute's 95th percentile. The third is optional, and is
884 compared to the attribute's cnt aggregate. Queries that pass this
885 specification are added to the report, regardless of any limits you
886 specified in "--limit".
887
888 For example, to report queries whose 95th percentile Query_time is
889 at least 60 seconds and which are seen at least 5 times, use the
890 following argument:
891
892 --outliers Query_time:60:5
893
894 You can specify an --outliers option for each value in
895 "--group-by".
896
897 --password
898 short form: -p; type: string
899
900 Password to use when connecting.
901
902 --pid
903 type: string
904
905 Create the given PID file when daemonized. The file contains the
906 process ID of the daemonized instance. The PID file is removed
907 when the daemonized instance exits. The program checks for the
908 existence of the PID file when starting; if it exists and the
909 process with the matching PID exists, the program exits.
910
911 --pipeline-profile
912 Print a profile of the pipeline processes.
913
914 --port
915 short form: -P; type: int
916
917 Port number to use for connection.
918
919 --print
920 Print log events to STDOUT in standard slow-query-log format.
921
922 --print-iterations
923 Print the start time for each "--iterations".
924
925 This option causes a line like the following to be printed at the
926 start of each "--iterations" report:
927
928 # Iteration 2 started at 2009-11-24T14:39:48.345780
929
930 This line will print even if "--no-report" is specified. If
931 "--iterations 0" is specified, each iteration number will be 0.
932
933 --processlist
934 type: DSN
935
936 Poll this DSN's processlist for queries, with "--interval" sleep
937 between.
938
939 If the connection fails, mk-query-digest tries to reopen it once
940 per second. See also "--mirror".
941
942 --progress
943 type: array; default: time,30
944
945 Print progress reports to STDERR. The value is a comma-separated
946 list with two parts. The first part can be percentage, time, or
947 iterations; the second part specifies how often an update should be
948 printed, in percentage, seconds, or number of iterations.
949
950 --read-timeout
951 type: time; default: 0
952
953 Wait this long for an event from the input; 0 to wait forever.
954
955 This option sets the maximum time to wait for an event from the
956 input. It applies to all types of input except "--processlist".
957 If an event is not received after the specified time, the script
958 stops reading the input and prints its reports. If "--iterations"
959 is 0 or greater than 1, the next iteration will begin, else the
960 script will exit.
961
962 This option requires the Perl POSIX module.
963
964 --[no]report
965 default: yes
966
967 Print out reports on the aggregate results from "--group-by".
968
969 This is the standard slow-log analysis functionality. See "OUTPUT"
970 for the description of what this does and what the results look
971 like.
972
973 --report-all
974 Include all queries, even if they have already been reviewed.
975
976 --report-format
977 type: Array; default:
978 rusage,date,files,header,profile,query_report,prepared
979
980 Print these sections of the query analysis report.
981
982 SECTION PRINTS
983 ============ ==============================================================
984 rusgae CPU times and memory usage reported by ps
985 date Current local date and time
986 files Input files read/parse
987 header Summary of the entire analysis run
988 profile Compact table of queries for a quick view of the report
989 query_report Detailed information about each unique query
990 prepared Prepared statements
991
992 The sections are printed in the order specified. The rusage, date,
993 files and header sections are grouped together if specified
994 together; other sections are separted by blank lines.
995
996 See "OUTPUT" for more information on the various parts of the query
997 report.
998
999 --report-histogram
1000 type: string; default: Query_time
1001
1002 Chart the distribution of this attribute's values.
1003
1004 The distribution chart is limited to time-based attributes, so
1005 charting "Rows_examined", for example, will produce a useless
1006 chart.
1007
1008 --review
1009 type: DSN
1010
1011 Store a sample of each class of query in this DSN.
1012
1013 The argument specifies a table to store all unique query
1014 fingerprints in. The table must have at least the following
1015 columns. You can add more columns for your own special purposes,
1016 but they won't be used by mk-query-digest. The following CREATE
1017 TABLE definition is also used for "--create-review-table".
1018 MAGIC_create_review:
1019
1020 CREATE TABLE query_review (
1021 checksum BIGINT UNSIGNED NOT NULL PRIMARY KEY,
1022 fingerprint TEXT NOT NULL,
1023 sample TEXT NOT NULL,
1024 first_seen DATETIME,
1025 last_seen DATETIME,
1026 reviewed_by VARCHAR(20),
1027 reviewed_on DATETIME,
1028 comments TEXT
1029 )
1030
1031 The columns are as follows:
1032
1033 COLUMN MEANING
1034 =========== ===============
1035 checksum A 64-bit checksum of the query fingerprint
1036 fingerprint The abstracted version of the query; its primary key
1037 sample The query text of a sample of the class of queries
1038 first_seen The smallest timestamp of this class of queries
1039 last_seen The largest timestamp of this class of queries
1040 reviewed_by Initially NULL; if set, query is skipped thereafter
1041 reviewed_on Initially NULL; not assigned any special meaning
1042 comments Initially NULL; not assigned any special meaning
1043
1044 Note that the "fingerprint" column is the true primary key for a
1045 class of queries. The "checksum" is just a cryptographic hash of
1046 this value, which provides a shorter value that is very likely to
1047 also be unique.
1048
1049 After parsing and aggregating events, your table should contain a
1050 row for each fingerprint. This option depends on "--group-by
1051 fingerprint" (which is the default). It will not work otherwise.
1052
1053 --review-history
1054 type: DSN
1055
1056 The table in which to store historical values for review trend
1057 analysis.
1058
1059 Each time you review queries with "--review", mk-query-digest will
1060 save information into this table so you can see how classes of
1061 queries have changed over time.
1062
1063 This DSN inherits unspecified values from "--review". It should
1064 mention a table in which to store statistics about each class of
1065 queries. mk-query-digest verifies the existence of the table, and
1066 your privileges to insert, delete and update on that table.
1067
1068 mk-query-digest then inspects the columns in the table. The table
1069 must have at least the following columns:
1070
1071 CREATE TABLE query_review_history (
1072 checksum BIGINT UNSIGNED NOT NULL,
1073 sample TEXT NOT NULL
1074 );
1075
1076 Any columns not mentioned above are inspected to see if they follow
1077 a certain naming convention. The column is special if the name
1078 ends with an underscore followed by any of these MAGIC_history_cols
1079 values:
1080
1081 pct|avt|cnt|sum|min|max|pct_95|stddev|median|rank
1082
1083 If the column ends with one of those values, then the prefix is
1084 interpreted as the event attribute to store in that column, and the
1085 suffix is interpreted as the metric to be stored. For example, a
1086 column named Query_time_min will be used to store the minimum
1087 Query_time for the class of events. The presence of this column
1088 will also add Query_time to the "--select" list.
1089
1090 The table should also have a primary key, but that is up to you,
1091 depending on how you want to store the historical data. We suggest
1092 adding ts_min and ts_max columns and making them part of the
1093 primary key along with the checksum. But you could also just add a
1094 ts_min column and make it a DATE type, so you'd get one row per
1095 class of queries per day.
1096
1097 The default table structure follows. The following
1098 MAGIC_create_review_history table definition is used for
1099 "--create-review-history-table":
1100
1101 CREATE TABLE query_review_history (
1102 checksum BIGINT UNSIGNED NOT NULL,
1103 sample TEXT NOT NULL,
1104 ts_min DATETIME,
1105 ts_max DATETIME,
1106 ts_cnt FLOAT,
1107 Query_time_sum FLOAT,
1108 Query_time_min FLOAT,
1109 Query_time_max FLOAT,
1110 Query_time_pct_95 FLOAT,
1111 Query_time_stddev FLOAT,
1112 Query_time_median FLOAT,
1113 Lock_time_sum FLOAT,
1114 Lock_time_min FLOAT,
1115 Lock_time_max FLOAT,
1116 Lock_time_pct_95 FLOAT,
1117 Lock_time_stddev FLOAT,
1118 Lock_time_median FLOAT,
1119 Rows_sent_sum FLOAT,
1120 Rows_sent_min FLOAT,
1121 Rows_sent_max FLOAT,
1122 Rows_sent_pct_95 FLOAT,
1123 Rows_sent_stddev FLOAT,
1124 Rows_sent_median FLOAT,
1125 Rows_examined_sum FLOAT,
1126 Rows_examined_min FLOAT,
1127 Rows_examined_max FLOAT,
1128 Rows_examined_pct_95 FLOAT,
1129 Rows_examined_stddev FLOAT,
1130 Rows_examined_median FLOAT,
1131 PRIMARY KEY(checksum, ts_min, ts_max)
1132 );
1133
1134 Note that we store the count (cnt) for the ts attribute only; it
1135 will be redundant to store this for other attributes.
1136
1137 --run-time
1138 type: time
1139
1140 How long to run before exiting. The default is to run forever (you
1141 can interrupt with CTRL-C). See also "--iterations".
1142
1143 --sample
1144 type: int
1145
1146 Filter out all but the first N occurrences of each query. The
1147 queries are filtered on the first value in "--group-by", so by
1148 default, this will filter by query fingerprint. For example,
1149 "--sample 2" will permit two sample queries for each fingerprint.
1150 Useful in conjunction with "--print" to print out the queries. You
1151 probably want to set "--no-report" to avoid the overhead of
1152 aggregating and reporting if you're just using this to print out
1153 samples of queries. A complete example:
1154
1155 mk-query-digest --sample 2 --no-report --print slow.log
1156
1157 --save-results
1158 type: string
1159
1160 Save results to the specified file.
1161
1162 If "--[no]gzip" is true (by default it is) then .gz is appended to
1163 the file name.
1164
1165 --select
1166 type: Array
1167
1168 Compute aggregate statistics for these attributes.
1169
1170 By default mk-query-digest auto-detects, aggregates and prints
1171 metrics for every query attribute that it finds in the slow query
1172 log. This option specifies a list of only the attributes that you
1173 want. You can specify an alternative attribute with a colon. For
1174 example, "db:Schema" uses db if it's available, and Schema if it's
1175 not.
1176
1177 Previously, mk-query-digest only aggregated these attributes:
1178
1179 Query_time,Lock_time,Rows_sent,Rows_examined,user,db:Schema,ts
1180
1181 Attributes specified in the "--review-history" table will always be
1182 selected even if you do not specify "--select".
1183
1184 See also "--ignore-attributes" and "ATTRIBUTES".
1185
1186 --set-vars
1187 type: string; default: wait_timeout=10000
1188
1189 Set these MySQL variables. Immediately after connecting to MySQL,
1190 this string will be appended to SET and executed.
1191
1192 --shorten
1193 type: int; default: 1024
1194
1195 Shorten long statements in reports.
1196
1197 Shortens long statements, replacing the omitted portion with a
1198 "/*... omitted ...*/" comment. This applies only to the output in
1199 reports, not to information stored for "--review" or other places.
1200 It prevents a large statement from causing difficulty in a report.
1201 The argument is the preferred length of the shortened statement.
1202 Not all statements can be shortened, but very large INSERT and
1203 similar statements often can; and so can IN() lists, although only
1204 the first such list in the statement will be shortened.
1205
1206 If it shortens something beyond recognition, you can find the
1207 original statement in the log, at the offset shown in the report
1208 header (see "OUTPUT").
1209
1210 --show-all
1211 type: Hash
1212
1213 Show all values for these attributes.
1214
1215 By default mk-query-digest only shows as many of an attribute's
1216 value that fit on a single line. This option allows you to specify
1217 attributes for which all values will be shown (line width is
1218 ignored). This only works for attributes with string values like
1219 user, host, db, etc. Multiple attributes can be specified, comma-
1220 separated.
1221
1222 --since
1223 type: string
1224
1225 Parse only queries newer than this value (parse queries since this
1226 date).
1227
1228 This option allows you to ignore queries older than a certain value
1229 and parse only those queries which are more recent than the value.
1230 The value can be several types:
1231
1232 * Simple time value N with optional suffix: N[shmd], where
1233 s=seconds, h=hours, m=minutes, d=days (default s if no suffix
1234 given); this is like saying "since N[shmd] ago"
1235 * Full date with optional hours:minutes:seconds:
1236 YYYY-MM-DD [HH:MM::SS]
1237 * Short, MySQL-style date:
1238 YYMMDD [HH:MM:SS]
1239 * Any time expression evaluated by MySQL:
1240 CURRENT_DATE - INTERVAL 7 DAY
1241
1242 If you give a MySQL time expression, then you must also specifiy a
1243 DSN so that mk-query-digest can connect to MySQL to evalue the
1244 expression. If you specify "--execute", "--explain",
1245 "--processlist", "--review" or "--review-history", then one of
1246 these DSNs will be used automatically. Otherwise, you must specify
1247 an "--aux-dsn" or mk-query-digest will die saying that the value is
1248 invalid.
1249
1250 The MySQL time expression is warpped inside a query like "SELECT
1251 UNIX_TIMESTAMP(<expression>)", so be sure that the expression is
1252 valid inside this query. For example, do not use UNIX_TIMESTAMP()
1253 because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.
1254
1255 Events are assumed to be in chronological--older events at the
1256 beginning of the log and newer events at the end of the log.
1257 "--since" is strict: it ignores all queries until one is found that
1258 is new enough. Therefore, if the query events are not consistently
1259 timestamped, some may be ignored which are actually new enough.
1260
1261 See also "--until".
1262
1263 --socket
1264 short form: -S; type: string
1265
1266 Socket file to use for connection.
1267
1268 --statistics
1269 Print statistics.
1270
1271 --table-access
1272 Print a table access report.
1273
1274 The table access report shows which tables are accessed by all the
1275 queries and if the access is a read or write. The report looks
1276 like:
1277
1278 write `baz`.`tbl`
1279 read `baz`.`new_tbl`
1280 write `baz`.`tbl3`
1281 write `db6`.`tbl6`
1282
1283 If you pipe the output to sort, the read and write tables will be
1284 grouped together and sorted alphabetically:
1285
1286 read `baz`.`new_tbl`
1287 write `baz`.`tbl`
1288 write `baz`.`tbl3`
1289 write `db6`.`tbl6`
1290
1291 --tcpdump-errors
1292 type: string
1293
1294 Write the tcpdump data to this file on error. If mk-query-digest
1295 doesn't parse the stream correctly for some reason, the session's
1296 packets since the last query event will be written out to create a
1297 usable test case. If this happens, mk-query-digest will not raise
1298 an error; it will just discard the session's saved state and permit
1299 the tool to continue working. See "tcpdump" for more information
1300 about parsing tcpdump output.
1301
1302 --timeline
1303 Show a timeline of events.
1304
1305 This option makes mk-query-digest print another kind of report: a
1306 timeline of the events. Each query is still grouped and aggregate
1307 into classes according to "--group-by", but then they are printed
1308 in chronological order. The timeline report prints out the
1309 timestamp, interval, count and value of each classes.
1310
1311 If all you want is the timeline report, then specifiy "--no-report"
1312 to suppress the default query analysis report. Otherwise, the
1313 timeline report will be printed at the end before the response-time
1314 profile (see "--report-format" and "OUTPUT").
1315
1316 For example, this:
1317
1318 mk-query-digest /path/to/log --group-by distill --timeline
1319
1320 will print something like:
1321
1322 # ########################################################
1323 # distill report
1324 # ########################################################
1325 # 2009-07-25 11:19:27 1+00:00:01 2 SELECT foo
1326 # 2009-07-27 11:19:30 00:01 2 SELECT bar
1327 # 2009-07-27 11:30:00 1+06:30:00 2 SELECT foo
1328
1329 --type
1330 type: Array
1331
1332 The type of input to parse (default slowlog). The permitted types
1333 are
1334
1335 binlog
1336 Parse a binary log file.
1337
1338 genlog
1339 Parse a MySQL general log file. General logs lack a lot of
1340 "ATTRIBUTES", notably "Query_time". The default "--order-by"
1341 for general logs changes to "Query_time:cnt".
1342
1343 http
1344 Parse HTTP traffic from tcpdump.
1345
1346 pglog
1347 Parse a log file in PostgreSQL format. The parser will
1348 automatically recognize logs sent to syslog and transparently
1349 parse the syslog format, too. The recommended configuration
1350 for logging in your postgresql.conf is as follows.
1351
1352 The log_destination setting can be set to either syslog or
1353 stderr. Syslog has the added benefit of not interleaving log
1354 messages from several sessions concurrently, which the parser
1355 cannot handle, so this might be better than stderr. CSV-
1356 formatted logs are not supported at this time.
1357
1358 The log_min_duration_statement setting should be set to 0 to
1359 capture all statements with their durations. Alternatively,
1360 the parser will also recognize and handle various combinations
1361 of log_duration and log_statement.
1362
1363 You may enable log_connections and log_disconnections, but this
1364 is optional.
1365
1366 It is highly recommended to set your log_line_prefix to the
1367 following:
1368
1369 log_line_prefix = '%m c=%c,u=%u,D=%d '
1370
1371 This lets the parser find timestamps with milliseconds, session
1372 IDs, users, and databases from the log. If these items are
1373 missing, you'll simply get less information to analyze. For
1374 compatibility with other log analysis tools such as PQA and
1375 pgfouine, various log line prefix formats are supported. The
1376 general format is as follows: a timestamp can be detected and
1377 extracted (the syslog timestamp is NOT parsed), and a
1378 name=value list of properties can also. Although the suggested
1379 format is as shown above, any name=value list will be captured
1380 and interpreted by using the first letter of the 'name' part,
1381 lowercased, to determine the meaning of the item. The
1382 lowercased first letter is interpreted to mean the same thing
1383 as PostgreSQL's built-in %-codes for the log_line_prefix format
1384 string. For example, u means user, so unicorn=fred will be
1385 interpreted as user=fred; d means database, so D=john will be
1386 interpreted as database=john. The pgfouine-suggested
1387 formatting is user=%u and db=%d, so it should Just Work
1388 regardless of which format you choose. The main thing is to
1389 add as much information as possible into the log_line_prefix to
1390 permit richer analysis.
1391
1392 Currently, only English locale messages are supported, so if
1393 your server's locale is set to something else, the log won't be
1394 parsed properly. (Log messages with "duration:" and
1395 "statement:" won't be recognized.)
1396
1397 slowlog
1398 Parse a log file in any variation of MySQL slow-log format.
1399
1400 tcpdump
1401 Inspect network packets and decode the MySQL client protocol,
1402 extracting queries and responses from it.
1403
1404 mk-query-digest does not actually watch the network (i.e. it
1405 does NOT "sniff packets"). Instead, it's just parsing the
1406 output of tcpdump. You are responsible for generating this
1407 output; mk-query-digest does not do it for you. Then you send
1408 this to mk-query-digest as you would any log file: as files on
1409 the command line or to STDIN.
1410
1411 The parser expects the input to be formatted with the following
1412 options: "-x -n -q -tttt". For example, if you want to capture
1413 output from your local machine, you can do something like the
1414 following (the port must come last on FreeBSD):
1415
1416 tcpdump -s 65535 -x -nn -q -tttt -i eth0 -c 1000 port 3306 > mysql.tcp.txt
1417 mk-query-digest --type tcpdump mysql.tcp.txt
1418
1419 The other tcpdump parameters, such as -s, -c, and -i, are up to
1420 you. Just make sure the output looks like this:
1421
1422 2009-04-12 09:50:16.804849 IP 127.0.0.1.42167 > 127.0.0.1.3306: tcp 37
1423 0x0000: 4508 0059 6eb2 4000 4006 cde2 7f00 0001
1424 0x0010: ....
1425
1426 Remember tcpdump has a handy -c option to stop after it
1427 captures some number of packets! That's very useful for
1428 testing your tcpdump command. Note that tcpdump can't capture
1429 traffic on a Unix socket. Read
1430 <http://bugs.mysql.com/bug.php?id=31577> if you're confused
1431 about this.
1432
1433 All MySQL servers running on port 3306 are automatically
1434 detected in the tcpdump output. Therefore, if the tcpdump out
1435 contains packets from multiple servers on port 3306 (for
1436 example, 10.0.0.1:3306, 10.0.0.2:3306, etc.), all
1437 packets/queries from all these servers will be analyzed
1438 together as if they were one server.
1439
1440 If you're analyzing traffic for a MySQL server that is not
1441 running on port 3306, see "--watch-server".
1442
1443 Also note that mk-query-digest may fail to report the database
1444 for queries when parsing tcpdump output. The database is
1445 discovered only in the initial connect events for a new client
1446 or when <USE db> is executed. If the tcpdump output contains
1447 neither of these, then mk-query-digest cannot discover the
1448 database.
1449
1450 The MySQL client protocol implementation is not complete;
1451 server-side prepared statements are not yet supported, and
1452 naturally, SSL-encrypted traffic cannot be inspected and
1453 decoded.
1454
1455 memcached
1456 Similar to tcpdump, but the expected input is memcached packets
1457 instead of MySQL packets. For example:
1458
1459 tcpdump -i eth0 port 11211 -s 65535 -x -nn -q -tttt > memcached.tcp.txt
1460 mk-query-digest --type memcached memcached.tcp.txt
1461
1462 memcached uses port 11211 by default.
1463
1464 --until
1465 type: string
1466
1467 Parse only queries older than this value (parse queries until this
1468 date).
1469
1470 This option allows you to ignore queries newer than a certain value
1471 and parse only those queries which are older than the value. The
1472 value can be one of the same types listed for "--since".
1473
1474 Unlike "--since", "--until" is not strict: all queries are parsed
1475 until one has a timestamp that is equal to or greater than
1476 "--until". Then all subsequent queries are ignored.
1477
1478 --user
1479 short form: -u; type: string
1480
1481 User for login if not current user.
1482
1483 --version
1484 Show version and exit.
1485
1486 --watch-server
1487 type: string
1488
1489 This option tells mk-query-digest which server IP address and port
1490 (like "10.0.0.1:3306") to watch when parsing tcpdump (for "--type"
1491 tcpdump and memcached); all other servers are ignored. If you
1492 don't specify it, mk-query-digest watches all servers by looking
1493 for any IP address using port 3306 or "mysql". If you're watching
1494 a server with a non-standard port, this won't work, so you must
1495 specify the IP address and port to watch.
1496
1497 If you want to watch a mix of servers, some running on standard
1498 port 3306 and some running on non-standard ports, you need to
1499 create separate tcpdump outputs for the non-stanard port servers
1500 and then specify this option for each. At present mk-query-digest
1501 cannot auto-detect servers on port 3306 and also be told to watch a
1502 server on a non-standard port.
1503
1504 --[no]zero-admin
1505 default: yes
1506
1507 Zero out the Rows_XXX properties for administrator command events.
1508
1509 --[no]zero-bool
1510 default: yes
1511
1512 Print 0% boolean values in report.
1513
1515 These DSN options are used to create a DSN. Each option is given like
1516 "option=value". The options are case-sensitive, so P and p are not the
1517 same option. There cannot be whitespace before or after the "=" and if
1518 the value contains whitespace it must be quoted. DSN options are
1519 comma-separated. See the maatkit manpage for full details.
1520
1521 · A
1522
1523 dsn: charset; copy: yes
1524
1525 Default character set.
1526
1527 · D
1528
1529 dsn: database; copy: yes
1530
1531 Database that contains the query review table.
1532
1533 · F
1534
1535 dsn: mysql_read_default_file; copy: yes
1536
1537 Only read default options from the given file
1538
1539 · h
1540
1541 dsn: host; copy: yes
1542
1543 Connect to host.
1544
1545 · p
1546
1547 dsn: password; copy: yes
1548
1549 Password to use when connecting.
1550
1551 · P
1552
1553 dsn: port; copy: yes
1554
1555 Port number to use for connection.
1556
1557 · S
1558
1559 dsn: mysql_socket; copy: yes
1560
1561 Socket file to use for connection.
1562
1563 · t
1564
1565 Table to use as the query review table.
1566
1567 · u
1568
1569 dsn: user; copy: yes
1570
1571 User for login if not current user.
1572
1574 You can download Maatkit from Google Code at
1575 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1576 easily with a command like the following:
1577
1578 wget http://www.maatkit.org/get/toolname
1579 or
1580 wget http://www.maatkit.org/trunk/toolname
1581
1582 Where "toolname" can be replaced with the name (or fragment of a name)
1583 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1584 installation is needed. The first URL gets the latest released version
1585 of the tool, and the second gets the latest trunk code from Subversion.
1586
1588 The environment variable "MKDEBUG" enables verbose debugging output in
1589 all of the Maatkit tools:
1590
1591 MKDEBUG=1 mk-....
1592
1594 You need Perl and some core packages that ought to be installed in any
1595 reasonably new version of Perl.
1596
1598 For list of known bugs see http://www.maatkit.org/bugs/mk-query-digest
1599 <http://www.maatkit.org/bugs/mk-query-digest>.
1600
1601 Please use Google Code Issues and Groups to report bugs or request
1602 support: <http://code.google.com/p/maatkit/>. You can also join
1603 #maatkit on Freenode to discuss Maatkit.
1604
1605 Please include the complete command-line used to reproduce the problem
1606 you are seeing, the version of all MySQL servers involved, the complete
1607 output of the tool when run with "--version", and if possible,
1608 debugging output produced by running with the "MKDEBUG=1" environment
1609 variable.
1610
1612 This program is copyright 2007-2010 Baron Schwartz. Feedback and
1613 improvements are welcome.
1614
1615 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1616 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1617 MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1618
1619 This program is free software; you can redistribute it and/or modify it
1620 under the terms of the GNU General Public License as published by the
1621 Free Software Foundation, version 2; OR the Perl Artistic License. On
1622 UNIX and similar systems, you can issue `man perlgpl' or `man
1623 perlartistic' to read these licenses.
1624
1625 You should have received a copy of the GNU General Public License along
1626 with this program; if not, write to the Free Software Foundation, Inc.,
1627 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1628
1630 Baron Schwartz, Daniel Nichter
1631
1633 This tool is part of Maatkit, a toolkit for power users of MySQL.
1634 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1635 primary code contributors. Both are employed by Percona. Financial
1636 support for Maatkit development is primarily provided by Percona and
1637 its clients.
1638
1640 This manual page documents Ver 0.9.20 Distrib 6839 $Revision: 6831 $.
1641
1642
1643
1644perl v5.12.1 2010-08-01 MK-QUERY-DIGEST(1)