mk-query-digest(1p)

1MK-QUERY-DIGEST(1)    User Contributed Perl Documentation   MK-QUERY-DIGEST(1)
2
3
4

NAME

6       mk-query-digest - Parses logs and more.  Analyze, transform, filter,
7       review and report on queries.
8

SYNOPSIS

10       Analyze, aggregate, and report on a slow query log:
11
12        mk-query-digest /path/to/slow.log
13
14       Review a slow log, saving results to the test.query_review table in a
15       MySQL server running on host1.  See "--review" for more on reviewing
16       queries:
17
18        mk-query-digest --review h=host1,D=test,t=query_review /path/to/slow.log
19
20       Filter out everything but SELECT queries, replay the queries against
21       another server, then use the timings from replaying them to analyze
22       their performance:
23
24        mk-query-digest /path/to/slow.log --execute h=another_server \
25          --filter '$event->{fingerprint} =~ m/^select/'
26
27       Print the structure of events so you can construct a complex
28       "--filter":
29
30        mk-query-digest /path/to/slow.log --no-report \
31          --filter 'print Dumper($event)'
32
33       Watch SHOW FULL PROCESSLIST and output a log in slow query log format:
34
35        mk-query-digest --processlist h=host1 --print --no-report
36
37       The default aggregation and analysis is CPU and memory intensive.
38       Disable it if you don't need the default report:
39
40        mk-query-digest <arguments> --no-report
41

RISKS

43       The following section is included to inform users about the potential
44       risks, whether known or unknown, of using this tool.  The two main
45       categories of risks are those created by the nature of the tool (e.g.
46       read-only tools vs. read-write tools) and those created by bugs.
47
48       By default mk-query-digest merely collects and aggregates data from the
49       files specified.  It is designed to be as efficient as possible, but
50       depending on the input you give it, it can use a lot of CPU and memory.
51       Practically speaking, it is safe to run even on production systems, but
52       you might want to monitor it until you are satisfied that the input you
53       give it does not cause undue load.
54
55       Various options will cause mk-query-digest to insert data into tables,
56       execute SQL queries, and so on.  These include the "--execute" option
57       and "--review".
58
59       At the time of this release, we know of no bugs that could cause
60       serious harm to users.
61
62       The authoritative source for updated information is always the online
63       issue tracking system.  Issues that affect this tool will be marked as
64       such.  You can see a list of such issues at the following URL:
65       http://www.maatkit.org/bugs/mk-query-digest
66       <http://www.maatkit.org/bugs/mk-query-digest>.
67
68       See also "BUGS" for more information on filing bugs and getting help.
69

DESCRIPTION

71       This tool was formerly known as mk-log-parser.
72
73       "mk-query-digest" is a framework for doing things with events from a
74       query source such as the slow query log or PROCESSLIST.  By default it
75       acts as a very sophisticated log analysis tool.  You can group and sort
76       queries in many different ways simultaneously and find the most
77       expensive queries, or create a timeline of queries in the log, for
78       example.  It can also do a "query review," which means to save a sample
79       of each type of query into a MySQL table so you can easily see whether
80       you've reviewed and analyzed a query before.  The benefit of this is
81       that you can keep track of changes to your server's queries and avoid
82       repeated work.  You can also save other information with the queries,
83       such as comments, issue numbers in your ticketing system, and so on.
84
85       Note that this is a work in *very* active progress and you should
86       expect incompatible changes in the future.
87

ATTRIBUTES

89       mk-query-digest works on events, which are a collection of key/value
90       pairs called attributes.  You'll recognize most of the attributes right
91       away: Query_time, Lock_time, and so on.  You can just look at a slow
92       log and see them.  However, there are some that don't exist in the slow
93       log, and slow logs may actually include different kinds of attributes
94       (for example, you may have a server with the Percona patches).
95
96       For a full list of attributes, see
97       <http://code.google.com/p/maatkit/wiki/EventAttributes>.
98
99   memcached
100       memcached events have additional attributes related to the memcached
101       protocol: cmd, key, res (result) and val.  Also, boolean attributes are
102       created for the various commands, misses and errors: Memc_CMD where CMD
103       is a memcached command (get, set, delete, etc.), Memc_error and
104       Memc_miss.
105
106       These attributes are no different from slow log attributes, so you can
107       use them with "--[no]report", "--group-by", in a "--filter", etc.
108
109       These attributes and more are documented at
110       <http://code.google.com/p/maatkit/wiki/EventAttributes>.
111

OUTPUT

113       The default output is a query analysis report.  The "--[no]report"
114       option controls whether or not this report is printed.  Sometimes you
115       may wish to parse all the queries but suppress the report, for example
116       when using "--print", "--review" or "--save-results".
117
118       There is one paragraph for each class of query analyzed.  A "class" of
119       queries all have the same value for the "--group-by" attribute which is
120       "fingerprint" by default.  (See "ATTRIBUTES".)  A fingerprint is an
121       abstracted version of the query text with literals removed, whitespace
122       collapsed, and so forth.  The report is formatted so it's easy to paste
123       into emails without wrapping, and all non-query lines begin with a
124       comment, so you can save it to a .sql file and open it in your favorite
125       syntax-highlighting text editor.  There is a response-time profile at
126       the beginning.
127
128       The output described here is controlled by "--report-format".  That
129       option allows you to specify what to print and in what order.  The
130       default output in the default order is described here.
131
132       The report, by default, begins with a paragraph about the entire
133       analysis run The information is very similar to what you'll see for
134       each class of queries in the log, but it doesn't have some information
135       that would be too expensive to keep globally for the analysis.  It also
136       has some statistics about the code's excution itself, such as the CPU
137       and memory usage, the local date and time of the run, and a list of
138       input file read/parsed.
139
140       Following this is the response-time profile over the events.  This is a
141       highly summarized view of the unique events in the detailed query
142       report that follows.  Here is a sample:
143
144         # Rank Query ID           Response time    Calls   R/Call     Item
145         # ==== ================== ================ ======= ========== ====
146         #    1 0x31DA25F95494CA95     0.1494 99.9%       1   0.149435 SHOW
147         #    2 0x3AEAAD0E15D725B5     0.0001  0.1%       2   0.000041 SET
148         #    3 0x813031B8BBC3B329     0.0000  0.0%       1   0.000015 COMMIT
149
150       The columns should be fairly self-explanatory: rank, query ID, response
151       time sum and percentage of total; number of calls and response time per
152       call; and the distilled query (see "distill" for more detail on this).
153
154       Next, the detailed query report is printed.  Each query appears in a
155       paragraph.  Here is a sample, slightly reformatted so 'perldoc' will
156       not wrap lines in a terminal.  The following will all be one paragraph,
157       but we'll break it up for commentary.
158
159        # Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665
160
161       This line identifies the sequential number of the query in the sort
162       order specified by "--order-by".  Then there's the queries per second,
163       and the approximate concurrency for this query (calculated as a
164       function of the timespan and total Query_time).  Next there's a query
165       ID.  This ID is a hex version of the query's checksum in the database,
166       if you're using "--review".  You can select the reviewed query's
167       details from the database with a query like "SELECT .... WHERE
168       checksum=0xFDEA8D2993C9CAF3".
169
170       Finally, in case you want to find a sample of the query in the log
171       file, there's the byte offset where you can look.  (This is not always
172       accurate, due to some silly anomalies in the slow-log format, but it's
173       usually right.)  The position refers to the worst sample, which we'll
174       see more about below.
175
176       Next is the table of metrics about this class of queries.
177
178        #           pct   total    min    max     avg     95%  stddev  median
179        # Count       0       2
180        # Exec time  13   1105s   552s   554s    553s    554s      2s    553s
181        # Lock time   0   216us   99us  117us   108us   117us    12us   108us
182        # Rows sent  20   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M
183        # Rows exam   0   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M
184
185       The first line is column headers for the table.  The percentage is the
186       percent of the total for the whole analysis run, and the total is the
187       actual value of the specified metric.  For example, in this case we can
188       see that the query executed 2 times, which is 13% of the total number
189       of queries in the file.  The min, max and avg columns are self-
190       explanatory.  The 95% column shows the 95th percentile; 95% of the
191       values are less than or equal to this value.  The standard deviation
192       shows you how tightly grouped the values are.  The standard deviation
193       and median are both calculated from the 95th percentile, discarding the
194       extremely large values.
195
196       The stddev, median and 95th percentile statistics are approximate.
197       Exact statistics require keeping every value seen, sorting, and doing
198       some calculations on them.  This uses a lot of memory.  To avoid this,
199       we keep 1000 buckets, each of them 5% bigger than the one before,
200       ranging from .000001 up to a very big number.  When we see a value we
201       increment the bucket into which it falls.  Thus we have fixed memory
202       per class of queries.  The drawback is the imprecision, which typically
203       falls in the 5 percent range.
204
205       Next we have statistics on the users, databases and time range for the
206       query.
207
208        # Users       1   user1
209        # Databases   2     db1(1), db2(1)
210        # Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15
211
212       The users and databases are shown as a count of distinct values,
213       followed by the values.  If there's only one, it's shown alone; if
214       there are many, we show each of the most frequent ones, followed by the
215       number of times it appears.
216
217        # Query_time distribution
218        #   1us
219        #  10us
220        # 100us
221        #   1ms
222        #  10ms
223        # 100ms
224        #    1s
225        #  10s+  #############################################################
226
227       The execution times show a logarithmic chart of time clustering.  Each
228       query goes into one of the "buckets" and is counted up.  The buckets
229       are powers of ten.  The first bucket is all values in the "single
230       microsecond range" -- that is, less than 10us.  The second is "tens of
231       microseconds," which is from 10us up to (but not including) 100us; and
232       so on.  The charted attribute can be changed by specifying
233       "--report-histogram" but is limited to time-based attributes.
234
235        # Tables
236        #    SHOW TABLE STATUS LIKE 'table1'\G
237        #    SHOW CREATE TABLE `table1`\G
238        # EXPLAIN
239        SELECT * FROM table1\G
240
241       This section is a convenience: if you're trying to optimize the queries
242       you see in the slow log, you probably want to examine the table
243       structure and size.  These are copy-and-paste-ready commands to do
244       that.
245
246       Finally, we see a sample of the queries in this class of query.  This
247       is not a random sample.  It is the query that performed the worst,
248       according to the sort order given by "--order-by".  You will normally
249       see a commented "# EXPLAIN" line just before it, so you can copy-paste
250       the query to examine its EXPLAIN plan. But for non-SELECT queries that
251       isn't possible to do, so the tool tries to transform the query into a
252       roughly equivalent SELECT query, and adds that below.
253
254       If you want to find this sample event in the log, use the offset
255       mentioned above, and something like the following:
256
257         tail -c +<offset> /path/to/file | head
258
259       See also "--report-format".
260

QUERY REVIEWS

262       A "query review" is the process of storing all the query fingerprints
263       analyzed.  This has several benefits:
264
265       ·   You can add meta-data to classes of queries, such as marking them
266           for follow-up, adding notes to queries, or marking them with an
267           issue ID for your issue tracking system.
268
269       ·   You can refer to the stored values on subsequent runs so you'll
270           know whether you've seen a query before.  This can help you cut
271           down on duplicated work.
272
273       ·   You can store historical data such as the row count, query times,
274           and generally anything you can see in the report.
275
276       To use this feature, you run mk-query-digest with the "--review"
277       option.  It will store the fingerprints and other information into the
278       table you specify.  Next time you run it with the same option, it will
279       do the following:
280
281       ·   It won't show you queries you've already reviewed.  A query is
282           considered to be already reviewed if you've set a value for the
283           "reviewed_by" column.  (If you want to see queries you've already
284           reviewed, use the "--report-all" option.)
285
286       ·   Queries that you've reviewed, and don't appear in the output, will
287           cause gaps in the query number sequence in the first line of each
288           paragraph.  And the value you've specified for "--limit" will still
289           be honored.  So if you've reviewed all queries in the top 10 and
290           you ask for the top 10, you won't see anything in the output.
291
292       ·   If you want to see the queries you've already reviewed, you can
293           specify "--report-all".  Then you'll see the normal analysis
294           output, but you'll also see the information from the review table,
295           just below the execution time graph.  For example,
296
297             # Review information
298             #      comments: really bad IN() subquery, fix soon!
299             #    first_seen: 2008-12-01 11:48:57
300             #   jira_ticket: 1933
301             #     last_seen: 2008-12-18 11:49:07
302             #      priority: high
303             #   reviewed_by: xaprb
304             #   reviewed_on: 2008-12-18 15:03:11
305
306           You can see how useful this meta-data is -- as you analyze your
307           queries, you get your comments integrated right into the report.
308
309           If you add the "--review-history" option, it will also store
310           information into a separate database table, so you can keep
311           historical trending information on classes of queries.
312

FINGERPRINTS

314       A query fingerprint is the abstracted form of a query, which makes it
315       possible to group similar queries together.  Abstracting a query
316       removes literal values, normalizes whitespace, and so on.  For example,
317       consider these two queries:
318
319         SELECT name, password FROM user WHERE id='12823';
320         select name,   password from user
321            where id=5;
322
323       Both of those queries will fingerprint to
324
325         select name, password from user where id=?
326
327       Once the query's fingerprint is known, we can then talk about a query
328       as though it represents all similar queries.
329
330       What "mk-query-digest" does is analogous to a GROUP BY statement in
331       SQL.  (But note that "multiple columns" doesn't define a multi-column
332       grouping; it defines multiple reports!) If your command-line looks like
333       this,
334
335         mk-query-digest /path/to/slow.log --select Rows_read,Rows_sent \
336             --group-by fingerprint --order-by Query_time:sum --limit 10
337
338       The corresponding pseudo-SQL looks like this:
339
340         SELECT WORST(query BY Query_time), SUM(Query_time), ...
341         FROM /path/to/slow.log
342         GROUP BY FINGERPRINT(query)
343         ORDER BY SUM(Query_time) DESC
344         LIMIT 10
345
346       You can also use the value "distill", which is a kind of super-
347       fingerprint.  See "--group-by" for more.
348
349       When parsing memcached input ("--type" memcached), the fingerprint is
350       an abstracted version of the command and key, with placeholders
351       removed.  For example, "get user_123_preferences" fingerprints to "get
352       user_?_preferences".  There is also a "key_print" which a fingerprinted
353       version of the key.  This example's key_print is "user_?_preferences".
354
355       Query fingerprinting accommodates a great many special cases, which
356       have proven necessary in the real world.  For example, an IN list with
357       5 literals is really equivalent to one with 4 literals, so lists of
358       literals are collapsed to a single one.  If you want to understand more
359       about how and why all of these cases are handled, please review the
360       test cases in the Subversion repository.  If you find something that is
361       not fingerprinted properly, please submit a bug report with a
362       reproducible test case.  Here is a list of transformations during
363       fingerprinting, which might not be exhaustive:
364
365       ·   Group all SELECT queries from mysqldump together, even if they are
366           against different tables.  Ditto for all of mk-table-checksum's
367           checksum queries.
368
369       ·   Shorten multi-value INSERT statements to a single VALUES() list.
370
371       ·   Strip comments.
372
373       ·   Abstract the databases in USE statements, so all USE statements are
374           grouped together.
375
376       ·   Replace all literals, such as quoted strings.  For efficiency, the
377           code that replaces literal numbers is somewhat non-selective, and
378           might replace some things as numbers when they really are not.
379           Hexadecimal literals are also replaced.  NULL is treated as a
380           literal.  Numbers embedded in identifiers are also replaced, so
381           tables named similarly will be fingerprinted to the same values
382           (e.g. users_2009 and users_2010 will fingerprint identically).
383
384       ·   Collapse all whitespace into a single space.
385
386       ·   Lowercase the entire query.
387
388       ·   Replace all literals inside of IN() and VALUES() lists with a
389           single placeholder, regardless of cardinality.
390
391       ·   Collapse multiple identical UNION queries into a single one.
392

OPTIONS

394       DSN values in "--review-history" default to values in "--review" if
395       COPY is yes.
396
397       --ask-pass
398           Prompt for a password when connecting to MySQL.
399
400       --attribute-aliases
401           type: array; default: db|Schema
402
403           List of attribute|alias,etc.
404
405           Certain attributes have multiple names, like db and Schema.  If an
406           event does not have the priamry attribute, mk-query-digest looks
407           for an alias attribute.  If it finds an alias, it creates the
408           primary attribute with the alias attribute's value and removes the
409           alias attribute.
410
411           If the event has the primary attribute, all alias attributes are
412           deleted.
413
414           This helps to simplify event attributes so that, for example, there
415           will not be report lines for both db and Schema.
416
417       --attribute-value-limit
418           type: int; default: 4294967296
419
420           A sanity limit for attribute values.
421
422           This option deals with bugs in slow-logging functionality that
423           causes large values for attributes.  If the attribute's value is
424           bigger than this, the last-seen value for that class of query is
425           used instead.
426
427       --aux-dsn
428           type: DSN
429
430           Auxiliary DSN used for special options.
431
432           The following options may require a DSN even when only parsing a
433           slow log file:
434
435             * L<"--since">
436             * L<"--until">
437
438           See each option for why it might require a DSN.
439
440       --charset
441           short form: -A; type: string
442
443           Default character set.  If the value is utf8, sets Perl's binmode
444           on STDOUT to utf8, passes the mysql_enable_utf8 option to
445           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
446           other value sets binmode on STDOUT without the utf8 layer, and runs
447           SET NAMES after connecting to MySQL.
448
449       --check-attributes-limit
450           type: int; default: 1000
451
452           Stop checking for new attributes after this many events.
453
454           For better speed, mk-query-digest stops checking events for new
455           attributes after a certain number of events.  Any new attributes
456           after this number will be ignored and will not be reported.
457
458           One special case is new attributes for pre-existing query classes
459           (see "--group-by" about query classes).  New attributes will not be
460           added to pre-existing query classes even if the attributes are
461           detected before the "--check-attributes-limit" limit.
462
463       --config
464           type: Array
465
466           Read this comma-separated list of config files; if specified, this
467           must be the first option on the command line.
468
469       --[no]continue-on-error
470           default: yes
471
472           Continue parsing even if there is an error.
473
474       --create-review-history-table
475           Create the "--review-history" table if it does not exist.
476
477           This option causes the table specified by "--review-history" to be
478           created with the default structure shown in the documentation for
479           that option.
480
481       --create-review-table
482           Create the "--review" table if it does not exist.
483
484           This option causes the table specified by "--review" to be created
485           with the default structure shown in the documentation for that
486           option.
487
488       --daemonize
489           Fork to the background and detach from the shell.  POSIX operating
490           systems only.
491
492       --defaults-file
493           short form: -F; type: string
494
495           Only read mysql options from the given file.  You must give an
496           absolute pathname.
497
498       --embedded-attributes
499           type: array
500
501           Two Perl regex patterns to capture pseudo-attributes embedded in
502           queries.
503
504           Embedded attributes might be special attribute-value pairs that
505           you've hidden in comments.  The first regex should match the entire
506           set of attributes (in case there are multiple).  The second regex
507           should match and capture attribute-value pairs from the first
508           regex.
509
510           For example, suppose your query looks like the following:
511
512             SELECT * from users -- file: /login.php, line: 493;
513
514           You might run mk-query-digest with the following option:
515
516             mk-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'
517
518           The first regular expression captures the whole comment:
519
520             " -- file: /login.php, line: 493;"
521
522           The second one splits it into attribute-value pairs and adds them
523           to the event:
524
525              ATTRIBUTE  VALUE
526              =========  ==========
527              file       /login.php
528              line       493
529
530           NOTE: All commas in the regex patterns must be escaped with \
531           otherwise the pattern will break.
532
533       --execute
534           type: DSN
535
536           Execute queries on this DSN.
537
538           Adds a callback into the chain, after filters but before the
539           reports.  Events are executed on this DSN.  If they are successful,
540           the time they take to execute overwrites the event's Query_time
541           attribute.  If unsuccessful, the callback returns false and
542           terminates the chain.
543
544           If the connection fails, mk-query-digest tries to reconnect once
545           per second.
546
547           See also "--mirror" and "--execute-throttle".
548
549       --execute-throttle
550           type: array
551
552           Throttle values for "--execute".
553
554           By default "--execute" runs without any limitations or concerns for
555           the amount of time that it takes to execute the events.  The
556           "--execute-throttle" allows you to limit the amount of time spent
557           doing "--execute" relative to the other processes that handle
558           events.  This works by marking some events with a "Skip_exec"
559           attribute when "--execute" begins to take too much time.
560           "--execute" will not execute an event if this attribute is true.
561           This indirectly decreases the time spent doing "--execute".
562
563           The "--execute-throttle" option takes at least two comma-separated
564           values: max allowed "--execute" time as a percentage and a check
565           interval time.  An optional third value is a percentage step for
566           increasing and decreasing the probability that an event will be
567           marked "Skip_exec" true.  5 (percent) is the default step.
568
569           For example: "--execute-throttle" "70,60,10".  This will limit
570           "--execute" to 70% of total event processing time, checked every
571           minute (60 seconds) and probability stepped up and down by 10%.
572           When "--execute" exceeds 70%, the probability that events will be
573           marked "Skip_exec" true increases by 10%. "--execute" time is
574           checked again after another minute.  If it's still above 70%, then
575           the probability will increase another 10%.  Or, if it's dropped
576           below 70%, then the probability will decrease by 10%.
577
578       --expected-range
579           type: array; default: 5,10
580
581           Explain items when there are more or fewer than expected.
582
583           Defines the number of items expected to be seen in the report given
584           by "--[no]report", as controlled by "--limit" and "--outliers".  If
585           there  are more or fewer items in the report, each one will explain
586           why it was included.
587
588       --explain
589           type: DSN
590
591           Run EXPLAIN for the sample query with this DSN and print results.
592
593           This works only when "--group-by" includes fingerprint.  It causes
594           mk-query-digest to run EXPLAIN and include the output into the
595           report.  For safety, queries that appear to have a subquery that
596           EXPLAIN will execute won't be EXPLAINed.  Those are typically
597           "derived table" queries of the form
598
599             select ... from ( select .... ) der;
600
601       --filter
602           type: string
603
604           Discard events for which this Perl code doesn't return true.
605
606           This option is a string of Perl code or a file containing Perl code
607           that gets compiled into a subroutine with one argument: $event.
608           This is a hashref.  If the given value is a readable file, then mk-
609           query-digest reads the entire file and uses its contents as the
610           code.  The file should not contain a shebang (#!/usr/bin/perl)
611           line.
612
613           If the code returns true, the chain of callbacks continues;
614           otherwise it ends.  The code is the last statement in the
615           subroutine other than "return $event".  The subroutine template is:
616
617             sub { $event = shift; filter && return $event; }
618
619           Filters given on the command line are wrapped inside parentheses
620           like like "( filter )".  For complex, multi-line filters, you must
621           put the code inside a file so it will not be wrapped inside
622           parentheses.  Either way, the filter must produce syntactically
623           valid code given the template.  For example, an if-else branch
624           given on the command line would not be valid:
625
626             --filter 'if () { } else { }'  # WRONG
627
628           Since it's given on the command line, the if-else branch would be
629           wrapped inside parentheses which is not syntactically valid.  So to
630           accomplish something more complex like this would require putting
631           the code in a file, for example filter.txt:
632
633             my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
634
635           Then specify "--filter filter.txt" to read the code from
636           filter.txt.
637
638           If the filter code won't compile, mk-query-digest will die with an
639           error.  If the filter code does compile, an error may still occur
640           at runtime if the code tries to do something wrong (like pattern
641           match an undefined value).  mk-query-digest does not provide any
642           safeguards so code carefully!
643
644           An example filter that discards everything but SELECT statements:
645
646             --filter '$event->{arg} =~ m/^select/i'
647
648           This is compiled into a subroutine like the following:
649
650             sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
651
652           It is permissible for the code to have side effects (to alter
653           $event).
654
655           You can find an explanation of the structure of $event at
656           <http://code.google.com/p/maatkit/wiki/EventAttributes>.
657
658           Here are more examples of filter code:
659
660           Host/IP matches domain.com
661               --filter '($event->{host} || $event->{ip} || "") =~
662               m/domain.com/'
663
664               Sometimes MySQL logs the host where the IP is expected.
665               Therefore, we check both.
666
667           User matches john
668               --filter '($event->{user} || "") =~ m/john/'
669
670           More than 1 warning
671               --filter '($event->{Warning_count} || 0) > 1'
672
673           Query does full table scan or full join
674               --filter '(($event->{Full_scan} || "") eq "Yes") ||
675               (($event->{Full_join} || "") eq "Yes")'
676
677           Query was not served from query cache
678               --filter '($event->{QC_Hit} || "") eq "No"'
679
680           Query is 1 MB or larger
681               --filter '$event->{bytes} >= 1_048_576'
682
683       --fingerprints
684           Add query fingerprints to the standard query analysis report.  This
685           is mostly useful for debugging purposes.
686
687       --[no]for-explain
688           default: yes
689
690           Print extra information to make analysis easy.
691
692           This option adds code snippets to make it easy to run SHOW CREATE
693           TABLE and SHOW TABLE STATUS for the query's tables.  It also
694           rewrites non-SELECT queries into a SELECT that might be helpful for
695           determining the non-SELECT statement's index usage.
696
697       --group-by
698           type: Array; default: fingerprint
699
700           Which attribute of the events to group by.
701
702           In general, you can group queries into classes based on any
703           attribute of the query, such as "user" or "db", which will by
704           default show you which users and which databases get the most
705           "Query_time".  The default attribute, "fingerprint", groups
706           similar, abstracted queries into classes; see below and see also
707           "FINGERPRINTS".
708
709           A report is printed for each "--group-by" value (unless
710           "--no-report" is given).  Therefore, "--group-by user,db" means
711           "report on queries with the same user and report on queries with
712           the same db"--it does not mean "report on queries with the same
713           user and db."  See also "OUTPUT".
714
715           Every value must have a corresponding value in the same position in
716           "--order-by".  However, adding values to "--group-by" will
717           automatically add values to "--order-by", for your convenience.
718
719           There are several magical values that cause some extra data mining
720           to happen before the grouping takes place:
721
722           fingerprint
723               This causes events to be fingerprinted to abstract queries into
724               a canonical form, which is then used to group events together
725               into a class.  See "FINGERPRINTS" for more about
726               fingerprinting.
727
728           tables
729               This causes events to be inspected for what appear to be
730               tables, and then aggregated by that.  Note that a query that
731               contains two or more tables will be counted as many times as
732               there are tables; so a join against two tables will count the
733               Query_time against both tables.
734
735           distill
736               This is a sort of super-fingerprint that collapses queries down
737               into a suggestion of what they do, such as "INSERT SELECT
738               table1 table2".
739
740           If parsing memcached input ("--type" memcached), there are other
741           attributes which you can group by: key_print (see memcached section
742           in "FINGERPRINTS"), cmd, key, res and val (see memcached section in
743           "ATTRIBUTES").
744
745       --[no]gzip
746           default: yes
747
748           Gzip "--save-results" files; requires IO::Compress::Gzip.
749
750       --help
751           Show help and exit.
752
753       --host
754           short form: -h; type: string
755
756           Connect to host.
757
758       --ignore-attributes
759           type: array; default: arg, cmd, insert_id, ip, port, Thread_id,
760           timestamp, exptime, flags, key, res, val, server_id, offset,
761           end_log_pos, Xid
762
763           Do not aggregate these attributes when auto-detecting "--select".
764
765           If you do not specify "--select" then mk-query-digest auto-detects
766           and aggregates every attribute that it finds in the slow log.  Some
767           attributes, however, should not be aggregated.  This option allows
768           you to specify a list of attributes to ignore.  This only works
769           when no explicit "--select" is given.
770
771       --inherit-attributes
772           type: array; default: db,ts
773
774           If missing, inherit these attributes from the last event that had
775           them.
776
777           This option sets which attributes are inherited or carried forward
778           to events which do not have them.  For example, if one event has
779           the db attribute equal to "foo", but the next event doesn't have
780           the db attribute, then it inherits "foo" for its db attribute.
781
782           Inheritance is usually desirable, but in some cases it might
783           confuse things.  If a query inherits a database that it doesn't
784           actually use, then this could confuse "--execute".
785
786       --interval
787           type: float; default: .1
788
789           How frequently to poll the processlist, in seconds.
790
791       --iterations
792           type: int; default: 1
793
794           How many times to iterate through the collect-and-report cycle.  If
795           0, iterate to infinity.  See also "--run-time".
796
797       --limit
798           type: Array; default: 95%:20
799
800           Limit output to the given percentage or count.
801
802           If the argument is an integer, report only the top N worst queries.
803           If the argument is an integer followed by the "%" sign, report that
804           percentage of the worst queries.  If the percentage is followed by
805           a colon and another integer, report the top percentage or the
806           number specified by that integer, whichever comes first.
807
808           The value is actually a comma-separated array of values, one for
809           each item in "--group-by".  If you don't specify a value for any of
810           those items, the default is the top 95%.
811
812           See also "--outliers".
813
814       --log
815           type: string
816
817           Print all output to this file when daemonized.
818
819       --mirror
820           type: float
821
822           How often to check whether connections should be moved, depending
823           on "read_only".  Requires "--processlist" and "--execute".
824
825           This option causes mk-query-digest to check every N seconds whether
826           it is reading from a read-write server and executing against a
827           read-only server, which is a sensible way to set up two servers if
828           you're doing something like master-master replication.  The
829           http://code.google.com/p/mysql-master-master/
830           <http://code.google.com/p/mysql-master-master/> master-master
831           toolkit does this. The aim is to keep the passive server ready for
832           failover, which is impossible without putting it under a realistic
833           workload.
834
835       --order-by
836           type: Array; default: Query_time:sum
837
838           Sort events by this attribute and aggregate function.
839
840           This is a comma-separated list of order-by expressions, one for
841           each "--group-by" attribute.  The default "Query_time:sum" is used
842           for "--group-by" attributes without explicitly given "--order-by"
843           attributes (that is, if you specify more "--group-by" attributes
844           than corresponding "--order-by" attributes).  The syntax is
845           "attribute:aggregate".  See "ATTRIBUTES" for valid attributes.
846           Valid aggregates are:
847
848              Aggregate Meaning
849              ========= ============================
850              sum       Sum/total attribute value
851              min       Minimum attribute value
852              max       Maximum attribute value
853              cnt       Frequency/count of the query
854
855           For example, the default "Query_time:sum" means that queries in the
856           query analysis report will be ordered (sorted) by their total query
857           execution time ("Exec time").  "Query_time:max" orders the queries
858           by their maximum query execution time, so the query with the single
859           largest "Query_time" will be list first.  "cnt" refers more to the
860           frequency of the query as a whole, how often it appears; "Count" is
861           its corresponding line in the query analysis report.  So any
862           attribute and "cnt" should yield the same report wherein queries
863           are sorted by the number of times they appear.
864
865           When parsing general logs ("--type" "genlog"), the default
866           "--order-by" becomes "Query_time:cnt".  General logs do not report
867           query times so only the "cnt" aggregate makes sense because all
868           query times are zero.
869
870           If you specify an attribute that doesn't exist in the events, then
871           mk-query-digest falls back to the default "Query_time:sum" and
872           prints a notice at the beginning of the report for each query
873           class.
874
875       --outliers
876           type: array; default: Query_time:1:10
877
878           Report outliers by attribute:percentile:count.
879
880           The syntax of this option is a comma-separated list of colon-
881           delimited strings.  The first field is the attribute by which an
882           outlier is defined.  The second is a number that is compared to the
883           attribute's 95th percentile.  The third is optional, and is
884           compared to the attribute's cnt aggregate.  Queries that pass this
885           specification are added to the report, regardless of any limits you
886           specified in "--limit".
887
888           For example, to report queries whose 95th percentile Query_time is
889           at least 60 seconds and which are seen at least 5 times, use the
890           following argument:
891
892             --outliers Query_time:60:5
893
894           You can specify an --outliers option for each value in
895           "--group-by".
896
897       --password
898           short form: -p; type: string
899
900           Password to use when connecting.
901
902       --pid
903           type: string
904
905           Create the given PID file when daemonized.  The file contains the
906           process ID of the daemonized instance.  The PID file is removed
907           when the daemonized instance exits.  The program checks for the
908           existence of the PID file when starting; if it exists and the
909           process with the matching PID exists, the program exits.
910
911       --pipeline-profile
912           Print a profile of the pipeline processes.
913
914       --port
915           short form: -P; type: int
916
917           Port number to use for connection.
918
919       --print
920           Print log events to STDOUT in standard slow-query-log format.
921
922       --print-iterations
923           Print the start time for each "--iterations".
924
925           This option causes a line like the following to be printed at the
926           start of each "--iterations" report:
927
928             # Iteration 2 started at 2009-11-24T14:39:48.345780
929
930           This line will print even if "--no-report" is specified.  If
931           "--iterations 0" is specified, each iteration number will be 0.
932
933       --processlist
934           type: DSN
935
936           Poll this DSN's processlist for queries, with "--interval" sleep
937           between.
938
939           If the connection fails, mk-query-digest tries to reopen it once
940           per second. See also "--mirror".
941
942       --progress
943           type: array; default: time,30
944
945           Print progress reports to STDERR.  The value is a comma-separated
946           list with two parts.  The first part can be percentage, time, or
947           iterations; the second part specifies how often an update should be
948           printed, in percentage, seconds, or number of iterations.
949
950       --read-timeout
951           type: time; default: 0
952
953           Wait this long for an event from the input; 0 to wait forever.
954
955           This option sets the maximum time to wait for an event from the
956           input.  It applies to all types of input except "--processlist".
957           If an event is not received after the specified time, the script
958           stops reading the input and prints its reports.  If "--iterations"
959           is 0 or greater than 1, the next iteration will begin, else the
960           script will exit.
961
962           This option requires the Perl POSIX module.
963
964       --[no]report
965           default: yes
966
967           Print out reports on the aggregate results from "--group-by".
968
969           This is the standard slow-log analysis functionality.  See "OUTPUT"
970           for the description of what this does and what the results look
971           like.
972
973       --report-all
974           Include all queries, even if they have already been reviewed.
975
976       --report-format
977           type: Array; default:
978           rusage,date,files,header,profile,query_report,prepared
979
980           Print these sections of the query analysis report.
981
982             SECTION      PRINTS
983             ============ ==============================================================
984             rusgae       CPU times and memory usage reported by ps
985             date         Current local date and time
986             files        Input files read/parse
987             header       Summary of the entire analysis run
988             profile      Compact table of queries for a quick view of the report
989             query_report Detailed information about each unique query
990             prepared     Prepared statements
991
992           The sections are printed in the order specified.  The rusage, date,
993           files and header sections are grouped together if specified
994           together; other sections are separted by blank lines.
995
996           See "OUTPUT" for more information on the various parts of the query
997           report.
998
999       --report-histogram
1000           type: string; default: Query_time
1001
1002           Chart the distribution of this attribute's values.
1003
1004           The distribution chart is limited to time-based attributes, so
1005           charting "Rows_examined", for example, will produce a useless
1006           chart.
1007
1008       --review
1009           type: DSN
1010
1011           Store a sample of each class of query in this DSN.
1012
1013           The argument specifies a table to store all unique query
1014           fingerprints in.  The table must have at least the following
1015           columns.  You can add more columns for your own special purposes,
1016           but they won't be used by mk-query-digest.  The following CREATE
1017           TABLE definition is also used for "--create-review-table".
1018           MAGIC_create_review:
1019
1020             CREATE TABLE query_review (
1021                checksum     BIGINT UNSIGNED NOT NULL PRIMARY KEY,
1022                fingerprint  TEXT NOT NULL,
1023                sample       TEXT NOT NULL,
1024                first_seen   DATETIME,
1025                last_seen    DATETIME,
1026                reviewed_by  VARCHAR(20),
1027                reviewed_on  DATETIME,
1028                comments     TEXT
1029             )
1030
1031           The columns are as follows:
1032
1033             COLUMN       MEANING
1034             ===========  ===============
1035             checksum     A 64-bit checksum of the query fingerprint
1036             fingerprint  The abstracted version of the query; its primary key
1037             sample       The query text of a sample of the class of queries
1038             first_seen   The smallest timestamp of this class of queries
1039             last_seen    The largest timestamp of this class of queries
1040             reviewed_by  Initially NULL; if set, query is skipped thereafter
1041             reviewed_on  Initially NULL; not assigned any special meaning
1042             comments     Initially NULL; not assigned any special meaning
1043
1044           Note that the "fingerprint" column is the true primary key for a
1045           class of queries.  The "checksum" is just a cryptographic hash of
1046           this value, which provides a shorter value that is very likely to
1047           also be unique.
1048
1049           After parsing and aggregating events, your table should contain a
1050           row for each fingerprint.  This option depends on "--group-by
1051           fingerprint" (which is the default).  It will not work otherwise.
1052
1053       --review-history
1054           type: DSN
1055
1056           The table in which to store historical values for review trend
1057           analysis.
1058
1059           Each time you review queries with "--review", mk-query-digest will
1060           save information into this table so you can see how classes of
1061           queries have changed over time.
1062
1063           This DSN inherits unspecified values from "--review".  It should
1064           mention a table in which to store statistics about each class of
1065           queries.  mk-query-digest verifies the existence of the table, and
1066           your privileges to insert, delete and update on that table.
1067
1068           mk-query-digest then inspects the columns in the table.  The table
1069           must have at least the following columns:
1070
1071             CREATE TABLE query_review_history (
1072               checksum     BIGINT UNSIGNED NOT NULL,
1073               sample       TEXT NOT NULL
1074             );
1075
1076           Any columns not mentioned above are inspected to see if they follow
1077           a certain naming convention.  The column is special if the name
1078           ends with an underscore followed by any of these MAGIC_history_cols
1079           values:
1080
1081             pct|avt|cnt|sum|min|max|pct_95|stddev|median|rank
1082
1083           If the column ends with one of those values, then the prefix is
1084           interpreted as the event attribute to store in that column, and the
1085           suffix is interpreted as the metric to be stored.  For example, a
1086           column named Query_time_min will be used to store the minimum
1087           Query_time for the class of events.  The presence of this column
1088           will also add Query_time to the "--select" list.
1089
1090           The table should also have a primary key, but that is up to you,
1091           depending on how you want to store the historical data.  We suggest
1092           adding ts_min and ts_max columns and making them part of the
1093           primary key along with the checksum.  But you could also just add a
1094           ts_min column and make it a DATE type, so you'd get one row per
1095           class of queries per day.
1096
1097           The default table structure follows.  The following
1098           MAGIC_create_review_history table definition is used for
1099           "--create-review-history-table":
1100
1101            CREATE TABLE query_review_history (
1102              checksum             BIGINT UNSIGNED NOT NULL,
1103              sample               TEXT NOT NULL,
1104              ts_min               DATETIME,
1105              ts_max               DATETIME,
1106              ts_cnt               FLOAT,
1107              Query_time_sum       FLOAT,
1108              Query_time_min       FLOAT,
1109              Query_time_max       FLOAT,
1110              Query_time_pct_95    FLOAT,
1111              Query_time_stddev    FLOAT,
1112              Query_time_median    FLOAT,
1113              Lock_time_sum        FLOAT,
1114              Lock_time_min        FLOAT,
1115              Lock_time_max        FLOAT,
1116              Lock_time_pct_95     FLOAT,
1117              Lock_time_stddev     FLOAT,
1118              Lock_time_median     FLOAT,
1119              Rows_sent_sum        FLOAT,
1120              Rows_sent_min        FLOAT,
1121              Rows_sent_max        FLOAT,
1122              Rows_sent_pct_95     FLOAT,
1123              Rows_sent_stddev     FLOAT,
1124              Rows_sent_median     FLOAT,
1125              Rows_examined_sum    FLOAT,
1126              Rows_examined_min    FLOAT,
1127              Rows_examined_max    FLOAT,
1128              Rows_examined_pct_95 FLOAT,
1129              Rows_examined_stddev FLOAT,
1130              Rows_examined_median FLOAT,
1131              PRIMARY KEY(checksum, ts_min, ts_max)
1132            );
1133
1134           Note that we store the count (cnt) for the ts attribute only; it
1135           will be redundant to store this for other attributes.
1136
1137       --run-time
1138           type: time
1139
1140           How long to run before exiting.  The default is to run forever (you
1141           can interrupt with CTRL-C).  See also "--iterations".
1142
1143       --sample
1144           type: int
1145
1146           Filter out all but the first N occurrences of each query.  The
1147           queries are filtered on the first value in "--group-by", so by
1148           default, this will filter by query fingerprint.  For example,
1149           "--sample 2" will permit two sample queries for each fingerprint.
1150           Useful in conjunction with "--print" to print out the queries.  You
1151           probably want to set "--no-report" to avoid the overhead of
1152           aggregating and reporting if you're just using this to print out
1153           samples of queries.  A complete example:
1154
1155             mk-query-digest --sample 2 --no-report --print slow.log
1156
1157       --save-results
1158           type: string
1159
1160           Save results to the specified file.
1161
1162           If "--[no]gzip" is true (by default it is) then .gz is appended to
1163           the file name.
1164
1165       --select
1166           type: Array
1167
1168           Compute aggregate statistics for these attributes.
1169
1170           By default mk-query-digest auto-detects, aggregates and prints
1171           metrics for every query attribute that it finds in the slow query
1172           log.  This option specifies a list of only the attributes that you
1173           want.  You can specify an alternative attribute with a colon.  For
1174           example, "db:Schema" uses db if it's available, and Schema if it's
1175           not.
1176
1177           Previously, mk-query-digest only aggregated these attributes:
1178
1179             Query_time,Lock_time,Rows_sent,Rows_examined,user,db:Schema,ts
1180
1181           Attributes specified in the "--review-history" table will always be
1182           selected even if you do not specify "--select".
1183
1184           See also "--ignore-attributes" and "ATTRIBUTES".
1185
1186       --set-vars
1187           type: string; default: wait_timeout=10000
1188
1189           Set these MySQL variables.  Immediately after connecting to MySQL,
1190           this string will be appended to SET and executed.
1191
1192       --shorten
1193           type: int; default: 1024
1194
1195           Shorten long statements in reports.
1196
1197           Shortens long statements, replacing the omitted portion with a
1198           "/*... omitted ...*/" comment.  This applies only to the output in
1199           reports, not to information stored for "--review" or other places.
1200           It prevents a large statement from causing difficulty in a report.
1201           The argument is the preferred length of the shortened statement.
1202           Not all statements can be shortened, but very large INSERT and
1203           similar statements often can; and so can IN() lists, although only
1204           the first such list in the statement will be shortened.
1205
1206           If it shortens something beyond recognition, you can find the
1207           original statement in the log, at the offset shown in the report
1208           header (see "OUTPUT").
1209
1210       --show-all
1211           type: Hash
1212
1213           Show all values for these attributes.
1214
1215           By default mk-query-digest only shows as many of an attribute's
1216           value that fit on a single line.  This option allows you to specify
1217           attributes for which all values will be shown (line width is
1218           ignored).  This only works for attributes with string values like
1219           user, host, db, etc.  Multiple attributes can be specified, comma-
1220           separated.
1221
1222       --since
1223           type: string
1224
1225           Parse only queries newer than this value (parse queries since this
1226           date).
1227
1228           This option allows you to ignore queries older than a certain value
1229           and parse only those queries which are more recent than the value.
1230           The value can be several types:
1231
1232             * Simple time value N with optional suffix: N[shmd], where
1233               s=seconds, h=hours, m=minutes, d=days (default s if no suffix
1234               given); this is like saying "since N[shmd] ago"
1235             * Full date with optional hours:minutes:seconds:
1236               YYYY-MM-DD [HH:MM::SS]
1237             * Short, MySQL-style date:
1238               YYMMDD [HH:MM:SS]
1239             * Any time expression evaluated by MySQL:
1240               CURRENT_DATE - INTERVAL 7 DAY
1241
1242           If you give a MySQL time expression, then you must also specifiy a
1243           DSN so that mk-query-digest can connect to MySQL to evalue the
1244           expression.  If you specify "--execute", "--explain",
1245           "--processlist", "--review" or "--review-history", then one of
1246           these DSNs will be used automatically.  Otherwise, you must specify
1247           an "--aux-dsn" or mk-query-digest will die saying that the value is
1248           invalid.
1249
1250           The MySQL time expression is warpped inside a query like "SELECT
1251           UNIX_TIMESTAMP(<expression>)", so be sure that the expression is
1252           valid inside this query.  For example, do not use UNIX_TIMESTAMP()
1253           because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.
1254
1255           Events are assumed to be in chronological--older events at the
1256           beginning of the log and newer events at the end of the log.
1257           "--since" is strict: it ignores all queries until one is found that
1258           is new enough.  Therefore, if the query events are not consistently
1259           timestamped, some may be ignored which are actually new enough.
1260
1261           See also "--until".
1262
1263       --socket
1264           short form: -S; type: string
1265
1266           Socket file to use for connection.
1267
1268       --statistics
1269           Print statistics.
1270
1271       --table-access
1272           Print a table access report.
1273
1274           The table access report shows which tables are accessed by all the
1275           queries and if the access is a read or write.  The report looks
1276           like:
1277
1278             write `baz`.`tbl`
1279             read `baz`.`new_tbl`
1280             write `baz`.`tbl3`
1281             write `db6`.`tbl6`
1282
1283           If you pipe the output to sort, the read and write tables will be
1284           grouped together and sorted alphabetically:
1285
1286             read `baz`.`new_tbl`
1287             write `baz`.`tbl`
1288             write `baz`.`tbl3`
1289             write `db6`.`tbl6`
1290
1291       --tcpdump-errors
1292           type: string
1293
1294           Write the tcpdump data to this file on error.  If mk-query-digest
1295           doesn't parse the stream correctly for some reason, the session's
1296           packets since the last query event will be written out to create a
1297           usable test case.  If this happens, mk-query-digest will not raise
1298           an error; it will just discard the session's saved state and permit
1299           the tool to continue working.  See "tcpdump" for more information
1300           about parsing tcpdump output.
1301
1302       --timeline
1303           Show a timeline of events.
1304
1305           This option makes mk-query-digest print another kind of report: a
1306           timeline of the events.  Each query is still grouped and aggregate
1307           into classes according to "--group-by", but then they are printed
1308           in chronological order.  The timeline report prints out the
1309           timestamp, interval, count and value of each classes.
1310
1311           If all you want is the timeline report, then specifiy "--no-report"
1312           to suppress the default query analysis report.  Otherwise, the
1313           timeline report will be printed at the end before the response-time
1314           profile (see "--report-format" and "OUTPUT").
1315
1316           For example, this:
1317
1318             mk-query-digest /path/to/log --group-by distill --timeline
1319
1320           will print something like:
1321
1322             # ########################################################
1323             # distill report
1324             # ########################################################
1325             # 2009-07-25 11:19:27 1+00:00:01   2 SELECT foo
1326             # 2009-07-27 11:19:30      00:01   2 SELECT bar
1327             # 2009-07-27 11:30:00 1+06:30:00   2 SELECT foo
1328
1329       --type
1330           type: Array
1331
1332           The type of input to parse (default slowlog).  The permitted types
1333           are
1334
1335           binlog
1336               Parse a binary log file.
1337
1338           genlog
1339               Parse a MySQL general log file.  General logs lack a lot of
1340               "ATTRIBUTES", notably "Query_time".  The default "--order-by"
1341               for general logs changes to "Query_time:cnt".
1342
1343           http
1344               Parse HTTP traffic from tcpdump.
1345
1346           pglog
1347               Parse a log file in PostgreSQL format.  The parser will
1348               automatically recognize logs sent to syslog and transparently
1349               parse the syslog format, too.  The recommended configuration
1350               for logging in your postgresql.conf is as follows.
1351
1352               The log_destination setting can be set to either syslog or
1353               stderr.  Syslog has the added benefit of not interleaving log
1354               messages from several sessions concurrently, which the parser
1355               cannot handle, so this might be better than stderr.  CSV-
1356               formatted logs are not supported at this time.
1357
1358               The log_min_duration_statement setting should be set to 0 to
1359               capture all statements with their durations.  Alternatively,
1360               the parser will also recognize and handle various combinations
1361               of log_duration and log_statement.
1362
1363               You may enable log_connections and log_disconnections, but this
1364               is optional.
1365
1366               It is highly recommended to set your log_line_prefix to the
1367               following:
1368
1369                 log_line_prefix = '%m c=%c,u=%u,D=%d '
1370
1371               This lets the parser find timestamps with milliseconds, session
1372               IDs, users, and databases from the log.  If these items are
1373               missing, you'll simply get less information to analyze.  For
1374               compatibility with other log analysis tools such as PQA and
1375               pgfouine, various log line prefix formats are supported.  The
1376               general format is as follows: a timestamp can be detected and
1377               extracted (the syslog timestamp is NOT parsed), and a
1378               name=value list of properties can also.  Although the suggested
1379               format is as shown above, any name=value list will be captured
1380               and interpreted by using the first letter of the 'name' part,
1381               lowercased, to determine the meaning of the item.  The
1382               lowercased first letter is interpreted to mean the same thing
1383               as PostgreSQL's built-in %-codes for the log_line_prefix format
1384               string.  For example, u means user, so unicorn=fred will be
1385               interpreted as user=fred; d means database, so D=john will be
1386               interpreted as database=john.  The pgfouine-suggested
1387               formatting is user=%u and db=%d, so it should Just Work
1388               regardless of which format you choose.  The main thing is to
1389               add as much information as possible into the log_line_prefix to
1390               permit richer analysis.
1391
1392               Currently, only English locale messages are supported, so if
1393               your server's locale is set to something else, the log won't be
1394               parsed properly.  (Log messages with "duration:" and
1395               "statement:" won't be recognized.)
1396
1397           slowlog
1398               Parse a log file in any variation of MySQL slow-log format.
1399
1400           tcpdump
1401               Inspect network packets and decode the MySQL client protocol,
1402               extracting queries and responses from it.
1403
1404               mk-query-digest does not actually watch the network (i.e. it
1405               does NOT "sniff packets").  Instead, it's just parsing the
1406               output of tcpdump.  You are responsible for generating this
1407               output; mk-query-digest does not do it for you.  Then you send
1408               this to mk-query-digest as you would any log file: as files on
1409               the command line or to STDIN.
1410
1411               The parser expects the input to be formatted with the following
1412               options: "-x -n -q -tttt".  For example, if you want to capture
1413               output from your local machine, you can do something like the
1414               following (the port must come last on FreeBSD):
1415
1416                 tcpdump -s 65535 -x -nn -q -tttt -i eth0 -c 1000 port 3306 > mysql.tcp.txt
1417                 mk-query-digest --type tcpdump mysql.tcp.txt
1418
1419               The other tcpdump parameters, such as -s, -c, and -i, are up to
1420               you.  Just make sure the output looks like this:
1421
1422                 2009-04-12 09:50:16.804849 IP 127.0.0.1.42167 > 127.0.0.1.3306: tcp 37
1423                     0x0000:  4508 0059 6eb2 4000 4006 cde2 7f00 0001
1424                     0x0010:  ....
1425
1426               Remember tcpdump has a handy -c option to stop after it
1427               captures some number of packets!  That's very useful for
1428               testing your tcpdump command.  Note that tcpdump can't capture
1429               traffic on a Unix socket.  Read
1430               <http://bugs.mysql.com/bug.php?id=31577> if you're confused
1431               about this.
1432
1433               All MySQL servers running on port 3306 are automatically
1434               detected in the tcpdump output.  Therefore, if the tcpdump out
1435               contains packets from multiple servers on port 3306 (for
1436               example, 10.0.0.1:3306, 10.0.0.2:3306, etc.), all
1437               packets/queries from all these servers will be analyzed
1438               together as if they were one server.
1439
1440               If you're analyzing traffic for a MySQL server that is not
1441               running on port 3306, see "--watch-server".
1442
1443               Also note that mk-query-digest may fail to report the database
1444               for queries when parsing tcpdump output.  The database is
1445               discovered only in the initial connect events for a new client
1446               or when <USE db> is executed.  If the tcpdump output contains
1447               neither of these, then mk-query-digest cannot discover the
1448               database.
1449
1450               The MySQL client protocol implementation is not complete;
1451               server-side prepared statements are not yet supported, and
1452               naturally, SSL-encrypted traffic cannot be inspected and
1453               decoded.
1454
1455           memcached
1456               Similar to tcpdump, but the expected input is memcached packets
1457               instead of MySQL packets.  For example:
1458
1459                 tcpdump -i eth0 port 11211 -s 65535 -x -nn -q -tttt > memcached.tcp.txt
1460                 mk-query-digest --type memcached memcached.tcp.txt
1461
1462               memcached uses port 11211 by default.
1463
1464       --until
1465           type: string
1466
1467           Parse only queries older than this value (parse queries until this
1468           date).
1469
1470           This option allows you to ignore queries newer than a certain value
1471           and parse only those queries which are older than the value.  The
1472           value can be one of the same types listed for "--since".
1473
1474           Unlike "--since", "--until" is not strict: all queries are parsed
1475           until one has a timestamp that is equal to or greater than
1476           "--until".  Then all subsequent queries are ignored.
1477
1478       --user
1479           short form: -u; type: string
1480
1481           User for login if not current user.
1482
1483       --version
1484           Show version and exit.
1485
1486       --watch-server
1487           type: string
1488
1489           This option tells mk-query-digest which server IP address and port
1490           (like "10.0.0.1:3306") to watch when parsing tcpdump (for "--type"
1491           tcpdump and memcached); all other servers are ignored.  If you
1492           don't specify it, mk-query-digest watches all servers by looking
1493           for any IP address using port 3306 or "mysql".  If you're watching
1494           a server with a non-standard port, this won't work, so you must
1495           specify the IP address and port to watch.
1496
1497           If you want to watch a mix of servers, some running on standard
1498           port 3306 and some running on non-standard ports, you need to
1499           create separate tcpdump outputs for the non-stanard port servers
1500           and then specify this option for each.  At present mk-query-digest
1501           cannot auto-detect servers on port 3306 and also be told to watch a
1502           server on a non-standard port.
1503
1504       --[no]zero-admin
1505           default: yes
1506
1507           Zero out the Rows_XXX properties for administrator command events.
1508
1509       --[no]zero-bool
1510           default: yes
1511
1512           Print 0% boolean values in report.
1513

DSN OPTIONS

1515       These DSN options are used to create a DSN.  Each option is given like
1516       "option=value".  The options are case-sensitive, so P and p are not the
1517       same option.  There cannot be whitespace before or after the "=" and if
1518       the value contains whitespace it must be quoted.  DSN options are
1519       comma-separated.  See the maatkit manpage for full details.
1520
1521       ·   A
1522
1523           dsn: charset; copy: yes
1524
1525           Default character set.
1526
1527       ·   D
1528
1529           dsn: database; copy: yes
1530
1531           Database that contains the query review table.
1532
1533       ·   F
1534
1535           dsn: mysql_read_default_file; copy: yes
1536
1537           Only read default options from the given file
1538
1539       ·   h
1540
1541           dsn: host; copy: yes
1542
1543           Connect to host.
1544
1545       ·   p
1546
1547           dsn: password; copy: yes
1548
1549           Password to use when connecting.
1550
1551       ·   P
1552
1553           dsn: port; copy: yes
1554
1555           Port number to use for connection.
1556
1557       ·   S
1558
1559           dsn: mysql_socket; copy: yes
1560
1561           Socket file to use for connection.
1562
1563       ·   t
1564
1565           Table to use as the query review table.
1566
1567       ·   u
1568
1569           dsn: user; copy: yes
1570
1571           User for login if not current user.
1572

DOWNLOADING

1574       You can download Maatkit from Google Code at
1575       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1576       easily with a command like the following:
1577
1578          wget http://www.maatkit.org/get/toolname
1579          or
1580          wget http://www.maatkit.org/trunk/toolname
1581
1582       Where "toolname" can be replaced with the name (or fragment of a name)
1583       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1584       installation is needed.  The first URL gets the latest released version
1585       of the tool, and the second gets the latest trunk code from Subversion.
1586

ENVIRONMENT

1588       The environment variable "MKDEBUG" enables verbose debugging output in
1589       all of the Maatkit tools:
1590
1591          MKDEBUG=1 mk-....
1592

SYSTEM REQUIREMENTS

1594       You need Perl and some core packages that ought to be installed in any
1595       reasonably new version of Perl.
1596

BUGS

1598       For list of known bugs see http://www.maatkit.org/bugs/mk-query-digest
1599       <http://www.maatkit.org/bugs/mk-query-digest>.
1600
1601       Please use Google Code Issues and Groups to report bugs or request
1602       support: <http://code.google.com/p/maatkit/>.  You can also join
1603       #maatkit on Freenode to discuss Maatkit.
1604
1605       Please include the complete command-line used to reproduce the problem
1606       you are seeing, the version of all MySQL servers involved, the complete
1607       output of the tool when run with "--version", and if possible,
1608       debugging output produced by running with the "MKDEBUG=1" environment
1609       variable.
1610

COPYRIGHT, LICENSE AND WARRANTY

1612       This program is copyright 2007-2010 Baron Schwartz.  Feedback and
1613       improvements are welcome.
1614
1615       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1616       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1617       MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1618
1619       This program is free software; you can redistribute it and/or modify it
1620       under the terms of the GNU General Public License as published by the
1621       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1622       UNIX and similar systems, you can issue `man perlgpl' or `man
1623       perlartistic' to read these licenses.
1624
1625       You should have received a copy of the GNU General Public License along
1626       with this program; if not, write to the Free Software Foundation, Inc.,
1627       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
1628

AUTHOR

1630       Baron Schwartz, Daniel Nichter
1631

ABOUT MAATKIT

1633       This tool is part of Maatkit, a toolkit for power users of MySQL.
1634       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1635       primary code contributors.  Both are employed by Percona.  Financial
1636       support for Maatkit development is primarily provided by Percona and
1637       its clients.
1638

VERSION

1640       This manual page documents Ver 0.9.20 Distrib 6839 $Revision: 6831 $.
1641
1642
1643
1644perl v5.12.1                      2010-08-01                MK-QUERY-DIGEST(1)