mk-table-usage(1p)

1MK-TABLE-USAGE(1)     User Contributed Perl Documentation    MK-TABLE-USAGE(1)
2
3
4

NAME

6       mk-table-usage - Read queries from a log and analyze how they use
7       tables.
8

SYNOPSIS

10       Usage: mk-table-usage [OPTION...] [FILE...]
11
12       mk-table-usage reads queries from slow query logs and analyzes how they
13       use tables.  If no FILE is specified, STDIN is read.  Table usage for
14       every query is printed to STDOUT.
15

RISKS

17       mk-table-use is very low risk because it only reads and examines
18       queries from a log and executes "EXPLAIN EXTENDED" if the
19       "--explain-extended" option is specified.
20
21       At the time of this release, there are no known bugs that could cause
22       serious harm to users.
23
24       The authoritative source for updated information is always the online
25       issue tracking system.  Issues that affect this tool will be marked as
26       such.  You can see a list of such issues at the following URL:
27       <http://www.maatkit.org/bugs/mk-table-usage>.
28
29       See also "BUGS" for more information on filing bugs and getting help.
30

DESCRIPTION

32       mk-table-usage reads queries from slow query logs and analyzes how they
33       use tables.  Table usage indicates more than just which tables are read
34       from or written to by the query, it also indicates data flow: data in
35       and data out.  Data flow is determined by the contexts in which tables
36       are used by the query.  A single table can be used in several different
37       contexts in the same query.  The reported table usage for each query
38       lists every context for every table.  This CONTEXT-TABLE list tells how
39       and where data flows, i.e. the query's table usage.  The "OUTPUT"
40       section lists the possible contexts and describes how to read a table
41       usage report.
42
43       Since this tool analyzes table usage, it's important that queries use
44       table-qualified columns.  If a query uses only one table, then all
45       columns must be from that table and there's no problem.  But if a query
46       uses multiple tables and the columns are not table-qualified, then that
47       creates a problem that can only be solved by knowing the query's
48       database and specifying "--explain-extended".  If the slow log does not
49       specify the database used by the query, then you can specify a default
50       database with "--database".  There is no other way to know or guess the
51       database, so the query will be skipped.  Secondly, if the database is
52       known, then specifying "--explain-extended" causes mk-table-usage to do
53       "EXPLAIN EXTENDED ..."  "SHOW WARNINGS" to get the fully qualified
54       query as reported by MySQL (i.e. all identifiers are fully database-
55       and/or table-qualified).  For best results, you should specify
56       "--explain-extended" and "--database" if you know that all queries use
57       the same database.
58
59       Each query is identified in the output by either an MD5 hex checksum of
60       the query's fingerprint or the query's value for the specified
61       "--id-attribute".  The query ID is for parsing and storing the table
62       usage reports in a table that is keyed on the query ID.  See "OUTPUT"
63       for more information.
64

OUTPUT

66       The table usage report that is printed for each query looks similar to
67       the following:
68
69         Query_id: 0x1CD27577D202A339.1
70         UPDATE t1
71         SELECT DUAL
72         JOIN t1
73         JOIN t2
74         WHERE t1
75
76         Query_id: 0x1CD27577D202A339.2
77         UPDATE t2
78         SELECT DUAL
79         JOIN t1
80         JOIN t2
81         WHERE t1
82
83       Usage reports are separated by blank lines.  The first line is always
84       the query ID: a unique ID that can be used to parse the output and
85       store the usage reports in a table keyed on this ID.  The query ID has
86       two parts separated by a period: the query ID and the target table
87       number.
88
89       If "--id-attribute" is not specified, then query IDs are automatically
90       created by making an MD5 hex checksum of the query's fingerprint (as
91       shown above, e.g. 0x1CD27577D202A339); otherwise, the query ID is the
92       query's value for the given attribute.
93
94       The target table number starts at 1 and increments by 1 for each table
95       that the query affects.  Only multi-table UPDATE queries can affect
96       multiple tables with a single query, so this number is 1 for all other
97       types of queries.  (Multi-table DELETE queries are not supported.)  The
98       example output above is from this query:
99
100         UPDATE t1 AS a JOIN t2 AS b USING (id)
101         SET a.foo="bar", b.foo="bat"
102         WHERE a.id=1;
103
104       The "SET" clause indicates that two tables are updated: "a" aliased as
105       "t1", and "b" aliased as "t2".  So two usage reports are printed, one
106       for each table, and this is indicated in the output by their common
107       query ID but incrementing target table number.
108
109       After the first line is a variable number of CONTEXT-TABLE lines.
110       Possible contexts are:
111
112       ·   SELECT
113
114           SELECT means that data is taken out of the table for one of two
115           reasons: to be returned to the user as part of a result set, or to
116           be put into another table as part of an INSERT or UPDATE.  In the
117           first case, since only SELECT queries return result sets, a SELECT
118           context is always listed for SELECT queries.  In the second case,
119           data from one table is used to insert or update rows in another
120           table.  For example, the UPDATE query in the example above has the
121           usage:
122
123             SELECT DUAL
124
125           This refers to:
126
127             SET a.foo="bar", b.foo="bat"
128
129           DUAL is used for any values that does not originate in a table, in
130           this case the literal values "bar" and "bat".  If that "SET" clause
131           were "SET a.foo=b.foo" instead, then the complete usage would be:
132
133             Query_id: 0x1CD27577D202A339.1
134             UPDATE t1
135             SELECT t2
136             JOIN t1
137             JOIN t2
138             WHERE t1
139
140           The presence of a SELECT context after another context, such as
141           UPDATE or INSERT, indicates where the UPDATE or INSERT retrieves
142           its data.  The example immediately above reflects an UPDATE query
143           that updates rows in table "t1" with data from table "t2".
144
145       ·   Any other query type
146
147           Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a
148           context.  All these types indicate that the table is written or
149           altered in some way.  If a SELECT context follows one of these
150           types, then data is read from the SELECT table and written to this
151           table.  This happens, for example, with INSERT..SELECT or UPDATE
152           queries that set column values using values from tables instead of
153           constant values.
154
155           These query types are not supported:
156
157             SET
158             LOAD
159             multi-table DELETE
160
161       ·   JOIN
162
163           The JOIN context lists tables that are joined, either with an
164           explicit JOIN in the FROM clause, or implicitly in the WHERE
165           clause, such as "t1.id = t2.id".
166
167       ·   WHERE
168
169           The WHERE context lists tables that are used in the WHERE clause to
170           filter results.  This does not include tables that are implicitly
171           joined in the WHERE clause; those are listed as JOIN contexts.  For
172           example:
173
174             WHERE t1.id > 100 AND t1.id < 200 AND t2.foo IS NOT NULL
175
176           Results in:
177
178             WHERE t1
179             WHERE t2
180
181           Only unique tables are listed; that is why table "t1" is listed
182           only once.
183
184       ·   TLIST
185
186           The TLIST context lists tables that are accessed by the query but
187           do not appear in any other context.  These tables are usually an
188           implicit full cartesian join, so they should be avoided.  For
189           example, the query "SELECT * FROM t1, t2" results in:
190
191             Query_id: 0xBDDEB6EDA41897A8.1
192             SELECT t1
193             SELECT t2
194             TLIST t1
195             TLIST t2
196
197           First of all, there are two SELECT contexts, because "SELECT *"
198           selects rows from all tables; "t1" and "t2" in this case.
199           Secondly, the tables are implicitly joined, but without any kind of
200           join condition, which results in a full cartesian join as indicated
201           by the TLIST context for each.
202

EXIT STATUS

204       mk-table-usage exits 1 on any kind of error, or 0 if no errors.
205

OPTIONS

207       This tool accepts additional command-line arguments.  Refer to the
208       "SYNOPSIS" and usage information for details.
209
210       --ask-pass
211           Prompt for a password when connecting to MySQL.
212
213       --charset
214           short form: -A; type: string
215
216           Default character set.  If the value is utf8, sets Perl's binmode
217           on STDOUT to utf8, passes the mysql_enable_utf8 option to
218           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
219           other value sets binmode on STDOUT without the utf8 layer, and runs
220           SET NAMES after connecting to MySQL.
221
222       --config
223           type: Array
224
225           Read this comma-separated list of config files; if specified, this
226           must be the first option on the command line.
227
228       --constant-data-value
229           type: string; default: DUAL
230
231           Value to print for constant data.  Constant data means all data not
232           from tables (or subqueries since subqueries are not supported).
233           For example, real constant values like strings ("foo") and numbers
234           (42), and data from functions like "NOW()".  For example, in the
235           query "INSERT INTO t (c) VALUES ('a')", the string 'a' is constant
236           data, so the table usage report is:
237
238             INSERT t
239             SELECT DUAL
240
241           The first line indicates that data is inserted into table "t" and
242           the second line indicates that that data comes from some constant
243           value.
244
245       --[no]continue-on-error
246           default: yes
247
248           Continue parsing even if there is an error.
249
250       --create-table-definitions
251           type: array
252
253           Read "CREATE TABLE" definitions from this list of comma-separated
254           files.  If you cannot use "--explain-extended" to fully qualify
255           table and column names, you can save the output of "mysqldump
256           --no-data" to one or more files and specify those files with this
257           option.  The tool will parse all "CREATE TABLE" definitions from
258           the files and use this information to qualify table and column
259           names.  If a column name is used in multiple tables, or table name
260           is used in multiple databases, these duplicates cannot be
261           qualified.
262
263       --daemonize
264           Fork to the background and detach from the shell.  POSIX operating
265           systems only.
266
267       --database
268           short form: -D; type: string
269
270           Default database.
271
272       --defaults-file
273           short form: -F; type: string
274
275           Only read mysql options from the given file.  You must give an
276           absolute pathname.
277
278       --explain-extended
279           type: DSN
280
281           EXPLAIN EXTENDED queries on this host to fully qualify table and
282           column names.
283
284       --filter
285           type: string
286
287           Discard events for which this Perl code doesn't return true.
288
289           This option is a string of Perl code or a file containing Perl code
290           that gets compiled into a subroutine with one argument: $event.
291           This is a hashref.  If the given value is a readable file, then mk-
292           query-digest reads the entire file and uses its contents as the
293           code.  The file should not contain a shebang (#!/usr/bin/perl)
294           line.
295
296           If the code returns true, the chain of callbacks continues;
297           otherwise it ends.  The code is the last statement in the
298           subroutine other than "return $event".  The subroutine template is:
299
300             sub { $event = shift; filter && return $event; }
301
302           Filters given on the command line are wrapped inside parentheses
303           like like "( filter )".  For complex, multi-line filters, you must
304           put the code inside a file so it will not be wrapped inside
305           parentheses.  Either way, the filter must produce syntactically
306           valid code given the template.  For example, an if-else branch
307           given on the command line would not be valid:
308
309             --filter 'if () { } else { }'  # WRONG
310
311           Since it's given on the command line, the if-else branch would be
312           wrapped inside parentheses which is not syntactically valid.  So to
313           accomplish something more complex like this would require putting
314           the code in a file, for example filter.txt:
315
316             my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
317
318           Then specify "--filter filter.txt" to read the code from
319           filter.txt.
320
321           If the filter code won't compile, mk-query-digest will die with an
322           error.  If the filter code does compile, an error may still occur
323           at runtime if the code tries to do something wrong (like pattern
324           match an undefined value).  mk-query-digest does not provide any
325           safeguards so code carefully!
326
327           An example filter that discards everything but SELECT statements:
328
329             --filter '$event->{arg} =~ m/^select/i'
330
331           This is compiled into a subroutine like the following:
332
333             sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
334
335           It is permissible for the code to have side effects (to alter
336           $event).
337
338           You can find an explanation of the structure of $event at
339           <http://code.google.com/p/maatkit/wiki/EventAttributes>.
340
341           Here are more examples of filter code:
342
343           Host/IP matches domain.com
344               --filter '($event->{host} || $event->{ip} || "") =~
345               m/domain.com/'
346
347               Sometimes MySQL logs the host where the IP is expected.
348               Therefore, we check both.
349
350           User matches john
351               --filter '($event->{user} || "") =~ m/john/'
352
353           More than 1 warning
354               --filter '($event->{Warning_count} || 0) > 1'
355
356           Query does full table scan or full join
357               --filter '(($event->{Full_scan} || "") eq "Yes") ||
358               (($event->{Full_join} || "") eq "Yes")'
359
360           Query was not served from query cache
361               --filter '($event->{QC_Hit} || "") eq "No"'
362
363           Query is 1 MB or larger
364               --filter '$event->{bytes} >= 1_048_576'
365
366           Since "--filter" allows you to alter $event, you can use it to do
367           other things, like create new attributes.
368
369       --help
370           Show help and exit.
371
372       --host
373           short form: -h; type: string
374
375           Connect to host.
376
377       --id-attribute
378           type: string
379
380           Identify each event using this attribute.  If not ID attribute is
381           given, then events are identified with the query's checksum: an MD5
382           hex checksum of the query's fingerprint.
383
384       --log
385           type: string
386
387           Print all output to this file when daemonized.
388
389       --password
390           short form: -p; type: string
391
392           Password to use when connecting.
393
394       --pid
395           type: string
396
397           Create the given PID file when running.  The file contains the
398           process ID of the daemonized instance.  The PID file is removed
399           when the daemonized instance exits.  The program checks for the
400           existence of the PID file when starting; if it exists and the
401           process with the matching PID exists, the program exits.
402
403       --port
404           short form: -P; type: int
405
406           Port number to use for connection.
407
408       --progress
409           type: array; default: time,30
410
411           Print progress reports to STDERR.  The value is a comma-separated
412           list with two parts.  The first part can be percentage, time, or
413           iterations; the second part specifies how often an update should be
414           printed, in percentage, seconds, or number of iterations.
415
416       --query
417           type: string
418
419           Analyze only this given query.  If you want to analyze the table
420           usage of one simple query by providing on the command line instead
421           of reading it from a slow log file, then specify that query with
422           this option.  The default "--id-attribute" will be used which is
423           the query's checksum.
424
425       --read-timeout
426           type: time; default: 0
427
428           Wait this long for an event from the input; 0 to wait forever.
429
430           This option sets the maximum time to wait for an event from the
431           input.  If an event is not received after the specified time, the
432           script stops reading the input and prints its reports.
433
434           This option requires the Perl POSIX module.
435
436       --run-time
437           type: time
438
439           How long to run before exiting.  The default is to run forever (you
440           can interrupt with CTRL-C).
441
442       --set-vars
443           type: string; default: wait_timeout=10000
444
445           Set these MySQL variables.  Immediately after connecting to MySQL,
446           this string will be appended to SET and executed.
447
448       --socket
449           short form: -S; type: string
450
451           Socket file to use for connection.
452
453       --user
454           short form: -u; type: string
455
456           User for login if not current user.
457
458       --version
459           Show version and exit.
460

DSN OPTIONS

462       These DSN options are used to create a DSN.  Each option is given like
463       "option=value".  The options are case-sensitive, so P and p are not the
464       same option.  There cannot be whitespace before or after the "=" and if
465       the value contains whitespace it must be quoted.  DSN options are
466       comma-separated.  See the maatkit manpage for full details.
467
468       ·   A
469
470           dsn: charset; copy: yes
471
472           Default character set.
473
474       ·   D
475
476           dsn: database; copy: yes
477
478           Database that contains the query review table.
479
480       ·   F
481
482           dsn: mysql_read_default_file; copy: yes
483
484           Only read default options from the given file
485
486       ·   h
487
488           dsn: host; copy: yes
489
490           Connect to host.
491
492       ·   p
493
494           dsn: password; copy: yes
495
496           Password to use when connecting.
497
498       ·   P
499
500           dsn: port; copy: yes
501
502           Port number to use for connection.
503
504       ·   S
505
506           dsn: mysql_socket; copy: yes
507
508           Socket file to use for connection.
509
510       ·   u
511
512           dsn: user; copy: yes
513
514           User for login if not current user.
515

DOWNLOADING

517       You can download Maatkit from Google Code at
518       <http://code.google.com/p/maatkit/>, or you can get any of the tools
519       easily with a command like the following:
520
521          wget http://www.maatkit.org/get/toolname
522          or
523          wget http://www.maatkit.org/trunk/toolname
524
525       Where "toolname" can be replaced with the name (or fragment of a name)
526       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
527       installation is needed.  The first URL gets the latest released version
528       of the tool, and the second gets the latest trunk code from Subversion.
529

ENVIRONMENT

531       The environment variable "MKDEBUG" enables verbose debugging output in
532       all of the Maatkit tools:
533
534          MKDEBUG=1 mk-....
535

SYSTEM REQUIREMENTS

537       You need Perl and some core packages that ought to be installed in any
538       reasonably new version of Perl.
539

BUGS

541       For a list of known bugs see
542       <http://www.maatkit.org/bugs/mk-table-usage>.
543
544       Please use Google Code Issues and Groups to report bugs or request
545       support: <http://code.google.com/p/maatkit/>.  You can also join
546       #maatkit on Freenode to discuss Maatkit.
547
548       Please include the complete command-line used to reproduce the problem
549       you are seeing, the version of all MySQL servers involved, the complete
550       output of the tool when run with "--version", and if possible,
551       debugging output produced by running with the "MKDEBUG=1" environment
552       variable.
553

COPYRIGHT, LICENSE AND WARRANTY

555       This program is copyright 2009-2011 Percona Inc.  Feedback and
556       improvements are welcome.
557
558       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
559       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
560       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
561
562       This program is free software; you can redistribute it and/or modify it
563       under the terms of the GNU General Public License as published by the
564       Free Software Foundation, version 2; OR the Perl Artistic License.  On
565       UNIX and similar systems, you can issue `man perlgpl' or `man
566       perlartistic' to read these licenses.
567
568       You should have received a copy of the GNU General Public License along
569       with this program; if not, write to the Free Software Foundation, Inc.,
570       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
571

AUTHOR

573       Daniel Nichter
574

VERSION

576       This manual page documents Ver 1.0.1 Distrib 7540 $Revision: 7531 $.
577
578
579
580perl v5.28.1                      2011-06-08                 MK-TABLE-USAGE(1)