1MK-TABLE-USAGE(1) User Contributed Perl Documentation MK-TABLE-USAGE(1)
2
3
4
6 mk-table-usage - Read queries from a log and analyze how they use
7 tables.
8
10 Usage: mk-table-usage [OPTION...] [FILE...]
11
12 mk-table-usage reads queries from slow query logs and analyzes how they
13 use tables. If no FILE is specified, STDIN is read. Table usage for
14 every query is printed to STDOUT.
15
17 mk-table-use is very low risk because it only reads and examines
18 queries from a log and executes "EXPLAIN EXTENDED" if the
19 "--explain-extended" option is specified.
20
21 At the time of this release, there are no known bugs that could cause
22 serious harm to users.
23
24 The authoritative source for updated information is always the online
25 issue tracking system. Issues that affect this tool will be marked as
26 such. You can see a list of such issues at the following URL:
27 <http://www.maatkit.org/bugs/mk-table-usage>.
28
29 See also "BUGS" for more information on filing bugs and getting help.
30
32 mk-table-usage reads queries from slow query logs and analyzes how they
33 use tables. Table usage indicates more than just which tables are read
34 from or written to by the query, it also indicates data flow: data in
35 and data out. Data flow is determined by the contexts in which tables
36 are used by the query. A single table can be used in several different
37 contexts in the same query. The reported table usage for each query
38 lists every context for every table. This CONTEXT-TABLE list tells how
39 and where data flows, i.e. the query's table usage. The "OUTPUT"
40 section lists the possible contexts and describes how to read a table
41 usage report.
42
43 Since this tool analyzes table usage, it's important that queries use
44 table-qualified columns. If a query uses only one table, then all
45 columns must be from that table and there's no problem. But if a query
46 uses multiple tables and the columns are not table-qualified, then that
47 creates a problem that can only be solved by knowing the query's
48 database and specifying "--explain-extended". If the slow log does not
49 specify the database used by the query, then you can specify a default
50 database with "--database". There is no other way to know or guess the
51 database, so the query will be skipped. Secondly, if the database is
52 known, then specifying "--explain-extended" causes mk-table-usage to do
53 "EXPLAIN EXTENDED ..." "SHOW WARNINGS" to get the fully qualified
54 query as reported by MySQL (i.e. all identifiers are fully database-
55 and/or table-qualified). For best results, you should specify
56 "--explain-extended" and "--database" if you know that all queries use
57 the same database.
58
59 Each query is identified in the output by either an MD5 hex checksum of
60 the query's fingerprint or the query's value for the specified
61 "--id-attribute". The query ID is for parsing and storing the table
62 usage reports in a table that is keyed on the query ID. See "OUTPUT"
63 for more information.
64
66 The table usage report that is printed for each query looks similar to
67 the following:
68
69 Query_id: 0x1CD27577D202A339.1
70 UPDATE t1
71 SELECT DUAL
72 JOIN t1
73 JOIN t2
74 WHERE t1
75
76 Query_id: 0x1CD27577D202A339.2
77 UPDATE t2
78 SELECT DUAL
79 JOIN t1
80 JOIN t2
81 WHERE t1
82
83 Usage reports are separated by blank lines. The first line is always
84 the query ID: a unique ID that can be used to parse the output and
85 store the usage reports in a table keyed on this ID. The query ID has
86 two parts separated by a period: the query ID and the target table
87 number.
88
89 If "--id-attribute" is not specified, then query IDs are automatically
90 created by making an MD5 hex checksum of the query's fingerprint (as
91 shown above, e.g. 0x1CD27577D202A339); otherwise, the query ID is the
92 query's value for the given attribute.
93
94 The target table number starts at 1 and increments by 1 for each table
95 that the query affects. Only multi-table UPDATE queries can affect
96 multiple tables with a single query, so this number is 1 for all other
97 types of queries. (Multi-table DELETE queries are not supported.) The
98 example output above is from this query:
99
100 UPDATE t1 AS a JOIN t2 AS b USING (id)
101 SET a.foo="bar", b.foo="bat"
102 WHERE a.id=1;
103
104 The "SET" clause indicates that two tables are updated: "a" aliased as
105 "t1", and "b" aliased as "t2". So two usage reports are printed, one
106 for each table, and this is indicated in the output by their common
107 query ID but incrementing target table number.
108
109 After the first line is a variable number of CONTEXT-TABLE lines.
110 Possible contexts are:
111
112 • SELECT
113
114 SELECT means that data is taken out of the table for one of two
115 reasons: to be returned to the user as part of a result set, or to
116 be put into another table as part of an INSERT or UPDATE. In the
117 first case, since only SELECT queries return result sets, a SELECT
118 context is always listed for SELECT queries. In the second case,
119 data from one table is used to insert or update rows in another
120 table. For example, the UPDATE query in the example above has the
121 usage:
122
123 SELECT DUAL
124
125 This refers to:
126
127 SET a.foo="bar", b.foo="bat"
128
129 DUAL is used for any values that does not originate in a table, in
130 this case the literal values "bar" and "bat". If that "SET" clause
131 were "SET a.foo=b.foo" instead, then the complete usage would be:
132
133 Query_id: 0x1CD27577D202A339.1
134 UPDATE t1
135 SELECT t2
136 JOIN t1
137 JOIN t2
138 WHERE t1
139
140 The presence of a SELECT context after another context, such as
141 UPDATE or INSERT, indicates where the UPDATE or INSERT retrieves
142 its data. The example immediately above reflects an UPDATE query
143 that updates rows in table "t1" with data from table "t2".
144
145 • Any other query type
146
147 Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a
148 context. All these types indicate that the table is written or
149 altered in some way. If a SELECT context follows one of these
150 types, then data is read from the SELECT table and written to this
151 table. This happens, for example, with INSERT..SELECT or UPDATE
152 queries that set column values using values from tables instead of
153 constant values.
154
155 These query types are not supported:
156
157 SET
158 LOAD
159 multi-table DELETE
160
161 • JOIN
162
163 The JOIN context lists tables that are joined, either with an
164 explicit JOIN in the FROM clause, or implicitly in the WHERE
165 clause, such as "t1.id = t2.id".
166
167 • WHERE
168
169 The WHERE context lists tables that are used in the WHERE clause to
170 filter results. This does not include tables that are implicitly
171 joined in the WHERE clause; those are listed as JOIN contexts. For
172 example:
173
174 WHERE t1.id > 100 AND t1.id < 200 AND t2.foo IS NOT NULL
175
176 Results in:
177
178 WHERE t1
179 WHERE t2
180
181 Only unique tables are listed; that is why table "t1" is listed
182 only once.
183
184 • TLIST
185
186 The TLIST context lists tables that are accessed by the query but
187 do not appear in any other context. These tables are usually an
188 implicit full cartesian join, so they should be avoided. For
189 example, the query "SELECT * FROM t1, t2" results in:
190
191 Query_id: 0xBDDEB6EDA41897A8.1
192 SELECT t1
193 SELECT t2
194 TLIST t1
195 TLIST t2
196
197 First of all, there are two SELECT contexts, because "SELECT *"
198 selects rows from all tables; "t1" and "t2" in this case.
199 Secondly, the tables are implicitly joined, but without any kind of
200 join condition, which results in a full cartesian join as indicated
201 by the TLIST context for each.
202
204 mk-table-usage exits 1 on any kind of error, or 0 if no errors.
205
207 This tool accepts additional command-line arguments. Refer to the
208 "SYNOPSIS" and usage information for details.
209
210 --ask-pass
211 Prompt for a password when connecting to MySQL.
212
213 --charset
214 short form: -A; type: string
215
216 Default character set. If the value is utf8, sets Perl's binmode
217 on STDOUT to utf8, passes the mysql_enable_utf8 option to
218 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
219 other value sets binmode on STDOUT without the utf8 layer, and runs
220 SET NAMES after connecting to MySQL.
221
222 --config
223 type: Array
224
225 Read this comma-separated list of config files; if specified, this
226 must be the first option on the command line.
227
228 --constant-data-value
229 type: string; default: DUAL
230
231 Value to print for constant data. Constant data means all data not
232 from tables (or subqueries since subqueries are not supported).
233 For example, real constant values like strings ("foo") and numbers
234 (42), and data from functions like "NOW()". For example, in the
235 query "INSERT INTO t (c) VALUES ('a')", the string 'a' is constant
236 data, so the table usage report is:
237
238 INSERT t
239 SELECT DUAL
240
241 The first line indicates that data is inserted into table "t" and
242 the second line indicates that that data comes from some constant
243 value.
244
245 --[no]continue-on-error
246 default: yes
247
248 Continue parsing even if there is an error.
249
250 --create-table-definitions
251 type: array
252
253 Read "CREATE TABLE" definitions from this list of comma-separated
254 files. If you cannot use "--explain-extended" to fully qualify
255 table and column names, you can save the output of "mysqldump
256 --no-data" to one or more files and specify those files with this
257 option. The tool will parse all "CREATE TABLE" definitions from
258 the files and use this information to qualify table and column
259 names. If a column name is used in multiple tables, or table name
260 is used in multiple databases, these duplicates cannot be
261 qualified.
262
263 --daemonize
264 Fork to the background and detach from the shell. POSIX operating
265 systems only.
266
267 --database
268 short form: -D; type: string
269
270 Default database.
271
272 --defaults-file
273 short form: -F; type: string
274
275 Only read mysql options from the given file. You must give an
276 absolute pathname.
277
278 --explain-extended
279 type: DSN
280
281 EXPLAIN EXTENDED queries on this host to fully qualify table and
282 column names.
283
284 --filter
285 type: string
286
287 Discard events for which this Perl code doesn't return true.
288
289 This option is a string of Perl code or a file containing Perl code
290 that gets compiled into a subroutine with one argument: $event.
291 This is a hashref. If the given value is a readable file, then mk-
292 query-digest reads the entire file and uses its contents as the
293 code. The file should not contain a shebang (#!/usr/bin/perl)
294 line.
295
296 If the code returns true, the chain of callbacks continues;
297 otherwise it ends. The code is the last statement in the
298 subroutine other than "return $event". The subroutine template is:
299
300 sub { $event = shift; filter && return $event; }
301
302 Filters given on the command line are wrapped inside parentheses
303 like like "( filter )". For complex, multi-line filters, you must
304 put the code inside a file so it will not be wrapped inside
305 parentheses. Either way, the filter must produce syntactically
306 valid code given the template. For example, an if-else branch
307 given on the command line would not be valid:
308
309 --filter 'if () { } else { }' # WRONG
310
311 Since it's given on the command line, the if-else branch would be
312 wrapped inside parentheses which is not syntactically valid. So to
313 accomplish something more complex like this would require putting
314 the code in a file, for example filter.txt:
315
316 my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
317
318 Then specify "--filter filter.txt" to read the code from
319 filter.txt.
320
321 If the filter code won't compile, mk-query-digest will die with an
322 error. If the filter code does compile, an error may still occur
323 at runtime if the code tries to do something wrong (like pattern
324 match an undefined value). mk-query-digest does not provide any
325 safeguards so code carefully!
326
327 An example filter that discards everything but SELECT statements:
328
329 --filter '$event->{arg} =~ m/^select/i'
330
331 This is compiled into a subroutine like the following:
332
333 sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
334
335 It is permissible for the code to have side effects (to alter
336 $event).
337
338 You can find an explanation of the structure of $event at
339 <http://code.google.com/p/maatkit/wiki/EventAttributes>.
340
341 Here are more examples of filter code:
342
343 Host/IP matches domain.com
344 --filter '($event->{host} || $event->{ip} || "") =~
345 m/domain.com/'
346
347 Sometimes MySQL logs the host where the IP is expected.
348 Therefore, we check both.
349
350 User matches john
351 --filter '($event->{user} || "") =~ m/john/'
352
353 More than 1 warning
354 --filter '($event->{Warning_count} || 0) > 1'
355
356 Query does full table scan or full join
357 --filter '(($event->{Full_scan} || "") eq "Yes") ||
358 (($event->{Full_join} || "") eq "Yes")'
359
360 Query was not served from query cache
361 --filter '($event->{QC_Hit} || "") eq "No"'
362
363 Query is 1 MB or larger
364 --filter '$event->{bytes} >= 1_048_576'
365
366 Since "--filter" allows you to alter $event, you can use it to do
367 other things, like create new attributes.
368
369 --help
370 Show help and exit.
371
372 --host
373 short form: -h; type: string
374
375 Connect to host.
376
377 --id-attribute
378 type: string
379
380 Identify each event using this attribute. If not ID attribute is
381 given, then events are identified with the query's checksum: an MD5
382 hex checksum of the query's fingerprint.
383
384 --log
385 type: string
386
387 Print all output to this file when daemonized.
388
389 --password
390 short form: -p; type: string
391
392 Password to use when connecting.
393
394 --pid
395 type: string
396
397 Create the given PID file when running. The file contains the
398 process ID of the daemonized instance. The PID file is removed
399 when the daemonized instance exits. The program checks for the
400 existence of the PID file when starting; if it exists and the
401 process with the matching PID exists, the program exits.
402
403 --port
404 short form: -P; type: int
405
406 Port number to use for connection.
407
408 --progress
409 type: array; default: time,30
410
411 Print progress reports to STDERR. The value is a comma-separated
412 list with two parts. The first part can be percentage, time, or
413 iterations; the second part specifies how often an update should be
414 printed, in percentage, seconds, or number of iterations.
415
416 --query
417 type: string
418
419 Analyze only this given query. If you want to analyze the table
420 usage of one simple query by providing on the command line instead
421 of reading it from a slow log file, then specify that query with
422 this option. The default "--id-attribute" will be used which is
423 the query's checksum.
424
425 --read-timeout
426 type: time; default: 0
427
428 Wait this long for an event from the input; 0 to wait forever.
429
430 This option sets the maximum time to wait for an event from the
431 input. If an event is not received after the specified time, the
432 script stops reading the input and prints its reports.
433
434 This option requires the Perl POSIX module.
435
436 --run-time
437 type: time
438
439 How long to run before exiting. The default is to run forever (you
440 can interrupt with CTRL-C).
441
442 --set-vars
443 type: string; default: wait_timeout=10000
444
445 Set these MySQL variables. Immediately after connecting to MySQL,
446 this string will be appended to SET and executed.
447
448 --socket
449 short form: -S; type: string
450
451 Socket file to use for connection.
452
453 --user
454 short form: -u; type: string
455
456 User for login if not current user.
457
458 --version
459 Show version and exit.
460
462 These DSN options are used to create a DSN. Each option is given like
463 "option=value". The options are case-sensitive, so P and p are not the
464 same option. There cannot be whitespace before or after the "=" and if
465 the value contains whitespace it must be quoted. DSN options are
466 comma-separated. See the maatkit manpage for full details.
467
468 • A
469
470 dsn: charset; copy: yes
471
472 Default character set.
473
474 • D
475
476 dsn: database; copy: yes
477
478 Database that contains the query review table.
479
480 • F
481
482 dsn: mysql_read_default_file; copy: yes
483
484 Only read default options from the given file
485
486 • h
487
488 dsn: host; copy: yes
489
490 Connect to host.
491
492 • p
493
494 dsn: password; copy: yes
495
496 Password to use when connecting.
497
498 • P
499
500 dsn: port; copy: yes
501
502 Port number to use for connection.
503
504 • S
505
506 dsn: mysql_socket; copy: yes
507
508 Socket file to use for connection.
509
510 • u
511
512 dsn: user; copy: yes
513
514 User for login if not current user.
515
517 You can download Maatkit from Google Code at
518 <http://code.google.com/p/maatkit/>, or you can get any of the tools
519 easily with a command like the following:
520
521 wget http://www.maatkit.org/get/toolname
522 or
523 wget http://www.maatkit.org/trunk/toolname
524
525 Where "toolname" can be replaced with the name (or fragment of a name)
526 of any of the Maatkit tools. Once downloaded, they're ready to run; no
527 installation is needed. The first URL gets the latest released version
528 of the tool, and the second gets the latest trunk code from Subversion.
529
531 The environment variable "MKDEBUG" enables verbose debugging output in
532 all of the Maatkit tools:
533
534 MKDEBUG=1 mk-....
535
537 You need Perl and some core packages that ought to be installed in any
538 reasonably new version of Perl.
539
541 For a list of known bugs see
542 <http://www.maatkit.org/bugs/mk-table-usage>.
543
544 Please use Google Code Issues and Groups to report bugs or request
545 support: <http://code.google.com/p/maatkit/>. You can also join
546 #maatkit on Freenode to discuss Maatkit.
547
548 Please include the complete command-line used to reproduce the problem
549 you are seeing, the version of all MySQL servers involved, the complete
550 output of the tool when run with "--version", and if possible,
551 debugging output produced by running with the "MKDEBUG=1" environment
552 variable.
553
555 This program is copyright 2009-2011 Percona Inc. Feedback and
556 improvements are welcome.
557
558 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
559 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
560 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
561
562 This program is free software; you can redistribute it and/or modify it
563 under the terms of the GNU General Public License as published by the
564 Free Software Foundation, version 2; OR the Perl Artistic License. On
565 UNIX and similar systems, you can issue `man perlgpl' or `man
566 perlartistic' to read these licenses.
567
568 You should have received a copy of the GNU General Public License along
569 with this program; if not, write to the Free Software Foundation, Inc.,
570 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
571
573 Daniel Nichter
574
576 This manual page documents Ver 1.0.1 Distrib 7540 $Revision: 7531 $.
577
578
579
580perl v5.34.0 2022-01-20 MK-TABLE-USAGE(1)