1MK-ARCHIVER(1) User Contributed Perl Documentation MK-ARCHIVER(1)
2
3
4
6 mk-archiver - Archive rows from a MySQL table into another table or a
7 file.
8
10 Archive all rows from oltp_server to olap_server and to a file:
11
12 mk-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
13 --file '/var/log/archive/%Y-%m-%d-%D.%t' \
14 --where "1=1" --limit 1000 --commit-each
15
16 Purge (delete) orphan rows from child table:
17
18 mk-archiver --source h=host,D=db,t=child \
19 --purge --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'
20
22 The following section is included to inform users about the potential
23 risks, whether known or unknown, of using this tool. The two main
24 categories of risks are those created by the nature of the tool (e.g.
25 read-only tools vs. read-write tools) and those created by bugs.
26
27 mk-achiver is a read-write tool. It deletes data from the source by
28 default, so you should test your archiving jobs with the "--dry-run"
29 option if you're not sure about them. It is designed to have as little
30 impact on production systems as possible, but tuning with "--limit",
31 "--txn-size" and similar options might be a good idea too.
32
33 If you write or use "--plugin" modules, you should ensure they are good
34 quality and well-tested.
35
36 At the time of this release, we know of no bugs that could cause
37 serious harm to users.
38
39 The authoritative source for updated information is always the online
40 issue tracking system. Issues that affect this tool will be marked as
41 such. You can see a list of such issues at the following URL:
42 http://www.maatkit.org/bugs/mk-archiver
43 <http://www.maatkit.org/bugs/mk-archiver>.
44
45 See also "BUGS" for more information on filing bugs and getting help.
46
48 mk-archiver is the tool I use to archive tables as described in
49 http://tinyurl.com/mysql-archiving <http://tinyurl.com/mysql-
50 archiving>. The goal is a low-impact, forward-only job to nibble old
51 data out of the table without impacting OLTP queries much. You can
52 insert the data into another table, which need not be on the same
53 server. You can also write it to a file in a format suitable for LOAD
54 DATA INFILE. Or you can do neither, in which case it's just an
55 incremental DELETE.
56
57 mk-archiver is extensible via a plugin mechanism. You can inject your
58 own code to add advanced archiving logic that could be useful for
59 archiving dependent data, applying complex business rules, or building
60 a data warehouse during the archiving process.
61
62 You need to choose values carefully for some options. The most
63 important are "--limit", "--retries", and "--txn-size".
64
65 The strategy is to find the first row(s), then scan some index forward-
66 only to find more rows efficiently. Each subsequent query should not
67 scan the entire table; it should seek into the index, then scan until
68 it finds more archivable rows. Specifying the index with the 'i' part
69 of the "--source" argument can be crucial for this; use "--dry-run" to
70 examine the generated queries and be sure to EXPLAIN them to see if
71 they are efficient (most of the time you probably want to scan the
72 PRIMARY key, which is the default). Even better, profile mk-archiver
73 with mk-query-profiler and make sure it is not scanning the whole table
74 every query.
75
76 You can disable the seek-then-scan optimizations partially or wholly
77 with "--no-ascend" and "--ascend-first". Sometimes this may be more
78 efficient for multi-column keys. Be aware that mk-archiver is built to
79 start at the beginning of the index it chooses and scan it forward-
80 only. This might result in long table scans if you're trying to nibble
81 from the end of the table by an index other than the one it prefers.
82 See "--source" and read the documentation on the "i" part if this
83 applies to you.
84
86 If you specify "--progress", the output is a header row, plus status
87 output at intervals. Each row in the status output lists the current
88 date and time, how many seconds mk-archiver has been running, and how
89 many rows it has archived.
90
91 If you specify "--statistics", "mk-archiver" outputs timing and other
92 information to help you identify which part of your archiving process
93 takes the most time.
94
96 mk-archiver tries to catch signals and exit gracefully; for example, if
97 you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the
98 signal, print a message about the signal, and exit fairly normally. It
99 will not execute "--analyze" or "--optimize", because these may take a
100 long time to finish. It will run all other code normally, including
101 calling after_finish() on any plugins (see "EXTENDING").
102
103 In other words, a signal, if caught, will break out of the main
104 archiving loop and skip optimize/analyze.
105
107 Specify at least one of "--dest", "--file", or "--purge".
108
109 "--ignore" and "--replace" are mutually exclusive.
110
111 "--txn-size" and "--commit-each" are mutually exclusive.
112
113 "--low-priority-insert" and "--delayed-insert" are mutually exclusive.
114
115 "--share-lock" and "--for-update" are mutually exclusive.
116
117 "--analyze" and "--optimize" are mutually exclusive.
118
119 "--no-ascend" and "--no-delete" are mutually exclusive.
120
121 DSN values in "--dest" default to values from "--source" if COPY is
122 yes.
123
124 --analyze
125 type: string
126
127 Run ANALYZE TABLE afterwards on "--source" and/or "--dest".
128
129 Runs ANALYZE TABLE after finishing. The argument is an arbitrary
130 string. If it contains the letter 's', the source will be
131 analyzed. If it contains 'd', the destination will be analyzed.
132 You can specify either or both. For example, the following will
133 analyze both:
134
135 --analyze=ds
136
137 See http://dev.mysql.com/doc/en/analyze-table.html
138 <http://dev.mysql.com/doc/en/analyze-table.html> for details on
139 ANALYZE TABLE.
140
141 --ascend-first
142 Ascend only first column of index.
143
144 If you do want to use the ascending index optimization (see
145 "--no-ascend"), but do not want to incur the overhead of ascending
146 a large multi-column index, you can use this option to tell mk-
147 archiver to ascend only the leftmost column of the index. This can
148 provide a significant performance boost over not ascending the
149 index at all, while avoiding the cost of ascending the whole index.
150
151 See "EXTENDING" for a discussion of how this interacts with
152 plugins.
153
154 --ask-pass
155 Prompt for a password when connecting to MySQL.
156
157 --buffer
158 Buffer output to "--file" and flush at commit.
159
160 Disables autoflushing to "--file" and flushes "--file" to disk only
161 when a transaction commits. This typically means the file is
162 block-flushed by the operating system, so there may be some
163 implicit flushes to disk between commits as well. The default is
164 to flush "--file" to disk after every row.
165
166 The danger is that a crash might cause lost data.
167
168 The performance increase I have seen from using "--buffer" is
169 around 5 to 15 percent. Your mileage may vary.
170
171 --bulk-delete
172 Delete each chunk with a single statement (implies
173 "--commit-each").
174
175 Delete each chunk of rows in bulk with a single "DELETE" statement.
176 The statement deletes every row between the first and last row of
177 the chunk, inclusive. It implies "--commit-each", since it would
178 be a bad idea to "INSERT" rows one at a time and commit them before
179 the bulk "DELETE".
180
181 The normal method is to delete every row by its primary key. Bulk
182 deletes might be a lot faster. They also might not be faster if
183 you have a complex "WHERE" clause.
184
185 This option completely defers all "DELETE" processing until the
186 chunk of rows is finished. If you have a plugin on the source, its
187 "before_delete" method will not be called. Instead, its
188 "before_bulk_delete" method is called later.
189
190 WARNING: if you have a plugin on the source that sometimes doesn't
191 return true from "is_archivable()", you should use this option only
192 if you understand what it does. If the plugin instructs
193 "mk-archiver" not to archive a row, it will still be deleted by the
194 bulk delete!
195
196 --bulk-insert
197 Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete"
198 "--commit-each").
199
200 Insert each chunk of rows with "LOAD DATA LOCAL INFILE". This may
201 be much faster than inserting a row at a time with "INSERT"
202 statements. It is implemented by creating a temporary file for
203 each chunk of rows, and writing the rows to this file instead of
204 inserting them. When the chunk is finished, it uploads the rows.
205
206 To protect the safety of your data, this option forces bulk deletes
207 to be used. It would be unsafe to delete each row as it is found,
208 before inserting the rows into the destination first. Forcing bulk
209 deletes guarantees that the deletion waits until the insertion is
210 successful.
211
212 The "--low-priority-insert", "--replace", and "--ignore" options
213 work with this option, but "--delayed-insert" does not.
214
215 --charset
216 short form: -A; type: string
217
218 Default character set. If the value is utf8, sets Perl's binmode
219 on STDOUT to utf8, passes the mysql_enable_utf8 option to
220 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
221 other value sets binmode on STDOUT without the utf8 layer, and runs
222 SET NAMES after connecting to MySQL.
223
224 --[no]check-columns
225 default: yes
226
227 Ensure "--source" and "--dest" have same columns.
228
229 Enabled by default; causes mk-archiver to check that the source and
230 destination tables have the same columns. It does not check column
231 order, data type, etc. It just checks that all columns in the
232 source exist in the destination and vice versa. If there are any
233 differences, mk-archiver will exit with an error.
234
235 To disable this check, specify --no-check-columns.
236
237 --check-interval
238 type: time; default: 1s
239
240 How often to check for slave lag if "--check-slave-lag" is given.
241
242 --check-slave-lag
243 type: string
244
245 Pause archiving until the specified DSN's slave lag is less than
246 "--max-lag".
247
248 --columns
249 short form: -c; type: array
250
251 Comma-separated list of columns to archive.
252
253 Specify a comma-separated list of columns to fetch, write to the
254 file, and insert into the destination table. If specified, mk-
255 archiver ignores other columns unless it needs to add them to the
256 "SELECT" statement for ascending an index or deleting rows. It
257 fetches and uses these extra columns internally, but does not write
258 them to the file or to the destination table. It does pass them to
259 plugins.
260
261 See also "--primary-key-only".
262
263 --commit-each
264 Commit each set of fetched and archived rows (disables
265 "--txn-size").
266
267 Commits transactions and flushes "--file" after each set of rows
268 has been archived, before fetching the next set of rows, and before
269 sleeping if "--sleep" is specified. Disables "--txn-size"; use
270 "--limit" to control the transaction size with "--commit-each".
271
272 This option is useful as a shortcut to make "--limit" and
273 "--txn-size" the same value, but more importantly it avoids
274 transactions being held open while searching for more rows. For
275 example, imagine you are archiving old rows from the beginning of a
276 very large table, with "--limit" 1000 and "--txn-size" 1000. After
277 some period of finding and archiving 1000 rows at a time, mk-
278 archiver finds the last 999 rows and archives them, then executes
279 the next SELECT to find more rows. This scans the rest of the
280 table, but never finds any more rows. It has held open a
281 transaction for a very long time, only to determine it is finished
282 anyway. You can use "--commit-each" to avoid this.
283
284 --config
285 type: Array
286
287 Read this comma-separated list of config files; if specified, this
288 must be the first option on the command line.
289
290 --delayed-insert
291 Add the DELAYED modifier to INSERT statements.
292
293 Adds the DELAYED modifier to INSERT or REPLACE statements. See
294 <http://dev.mysql.com/doc/en/insert.html> for details.
295
296 --dest
297 type: DSN
298
299 DSN specifying the table to archive to.
300
301 This item specifies a table into which mk-archiver will insert rows
302 archived from "--source". It uses the same key=val argument format
303 as "--source". Most missing values default to the same values as
304 "--source", so you don't have to repeat options that are the same
305 in "--source" and "--dest". Use the "--help" option to see which
306 values are copied from "--source".
307
308 WARNING: Using a default options file (F) DSN option that defines a
309 socket for "--source" causes mk-archiver to connect to "--dest"
310 using that socket unless another socket for "--dest" is specified.
311 This means that mk-archiver may incorrectly connect to "--source"
312 when it connects to "--dest". For example:
313
314 --source F=host1.cnf,D=db,t=tbl --dest h=host2
315
316 When mk-archiver connects to "--dest", host2, it will connect via
317 the "--source", host1, socket defined in host1.cnf.
318
319 --dry-run
320 Print queries and exit without doing anything.
321
322 Causes mk-archiver to exit after printing the filename and SQL
323 statements it will use.
324
325 --file
326 type: string
327
328 File to archive to, with DATE_FORMAT()-like formatting.
329
330 Filename to write archived rows to. A subset of MySQL's
331 DATE_FORMAT() formatting codes are allowed in the filename, as
332 follows:
333
334 %d Day of the month, numeric (01..31)
335 %H Hour (00..23)
336 %i Minutes, numeric (00..59)
337 %m Month, numeric (01..12)
338 %s Seconds (00..59)
339 %Y Year, numeric, four digits
340
341 You can use the following extra format codes too:
342
343 %D Database name
344 %t Table name
345
346 Example:
347
348 --file '/var/log/archive/%Y-%m-%d-%D.%t'
349
350 The file's contents are in the same format used by SELECT INTO
351 OUTFILE, as documented in the MySQL manual: rows terminated by
352 newlines, columns terminated by tabs, NULL characters are
353 represented by \N, and special characters are escaped by \. This
354 lets you reload a file with LOAD DATA INFILE's default settings.
355
356 If you want a column header at the top of the file, see "--header".
357 The file is auto-flushed by default; see "--buffer".
358
359 --for-update
360 Adds the FOR UPDATE modifier to SELECT statements.
361
362 For details, see
363 http://dev.mysql.com/doc/en/innodb-locking-reads.html
364 <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
365
366 --header
367 Print column header at top of "--file".
368
369 Writes column names as the first line in the file given by
370 "--file". If the file exists, does not write headers; this keeps
371 the file loadable with LOAD DATA INFILE in case you append more
372 output to it.
373
374 --help
375 Show help and exit.
376
377 --high-priority-select
378 Adds the HIGH_PRIORITY modifier to SELECT statements.
379
380 See <http://dev.mysql.com/doc/en/select.html> for details.
381
382 --host
383 short form: -h; type: string
384
385 Connect to host.
386
387 --ignore
388 Use IGNORE for INSERT statements.
389
390 Causes INSERTs into "--dest" to be INSERT IGNORE.
391
392 --limit
393 type: int; default: 1
394
395 Number of rows to fetch and archive per statement.
396
397 Limits the number of rows returned by the SELECT statements that
398 retrieve rows to archive. Default is one row. It may be more
399 efficient to increase the limit, but be careful if you are
400 archiving sparsely, skipping over many rows; this can potentially
401 cause more contention with other queries, depending on the storage
402 engine, transaction isolation level, and options such as
403 "--for-update".
404
405 --local
406 Do not write OPTIMIZE or ANALYZE queries to binlog.
407
408 Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE
409 queries. See "--analyze" for details.
410
411 --low-priority-delete
412 Adds the LOW_PRIORITY modifier to DELETE statements.
413
414 See <http://dev.mysql.com/doc/en/delete.html> for details.
415
416 --low-priority-insert
417 Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
418
419 See <http://dev.mysql.com/doc/en/insert.html> for details.
420
421 --max-lag
422 type: time; default: 1s
423
424 Pause archiving if the slave given by "--check-slave-lag" lags.
425
426 This option causes mk-archiver to look at the slave every time it's
427 about to fetch another row. If the slave's lag is greater than the
428 option's value, or if the slave isn't running (so its lag is NULL),
429 mk-table-checksum sleeps for "--check-interval" seconds and then
430 looks at the lag again. It repeats until the slave is caught up,
431 then proceeds to fetch and archive the row.
432
433 This option may eliminate the need for "--sleep" or "--sleep-coef".
434
435 --no-ascend
436 Do not use ascending index optimization.
437
438 The default ascending-index optimization causes "mk-archiver" to
439 optimize repeated "SELECT" queries so they seek into the index
440 where the previous query ended, then scan along it, rather than
441 scanning from the beginning of the table every time. This is
442 enabled by default because it is generally a good strategy for
443 repeated accesses.
444
445 Large, multiple-column indexes may cause the WHERE clause to be
446 complex enough that this could actually be less efficient.
447 Consider for example a four-column PRIMARY KEY on (a, b, c, d).
448 The WHERE clause to start where the last query ended is as follows:
449
450 WHERE (a > ?)
451 OR (a = ? AND b > ?)
452 OR (a = ? AND b = ? AND c > ?)
453 OR (a = ? AND b = ? AND c = ? AND d >= ?)
454
455 Populating the placeholders with values uses memory and CPU, adds
456 network traffic and parsing overhead, and may make the query harder
457 for MySQL to optimize. A four-column key isn't a big deal, but a
458 ten-column key in which every column allows "NULL" might be.
459
460 Ascending the index might not be necessary if you know you are
461 simply removing rows from the beginning of the table in chunks, but
462 not leaving any holes, so starting at the beginning of the table is
463 actually the most efficient thing to do.
464
465 See also "--ascend-first". See "EXTENDING" for a discussion of how
466 this interacts with plugins.
467
468 --no-delete
469 Do not delete archived rows.
470
471 Causes "mk-archiver" not to delete rows after processing them.
472 This disallows "--no-ascend", because enabling them both would
473 cause an infinite loop.
474
475 If there is a plugin on the source DSN, its "before_delete" method
476 is called anyway, even though "mk-archiver" will not execute the
477 delete. See "EXTENDING" for more on plugins.
478
479 --optimize
480 type: string
481
482 Run OPTIMIZE TABLE afterwards on "--source" and/or "--dest".
483
484 Runs OPTIMIZE TABLE after finishing. See "--analyze" for the
485 option syntax and http://dev.mysql.com/doc/en/optimize-table.html
486 <http://dev.mysql.com/doc/en/optimize-table.html> for details on
487 OPTIMIZE TABLE.
488
489 --password
490 short form: -p; type: string
491
492 Password to use when connecting.
493
494 --pid
495 type: string
496
497 Create the given PID file when daemonized. The file contains the
498 process ID of the daemonized instance. The PID file is removed
499 when the daemonized instance exits. The program checks for the
500 existence of the PID file when starting; if it exists and the
501 process with the matching PID exists, the program exits.
502
503 --plugin
504 type: string
505
506 Perl module name to use as a generic plugin.
507
508 Specify the Perl module name of a general-purpose plugin. It is
509 currently used only for statistics (see "--statistics") and must
510 have "new()" and a "statistics()" method.
511
512 The "new( src =" $src, dst => $dst, opts => $o )> method gets the
513 source and destination DSNs, and their database connections, just
514 like the connection-specific plugins do. It also gets an
515 OptionParser object ($o) for accessing command-line options
516 (example: "$o-"get('purge');>).
517
518 The "statistics(\%stats, $time)" method gets a hashref of the
519 statistics collected by the archiving job, and the time the whole
520 job started.
521
522 --port
523 short form: -P; type: int
524
525 Port number to use for connection.
526
527 --primary-key-only
528 Primary key columns only.
529
530 A shortcut for specifying "--columns" with the primary key columns.
531 This is an efficiency if you just want to purge rows; it avoids
532 fetching the entire row, when only the primary key columns are
533 needed for "DELETE" statements. See also "--purge".
534
535 --progress
536 type: int
537
538 Print progress information every X rows.
539
540 Prints current time, elapsed time, and rows archived every X rows.
541
542 --purge
543 Purge instead of archiving; allows omitting "--file" and "--dest".
544
545 Allows archiving without a "--file" or "--dest" argument, which is
546 effectively a purge since the rows are just deleted.
547
548 If you just want to purge rows, consider specifying the table's
549 primary key columns with "--primary-key-only". This will prevent
550 fetching all columns from the server for no reason.
551
552 --quick-delete
553 Adds the QUICK modifier to DELETE statements.
554
555 See <http://dev.mysql.com/doc/en/delete.html> for details. As
556 stated in the documentation, in some cases it may be faster to use
557 DELETE QUICK followed by OPTIMIZE TABLE. You can use "--optimize"
558 for this.
559
560 --quiet
561 Do not print any output, such as for "--statistics".
562
563 Suppresses normal output, including the output of "--statistics",
564 but doesn't suppress the output from "--why-quit".
565
566 --replace
567 Causes INSERTs into "--dest" to be written as REPLACE.
568
569 --retries
570 type: int; default: 1
571
572 Number of retries per timeout or deadlock.
573
574 Specifies the number of times mk-archiver should retry when there
575 is an InnoDB lock wait timeout or deadlock. When retries are
576 exhausted, mk-archiver will exit with an error.
577
578 Consider carefully what you want to happen when you are archiving
579 between a mixture of transactional and non-transactional storage
580 engines. The INSERT to "--dest" and DELETE from "--source" are on
581 separate connections, so they do not actually participate in the
582 same transaction even if they're on the same server. However, mk-
583 archiver implements simple distributed transactions in code, so
584 commits and rollbacks should happen as desired across the two
585 connections.
586
587 At this time I have not written any code to handle errors with
588 transactional storage engines other than InnoDB. Request that
589 feature if you need it.
590
591 --run-time
592 type: time
593
594 Time to run before exiting.
595
596 Optional suffix s=seconds, m=minutes, h=hours, d=days; if no
597 suffix, s is used.
598
599 --[no]safe-auto-increment
600 default: yes
601
602 Do not archive row with max AUTO_INCREMENT.
603
604 Adds an extra WHERE clause to prevent mk-archiver from removing the
605 newest row when ascending a single-column AUTO_INCREMENT key. This
606 guards against re-using AUTO_INCREMENT values if the server
607 restarts, and is enabled by default.
608
609 The extra WHERE clause contains the maximum value of the auto-
610 increment column as of the beginning of the archive or purge job.
611 If new rows are inserted while mk-archiver is running, it will not
612 see them.
613
614 --sentinel
615 type: string; default: /tmp/mk-archiver-sentinel
616
617 Exit if this file exists.
618
619 The presence of the file specified by "--sentinel" will cause mk-
620 archiver to stop archiving and exit. The default is
621 /tmp/mk-archiver-sentinel. You might find this handy to stop cron
622 jobs gracefully if necessary. See also "--stop".
623
624 --set-vars
625 type: string; default: wait_timeout=10000
626
627 Set these MySQL variables.
628
629 Specify any variables you want to be set immediately after
630 connecting to MySQL. These will be included in a "SET" command.
631
632 --share-lock
633 Adds the LOCK IN SHARE MODE modifier to SELECT statements.
634
635 See http://dev.mysql.com/doc/en/innodb-locking-reads.html
636 <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
637
638 --skip-foreign-key-checks
639 Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
640
641 --sleep
642 type: int
643
644 Sleep time between fetches.
645
646 Specifies how long to sleep between SELECT statements. Default is
647 not to sleep at all. Transactions are NOT committed, and the
648 "--file" file is NOT flushed, before sleeping. See "--txn-size" to
649 control that.
650
651 If "--commit-each" is specified, committing and flushing happens
652 before sleeping.
653
654 --sleep-coef
655 type: float
656
657 Calculate "--sleep" as a multiple of the last SELECT time.
658
659 If this option is specified, mk-archiver will sleep for the query
660 time of the last SELECT multiplied by the specified coefficient.
661 This option is ignored if "--sleep" is specified.
662
663 This is a slightly more sophisticated way to throttle the SELECTs:
664 sleep a varying amount of time between each SELECT, depending on
665 how long the SELECTs are taking.
666
667 --socket
668 short form: -S; type: string
669
670 Socket file to use for connection.
671
672 --source
673 type: DSN
674
675 DSN specifying the table to archive from (required).
676
677 This argument is specially formatted as a key=value,key=value
678 string. Keys are a single letter. Most options control how mk-
679 archiver connects to MySQL:
680
681 KEY MEANING
682 === =======
683 h Connect to host
684 P Port number to use for connection
685 S Socket file to use for connection
686 u User for login if not current user
687 p Password to use when connecting
688 F Only read default options from the given file
689
690 The following options select a table to archive:
691
692 KEY MEANING
693 === =======
694 D Database to archive
695 t Table to archive
696 i Index to use
697
698 The following options specify pluggable actions, which an external
699 Perl module can provide:
700
701 KEY MEANING
702 === =======
703 m Package name of an external Perl module (see EXTENDING).
704
705 The following actions set other options:
706
707 KEY MEANING
708 === =======
709 a Database to set as the connection's default with USE
710 b Disable binary logging with SET SQL_LOG_BIN=0
711
712 The only required part is the table; other parts may be read from
713 various places in the environment (such as options files). Here is
714 an example:
715
716 --source h=my_server,D=my_database,t=my_tbl
717
718 The 'i' part deserves special mention. This tells mk-archiver
719 which index it should scan to archive. This appears in a FORCE
720 INDEX or USE INDEX hint in the SELECT statements used to fetch
721 archivable rows. If you don't specify anything, mk-archiver will
722 auto-discover a good index, preferring a "PRIMARY KEY" if one
723 exists. In my experience this usually works well, so most of the
724 time you can probably just omit the 'i' part.
725
726 The index is used to optimize repeated accesses to the table; mk-
727 archiver remembers the last row it retrieves from each SELECT
728 statement, and uses it to construct a WHERE clause, using the
729 columns in the specified index, that should allow MySQL to start
730 the next SELECT where the last one ended, rather than potentially
731 scanning from the beginning of the table with each successive
732 SELECT. If you are using external plugins, please see "EXTENDING"
733 for a discussion of how they interact with ascending indexes.
734
735 The 'a' and 'b' options allow you to control how statements flow
736 through the binary log. If you specify the 'b' option, binary
737 logging will be disabled on the specified connection. If you
738 specify the 'a' option, the connection will "USE" the specified
739 database, which you can use to prevent slaves from executing the
740 binary log events with "--replicate-ignore-db" options. These two
741 options can be used as different methods to achieve the same goal:
742 archive data off the master, but leave it on the slave. For
743 example, you can run a purge job on the master and prevent it from
744 happening on the slave using your method of choice.
745
746 WARNING: Using a default options file (F) DSN option that defines a
747 socket for "--source" causes mk-archiver to connect to "--dest"
748 using that socket unless another socket for "--dest" is specified.
749 This means that mk-archiver may incorrectly connect to "--source"
750 when it connects to "--dest". For example:
751
752 --source F=host1.cnf,D=db,t=tbl --dest h=host2
753
754 When mk-archiver connects to "--dest", host2, it will connect via
755 the "--source", host1, socket defined in host1.cnf.
756
757 --statistics
758 Collect and print timing statistics.
759
760 Causes mk-archiver to collect timing statistics about what it does.
761 These statistics are available to the plugin specified by
762 "--plugin"
763
764 Unless you specify "--quiet", "mk-archiver" prints the statistics
765 when it exits. The statistics look like this:
766
767 Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
768 Source: D=db,t=table
769 SELECT 4
770 INSERT 4
771 DELETE 4
772 Action Count Time Pct
773 commit 10 0.1079 88.27
774 select 5 0.0047 3.87
775 deleting 4 0.0028 2.29
776 inserting 4 0.0028 2.28
777 other 0 0.0040 3.29
778
779 The first two (or three) lines show times and the source and
780 destination tables. The next three lines show how many rows were
781 fetched, inserted, and deleted.
782
783 The remaining lines show counts and timing. The columns are the
784 action, the total number of times that action was timed, the total
785 time it took, and the percent of the program's total runtime. The
786 rows are sorted in order of descending total time. The last row is
787 the rest of the time not explicitly attributed to anything.
788 Actions will vary depending on command-line options.
789
790 If "--why-quit" is given, its behavior is changed slightly. This
791 option causes it to print the reason for exiting even when it's
792 just because there are no more rows.
793
794 This option requires the standard Time::HiRes module, which is part
795 of core Perl on reasonably new Perl releases.
796
797 --stop
798 Stop running instances by creating the sentinel file.
799
800 Causes mk-archiver to create the sentinel file specified by
801 "--sentinel" and exit. This should have the effect of stopping all
802 running instances which are watching the same sentinel file.
803
804 --txn-size
805 type: int; default: 1
806
807 Number of rows per transaction.
808
809 Specifies the size, in number of rows, of each transaction. Zero
810 disables transactions altogether. After mk-archiver processes this
811 many rows, it commits both the "--source" and the "--dest" if
812 given, and flushes the file given by "--file".
813
814 This parameter is critical to performance. If you are archiving
815 from a live server, which for example is doing heavy OLTP work, you
816 need to choose a good balance between transaction size and commit
817 overhead. Larger transactions create the possibility of more lock
818 contention and deadlocks, but smaller transactions cause more
819 frequent commit overhead, which can be significant. To give an
820 idea, on a small test set I worked with while writing mk-archiver,
821 a value of 500 caused archiving to take about 2 seconds per 1000
822 rows on an otherwise quiet MySQL instance on my desktop machine,
823 archiving to disk and to another table. Disabling transactions
824 with a value of zero, which turns on autocommit, dropped
825 performance to 38 seconds per thousand rows.
826
827 If you are not archiving from or to a transactional storage engine,
828 you may want to disable transactions so mk-archiver doesn't try to
829 commit.
830
831 --user
832 short form: -u; type: string
833
834 User for login if not current user.
835
836 --version
837 Show version and exit.
838
839 --where
840 type: string
841
842 WHERE clause to limit which rows to archive (required).
843
844 Specifies a WHERE clause to limit which rows are archived. Do not
845 include the word WHERE. You may need to quote the argument to
846 prevent your shell from interpreting it. For example:
847
848 --where 'ts < current_date - interval 90 day'
849
850 For safety, "--where" is required. If you do not require a WHERE
851 clause, use "--where" 1=1.
852
853 --why-quit
854 Print reason for exiting unless rows exhausted.
855
856 Causes mk-archiver to print a message if it exits for any reason
857 other than running out of rows to archive. This can be useful if
858 you have a cron job with "--run-time" specified, for example, and
859 you want to be sure mk-archiver is finishing before running out of
860 time.
861
862 If "--statistics" is given, the behavior is changed slightly. It
863 will print the reason for exiting even when it's just because there
864 are no more rows.
865
866 This output prints even if "--quiet" is given. That's so you can
867 put "mk-archiver" in a "cron" job and get an email if there's an
868 abnormal exit.
869
871 These DSN options are used to create a DSN. Each option is given like
872 "option=value". The options are case-sensitive, so P and p are not the
873 same option. There cannot be whitespace before or after the "=" and if
874 the value contains whitespace it must be quoted. DSN options are
875 comma-separated. See the maatkit manpage for full details.
876
877 · a
878
879 copy: no
880
881 Database to USE when executing queries.
882
883 · A
884
885 dsn: charset; copy: yes
886
887 Default character set.
888
889 · b
890
891 copy: no
892
893 Disable binlog with SET SQL_LOG_BIN=0.
894
895 · D
896
897 dsn: database; copy: yes
898
899 Database that contains the table.
900
901 · F
902
903 dsn: mysql_read_default_file; copy: yes
904
905 Only read default options from the given file
906
907 · h
908
909 dsn: host; copy: yes
910
911 Connect to host.
912
913 · i
914
915 copy: yes
916
917 Index to use.
918
919 · m
920
921 copy: no
922
923 Plugin module name.
924
925 · p
926
927 dsn: password; copy: yes
928
929 Password to use when connecting.
930
931 · P
932
933 dsn: port; copy: yes
934
935 Port number to use for connection.
936
937 · S
938
939 dsn: mysql_socket; copy: yes
940
941 Socket file to use for connection.
942
943 · t
944
945 copy: yes
946
947 Table to archive from/to.
948
949 · u
950
951 dsn: user; copy: yes
952
953 User for login if not current user.
954
956 mk-archiver is extensible by plugging in external Perl modules to
957 handle some logic and/or actions. You can specify a module for both
958 the "--source" and the "--dest", with the 'm' part of the
959 specification. For example:
960
961 --source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
962
963 This will cause mk-archiver to load the My::Module1 and My::Module2
964 packages, create instances of them, and then make calls to them during
965 the archiving process.
966
967 You can also specify a plugin with "--plugin".
968
969 The module must provide this interface:
970
971 new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
972 The plugin's constructor is passed a reference to the database
973 handle, the database name, and table name. The plugin is created
974 just after mk-archiver opens the connection, and before it examines
975 the table given in the arguments. This gives the plugin a chance
976 to create and populate temporary tables, or do other setup work.
977
978 before_begin(cols => \@cols, allcols => \@allcols)
979 This method is called just before mk-archiver begins iterating
980 through rows and archiving them, but after it does all other setup
981 work (examining table structures, designing SQL queries, and so
982 on). This is the only time mk-archiver tells the plugin column
983 names for the rows it will pass the plugin while archiving.
984
985 The "cols" argument is the column names the user requested to be
986 archived, either by default or by the "--columns" option. The
987 "allcols" argument is the list of column names for every row mk-
988 archiver will fetch from the source table. It may fetch more
989 columns than the user requested, because it needs some columns for
990 its own use. When subsequent plugin functions receive a row, it is
991 the full row containing all the extra columns, if any, added to the
992 end.
993
994 is_archivable(row => \@row)
995 This method is called for each row to determine whether it is
996 archivable. This applies only to "--source". The argument is the
997 row itself, as an arrayref. If the method returns true, the row
998 will be archived; otherwise it will be skipped.
999
1000 Skipping a row adds complications for non-unique indexes. Normally
1001 mk-archiver uses a WHERE clause designed to target the last
1002 processed row as the place to start the scan for the next SELECT
1003 statement. If you have skipped the row by returning false from
1004 is_archivable(), mk-archiver could get into an infinite loop
1005 because the row still exists. Therefore, when you specify a plugin
1006 for the "--source" argument, mk-archiver will change its WHERE
1007 clause slightly. Instead of starting at "greater than or equal to"
1008 the last processed row, it will start "strictly greater than."
1009 This will work fine on unique indexes such as primary keys, but it
1010 may skip rows (leave holes) on non-unique indexes or when ascending
1011 only the first column of an index.
1012
1013 "mk-archiver" will change the clause in the same way if you specify
1014 "--no-delete", because again an infinite loop is possible.
1015
1016 If you specify the "--bulk-delete" option and return false from
1017 this method, "mk-archiver" may not do what you want. The row won't
1018 be archived, but it will be deleted, since bulk deletes operate on
1019 ranges of rows and don't know which rows the plugin selected to
1020 keep.
1021
1022 If you specify the "--bulk-insert" option, this method's return
1023 value will influence whether the row is written to the temporary
1024 file for the bulk insert, so bulk inserts will work as expected.
1025 However, bulk inserts require bulk deletes.
1026
1027 before_delete(row => \@row)
1028 This method is called for each row just before it is deleted. This
1029 applies only to "--source". This is a good place for you to handle
1030 dependencies, such as deleting things that are foreign-keyed to the
1031 row you are about to delete. You could also use this to
1032 recursively archive all dependent tables.
1033
1034 This plugin method is called even if "--no-delete" is given, but
1035 not if "--bulk-delete" is given.
1036
1037 before_bulk_delete(first_row => \@row, last_row => \@row)
1038 This method is called just before a bulk delete is executed. It is
1039 similar to the "before_delete" method, except its arguments are the
1040 first and last row of the range to be deleted. It is called even
1041 if "--no-delete" is given.
1042
1043 before_insert(row => \@row)
1044 This method is called for each row just before it is inserted.
1045 This applies only to "--dest". You could use this to insert the
1046 row into multiple tables, perhaps with an ON DUPLICATE KEY UPDATE
1047 clause to build summary tables in a data warehouse.
1048
1049 This method is not called if "--bulk-insert" is given.
1050
1051 before_bulk_insert(first_row => \@row, last_row => \@row)
1052 This method is called just before a bulk insert is executed. It is
1053 similar to the "before_insert" method, except its arguments are the
1054 first and last row of the range to be deleted.
1055
1056 custom_sth(row => \@row, sql => $sql)
1057 This method is called just before inserting the row, but after
1058 "before_insert()". It allows the plugin to specify different
1059 "INSERT" statement if desired. The return value (if any) should be
1060 a DBI statement handle. The "sql" parameter is the SQL text used
1061 to prepare the default "INSERT" statement. This method is not
1062 called if you specify "--bulk-insert".
1063
1064 If no value is returned, the default "INSERT" statement handle is
1065 used.
1066
1067 This method applies only to the plugin specified for "--dest", so
1068 if your plugin isn't doing what you expect, check that you've
1069 specified it for the destination and not the source.
1070
1071 custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql)
1072 If you've specified "--bulk-insert", this method is called just
1073 before the bulk insert, but after "before_bulk_insert()", and the
1074 arguments are different.
1075
1076 This method's return value etc is similar to the "custom_sth()"
1077 method.
1078
1079 after_finish()
1080 This method is called after mk-archiver exits the archiving loop,
1081 commits all database handles, closes "--file", and prints the final
1082 statistics, but before mk-archiver runs ANALYZE or OPTIMIZE (see
1083 "--analyze" and "--optimize").
1084
1085 If you specify a plugin for both "--source" and "--dest", mk-archiver
1086 constructs, calls before_begin(), and calls after_finish() on the two
1087 plugins in the order "--source", "--dest".
1088
1089 mk-archiver assumes it controls transactions, and that the plugin will
1090 NOT commit or roll back the database handle. The database handle
1091 passed to the plugin's constructor is the same handle mk-archiver uses
1092 itself. Remember that "--source" and "--dest" are separate handles.
1093
1094 A sample module might look like this:
1095
1096 package My::Module;
1097
1098 sub new {
1099 my ( $class, %args ) = @_;
1100 return bless(\%args, $class);
1101 }
1102
1103 sub before_begin {
1104 my ( $self, %args ) = @_;
1105 # Save column names for later
1106 $self->{cols} = $args{cols};
1107 }
1108
1109 sub is_archivable {
1110 my ( $self, %args ) = @_;
1111 # Do some advanced logic with $args{row}
1112 return 1;
1113 }
1114
1115 sub before_delete {} # Take no action
1116 sub before_insert {} # Take no action
1117 sub custom_sth {} # Take no action
1118 sub after_finish {} # Take no action
1119
1120 1;
1121
1123 You can download Maatkit from Google Code at
1124 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1125 easily with a command like the following:
1126
1127 wget http://www.maatkit.org/get/toolname
1128 or
1129 wget http://www.maatkit.org/trunk/toolname
1130
1131 Where "toolname" can be replaced with the name (or fragment of a name)
1132 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1133 installation is needed. The first URL gets the latest released version
1134 of the tool, and the second gets the latest trunk code from Subversion.
1135
1137 The environment variable "MKDEBUG" enables verbose debugging output in
1138 all of the Maatkit tools:
1139
1140 MKDEBUG=1 mk-....
1141
1143 You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1144 installed in any reasonably new version of Perl.
1145
1147 For a list of known bugs see http://www.maatkit.org/bugs/mk-archiver
1148 <http://www.maatkit.org/bugs/mk-archiver>.
1149
1150 Please use Google Code Issues and Groups to report bugs or request
1151 support: <http://code.google.com/p/maatkit/>. You can also join
1152 #maatkit on Freenode to discuss Maatkit.
1153
1154 Please include the complete command-line used to reproduce the problem
1155 you are seeing, the version of all MySQL servers involved, the complete
1156 output of the tool when run with "--version", and if possible,
1157 debugging output produced by running with the "MKDEBUG=1" environment
1158 variable.
1159
1161 Thanks to the following people, and apologies to anyone I've omitted:
1162
1163 Andrew O'Brien,
1164
1166 This program is copyright 2007-2010 Baron Schwartz. Feedback and
1167 improvements are welcome.
1168
1169 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1170 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1171 MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1172
1173 This program is free software; you can redistribute it and/or modify it
1174 under the terms of the GNU General Public License as published by the
1175 Free Software Foundation, version 2; OR the Perl Artistic License. On
1176 UNIX and similar systems, you can issue `man perlgpl' or `man
1177 perlartistic' to read these licenses.
1178
1179 You should have received a copy of the GNU General Public License along
1180 with this program; if not, write to the Free Software Foundation, Inc.,
1181 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1182
1184 Baron Schwartz
1185
1187 This tool is part of Maatkit, a toolkit for power users of MySQL.
1188 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1189 primary code contributors. Both are employed by Percona. Financial
1190 support for Maatkit development is primarily provided by Percona and
1191 its clients.
1192
1194 This manual page documents Ver 1.0.23 Distrib 6839 $Revision: 6831 $.
1195
1196
1197
1198perl v5.12.1 2010-08-01 MK-ARCHIVER(1)