1MK-ARCHIVER(1)        User Contributed Perl Documentation       MK-ARCHIVER(1)
2
3
4

NAME

6       mk-archiver - Archive rows from a MySQL table into another table or a
7       file.
8

SYNOPSIS

10       Archive all rows from oltp_server to olap_server and to a file:
11
12         mk-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
13           --file '/var/log/archive/%Y-%m-%d-%D.%t'                           \
14           --where "1=1" --limit 1000 --commit-each
15
16       Purge (delete) orphan rows from child table:
17
18         mk-archiver --source h=host,D=db,t=child \
19           --purge --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'
20

RISKS

22       The following section is included to inform users about the potential
23       risks, whether known or unknown, of using this tool.  The two main
24       categories of risks are those created by the nature of the tool (e.g.
25       read-only tools vs. read-write tools) and those created by bugs.
26
27       mk-achiver is a read-write tool.  It deletes data from the source by
28       default, so you should test your archiving jobs with the "--dry-run"
29       option if you're not sure about them.  It is designed to have as little
30       impact on production systems as possible, but tuning with "--limit",
31       "--txn-size" and similar options might be a good idea too.
32
33       If you write or use "--plugin" modules, you should ensure they are good
34       quality and well-tested.
35
36       At the time of this release, we know of no bugs that could cause
37       serious harm to users.
38
39       The authoritative source for updated information is always the online
40       issue tracking system.  Issues that affect this tool will be marked as
41       such.  You can see a list of such issues at the following URL:
42       http://www.maatkit.org/bugs/mk-archiver
43       <http://www.maatkit.org/bugs/mk-archiver>.
44
45       See also "BUGS" for more information on filing bugs and getting help.
46

DESCRIPTION

48       mk-archiver is the tool I use to archive tables as described in
49       http://tinyurl.com/mysql-archiving <http://tinyurl.com/mysql-
50       archiving>.  The goal is a low-impact, forward-only job to nibble old
51       data out of the table without impacting OLTP queries much.  You can
52       insert the data into another table, which need not be on the same
53       server.  You can also write it to a file in a format suitable for LOAD
54       DATA INFILE.  Or you can do neither, in which case it's just an
55       incremental DELETE.
56
57       mk-archiver is extensible via a plugin mechanism.  You can inject your
58       own code to add advanced archiving logic that could be useful for
59       archiving dependent data, applying complex business rules, or building
60       a data warehouse during the archiving process.
61
62       You need to choose values carefully for some options.  The most
63       important are "--limit", "--retries", and "--txn-size".
64
65       The strategy is to find the first row(s), then scan some index forward-
66       only to find more rows efficiently.  Each subsequent query should not
67       scan the entire table; it should seek into the index, then scan until
68       it finds more archivable rows.  Specifying the index with the 'i' part
69       of the "--source" argument can be crucial for this; use "--dry-run" to
70       examine the generated queries and be sure to EXPLAIN them to see if
71       they are efficient (most of the time you probably want to scan the
72       PRIMARY key, which is the default).  Even better, profile mk-archiver
73       with mk-query-profiler and make sure it is not scanning the whole table
74       every query.
75
76       You can disable the seek-then-scan optimizations partially or wholly
77       with "--no-ascend" and "--ascend-first".  Sometimes this may be more
78       efficient for multi-column keys.  Be aware that mk-archiver is built to
79       start at the beginning of the index it chooses and scan it forward-
80       only.  This might result in long table scans if you're trying to nibble
81       from the end of the table by an index other than the one it prefers.
82       See "--source" and read the documentation on the "i" part if this
83       applies to you.
84

OUTPUT

86       If you specify "--progress", the output is a header row, plus status
87       output at intervals.  Each row in the status output lists the current
88       date and time, how many seconds mk-archiver has been running, and how
89       many rows it has archived.
90
91       If you specify "--statistics", "mk-archiver" outputs timing and other
92       information to help you identify which part of your archiving process
93       takes the most time.
94

ERROR-HANDLING

96       mk-archiver tries to catch signals and exit gracefully; for example, if
97       you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the
98       signal, print a message about the signal, and exit fairly normally.  It
99       will not execute "--analyze" or "--optimize", because these may take a
100       long time to finish.  It will run all other code normally, including
101       calling after_finish() on any plugins (see "EXTENDING").
102
103       In other words, a signal, if caught, will break out of the main
104       archiving loop and skip optimize/analyze.
105

OPTIONS

107       Specify at least one of "--dest", "--file", or "--purge".
108
109       "--ignore" and "--replace" are mutually exclusive.
110
111       "--txn-size" and "--commit-each" are mutually exclusive.
112
113       "--low-priority-insert" and "--delayed-insert" are mutually exclusive.
114
115       "--share-lock" and "--for-update" are mutually exclusive.
116
117       "--analyze" and "--optimize" are mutually exclusive.
118
119       "--no-ascend" and "--no-delete" are mutually exclusive.
120
121       DSN values in "--dest" default to values from "--source" if COPY is
122       yes.
123
124       --analyze
125           type: string
126
127           Run ANALYZE TABLE afterwards on "--source" and/or "--dest".
128
129           Runs ANALYZE TABLE after finishing.  The argument is an arbitrary
130           string.  If it contains the letter 's', the source will be
131           analyzed.  If it contains 'd', the destination will be analyzed.
132           You can specify either or both.  For example, the following will
133           analyze both:
134
135             --analyze=ds
136
137           See http://dev.mysql.com/doc/en/analyze-table.html
138           <http://dev.mysql.com/doc/en/analyze-table.html> for details on
139           ANALYZE TABLE.
140
141       --ascend-first
142           Ascend only first column of index.
143
144           If you do want to use the ascending index optimization (see
145           "--no-ascend"), but do not want to incur the overhead of ascending
146           a large multi-column index, you can use this option to tell mk-
147           archiver to ascend only the leftmost column of the index.  This can
148           provide a significant performance boost over not ascending the
149           index at all, while avoiding the cost of ascending the whole index.
150
151           See "EXTENDING" for a discussion of how this interacts with
152           plugins.
153
154       --ask-pass
155           Prompt for a password when connecting to MySQL.
156
157       --buffer
158           Buffer output to "--file" and flush at commit.
159
160           Disables autoflushing to "--file" and flushes "--file" to disk only
161           when a transaction commits.  This typically means the file is
162           block-flushed by the operating system, so there may be some
163           implicit flushes to disk between commits as well.  The default is
164           to flush "--file" to disk after every row.
165
166           The danger is that a crash might cause lost data.
167
168           The performance increase I have seen from using "--buffer" is
169           around 5 to 15 percent.  Your mileage may vary.
170
171       --bulk-delete
172           Delete each chunk with a single statement (implies
173           "--commit-each").
174
175           Delete each chunk of rows in bulk with a single "DELETE" statement.
176           The statement deletes every row between the first and last row of
177           the chunk, inclusive.  It implies "--commit-each", since it would
178           be a bad idea to "INSERT" rows one at a time and commit them before
179           the bulk "DELETE".
180
181           The normal method is to delete every row by its primary key.  Bulk
182           deletes might be a lot faster.  They also might not be faster if
183           you have a complex "WHERE" clause.
184
185           This option completely defers all "DELETE" processing until the
186           chunk of rows is finished.  If you have a plugin on the source, its
187           "before_delete" method will not be called.  Instead, its
188           "before_bulk_delete" method is called later.
189
190           WARNING: if you have a plugin on the source that sometimes doesn't
191           return true from "is_archivable()", you should use this option only
192           if you understand what it does.  If the plugin instructs
193           "mk-archiver" not to archive a row, it will still be deleted by the
194           bulk delete!
195
196       --bulk-insert
197           Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete"
198           "--commit-each").
199
200           Insert each chunk of rows with "LOAD DATA LOCAL INFILE".  This may
201           be much faster than inserting a row at a time with "INSERT"
202           statements.  It is implemented by creating a temporary file for
203           each chunk of rows, and writing the rows to this file instead of
204           inserting them.  When the chunk is finished, it uploads the rows.
205
206           To protect the safety of your data, this option forces bulk deletes
207           to be used.  It would be unsafe to delete each row as it is found,
208           before inserting the rows into the destination first.  Forcing bulk
209           deletes guarantees that the deletion waits until the insertion is
210           successful.
211
212           The "--low-priority-insert", "--replace", and "--ignore" options
213           work with this option, but "--delayed-insert" does not.
214
215       --charset
216           short form: -A; type: string
217
218           Default character set.  If the value is utf8, sets Perl's binmode
219           on STDOUT to utf8, passes the mysql_enable_utf8 option to
220           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
221           other value sets binmode on STDOUT without the utf8 layer, and runs
222           SET NAMES after connecting to MySQL.
223
224       --[no]check-columns
225           default: yes
226
227           Ensure "--source" and "--dest" have same columns.
228
229           Enabled by default; causes mk-archiver to check that the source and
230           destination tables have the same columns.  It does not check column
231           order, data type, etc.  It just checks that all columns in the
232           source exist in the destination and vice versa.  If there are any
233           differences, mk-archiver will exit with an error.
234
235           To disable this check, specify --no-check-columns.
236
237       --check-interval
238           type: time; default: 1s
239
240           How often to check for slave lag if "--check-slave-lag" is given.
241
242       --check-slave-lag
243           type: string
244
245           Pause archiving until the specified DSN's slave lag is less than
246           "--max-lag".
247
248       --columns
249           short form: -c; type: array
250
251           Comma-separated list of columns to archive.
252
253           Specify a comma-separated list of columns to fetch, write to the
254           file, and insert into the destination table.  If specified, mk-
255           archiver ignores other columns unless it needs to add them to the
256           "SELECT" statement for ascending an index or deleting rows.  It
257           fetches and uses these extra columns internally, but does not write
258           them to the file or to the destination table.  It does pass them to
259           plugins.
260
261           See also "--primary-key-only".
262
263       --commit-each
264           Commit each set of fetched and archived rows (disables
265           "--txn-size").
266
267           Commits transactions and flushes "--file" after each set of rows
268           has been archived, before fetching the next set of rows, and before
269           sleeping if "--sleep" is specified.  Disables "--txn-size"; use
270           "--limit" to control the transaction size with "--commit-each".
271
272           This option is useful as a shortcut to make "--limit" and
273           "--txn-size" the same value, but more importantly it avoids
274           transactions being held open while searching for more rows.  For
275           example, imagine you are archiving old rows from the beginning of a
276           very large table, with "--limit" 1000 and "--txn-size" 1000.  After
277           some period of finding and archiving 1000 rows at a time, mk-
278           archiver finds the last 999 rows and archives them, then executes
279           the next SELECT to find more rows.  This scans the rest of the
280           table, but never finds any more rows.  It has held open a
281           transaction for a very long time, only to determine it is finished
282           anyway.  You can use "--commit-each" to avoid this.
283
284       --config
285           type: Array
286
287           Read this comma-separated list of config files; if specified, this
288           must be the first option on the command line.
289
290       --delayed-insert
291           Add the DELAYED modifier to INSERT statements.
292
293           Adds the DELAYED modifier to INSERT or REPLACE statements.  See
294           <http://dev.mysql.com/doc/en/insert.html> for details.
295
296       --dest
297           type: DSN
298
299           DSN specifying the table to archive to.
300
301           This item specifies a table into which mk-archiver will insert rows
302           archived from "--source".  It uses the same key=val argument format
303           as "--source".  Most missing values default to the same values as
304           "--source", so you don't have to repeat options that are the same
305           in "--source" and "--dest".  Use the "--help" option to see which
306           values are copied from "--source".
307
308           WARNING: Using a default options file (F) DSN option that defines a
309           socket for "--source" causes mk-archiver to connect to "--dest"
310           using that socket unless another socket for "--dest" is specified.
311           This means that mk-archiver may incorrectly connect to "--source"
312           when it connects to "--dest".  For example:
313
314             --source F=host1.cnf,D=db,t=tbl --dest h=host2
315
316           When mk-archiver connects to "--dest", host2, it will connect via
317           the "--source", host1, socket defined in host1.cnf.
318
319       --dry-run
320           Print queries and exit without doing anything.
321
322           Causes mk-archiver to exit after printing the filename and SQL
323           statements it will use.
324
325       --file
326           type: string
327
328           File to archive to, with DATE_FORMAT()-like formatting.
329
330           Filename to write archived rows to.  A subset of MySQL's
331           DATE_FORMAT() formatting codes are allowed in the filename, as
332           follows:
333
334              %d    Day of the month, numeric (01..31)
335              %H    Hour (00..23)
336              %i    Minutes, numeric (00..59)
337              %m    Month, numeric (01..12)
338              %s    Seconds (00..59)
339              %Y    Year, numeric, four digits
340
341           You can use the following extra format codes too:
342
343              %D    Database name
344              %t    Table name
345
346           Example:
347
348              --file '/var/log/archive/%Y-%m-%d-%D.%t'
349
350           The file's contents are in the same format used by SELECT INTO
351           OUTFILE, as documented in the MySQL manual: rows terminated by
352           newlines, columns terminated by tabs, NULL characters are
353           represented by \N, and special characters are escaped by \.  This
354           lets you reload a file with LOAD DATA INFILE's default settings.
355
356           If you want a column header at the top of the file, see "--header".
357           The file is auto-flushed by default; see "--buffer".
358
359       --for-update
360           Adds the FOR UPDATE modifier to SELECT statements.
361
362           For details, see
363           http://dev.mysql.com/doc/en/innodb-locking-reads.html
364           <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
365
366       --header
367           Print column header at top of "--file".
368
369           Writes column names as the first line in the file given by
370           "--file".  If the file exists, does not write headers; this keeps
371           the file loadable with LOAD DATA INFILE in case you append more
372           output to it.
373
374       --help
375           Show help and exit.
376
377       --high-priority-select
378           Adds the HIGH_PRIORITY modifier to SELECT statements.
379
380           See <http://dev.mysql.com/doc/en/select.html> for details.
381
382       --host
383           short form: -h; type: string
384
385           Connect to host.
386
387       --ignore
388           Use IGNORE for INSERT statements.
389
390           Causes INSERTs into "--dest" to be INSERT IGNORE.
391
392       --limit
393           type: int; default: 1
394
395           Number of rows to fetch and archive per statement.
396
397           Limits the number of rows returned by the SELECT statements that
398           retrieve rows to archive.  Default is one row.  It may be more
399           efficient to increase the limit, but be careful if you are
400           archiving sparsely, skipping over many rows; this can potentially
401           cause more contention with other queries, depending on the storage
402           engine, transaction isolation level, and options such as
403           "--for-update".
404
405       --local
406           Do not write OPTIMIZE or ANALYZE queries to binlog.
407
408           Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE
409           queries.  See "--analyze" for details.
410
411       --low-priority-delete
412           Adds the LOW_PRIORITY modifier to DELETE statements.
413
414           See <http://dev.mysql.com/doc/en/delete.html> for details.
415
416       --low-priority-insert
417           Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
418
419           See <http://dev.mysql.com/doc/en/insert.html> for details.
420
421       --max-lag
422           type: time; default: 1s
423
424           Pause archiving if the slave given by "--check-slave-lag" lags.
425
426           This option causes mk-archiver to look at the slave every time it's
427           about to fetch another row.  If the slave's lag is greater than the
428           option's value, or if the slave isn't running (so its lag is NULL),
429           mk-table-checksum sleeps for "--check-interval" seconds and then
430           looks at the lag again.  It repeats until the slave is caught up,
431           then proceeds to fetch and archive the row.
432
433           This option may eliminate the need for "--sleep" or "--sleep-coef".
434
435       --no-ascend
436           Do not use ascending index optimization.
437
438           The default ascending-index optimization causes "mk-archiver" to
439           optimize repeated "SELECT" queries so they seek into the index
440           where the previous query ended, then scan along it, rather than
441           scanning from the beginning of the table every time.  This is
442           enabled by default because it is generally a good strategy for
443           repeated accesses.
444
445           Large, multiple-column indexes may cause the WHERE clause to be
446           complex enough that this could actually be less efficient.
447           Consider for example a four-column PRIMARY KEY on (a, b, c, d).
448           The WHERE clause to start where the last query ended is as follows:
449
450              WHERE (a > ?)
451                 OR (a = ? AND b > ?)
452                 OR (a = ? AND b = ? AND c > ?)
453                 OR (a = ? AND b = ? AND c = ? AND d >= ?)
454
455           Populating the placeholders with values uses memory and CPU, adds
456           network traffic and parsing overhead, and may make the query harder
457           for MySQL to optimize.  A four-column key isn't a big deal, but a
458           ten-column key in which every column allows "NULL" might be.
459
460           Ascending the index might not be necessary if you know you are
461           simply removing rows from the beginning of the table in chunks, but
462           not leaving any holes, so starting at the beginning of the table is
463           actually the most efficient thing to do.
464
465           See also "--ascend-first".  See "EXTENDING" for a discussion of how
466           this interacts with plugins.
467
468       --no-delete
469           Do not delete archived rows.
470
471           Causes "mk-archiver" not to delete rows after processing them.
472           This disallows "--no-ascend", because enabling them both would
473           cause an infinite loop.
474
475           If there is a plugin on the source DSN, its "before_delete" method
476           is called anyway, even though "mk-archiver" will not execute the
477           delete.  See "EXTENDING" for more on plugins.
478
479       --optimize
480           type: string
481
482           Run OPTIMIZE TABLE afterwards on "--source" and/or "--dest".
483
484           Runs OPTIMIZE TABLE after finishing.  See "--analyze" for the
485           option syntax and http://dev.mysql.com/doc/en/optimize-table.html
486           <http://dev.mysql.com/doc/en/optimize-table.html> for details on
487           OPTIMIZE TABLE.
488
489       --password
490           short form: -p; type: string
491
492           Password to use when connecting.
493
494       --pid
495           type: string
496
497           Create the given PID file when daemonized.  The file contains the
498           process ID of the daemonized instance.  The PID file is removed
499           when the daemonized instance exits.  The program checks for the
500           existence of the PID file when starting; if it exists and the
501           process with the matching PID exists, the program exits.
502
503       --plugin
504           type: string
505
506           Perl module name to use as a generic plugin.
507
508           Specify the Perl module name of a general-purpose plugin.  It is
509           currently used only for statistics (see "--statistics") and must
510           have "new()" and a "statistics()" method.
511
512           The "new( src =" $src, dst => $dst, opts => $o )> method gets the
513           source and destination DSNs, and their database connections, just
514           like the connection-specific plugins do.  It also gets an
515           OptionParser object ($o) for accessing command-line options
516           (example: "$o-"get('purge');>).
517
518           The "statistics(\%stats, $time)" method gets a hashref of the
519           statistics collected by the archiving job, and the time the whole
520           job started.
521
522       --port
523           short form: -P; type: int
524
525           Port number to use for connection.
526
527       --primary-key-only
528           Primary key columns only.
529
530           A shortcut for specifying "--columns" with the primary key columns.
531           This is an efficiency if you just want to purge rows; it avoids
532           fetching the entire row, when only the primary key columns are
533           needed for "DELETE" statements.  See also "--purge".
534
535       --progress
536           type: int
537
538           Print progress information every X rows.
539
540           Prints current time, elapsed time, and rows archived every X rows.
541
542       --purge
543           Purge instead of archiving; allows omitting "--file" and "--dest".
544
545           Allows archiving without a "--file" or "--dest" argument, which is
546           effectively a purge since the rows are just deleted.
547
548           If you just want to purge rows, consider specifying the table's
549           primary key columns with "--primary-key-only".  This will prevent
550           fetching all columns from the server for no reason.
551
552       --quick-delete
553           Adds the QUICK modifier to DELETE statements.
554
555           See <http://dev.mysql.com/doc/en/delete.html> for details.  As
556           stated in the documentation, in some cases it may be faster to use
557           DELETE QUICK followed by OPTIMIZE TABLE.  You can use "--optimize"
558           for this.
559
560       --quiet
561           Do not print any output, such as for "--statistics".
562
563           Suppresses normal output, including the output of "--statistics",
564           but doesn't suppress the output from "--why-quit".
565
566       --replace
567           Causes INSERTs into "--dest" to be written as REPLACE.
568
569       --retries
570           type: int; default: 1
571
572           Number of retries per timeout or deadlock.
573
574           Specifies the number of times mk-archiver should retry when there
575           is an InnoDB lock wait timeout or deadlock.  When retries are
576           exhausted, mk-archiver will exit with an error.
577
578           Consider carefully what you want to happen when you are archiving
579           between a mixture of transactional and non-transactional storage
580           engines.  The INSERT to "--dest" and DELETE from "--source" are on
581           separate connections, so they do not actually participate in the
582           same transaction even if they're on the same server.  However, mk-
583           archiver implements simple distributed transactions in code, so
584           commits and rollbacks should happen as desired across the two
585           connections.
586
587           At this time I have not written any code to handle errors with
588           transactional storage engines other than InnoDB.  Request that
589           feature if you need it.
590
591       --run-time
592           type: time
593
594           Time to run before exiting.
595
596           Optional suffix s=seconds, m=minutes, h=hours, d=days; if no
597           suffix, s is used.
598
599       --[no]safe-auto-increment
600           default: yes
601
602           Do not archive row with max AUTO_INCREMENT.
603
604           Adds an extra WHERE clause to prevent mk-archiver from removing the
605           newest row when ascending a single-column AUTO_INCREMENT key.  This
606           guards against re-using AUTO_INCREMENT values if the server
607           restarts, and is enabled by default.
608
609           The extra WHERE clause contains the maximum value of the auto-
610           increment column as of the beginning of the archive or purge job.
611           If new rows are inserted while mk-archiver is running, it will not
612           see them.
613
614       --sentinel
615           type: string; default: /tmp/mk-archiver-sentinel
616
617           Exit if this file exists.
618
619           The presence of the file specified by "--sentinel" will cause mk-
620           archiver to stop archiving and exit.  The default is
621           /tmp/mk-archiver-sentinel.  You might find this handy to stop cron
622           jobs gracefully if necessary.  See also "--stop".
623
624       --set-vars
625           type: string; default: wait_timeout=10000
626
627           Set these MySQL variables.
628
629           Specify any variables you want to be set immediately after
630           connecting to MySQL.  These will be included in a "SET" command.
631
632       --share-lock
633           Adds the LOCK IN SHARE MODE modifier to SELECT statements.
634
635           See http://dev.mysql.com/doc/en/innodb-locking-reads.html
636           <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
637
638       --skip-foreign-key-checks
639           Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
640
641       --sleep
642           type: int
643
644           Sleep time between fetches.
645
646           Specifies how long to sleep between SELECT statements.  Default is
647           not to sleep at all.  Transactions are NOT committed, and the
648           "--file" file is NOT flushed, before sleeping.  See "--txn-size" to
649           control that.
650
651           If "--commit-each" is specified, committing and flushing happens
652           before sleeping.
653
654       --sleep-coef
655           type: float
656
657           Calculate "--sleep" as a multiple of the last SELECT time.
658
659           If this option is specified, mk-archiver will sleep for the query
660           time of the last SELECT multiplied by the specified coefficient.
661           This option is ignored if "--sleep" is specified.
662
663           This is a slightly more sophisticated way to throttle the SELECTs:
664           sleep a varying amount of time between each SELECT, depending on
665           how long the SELECTs are taking.
666
667       --socket
668           short form: -S; type: string
669
670           Socket file to use for connection.
671
672       --source
673           type: DSN
674
675           DSN specifying the table to archive from (required).
676
677           This argument is specially formatted as a key=value,key=value
678           string.  Keys are a single letter.  Most options control how mk-
679           archiver connects to MySQL:
680
681              KEY MEANING
682              === =======
683              h   Connect to host
684              P   Port number to use for connection
685              S   Socket file to use for connection
686              u   User for login if not current user
687              p   Password to use when connecting
688              F   Only read default options from the given file
689
690           The following options select a table to archive:
691
692              KEY MEANING
693              === =======
694              D   Database to archive
695              t   Table to archive
696              i   Index to use
697
698           The following options specify pluggable actions, which an external
699           Perl module can provide:
700
701              KEY MEANING
702              === =======
703              m   Package name of an external Perl module (see EXTENDING).
704
705           The following actions set other options:
706
707              KEY MEANING
708              === =======
709              a   Database to set as the connection's default with USE
710              b   Disable binary logging with SET SQL_LOG_BIN=0
711
712           The only required part is the table; other parts may be read from
713           various places in the environment (such as options files).  Here is
714           an example:
715
716              --source h=my_server,D=my_database,t=my_tbl
717
718           The 'i' part deserves special mention.  This tells mk-archiver
719           which index it should scan to archive.  This appears in a FORCE
720           INDEX or USE INDEX hint in the SELECT statements used to fetch
721           archivable rows.  If you don't specify anything, mk-archiver will
722           auto-discover a good index, preferring a "PRIMARY KEY" if one
723           exists.  In my experience this usually works well, so most of the
724           time you can probably just omit the 'i' part.
725
726           The index is used to optimize repeated accesses to the table; mk-
727           archiver remembers the last row it retrieves from each SELECT
728           statement, and uses it to construct a WHERE clause, using the
729           columns in the specified index, that should allow MySQL to start
730           the next SELECT where the last one ended, rather than potentially
731           scanning from the beginning of the table with each successive
732           SELECT.  If you are using external plugins, please see "EXTENDING"
733           for a discussion of how they interact with ascending indexes.
734
735           The 'a' and 'b' options allow you to control how statements flow
736           through the binary log.  If you specify the 'b' option, binary
737           logging will be disabled on the specified connection.  If you
738           specify the 'a' option, the connection will "USE" the specified
739           database, which you can use to prevent slaves from executing the
740           binary log events with "--replicate-ignore-db" options.  These two
741           options can be used as different methods to achieve the same goal:
742           archive data off the master, but leave it on the slave.  For
743           example, you can run a purge job on the master and prevent it from
744           happening on the slave using your method of choice.
745
746           WARNING: Using a default options file (F) DSN option that defines a
747           socket for "--source" causes mk-archiver to connect to "--dest"
748           using that socket unless another socket for "--dest" is specified.
749           This means that mk-archiver may incorrectly connect to "--source"
750           when it connects to "--dest".  For example:
751
752             --source F=host1.cnf,D=db,t=tbl --dest h=host2
753
754           When mk-archiver connects to "--dest", host2, it will connect via
755           the "--source", host1, socket defined in host1.cnf.
756
757       --statistics
758           Collect and print timing statistics.
759
760           Causes mk-archiver to collect timing statistics about what it does.
761           These statistics are available to the plugin specified by
762           "--plugin"
763
764           Unless you specify "--quiet", "mk-archiver" prints the statistics
765           when it exits.  The statistics look like this:
766
767            Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
768            Source: D=db,t=table
769            SELECT 4
770            INSERT 4
771            DELETE 4
772            Action         Count       Time        Pct
773            commit            10     0.1079      88.27
774            select             5     0.0047       3.87
775            deleting           4     0.0028       2.29
776            inserting          4     0.0028       2.28
777            other              0     0.0040       3.29
778
779           The first two (or three) lines show times and the source and
780           destination tables.  The next three lines show how many rows were
781           fetched, inserted, and deleted.
782
783           The remaining lines show counts and timing.  The columns are the
784           action, the total number of times that action was timed, the total
785           time it took, and the percent of the program's total runtime.  The
786           rows are sorted in order of descending total time.  The last row is
787           the rest of the time not explicitly attributed to anything.
788           Actions will vary depending on command-line options.
789
790           If "--why-quit" is given, its behavior is changed slightly.  This
791           option causes it to print the reason for exiting even when it's
792           just because there are no more rows.
793
794           This option requires the standard Time::HiRes module, which is part
795           of core Perl on reasonably new Perl releases.
796
797       --stop
798           Stop running instances by creating the sentinel file.
799
800           Causes mk-archiver to create the sentinel file specified by
801           "--sentinel" and exit.  This should have the effect of stopping all
802           running instances which are watching the same sentinel file.
803
804       --txn-size
805           type: int; default: 1
806
807           Number of rows per transaction.
808
809           Specifies the size, in number of rows, of each transaction. Zero
810           disables transactions altogether.  After mk-archiver processes this
811           many rows, it commits both the "--source" and the "--dest" if
812           given, and flushes the file given by "--file".
813
814           This parameter is critical to performance.  If you are archiving
815           from a live server, which for example is doing heavy OLTP work, you
816           need to choose a good balance between transaction size and commit
817           overhead.  Larger transactions create the possibility of more lock
818           contention and deadlocks, but smaller transactions cause more
819           frequent commit overhead, which can be significant.  To give an
820           idea, on a small test set I worked with while writing mk-archiver,
821           a value of 500 caused archiving to take about 2 seconds per 1000
822           rows on an otherwise quiet MySQL instance on my desktop machine,
823           archiving to disk and to another table.  Disabling transactions
824           with a value of zero, which turns on autocommit, dropped
825           performance to 38 seconds per thousand rows.
826
827           If you are not archiving from or to a transactional storage engine,
828           you may want to disable transactions so mk-archiver doesn't try to
829           commit.
830
831       --user
832           short form: -u; type: string
833
834           User for login if not current user.
835
836       --version
837           Show version and exit.
838
839       --where
840           type: string
841
842           WHERE clause to limit which rows to archive (required).
843
844           Specifies a WHERE clause to limit which rows are archived.  Do not
845           include the word WHERE.  You may need to quote the argument to
846           prevent your shell from interpreting it.  For example:
847
848              --where 'ts < current_date - interval 90 day'
849
850           For safety, "--where" is required.  If you do not require a WHERE
851           clause, use "--where" 1=1.
852
853       --why-quit
854           Print reason for exiting unless rows exhausted.
855
856           Causes mk-archiver to print a message if it exits for any reason
857           other than running out of rows to archive.  This can be useful if
858           you have a cron job with "--run-time" specified, for example, and
859           you want to be sure mk-archiver is finishing before running out of
860           time.
861
862           If "--statistics" is given, the behavior is changed slightly.  It
863           will print the reason for exiting even when it's just because there
864           are no more rows.
865
866           This output prints even if "--quiet" is given.  That's so you can
867           put "mk-archiver" in a "cron" job and get an email if there's an
868           abnormal exit.
869

DSN OPTIONS

871       These DSN options are used to create a DSN.  Each option is given like
872       "option=value".  The options are case-sensitive, so P and p are not the
873       same option.  There cannot be whitespace before or after the "=" and if
874       the value contains whitespace it must be quoted.  DSN options are
875       comma-separated.  See the maatkit manpage for full details.
876
877       ·   a
878
879           copy: no
880
881           Database to USE when executing queries.
882
883       ·   A
884
885           dsn: charset; copy: yes
886
887           Default character set.
888
889       ·   b
890
891           copy: no
892
893           Disable binlog with SET SQL_LOG_BIN=0.
894
895       ·   D
896
897           dsn: database; copy: yes
898
899           Database that contains the table.
900
901       ·   F
902
903           dsn: mysql_read_default_file; copy: yes
904
905           Only read default options from the given file
906
907       ·   h
908
909           dsn: host; copy: yes
910
911           Connect to host.
912
913       ·   i
914
915           copy: yes
916
917           Index to use.
918
919       ·   m
920
921           copy: no
922
923           Plugin module name.
924
925       ·   p
926
927           dsn: password; copy: yes
928
929           Password to use when connecting.
930
931       ·   P
932
933           dsn: port; copy: yes
934
935           Port number to use for connection.
936
937       ·   S
938
939           dsn: mysql_socket; copy: yes
940
941           Socket file to use for connection.
942
943       ·   t
944
945           copy: yes
946
947           Table to archive from/to.
948
949       ·   u
950
951           dsn: user; copy: yes
952
953           User for login if not current user.
954

EXTENDING

956       mk-archiver is extensible by plugging in external Perl modules to
957       handle some logic and/or actions.  You can specify a module for both
958       the "--source" and the "--dest", with the 'm' part of the
959       specification.  For example:
960
961          --source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
962
963       This will cause mk-archiver to load the My::Module1 and My::Module2
964       packages, create instances of them, and then make calls to them during
965       the archiving process.
966
967       You can also specify a plugin with "--plugin".
968
969       The module must provide this interface:
970
971       new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
972           The plugin's constructor is passed a reference to the database
973           handle, the database name, and table name.  The plugin is created
974           just after mk-archiver opens the connection, and before it examines
975           the table given in the arguments.  This gives the plugin a chance
976           to create and populate temporary tables, or do other setup work.
977
978       before_begin(cols => \@cols, allcols => \@allcols)
979           This method is called just before mk-archiver begins iterating
980           through rows and archiving them, but after it does all other setup
981           work (examining table structures, designing SQL queries, and so
982           on).  This is the only time mk-archiver tells the plugin column
983           names for the rows it will pass the plugin while archiving.
984
985           The "cols" argument is the column names the user requested to be
986           archived, either by default or by the "--columns" option.  The
987           "allcols" argument is the list of column names for every row mk-
988           archiver will fetch from the source table.  It may fetch more
989           columns than the user requested, because it needs some columns for
990           its own use.  When subsequent plugin functions receive a row, it is
991           the full row containing all the extra columns, if any, added to the
992           end.
993
994       is_archivable(row => \@row)
995           This method is called for each row to determine whether it is
996           archivable.  This applies only to "--source".  The argument is the
997           row itself, as an arrayref.  If the method returns true, the row
998           will be archived; otherwise it will be skipped.
999
1000           Skipping a row adds complications for non-unique indexes.  Normally
1001           mk-archiver uses a WHERE clause designed to target the last
1002           processed row as the place to start the scan for the next SELECT
1003           statement.  If you have skipped the row by returning false from
1004           is_archivable(), mk-archiver could get into an infinite loop
1005           because the row still exists.  Therefore, when you specify a plugin
1006           for the "--source" argument, mk-archiver will change its WHERE
1007           clause slightly.  Instead of starting at "greater than or equal to"
1008           the last processed row, it will start "strictly greater than."
1009           This will work fine on unique indexes such as primary keys, but it
1010           may skip rows (leave holes) on non-unique indexes or when ascending
1011           only the first column of an index.
1012
1013           "mk-archiver" will change the clause in the same way if you specify
1014           "--no-delete", because again an infinite loop is possible.
1015
1016           If you specify the "--bulk-delete" option and return false from
1017           this method, "mk-archiver" may not do what you want.  The row won't
1018           be archived, but it will be deleted, since bulk deletes operate on
1019           ranges of rows and don't know which rows the plugin selected to
1020           keep.
1021
1022           If you specify the "--bulk-insert" option, this method's return
1023           value will influence whether the row is written to the temporary
1024           file for the bulk insert, so bulk inserts will work as expected.
1025           However, bulk inserts require bulk deletes.
1026
1027       before_delete(row => \@row)
1028           This method is called for each row just before it is deleted.  This
1029           applies only to "--source".  This is a good place for you to handle
1030           dependencies, such as deleting things that are foreign-keyed to the
1031           row you are about to delete.  You could also use this to
1032           recursively archive all dependent tables.
1033
1034           This plugin method is called even if "--no-delete" is given, but
1035           not if "--bulk-delete" is given.
1036
1037       before_bulk_delete(first_row => \@row, last_row => \@row)
1038           This method is called just before a bulk delete is executed.  It is
1039           similar to the "before_delete" method, except its arguments are the
1040           first and last row of the range to be deleted.  It is called even
1041           if "--no-delete" is given.
1042
1043       before_insert(row => \@row)
1044           This method is called for each row just before it is inserted.
1045           This applies only to "--dest".  You could use this to insert the
1046           row into multiple tables, perhaps with an ON DUPLICATE KEY UPDATE
1047           clause to build summary tables in a data warehouse.
1048
1049           This method is not called if "--bulk-insert" is given.
1050
1051       before_bulk_insert(first_row => \@row, last_row => \@row)
1052           This method is called just before a bulk insert is executed.  It is
1053           similar to the "before_insert" method, except its arguments are the
1054           first and last row of the range to be deleted.
1055
1056       custom_sth(row => \@row, sql => $sql)
1057           This method is called just before inserting the row, but after
1058           "before_insert()".  It allows the plugin to specify different
1059           "INSERT" statement if desired.  The return value (if any) should be
1060           a DBI statement handle.  The "sql" parameter is the SQL text used
1061           to prepare the default "INSERT" statement.  This method is not
1062           called if you specify "--bulk-insert".
1063
1064           If no value is returned, the default "INSERT" statement handle is
1065           used.
1066
1067           This method applies only to the plugin specified for "--dest", so
1068           if your plugin isn't doing what you expect, check that you've
1069           specified it for the destination and not the source.
1070
1071       custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql)
1072           If you've specified "--bulk-insert", this method is called just
1073           before the bulk insert, but after "before_bulk_insert()", and the
1074           arguments are different.
1075
1076           This method's return value etc is similar to the "custom_sth()"
1077           method.
1078
1079       after_finish()
1080           This method is called after mk-archiver exits the archiving loop,
1081           commits all database handles, closes "--file", and prints the final
1082           statistics, but before mk-archiver runs ANALYZE or OPTIMIZE (see
1083           "--analyze" and "--optimize").
1084
1085       If you specify a plugin for both "--source" and "--dest", mk-archiver
1086       constructs, calls before_begin(), and calls after_finish() on the two
1087       plugins in the order "--source", "--dest".
1088
1089       mk-archiver assumes it controls transactions, and that the plugin will
1090       NOT commit or roll back the database handle.  The database handle
1091       passed to the plugin's constructor is the same handle mk-archiver uses
1092       itself.  Remember that "--source" and "--dest" are separate handles.
1093
1094       A sample module might look like this:
1095
1096          package My::Module;
1097
1098          sub new {
1099             my ( $class, %args ) = @_;
1100             return bless(\%args, $class);
1101          }
1102
1103          sub before_begin {
1104             my ( $self, %args ) = @_;
1105             # Save column names for later
1106             $self->{cols} = $args{cols};
1107          }
1108
1109          sub is_archivable {
1110             my ( $self, %args ) = @_;
1111             # Do some advanced logic with $args{row}
1112             return 1;
1113          }
1114
1115          sub before_delete {} # Take no action
1116          sub before_insert {} # Take no action
1117          sub custom_sth    {} # Take no action
1118          sub after_finish  {} # Take no action
1119
1120          1;
1121

DOWNLOADING

1123       You can download Maatkit from Google Code at
1124       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1125       easily with a command like the following:
1126
1127          wget http://www.maatkit.org/get/toolname
1128          or
1129          wget http://www.maatkit.org/trunk/toolname
1130
1131       Where "toolname" can be replaced with the name (or fragment of a name)
1132       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1133       installation is needed.  The first URL gets the latest released version
1134       of the tool, and the second gets the latest trunk code from Subversion.
1135

ENVIRONMENT

1137       The environment variable "MKDEBUG" enables verbose debugging output in
1138       all of the Maatkit tools:
1139
1140          MKDEBUG=1 mk-....
1141

SYSTEM REQUIREMENTS

1143       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1144       installed in any reasonably new version of Perl.
1145

BUGS

1147       For a list of known bugs see http://www.maatkit.org/bugs/mk-archiver
1148       <http://www.maatkit.org/bugs/mk-archiver>.
1149
1150       Please use Google Code Issues and Groups to report bugs or request
1151       support: <http://code.google.com/p/maatkit/>.  You can also join
1152       #maatkit on Freenode to discuss Maatkit.
1153
1154       Please include the complete command-line used to reproduce the problem
1155       you are seeing, the version of all MySQL servers involved, the complete
1156       output of the tool when run with "--version", and if possible,
1157       debugging output produced by running with the "MKDEBUG=1" environment
1158       variable.
1159

ACKNOWLEDGEMENTS

1161       Thanks to the following people, and apologies to anyone I've omitted:
1162
1163       Andrew O'Brien,
1164

COPYRIGHT, LICENSE AND WARRANTY

1166       This program is copyright 2007-2010 Baron Schwartz.  Feedback and
1167       improvements are welcome.
1168
1169       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1170       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1171       MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1172
1173       This program is free software; you can redistribute it and/or modify it
1174       under the terms of the GNU General Public License as published by the
1175       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1176       UNIX and similar systems, you can issue `man perlgpl' or `man
1177       perlartistic' to read these licenses.
1178
1179       You should have received a copy of the GNU General Public License along
1180       with this program; if not, write to the Free Software Foundation, Inc.,
1181       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
1182

AUTHOR

1184       Baron Schwartz
1185

ABOUT MAATKIT

1187       This tool is part of Maatkit, a toolkit for power users of MySQL.
1188       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1189       primary code contributors.  Both are employed by Percona.  Financial
1190       support for Maatkit development is primarily provided by Percona and
1191       its clients.
1192

VERSION

1194       This manual page documents Ver 1.0.23 Distrib 6839 $Revision: 6831 $.
1195
1196
1197
1198perl v5.12.1                      2010-08-01                    MK-ARCHIVER(1)
Impressum