mk-archiver(1p)

1MK-ARCHIVER(1)        User Contributed Perl Documentation       MK-ARCHIVER(1)
2
3
4

NAME

6       mk-archiver - Archive rows from a MySQL table into another table or a
7       file.
8

SYNOPSIS

10       Usage: mk-archiver [OPTION...] --source DSN --where WHERE
11
12       mk-archiver nibbles records from a MySQL table.  The --source and
13       --dest arguments use DSN syntax; if COPY is yes, --dest defaults to the
14       key's value from --source.
15
16       Examples:
17
18       Archive all rows from oltp_server to olap_server and to a file:
19
20         mk-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
21           --file '/var/log/archive/%Y-%m-%d-%D.%t'                           \
22           --where "1=1" --limit 1000 --commit-each
23
24       Purge (delete) orphan rows from child table:
25
26         mk-archiver --source h=host,D=db,t=child --purge \
27           --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'
28

RISKS

30       The following section is included to inform users about the potential
31       risks, whether known or unknown, of using this tool.  The two main
32       categories of risks are those created by the nature of the tool (e.g.
33       read-only tools vs. read-write tools) and those created by bugs.
34
35       mk-achiver is a read-write tool.  It deletes data from the source by
36       default, so you should test your archiving jobs with the "--dry-run"
37       option if you're not sure about them.  It is designed to have as little
38       impact on production systems as possible, but tuning with "--limit",
39       "--txn-size" and similar options might be a good idea too.
40
41       If you write or use "--plugin" modules, you should ensure they are good
42       quality and well-tested.
43
44       At the time of this release there is an unverified bug with
45       "--bulk-insert" that may cause data loss.
46
47       The authoritative source for updated information is always the online
48       issue tracking system.  Issues that affect this tool will be marked as
49       such.  You can see a list of such issues at the following URL:
50       <http://www.maatkit.org/bugs/mk-archiver>.
51
52       See also "BUGS" for more information on filing bugs and getting help.
53

DESCRIPTION

55       mk-archiver is the tool I use to archive tables as described in
56       <http://tinyurl.com/mysql-archiving>.  The goal is a low-impact,
57       forward-only job to nibble old data out of the table without impacting
58       OLTP queries much.  You can insert the data into another table, which
59       need not be on the same server.  You can also write it to a file in a
60       format suitable for LOAD DATA INFILE.  Or you can do neither, in which
61       case it's just an incremental DELETE.
62
63       mk-archiver is extensible via a plugin mechanism.  You can inject your
64       own code to add advanced archiving logic that could be useful for
65       archiving dependent data, applying complex business rules, or building
66       a data warehouse during the archiving process.
67
68       You need to choose values carefully for some options.  The most
69       important are "--limit", "--retries", and "--txn-size".
70
71       The strategy is to find the first row(s), then scan some index forward-
72       only to find more rows efficiently.  Each subsequent query should not
73       scan the entire table; it should seek into the index, then scan until
74       it finds more archivable rows.  Specifying the index with the 'i' part
75       of the "--source" argument can be crucial for this; use "--dry-run" to
76       examine the generated queries and be sure to EXPLAIN them to see if
77       they are efficient (most of the time you probably want to scan the
78       PRIMARY key, which is the default).  Even better, profile mk-archiver
79       with mk-query-profiler and make sure it is not scanning the whole table
80       every query.
81
82       You can disable the seek-then-scan optimizations partially or wholly
83       with "--no-ascend" and "--ascend-first".  Sometimes this may be more
84       efficient for multi-column keys.  Be aware that mk-archiver is built to
85       start at the beginning of the index it chooses and scan it forward-
86       only.  This might result in long table scans if you're trying to nibble
87       from the end of the table by an index other than the one it prefers.
88       See "--source" and read the documentation on the "i" part if this
89       applies to you.
90

OUTPUT

92       If you specify "--progress", the output is a header row, plus status
93       output at intervals.  Each row in the status output lists the current
94       date and time, how many seconds mk-archiver has been running, and how
95       many rows it has archived.
96
97       If you specify "--statistics", "mk-archiver" outputs timing and other
98       information to help you identify which part of your archiving process
99       takes the most time.
100

ERROR-HANDLING

102       mk-archiver tries to catch signals and exit gracefully; for example, if
103       you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the
104       signal, print a message about the signal, and exit fairly normally.  It
105       will not execute "--analyze" or "--optimize", because these may take a
106       long time to finish.  It will run all other code normally, including
107       calling after_finish() on any plugins (see "EXTENDING").
108
109       In other words, a signal, if caught, will break out of the main
110       archiving loop and skip optimize/analyze.
111

OPTIONS

113       Specify at least one of "--dest", "--file", or "--purge".
114
115       "--ignore" and "--replace" are mutually exclusive.
116
117       "--txn-size" and "--commit-each" are mutually exclusive.
118
119       "--low-priority-insert" and "--delayed-insert" are mutually exclusive.
120
121       "--share-lock" and "--for-update" are mutually exclusive.
122
123       "--analyze" and "--optimize" are mutually exclusive.
124
125       "--no-ascend" and "--no-delete" are mutually exclusive.
126
127       DSN values in "--dest" default to values from "--source" if COPY is
128       yes.
129
130       --analyze
131           type: string
132
133           Run ANALYZE TABLE afterwards on "--source" and/or "--dest".
134
135           Runs ANALYZE TABLE after finishing.  The argument is an arbitrary
136           string.  If it contains the letter 's', the source will be
137           analyzed.  If it contains 'd', the destination will be analyzed.
138           You can specify either or both.  For example, the following will
139           analyze both:
140
141             --analyze=ds
142
143           See <http://dev.mysql.com/doc/en/analyze-table.html> for details on
144           ANALYZE TABLE.
145
146       --ascend-first
147           Ascend only first column of index.
148
149           If you do want to use the ascending index optimization (see
150           "--no-ascend"), but do not want to incur the overhead of ascending
151           a large multi-column index, you can use this option to tell mk-
152           archiver to ascend only the leftmost column of the index.  This can
153           provide a significant performance boost over not ascending the
154           index at all, while avoiding the cost of ascending the whole index.
155
156           See "EXTENDING" for a discussion of how this interacts with
157           plugins.
158
159       --ask-pass
160           Prompt for a password when connecting to MySQL.
161
162       --buffer
163           Buffer output to "--file" and flush at commit.
164
165           Disables autoflushing to "--file" and flushes "--file" to disk only
166           when a transaction commits.  This typically means the file is
167           block-flushed by the operating system, so there may be some
168           implicit flushes to disk between commits as well.  The default is
169           to flush "--file" to disk after every row.
170
171           The danger is that a crash might cause lost data.
172
173           The performance increase I have seen from using "--buffer" is
174           around 5 to 15 percent.  Your mileage may vary.
175
176       --bulk-delete
177           Delete each chunk with a single statement (implies
178           "--commit-each").
179
180           Delete each chunk of rows in bulk with a single "DELETE" statement.
181           The statement deletes every row between the first and last row of
182           the chunk, inclusive.  It implies "--commit-each", since it would
183           be a bad idea to "INSERT" rows one at a time and commit them before
184           the bulk "DELETE".
185
186           The normal method is to delete every row by its primary key.  Bulk
187           deletes might be a lot faster.  They also might not be faster if
188           you have a complex "WHERE" clause.
189
190           This option completely defers all "DELETE" processing until the
191           chunk of rows is finished.  If you have a plugin on the source, its
192           "before_delete" method will not be called.  Instead, its
193           "before_bulk_delete" method is called later.
194
195           WARNING: if you have a plugin on the source that sometimes doesn't
196           return true from "is_archivable()", you should use this option only
197           if you understand what it does.  If the plugin instructs
198           "mk-archiver" not to archive a row, it will still be deleted by the
199           bulk delete!
200
201       --[no]bulk-delete-limit
202           default: yes
203
204           Add "--limit" to "--bulk-delete" statement.
205
206           This is an advanced option and you should not disable it unless you
207           know what you are doing and why!  By default, "--bulk-delete"
208           appends a "--limit" clause to the bulk delete SQL statement.  In
209           certain cases, this clause can be omitted by specifying
210           "--no-bulk-delete-limit".  "--limit" must still be specified.
211
212       --bulk-insert
213           Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete"
214           "--commit-each").
215
216           Insert each chunk of rows with "LOAD DATA LOCAL INFILE".  This may
217           be much faster than inserting a row at a time with "INSERT"
218           statements.  It is implemented by creating a temporary file for
219           each chunk of rows, and writing the rows to this file instead of
220           inserting them.  When the chunk is finished, it uploads the rows.
221
222           To protect the safety of your data, this option forces bulk deletes
223           to be used.  It would be unsafe to delete each row as it is found,
224           before inserting the rows into the destination first.  Forcing bulk
225           deletes guarantees that the deletion waits until the insertion is
226           successful.
227
228           The "--low-priority-insert", "--replace", and "--ignore" options
229           work with this option, but "--delayed-insert" does not.
230
231       --charset
232           short form: -A; type: string
233
234           Default character set.  If the value is utf8, sets Perl's binmode
235           on STDOUT to utf8, passes the mysql_enable_utf8 option to
236           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
237           other value sets binmode on STDOUT without the utf8 layer, and runs
238           SET NAMES after connecting to MySQL.
239
240           See also "--[no]check-charset".
241
242       --[no]check-charset
243           default: yes
244
245           Ensure connection and table character sets are the same.  Disabling
246           this check may cause text to be erroneously converted from one
247           character set to another (usually from utf8 to latin1) which may
248           cause data loss or mojibake.  Disabling this check may be useful or
249           necessary when character set conversions are intended.
250
251       --[no]check-columns
252           default: yes
253
254           Ensure "--source" and "--dest" have same columns.
255
256           Enabled by default; causes mk-archiver to check that the source and
257           destination tables have the same columns.  It does not check column
258           order, data type, etc.  It just checks that all columns in the
259           source exist in the destination and vice versa.  If there are any
260           differences, mk-archiver will exit with an error.
261
262           To disable this check, specify --no-check-columns.
263
264       --check-interval
265           type: time; default: 1s
266
267           How often to check for slave lag if "--check-slave-lag" is given.
268
269       --check-slave-lag
270           type: string
271
272           Pause archiving until the specified DSN's slave lag is less than
273           "--max-lag".
274
275       --columns
276           short form: -c; type: array
277
278           Comma-separated list of columns to archive.
279
280           Specify a comma-separated list of columns to fetch, write to the
281           file, and insert into the destination table.  If specified, mk-
282           archiver ignores other columns unless it needs to add them to the
283           "SELECT" statement for ascending an index or deleting rows.  It
284           fetches and uses these extra columns internally, but does not write
285           them to the file or to the destination table.  It does pass them to
286           plugins.
287
288           See also "--primary-key-only".
289
290       --commit-each
291           Commit each set of fetched and archived rows (disables
292           "--txn-size").
293
294           Commits transactions and flushes "--file" after each set of rows
295           has been archived, before fetching the next set of rows, and before
296           sleeping if "--sleep" is specified.  Disables "--txn-size"; use
297           "--limit" to control the transaction size with "--commit-each".
298
299           This option is useful as a shortcut to make "--limit" and
300           "--txn-size" the same value, but more importantly it avoids
301           transactions being held open while searching for more rows.  For
302           example, imagine you are archiving old rows from the beginning of a
303           very large table, with "--limit" 1000 and "--txn-size" 1000.  After
304           some period of finding and archiving 1000 rows at a time, mk-
305           archiver finds the last 999 rows and archives them, then executes
306           the next SELECT to find more rows.  This scans the rest of the
307           table, but never finds any more rows.  It has held open a
308           transaction for a very long time, only to determine it is finished
309           anyway.  You can use "--commit-each" to avoid this.
310
311       --config
312           type: Array
313
314           Read this comma-separated list of config files; if specified, this
315           must be the first option on the command line.
316
317       --delayed-insert
318           Add the DELAYED modifier to INSERT statements.
319
320           Adds the DELAYED modifier to INSERT or REPLACE statements.  See
321           <http://dev.mysql.com/doc/en/insert.html> for details.
322
323       --dest
324           type: DSN
325
326           DSN specifying the table to archive to.
327
328           This item specifies a table into which mk-archiver will insert rows
329           archived from "--source".  It uses the same key=val argument format
330           as "--source".  Most missing values default to the same values as
331           "--source", so you don't have to repeat options that are the same
332           in "--source" and "--dest".  Use the "--help" option to see which
333           values are copied from "--source".
334
335           WARNING: Using a default options file (F) DSN option that defines a
336           socket for "--source" causes mk-archiver to connect to "--dest"
337           using that socket unless another socket for "--dest" is specified.
338           This means that mk-archiver may incorrectly connect to "--source"
339           when it connects to "--dest".  For example:
340
341             --source F=host1.cnf,D=db,t=tbl --dest h=host2
342
343           When mk-archiver connects to "--dest", host2, it will connect via
344           the "--source", host1, socket defined in host1.cnf.
345
346       --dry-run
347           Print queries and exit without doing anything.
348
349           Causes mk-archiver to exit after printing the filename and SQL
350           statements it will use.
351
352       --file
353           type: string
354
355           File to archive to, with DATE_FORMAT()-like formatting.
356
357           Filename to write archived rows to.  A subset of MySQL's
358           DATE_FORMAT() formatting codes are allowed in the filename, as
359           follows:
360
361              %d    Day of the month, numeric (01..31)
362              %H    Hour (00..23)
363              %i    Minutes, numeric (00..59)
364              %m    Month, numeric (01..12)
365              %s    Seconds (00..59)
366              %Y    Year, numeric, four digits
367
368           You can use the following extra format codes too:
369
370              %D    Database name
371              %t    Table name
372
373           Example:
374
375              --file '/var/log/archive/%Y-%m-%d-%D.%t'
376
377           The file's contents are in the same format used by SELECT INTO
378           OUTFILE, as documented in the MySQL manual: rows terminated by
379           newlines, columns terminated by tabs, NULL characters are
380           represented by \N, and special characters are escaped by \.  This
381           lets you reload a file with LOAD DATA INFILE's default settings.
382
383           If you want a column header at the top of the file, see "--header".
384           The file is auto-flushed by default; see "--buffer".
385
386       --for-update
387           Adds the FOR UPDATE modifier to SELECT statements.
388
389           For details, see
390           <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
391
392       --header
393           Print column header at top of "--file".
394
395           Writes column names as the first line in the file given by
396           "--file".  If the file exists, does not write headers; this keeps
397           the file loadable with LOAD DATA INFILE in case you append more
398           output to it.
399
400       --help
401           Show help and exit.
402
403       --high-priority-select
404           Adds the HIGH_PRIORITY modifier to SELECT statements.
405
406           See <http://dev.mysql.com/doc/en/select.html> for details.
407
408       --host
409           short form: -h; type: string
410
411           Connect to host.
412
413       --ignore
414           Use IGNORE for INSERT statements.
415
416           Causes INSERTs into "--dest" to be INSERT IGNORE.
417
418       --limit
419           type: int; default: 1
420
421           Number of rows to fetch and archive per statement.
422
423           Limits the number of rows returned by the SELECT statements that
424           retrieve rows to archive.  Default is one row.  It may be more
425           efficient to increase the limit, but be careful if you are
426           archiving sparsely, skipping over many rows; this can potentially
427           cause more contention with other queries, depending on the storage
428           engine, transaction isolation level, and options such as
429           "--for-update".
430
431       --local
432           Do not write OPTIMIZE or ANALYZE queries to binlog.
433
434           Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE
435           queries.  See "--analyze" for details.
436
437       --low-priority-delete
438           Adds the LOW_PRIORITY modifier to DELETE statements.
439
440           See <http://dev.mysql.com/doc/en/delete.html> for details.
441
442       --low-priority-insert
443           Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
444
445           See <http://dev.mysql.com/doc/en/insert.html> for details.
446
447       --max-lag
448           type: time; default: 1s
449
450           Pause archiving if the slave given by "--check-slave-lag" lags.
451
452           This option causes mk-archiver to look at the slave every time it's
453           about to fetch another row.  If the slave's lag is greater than the
454           option's value, or if the slave isn't running (so its lag is NULL),
455           mk-table-checksum sleeps for "--check-interval" seconds and then
456           looks at the lag again.  It repeats until the slave is caught up,
457           then proceeds to fetch and archive the row.
458
459           This option may eliminate the need for "--sleep" or "--sleep-coef".
460
461       --no-ascend
462           Do not use ascending index optimization.
463
464           The default ascending-index optimization causes "mk-archiver" to
465           optimize repeated "SELECT" queries so they seek into the index
466           where the previous query ended, then scan along it, rather than
467           scanning from the beginning of the table every time.  This is
468           enabled by default because it is generally a good strategy for
469           repeated accesses.
470
471           Large, multiple-column indexes may cause the WHERE clause to be
472           complex enough that this could actually be less efficient.
473           Consider for example a four-column PRIMARY KEY on (a, b, c, d).
474           The WHERE clause to start where the last query ended is as follows:
475
476              WHERE (a > ?)
477                 OR (a = ? AND b > ?)
478                 OR (a = ? AND b = ? AND c > ?)
479                 OR (a = ? AND b = ? AND c = ? AND d >= ?)
480
481           Populating the placeholders with values uses memory and CPU, adds
482           network traffic and parsing overhead, and may make the query harder
483           for MySQL to optimize.  A four-column key isn't a big deal, but a
484           ten-column key in which every column allows "NULL" might be.
485
486           Ascending the index might not be necessary if you know you are
487           simply removing rows from the beginning of the table in chunks, but
488           not leaving any holes, so starting at the beginning of the table is
489           actually the most efficient thing to do.
490
491           See also "--ascend-first".  See "EXTENDING" for a discussion of how
492           this interacts with plugins.
493
494       --no-delete
495           Do not delete archived rows.
496
497           Causes "mk-archiver" not to delete rows after processing them.
498           This disallows "--no-ascend", because enabling them both would
499           cause an infinite loop.
500
501           If there is a plugin on the source DSN, its "before_delete" method
502           is called anyway, even though "mk-archiver" will not execute the
503           delete.  See "EXTENDING" for more on plugins.
504
505       --optimize
506           type: string
507
508           Run OPTIMIZE TABLE afterwards on "--source" and/or "--dest".
509
510           Runs OPTIMIZE TABLE after finishing.  See "--analyze" for the
511           option syntax and <http://dev.mysql.com/doc/en/optimize-table.html>
512           for details on OPTIMIZE TABLE.
513
514       --password
515           short form: -p; type: string
516
517           Password to use when connecting.
518
519       --pid
520           type: string
521
522           Create the given PID file when daemonized.  The file contains the
523           process ID of the daemonized instance.  The PID file is removed
524           when the daemonized instance exits.  The program checks for the
525           existence of the PID file when starting; if it exists and the
526           process with the matching PID exists, the program exits.
527
528       --plugin
529           type: string
530
531           Perl module name to use as a generic plugin.
532
533           Specify the Perl module name of a general-purpose plugin.  It is
534           currently used only for statistics (see "--statistics") and must
535           have "new()" and a "statistics()" method.
536
537           The "new( src =" $src, dst => $dst, opts => $o )> method gets the
538           source and destination DSNs, and their database connections, just
539           like the connection-specific plugins do.  It also gets an
540           OptionParser object ($o) for accessing command-line options
541           (example: "$o-"get('purge');>).
542
543           The "statistics(\%stats, $time)" method gets a hashref of the
544           statistics collected by the archiving job, and the time the whole
545           job started.
546
547       --port
548           short form: -P; type: int
549
550           Port number to use for connection.
551
552       --primary-key-only
553           Primary key columns only.
554
555           A shortcut for specifying "--columns" with the primary key columns.
556           This is an efficiency if you just want to purge rows; it avoids
557           fetching the entire row, when only the primary key columns are
558           needed for "DELETE" statements.  See also "--purge".
559
560       --progress
561           type: int
562
563           Print progress information every X rows.
564
565           Prints current time, elapsed time, and rows archived every X rows.
566
567       --purge
568           Purge instead of archiving; allows omitting "--file" and "--dest".
569
570           Allows archiving without a "--file" or "--dest" argument, which is
571           effectively a purge since the rows are just deleted.
572
573           If you just want to purge rows, consider specifying the table's
574           primary key columns with "--primary-key-only".  This will prevent
575           fetching all columns from the server for no reason.
576
577       --quick-delete
578           Adds the QUICK modifier to DELETE statements.
579
580           See <http://dev.mysql.com/doc/en/delete.html> for details.  As
581           stated in the documentation, in some cases it may be faster to use
582           DELETE QUICK followed by OPTIMIZE TABLE.  You can use "--optimize"
583           for this.
584
585       --quiet
586           short form: -q
587
588           Do not print any output, such as for "--statistics".
589
590           Suppresses normal output, including the output of "--statistics",
591           but doesn't suppress the output from "--why-quit".
592
593       --replace
594           Causes INSERTs into "--dest" to be written as REPLACE.
595
596       --retries
597           type: int; default: 1
598
599           Number of retries per timeout or deadlock.
600
601           Specifies the number of times mk-archiver should retry when there
602           is an InnoDB lock wait timeout or deadlock.  When retries are
603           exhausted, mk-archiver will exit with an error.
604
605           Consider carefully what you want to happen when you are archiving
606           between a mixture of transactional and non-transactional storage
607           engines.  The INSERT to "--dest" and DELETE from "--source" are on
608           separate connections, so they do not actually participate in the
609           same transaction even if they're on the same server.  However, mk-
610           archiver implements simple distributed transactions in code, so
611           commits and rollbacks should happen as desired across the two
612           connections.
613
614           At this time I have not written any code to handle errors with
615           transactional storage engines other than InnoDB.  Request that
616           feature if you need it.
617
618       --run-time
619           type: time
620
621           Time to run before exiting.
622
623           Optional suffix s=seconds, m=minutes, h=hours, d=days; if no
624           suffix, s is used.
625
626       --[no]safe-auto-increment
627           default: yes
628
629           Do not archive row with max AUTO_INCREMENT.
630
631           Adds an extra WHERE clause to prevent mk-archiver from removing the
632           newest row when ascending a single-column AUTO_INCREMENT key.  This
633           guards against re-using AUTO_INCREMENT values if the server
634           restarts, and is enabled by default.
635
636           The extra WHERE clause contains the maximum value of the auto-
637           increment column as of the beginning of the archive or purge job.
638           If new rows are inserted while mk-archiver is running, it will not
639           see them.
640
641       --sentinel
642           type: string; default: /tmp/mk-archiver-sentinel
643
644           Exit if this file exists.
645
646           The presence of the file specified by "--sentinel" will cause mk-
647           archiver to stop archiving and exit.  The default is
648           /tmp/mk-archiver-sentinel.  You might find this handy to stop cron
649           jobs gracefully if necessary.  See also "--stop".
650
651       --set-vars
652           type: string; default: wait_timeout=10000
653
654           Set these MySQL variables.
655
656           Specify any variables you want to be set immediately after
657           connecting to MySQL.  These will be included in a "SET" command.
658
659       --share-lock
660           Adds the LOCK IN SHARE MODE modifier to SELECT statements.
661
662           See <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
663
664       --skip-foreign-key-checks
665           Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
666
667       --sleep
668           type: int
669
670           Sleep time between fetches.
671
672           Specifies how long to sleep between SELECT statements.  Default is
673           not to sleep at all.  Transactions are NOT committed, and the
674           "--file" file is NOT flushed, before sleeping.  See "--txn-size" to
675           control that.
676
677           If "--commit-each" is specified, committing and flushing happens
678           before sleeping.
679
680       --sleep-coef
681           type: float
682
683           Calculate "--sleep" as a multiple of the last SELECT time.
684
685           If this option is specified, mk-archiver will sleep for the query
686           time of the last SELECT multiplied by the specified coefficient.
687
688           This is a slightly more sophisticated way to throttle the SELECTs:
689           sleep a varying amount of time between each SELECT, depending on
690           how long the SELECTs are taking.
691
692       --socket
693           short form: -S; type: string
694
695           Socket file to use for connection.
696
697       --source
698           type: DSN
699
700           DSN specifying the table to archive from (required).  This argument
701           is a DSN.  See "DSN OPTIONS" for the syntax.  Most options control
702           how mk-archiver connects to MySQL, but there are some extended DSN
703           options in this tool's syntax.  The D, t, and i options select a
704           table to archive:
705
706             --source h=my_server,D=my_database,t=my_tbl
707
708           The a option specifies the database to set as the connection's
709           default with USE.  If the b option is true, it disables binary
710           logging with SQL_LOG_BIN.  The m option specifies pluggable
711           actions, which an external Perl module can provide.  The only
712           required part is the table; other parts may be read from various
713           places in the environment (such as options files).
714
715           The 'i' part deserves special mention.  This tells mk-archiver
716           which index it should scan to archive.  This appears in a FORCE
717           INDEX or USE INDEX hint in the SELECT statements used to fetch
718           archivable rows.  If you don't specify anything, mk-archiver will
719           auto-discover a good index, preferring a "PRIMARY KEY" if one
720           exists.  In my experience this usually works well, so most of the
721           time you can probably just omit the 'i' part.
722
723           The index is used to optimize repeated accesses to the table; mk-
724           archiver remembers the last row it retrieves from each SELECT
725           statement, and uses it to construct a WHERE clause, using the
726           columns in the specified index, that should allow MySQL to start
727           the next SELECT where the last one ended, rather than potentially
728           scanning from the beginning of the table with each successive
729           SELECT.  If you are using external plugins, please see "EXTENDING"
730           for a discussion of how they interact with ascending indexes.
731
732           The 'a' and 'b' options allow you to control how statements flow
733           through the binary log.  If you specify the 'b' option, binary
734           logging will be disabled on the specified connection.  If you
735           specify the 'a' option, the connection will "USE" the specified
736           database, which you can use to prevent slaves from executing the
737           binary log events with "--replicate-ignore-db" options.  These two
738           options can be used as different methods to achieve the same goal:
739           archive data off the master, but leave it on the slave.  For
740           example, you can run a purge job on the master and prevent it from
741           happening on the slave using your method of choice.
742
743           WARNING: Using a default options file (F) DSN option that defines a
744           socket for "--source" causes mk-archiver to connect to "--dest"
745           using that socket unless another socket for "--dest" is specified.
746           This means that mk-archiver may incorrectly connect to "--source"
747           when it is meant to connect to "--dest".  For example:
748
749             --source F=host1.cnf,D=db,t=tbl --dest h=host2
750
751           When mk-archiver connects to "--dest", host2, it will connect via
752           the "--source", host1, socket defined in host1.cnf.
753
754       --statistics
755           Collect and print timing statistics.
756
757           Causes mk-archiver to collect timing statistics about what it does.
758           These statistics are available to the plugin specified by
759           "--plugin"
760
761           Unless you specify "--quiet", "mk-archiver" prints the statistics
762           when it exits.  The statistics look like this:
763
764            Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
765            Source: D=db,t=table
766            SELECT 4
767            INSERT 4
768            DELETE 4
769            Action         Count       Time        Pct
770            commit            10     0.1079      88.27
771            select             5     0.0047       3.87
772            deleting           4     0.0028       2.29
773            inserting          4     0.0028       2.28
774            other              0     0.0040       3.29
775
776           The first two (or three) lines show times and the source and
777           destination tables.  The next three lines show how many rows were
778           fetched, inserted, and deleted.
779
780           The remaining lines show counts and timing.  The columns are the
781           action, the total number of times that action was timed, the total
782           time it took, and the percent of the program's total runtime.  The
783           rows are sorted in order of descending total time.  The last row is
784           the rest of the time not explicitly attributed to anything.
785           Actions will vary depending on command-line options.
786
787           If "--why-quit" is given, its behavior is changed slightly.  This
788           option causes it to print the reason for exiting even when it's
789           just because there are no more rows.
790
791           This option requires the standard Time::HiRes module, which is part
792           of core Perl on reasonably new Perl releases.
793
794       --stop
795           Stop running instances by creating the sentinel file.
796
797           Causes mk-archiver to create the sentinel file specified by
798           "--sentinel" and exit.  This should have the effect of stopping all
799           running instances which are watching the same sentinel file.
800
801       --txn-size
802           type: int; default: 1
803
804           Number of rows per transaction.
805
806           Specifies the size, in number of rows, of each transaction. Zero
807           disables transactions altogether.  After mk-archiver processes this
808           many rows, it commits both the "--source" and the "--dest" if
809           given, and flushes the file given by "--file".
810
811           This parameter is critical to performance.  If you are archiving
812           from a live server, which for example is doing heavy OLTP work, you
813           need to choose a good balance between transaction size and commit
814           overhead.  Larger transactions create the possibility of more lock
815           contention and deadlocks, but smaller transactions cause more
816           frequent commit overhead, which can be significant.  To give an
817           idea, on a small test set I worked with while writing mk-archiver,
818           a value of 500 caused archiving to take about 2 seconds per 1000
819           rows on an otherwise quiet MySQL instance on my desktop machine,
820           archiving to disk and to another table.  Disabling transactions
821           with a value of zero, which turns on autocommit, dropped
822           performance to 38 seconds per thousand rows.
823
824           If you are not archiving from or to a transactional storage engine,
825           you may want to disable transactions so mk-archiver doesn't try to
826           commit.
827
828       --user
829           short form: -u; type: string
830
831           User for login if not current user.
832
833       --version
834           Show version and exit.
835
836       --where
837           type: string
838
839           WHERE clause to limit which rows to archive (required).
840
841           Specifies a WHERE clause to limit which rows are archived.  Do not
842           include the word WHERE.  You may need to quote the argument to
843           prevent your shell from interpreting it.  For example:
844
845              --where 'ts < current_date - interval 90 day'
846
847           For safety, "--where" is required.  If you do not require a WHERE
848           clause, use "--where" 1=1.
849
850       --why-quit
851           Print reason for exiting unless rows exhausted.
852
853           Causes mk-archiver to print a message if it exits for any reason
854           other than running out of rows to archive.  This can be useful if
855           you have a cron job with "--run-time" specified, for example, and
856           you want to be sure mk-archiver is finishing before running out of
857           time.
858
859           If "--statistics" is given, the behavior is changed slightly.  It
860           will print the reason for exiting even when it's just because there
861           are no more rows.
862
863           This output prints even if "--quiet" is given.  That's so you can
864           put "mk-archiver" in a "cron" job and get an email if there's an
865           abnormal exit.
866

DSN OPTIONS

868       These DSN options are used to create a DSN.  Each option is given like
869       "option=value".  The options are case-sensitive, so P and p are not the
870       same option.  There cannot be whitespace before or after the "=" and if
871       the value contains whitespace it must be quoted.  DSN options are
872       comma-separated.  See the maatkit manpage for full details.
873
874       •   a
875
876           copy: no
877
878           Database to USE when executing queries.
879
880       •   A
881
882           dsn: charset; copy: yes
883
884           Default character set.
885
886       •   b
887
888           copy: no
889
890           If true, disable binlog with SQL_LOG_BIN.
891
892       •   D
893
894           dsn: database; copy: yes
895
896           Database that contains the table.
897
898       •   F
899
900           dsn: mysql_read_default_file; copy: yes
901
902           Only read default options from the given file
903
904       •   h
905
906           dsn: host; copy: yes
907
908           Connect to host.
909
910       •   i
911
912           copy: yes
913
914           Index to use.
915
916       •   m
917
918           copy: no
919
920           Plugin module name.
921
922       •   p
923
924           dsn: password; copy: yes
925
926           Password to use when connecting.
927
928       •   P
929
930           dsn: port; copy: yes
931
932           Port number to use for connection.
933
934       •   S
935
936           dsn: mysql_socket; copy: yes
937
938           Socket file to use for connection.
939
940       •   t
941
942           copy: yes
943
944           Table to archive from/to.
945
946       •   u
947
948           dsn: user; copy: yes
949
950           User for login if not current user.
951

EXTENDING

953       mk-archiver is extensible by plugging in external Perl modules to
954       handle some logic and/or actions.  You can specify a module for both
955       the "--source" and the "--dest", with the 'm' part of the
956       specification.  For example:
957
958          --source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
959
960       This will cause mk-archiver to load the My::Module1 and My::Module2
961       packages, create instances of them, and then make calls to them during
962       the archiving process.
963
964       You can also specify a plugin with "--plugin".
965
966       The module must provide this interface:
967
968       new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
969           The plugin's constructor is passed a reference to the database
970           handle, the database name, and table name.  The plugin is created
971           just after mk-archiver opens the connection, and before it examines
972           the table given in the arguments.  This gives the plugin a chance
973           to create and populate temporary tables, or do other setup work.
974
975       before_begin(cols => \@cols, allcols => \@allcols)
976           This method is called just before mk-archiver begins iterating
977           through rows and archiving them, but after it does all other setup
978           work (examining table structures, designing SQL queries, and so
979           on).  This is the only time mk-archiver tells the plugin column
980           names for the rows it will pass the plugin while archiving.
981
982           The "cols" argument is the column names the user requested to be
983           archived, either by default or by the "--columns" option.  The
984           "allcols" argument is the list of column names for every row mk-
985           archiver will fetch from the source table.  It may fetch more
986           columns than the user requested, because it needs some columns for
987           its own use.  When subsequent plugin functions receive a row, it is
988           the full row containing all the extra columns, if any, added to the
989           end.
990
991       is_archivable(row => \@row)
992           This method is called for each row to determine whether it is
993           archivable.  This applies only to "--source".  The argument is the
994           row itself, as an arrayref.  If the method returns true, the row
995           will be archived; otherwise it will be skipped.
996
997           Skipping a row adds complications for non-unique indexes.  Normally
998           mk-archiver uses a WHERE clause designed to target the last
999           processed row as the place to start the scan for the next SELECT
1000           statement.  If you have skipped the row by returning false from
1001           is_archivable(), mk-archiver could get into an infinite loop
1002           because the row still exists.  Therefore, when you specify a plugin
1003           for the "--source" argument, mk-archiver will change its WHERE
1004           clause slightly.  Instead of starting at "greater than or equal to"
1005           the last processed row, it will start "strictly greater than."
1006           This will work fine on unique indexes such as primary keys, but it
1007           may skip rows (leave holes) on non-unique indexes or when ascending
1008           only the first column of an index.
1009
1010           "mk-archiver" will change the clause in the same way if you specify
1011           "--no-delete", because again an infinite loop is possible.
1012
1013           If you specify the "--bulk-delete" option and return false from
1014           this method, "mk-archiver" may not do what you want.  The row won't
1015           be archived, but it will be deleted, since bulk deletes operate on
1016           ranges of rows and don't know which rows the plugin selected to
1017           keep.
1018
1019           If you specify the "--bulk-insert" option, this method's return
1020           value will influence whether the row is written to the temporary
1021           file for the bulk insert, so bulk inserts will work as expected.
1022           However, bulk inserts require bulk deletes.
1023
1024       before_delete(row => \@row)
1025           This method is called for each row just before it is deleted.  This
1026           applies only to "--source".  This is a good place for you to handle
1027           dependencies, such as deleting things that are foreign-keyed to the
1028           row you are about to delete.  You could also use this to
1029           recursively archive all dependent tables.
1030
1031           This plugin method is called even if "--no-delete" is given, but
1032           not if "--bulk-delete" is given.
1033
1034       before_bulk_delete(first_row => \@row, last_row => \@row)
1035           This method is called just before a bulk delete is executed.  It is
1036           similar to the "before_delete" method, except its arguments are the
1037           first and last row of the range to be deleted.  It is called even
1038           if "--no-delete" is given.
1039
1040       before_insert(row => \@row)
1041           This method is called for each row just before it is inserted.
1042           This applies only to "--dest".  You could use this to insert the
1043           row into multiple tables, perhaps with an ON DUPLICATE KEY UPDATE
1044           clause to build summary tables in a data warehouse.
1045
1046           This method is not called if "--bulk-insert" is given.
1047
1048       before_bulk_insert(first_row => \@row, last_row => \@row, filename =>
1049       bulk_insert_filename)
1050           This method is called just before a bulk insert is executed.  It is
1051           similar to the "before_insert" method, except its arguments are the
1052           first and last row of the range to be deleted.
1053
1054       custom_sth(row => \@row, sql => $sql)
1055           This method is called just before inserting the row, but after
1056           "before_insert()".  It allows the plugin to specify different
1057           "INSERT" statement if desired.  The return value (if any) should be
1058           a DBI statement handle.  The "sql" parameter is the SQL text used
1059           to prepare the default "INSERT" statement.  This method is not
1060           called if you specify "--bulk-insert".
1061
1062           If no value is returned, the default "INSERT" statement handle is
1063           used.
1064
1065           This method applies only to the plugin specified for "--dest", so
1066           if your plugin isn't doing what you expect, check that you've
1067           specified it for the destination and not the source.
1068
1069       custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql,
1070       filename => $bulk_insert_filename)
1071           If you've specified "--bulk-insert", this method is called just
1072           before the bulk insert, but after "before_bulk_insert()", and the
1073           arguments are different.
1074
1075           This method's return value etc is similar to the "custom_sth()"
1076           method.
1077
1078       after_finish()
1079           This method is called after mk-archiver exits the archiving loop,
1080           commits all database handles, closes "--file", and prints the final
1081           statistics, but before mk-archiver runs ANALYZE or OPTIMIZE (see
1082           "--analyze" and "--optimize").
1083
1084       If you specify a plugin for both "--source" and "--dest", mk-archiver
1085       constructs, calls before_begin(), and calls after_finish() on the two
1086       plugins in the order "--source", "--dest".
1087
1088       mk-archiver assumes it controls transactions, and that the plugin will
1089       NOT commit or roll back the database handle.  The database handle
1090       passed to the plugin's constructor is the same handle mk-archiver uses
1091       itself.  Remember that "--source" and "--dest" are separate handles.
1092
1093       A sample module might look like this:
1094
1095          package My::Module;
1096
1097          sub new {
1098             my ( $class, %args ) = @_;
1099             return bless(\%args, $class);
1100          }
1101
1102          sub before_begin {
1103             my ( $self, %args ) = @_;
1104             # Save column names for later
1105             $self->{cols} = $args{cols};
1106          }
1107
1108          sub is_archivable {
1109             my ( $self, %args ) = @_;
1110             # Do some advanced logic with $args{row}
1111             return 1;
1112          }
1113
1114          sub before_delete {} # Take no action
1115          sub before_insert {} # Take no action
1116          sub custom_sth    {} # Take no action
1117          sub after_finish  {} # Take no action
1118
1119          1;
1120

DOWNLOADING

1122       You can download Maatkit from Google Code at
1123       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1124       easily with a command like the following:
1125
1126          wget http://www.maatkit.org/get/toolname
1127          or
1128          wget http://www.maatkit.org/trunk/toolname
1129
1130       Where "toolname" can be replaced with the name (or fragment of a name)
1131       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1132       installation is needed.  The first URL gets the latest released version
1133       of the tool, and the second gets the latest trunk code from Subversion.
1134

ENVIRONMENT

1136       The environment variable "MKDEBUG" enables verbose debugging output in
1137       all of the Maatkit tools:
1138
1139          MKDEBUG=1 mk-....
1140

SYSTEM REQUIREMENTS

1142       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1143       installed in any reasonably new version of Perl.
1144

BUGS

1146       For a list of known bugs see <http://www.maatkit.org/bugs/mk-archiver>.
1147
1148       Please use Google Code Issues and Groups to report bugs or request
1149       support: <http://code.google.com/p/maatkit/>.  You can also join
1150       #maatkit on Freenode to discuss Maatkit.
1151
1152       Please include the complete command-line used to reproduce the problem
1153       you are seeing, the version of all MySQL servers involved, the complete
1154       output of the tool when run with "--version", and if possible,
1155       debugging output produced by running with the "MKDEBUG=1" environment
1156       variable.
1157

ACKNOWLEDGMENTS

1159       Thanks to the following people, and apologies to anyone I've omitted:
1160
1161       Andrew O'Brien,
1162

COPYRIGHT, LICENSE AND WARRANTY

1164       This program is copyright 2007-2011 Baron Schwartz.  Feedback and
1165       improvements are welcome.
1166
1167       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1168       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1169       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1170
1171       This program is free software; you can redistribute it and/or modify it
1172       under the terms of the GNU General Public License as published by the
1173       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1174       UNIX and similar systems, you can issue `man perlgpl' or `man
1175       perlartistic' to read these licenses.
1176
1177       You should have received a copy of the GNU General Public License along
1178       with this program; if not, write to the Free Software Foundation, Inc.,
1179       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
1180

AUTHOR

1182       Baron Schwartz
1183

ABOUT MAATKIT

1185       This tool is part of Maatkit, a toolkit for power users of MySQL.
1186       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1187       primary code contributors.  Both are employed by Percona.  Financial
1188       support for Maatkit development is primarily provided by Percona and
1189       its clients.
1190

VERSION

1192       This manual page documents Ver 1.0.27 Distrib 7540 $Revision: 7531 $.
1193
1194
1195
1196perl v5.32.1                      2021-01-26                    MK-ARCHIVER(1)