1MK-ARCHIVER(1) User Contributed Perl Documentation MK-ARCHIVER(1)
2
3
4
6 mk-archiver - Archive rows from a MySQL table into another table or a
7 file.
8
10 Usage: mk-archiver [OPTION...] --source DSN --where WHERE
11
12 mk-archiver nibbles records from a MySQL table. The --source and
13 --dest arguments use DSN syntax; if COPY is yes, --dest defaults to the
14 key's value from --source.
15
16 Examples:
17
18 Archive all rows from oltp_server to olap_server and to a file:
19
20 mk-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
21 --file '/var/log/archive/%Y-%m-%d-%D.%t' \
22 --where "1=1" --limit 1000 --commit-each
23
24 Purge (delete) orphan rows from child table:
25
26 mk-archiver --source h=host,D=db,t=child --purge \
27 --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'
28
30 The following section is included to inform users about the potential
31 risks, whether known or unknown, of using this tool. The two main
32 categories of risks are those created by the nature of the tool (e.g.
33 read-only tools vs. read-write tools) and those created by bugs.
34
35 mk-achiver is a read-write tool. It deletes data from the source by
36 default, so you should test your archiving jobs with the "--dry-run"
37 option if you're not sure about them. It is designed to have as little
38 impact on production systems as possible, but tuning with "--limit",
39 "--txn-size" and similar options might be a good idea too.
40
41 If you write or use "--plugin" modules, you should ensure they are good
42 quality and well-tested.
43
44 At the time of this release there is an unverified bug with
45 "--bulk-insert" that may cause data loss.
46
47 The authoritative source for updated information is always the online
48 issue tracking system. Issues that affect this tool will be marked as
49 such. You can see a list of such issues at the following URL:
50 <http://www.maatkit.org/bugs/mk-archiver>.
51
52 See also "BUGS" for more information on filing bugs and getting help.
53
55 mk-archiver is the tool I use to archive tables as described in
56 <http://tinyurl.com/mysql-archiving>. The goal is a low-impact,
57 forward-only job to nibble old data out of the table without impacting
58 OLTP queries much. You can insert the data into another table, which
59 need not be on the same server. You can also write it to a file in a
60 format suitable for LOAD DATA INFILE. Or you can do neither, in which
61 case it's just an incremental DELETE.
62
63 mk-archiver is extensible via a plugin mechanism. You can inject your
64 own code to add advanced archiving logic that could be useful for
65 archiving dependent data, applying complex business rules, or building
66 a data warehouse during the archiving process.
67
68 You need to choose values carefully for some options. The most
69 important are "--limit", "--retries", and "--txn-size".
70
71 The strategy is to find the first row(s), then scan some index forward-
72 only to find more rows efficiently. Each subsequent query should not
73 scan the entire table; it should seek into the index, then scan until
74 it finds more archivable rows. Specifying the index with the 'i' part
75 of the "--source" argument can be crucial for this; use "--dry-run" to
76 examine the generated queries and be sure to EXPLAIN them to see if
77 they are efficient (most of the time you probably want to scan the
78 PRIMARY key, which is the default). Even better, profile mk-archiver
79 with mk-query-profiler and make sure it is not scanning the whole table
80 every query.
81
82 You can disable the seek-then-scan optimizations partially or wholly
83 with "--no-ascend" and "--ascend-first". Sometimes this may be more
84 efficient for multi-column keys. Be aware that mk-archiver is built to
85 start at the beginning of the index it chooses and scan it forward-
86 only. This might result in long table scans if you're trying to nibble
87 from the end of the table by an index other than the one it prefers.
88 See "--source" and read the documentation on the "i" part if this
89 applies to you.
90
92 If you specify "--progress", the output is a header row, plus status
93 output at intervals. Each row in the status output lists the current
94 date and time, how many seconds mk-archiver has been running, and how
95 many rows it has archived.
96
97 If you specify "--statistics", "mk-archiver" outputs timing and other
98 information to help you identify which part of your archiving process
99 takes the most time.
100
102 mk-archiver tries to catch signals and exit gracefully; for example, if
103 you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the
104 signal, print a message about the signal, and exit fairly normally. It
105 will not execute "--analyze" or "--optimize", because these may take a
106 long time to finish. It will run all other code normally, including
107 calling after_finish() on any plugins (see "EXTENDING").
108
109 In other words, a signal, if caught, will break out of the main
110 archiving loop and skip optimize/analyze.
111
113 Specify at least one of "--dest", "--file", or "--purge".
114
115 "--ignore" and "--replace" are mutually exclusive.
116
117 "--txn-size" and "--commit-each" are mutually exclusive.
118
119 "--low-priority-insert" and "--delayed-insert" are mutually exclusive.
120
121 "--share-lock" and "--for-update" are mutually exclusive.
122
123 "--analyze" and "--optimize" are mutually exclusive.
124
125 "--no-ascend" and "--no-delete" are mutually exclusive.
126
127 DSN values in "--dest" default to values from "--source" if COPY is
128 yes.
129
130 --analyze
131 type: string
132
133 Run ANALYZE TABLE afterwards on "--source" and/or "--dest".
134
135 Runs ANALYZE TABLE after finishing. The argument is an arbitrary
136 string. If it contains the letter 's', the source will be
137 analyzed. If it contains 'd', the destination will be analyzed.
138 You can specify either or both. For example, the following will
139 analyze both:
140
141 --analyze=ds
142
143 See <http://dev.mysql.com/doc/en/analyze-table.html> for details on
144 ANALYZE TABLE.
145
146 --ascend-first
147 Ascend only first column of index.
148
149 If you do want to use the ascending index optimization (see
150 "--no-ascend"), but do not want to incur the overhead of ascending
151 a large multi-column index, you can use this option to tell mk-
152 archiver to ascend only the leftmost column of the index. This can
153 provide a significant performance boost over not ascending the
154 index at all, while avoiding the cost of ascending the whole index.
155
156 See "EXTENDING" for a discussion of how this interacts with
157 plugins.
158
159 --ask-pass
160 Prompt for a password when connecting to MySQL.
161
162 --buffer
163 Buffer output to "--file" and flush at commit.
164
165 Disables autoflushing to "--file" and flushes "--file" to disk only
166 when a transaction commits. This typically means the file is
167 block-flushed by the operating system, so there may be some
168 implicit flushes to disk between commits as well. The default is
169 to flush "--file" to disk after every row.
170
171 The danger is that a crash might cause lost data.
172
173 The performance increase I have seen from using "--buffer" is
174 around 5 to 15 percent. Your mileage may vary.
175
176 --bulk-delete
177 Delete each chunk with a single statement (implies
178 "--commit-each").
179
180 Delete each chunk of rows in bulk with a single "DELETE" statement.
181 The statement deletes every row between the first and last row of
182 the chunk, inclusive. It implies "--commit-each", since it would
183 be a bad idea to "INSERT" rows one at a time and commit them before
184 the bulk "DELETE".
185
186 The normal method is to delete every row by its primary key. Bulk
187 deletes might be a lot faster. They also might not be faster if
188 you have a complex "WHERE" clause.
189
190 This option completely defers all "DELETE" processing until the
191 chunk of rows is finished. If you have a plugin on the source, its
192 "before_delete" method will not be called. Instead, its
193 "before_bulk_delete" method is called later.
194
195 WARNING: if you have a plugin on the source that sometimes doesn't
196 return true from "is_archivable()", you should use this option only
197 if you understand what it does. If the plugin instructs
198 "mk-archiver" not to archive a row, it will still be deleted by the
199 bulk delete!
200
201 --[no]bulk-delete-limit
202 default: yes
203
204 Add "--limit" to "--bulk-delete" statement.
205
206 This is an advanced option and you should not disable it unless you
207 know what you are doing and why! By default, "--bulk-delete"
208 appends a "--limit" clause to the bulk delete SQL statement. In
209 certain cases, this clause can be omitted by specifying
210 "--no-bulk-delete-limit". "--limit" must still be specified.
211
212 --bulk-insert
213 Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete"
214 "--commit-each").
215
216 Insert each chunk of rows with "LOAD DATA LOCAL INFILE". This may
217 be much faster than inserting a row at a time with "INSERT"
218 statements. It is implemented by creating a temporary file for
219 each chunk of rows, and writing the rows to this file instead of
220 inserting them. When the chunk is finished, it uploads the rows.
221
222 To protect the safety of your data, this option forces bulk deletes
223 to be used. It would be unsafe to delete each row as it is found,
224 before inserting the rows into the destination first. Forcing bulk
225 deletes guarantees that the deletion waits until the insertion is
226 successful.
227
228 The "--low-priority-insert", "--replace", and "--ignore" options
229 work with this option, but "--delayed-insert" does not.
230
231 --charset
232 short form: -A; type: string
233
234 Default character set. If the value is utf8, sets Perl's binmode
235 on STDOUT to utf8, passes the mysql_enable_utf8 option to
236 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
237 other value sets binmode on STDOUT without the utf8 layer, and runs
238 SET NAMES after connecting to MySQL.
239
240 See also "--[no]check-charset".
241
242 --[no]check-charset
243 default: yes
244
245 Ensure connection and table character sets are the same. Disabling
246 this check may cause text to be erroneously converted from one
247 character set to another (usually from utf8 to latin1) which may
248 cause data loss or mojibake. Disabling this check may be useful or
249 necessary when character set conversions are intended.
250
251 --[no]check-columns
252 default: yes
253
254 Ensure "--source" and "--dest" have same columns.
255
256 Enabled by default; causes mk-archiver to check that the source and
257 destination tables have the same columns. It does not check column
258 order, data type, etc. It just checks that all columns in the
259 source exist in the destination and vice versa. If there are any
260 differences, mk-archiver will exit with an error.
261
262 To disable this check, specify --no-check-columns.
263
264 --check-interval
265 type: time; default: 1s
266
267 How often to check for slave lag if "--check-slave-lag" is given.
268
269 --check-slave-lag
270 type: string
271
272 Pause archiving until the specified DSN's slave lag is less than
273 "--max-lag".
274
275 --columns
276 short form: -c; type: array
277
278 Comma-separated list of columns to archive.
279
280 Specify a comma-separated list of columns to fetch, write to the
281 file, and insert into the destination table. If specified, mk-
282 archiver ignores other columns unless it needs to add them to the
283 "SELECT" statement for ascending an index or deleting rows. It
284 fetches and uses these extra columns internally, but does not write
285 them to the file or to the destination table. It does pass them to
286 plugins.
287
288 See also "--primary-key-only".
289
290 --commit-each
291 Commit each set of fetched and archived rows (disables
292 "--txn-size").
293
294 Commits transactions and flushes "--file" after each set of rows
295 has been archived, before fetching the next set of rows, and before
296 sleeping if "--sleep" is specified. Disables "--txn-size"; use
297 "--limit" to control the transaction size with "--commit-each".
298
299 This option is useful as a shortcut to make "--limit" and
300 "--txn-size" the same value, but more importantly it avoids
301 transactions being held open while searching for more rows. For
302 example, imagine you are archiving old rows from the beginning of a
303 very large table, with "--limit" 1000 and "--txn-size" 1000. After
304 some period of finding and archiving 1000 rows at a time, mk-
305 archiver finds the last 999 rows and archives them, then executes
306 the next SELECT to find more rows. This scans the rest of the
307 table, but never finds any more rows. It has held open a
308 transaction for a very long time, only to determine it is finished
309 anyway. You can use "--commit-each" to avoid this.
310
311 --config
312 type: Array
313
314 Read this comma-separated list of config files; if specified, this
315 must be the first option on the command line.
316
317 --delayed-insert
318 Add the DELAYED modifier to INSERT statements.
319
320 Adds the DELAYED modifier to INSERT or REPLACE statements. See
321 <http://dev.mysql.com/doc/en/insert.html> for details.
322
323 --dest
324 type: DSN
325
326 DSN specifying the table to archive to.
327
328 This item specifies a table into which mk-archiver will insert rows
329 archived from "--source". It uses the same key=val argument format
330 as "--source". Most missing values default to the same values as
331 "--source", so you don't have to repeat options that are the same
332 in "--source" and "--dest". Use the "--help" option to see which
333 values are copied from "--source".
334
335 WARNING: Using a default options file (F) DSN option that defines a
336 socket for "--source" causes mk-archiver to connect to "--dest"
337 using that socket unless another socket for "--dest" is specified.
338 This means that mk-archiver may incorrectly connect to "--source"
339 when it connects to "--dest". For example:
340
341 --source F=host1.cnf,D=db,t=tbl --dest h=host2
342
343 When mk-archiver connects to "--dest", host2, it will connect via
344 the "--source", host1, socket defined in host1.cnf.
345
346 --dry-run
347 Print queries and exit without doing anything.
348
349 Causes mk-archiver to exit after printing the filename and SQL
350 statements it will use.
351
352 --file
353 type: string
354
355 File to archive to, with DATE_FORMAT()-like formatting.
356
357 Filename to write archived rows to. A subset of MySQL's
358 DATE_FORMAT() formatting codes are allowed in the filename, as
359 follows:
360
361 %d Day of the month, numeric (01..31)
362 %H Hour (00..23)
363 %i Minutes, numeric (00..59)
364 %m Month, numeric (01..12)
365 %s Seconds (00..59)
366 %Y Year, numeric, four digits
367
368 You can use the following extra format codes too:
369
370 %D Database name
371 %t Table name
372
373 Example:
374
375 --file '/var/log/archive/%Y-%m-%d-%D.%t'
376
377 The file's contents are in the same format used by SELECT INTO
378 OUTFILE, as documented in the MySQL manual: rows terminated by
379 newlines, columns terminated by tabs, NULL characters are
380 represented by \N, and special characters are escaped by \. This
381 lets you reload a file with LOAD DATA INFILE's default settings.
382
383 If you want a column header at the top of the file, see "--header".
384 The file is auto-flushed by default; see "--buffer".
385
386 --for-update
387 Adds the FOR UPDATE modifier to SELECT statements.
388
389 For details, see
390 <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
391
392 --header
393 Print column header at top of "--file".
394
395 Writes column names as the first line in the file given by
396 "--file". If the file exists, does not write headers; this keeps
397 the file loadable with LOAD DATA INFILE in case you append more
398 output to it.
399
400 --help
401 Show help and exit.
402
403 --high-priority-select
404 Adds the HIGH_PRIORITY modifier to SELECT statements.
405
406 See <http://dev.mysql.com/doc/en/select.html> for details.
407
408 --host
409 short form: -h; type: string
410
411 Connect to host.
412
413 --ignore
414 Use IGNORE for INSERT statements.
415
416 Causes INSERTs into "--dest" to be INSERT IGNORE.
417
418 --limit
419 type: int; default: 1
420
421 Number of rows to fetch and archive per statement.
422
423 Limits the number of rows returned by the SELECT statements that
424 retrieve rows to archive. Default is one row. It may be more
425 efficient to increase the limit, but be careful if you are
426 archiving sparsely, skipping over many rows; this can potentially
427 cause more contention with other queries, depending on the storage
428 engine, transaction isolation level, and options such as
429 "--for-update".
430
431 --local
432 Do not write OPTIMIZE or ANALYZE queries to binlog.
433
434 Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE
435 queries. See "--analyze" for details.
436
437 --low-priority-delete
438 Adds the LOW_PRIORITY modifier to DELETE statements.
439
440 See <http://dev.mysql.com/doc/en/delete.html> for details.
441
442 --low-priority-insert
443 Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
444
445 See <http://dev.mysql.com/doc/en/insert.html> for details.
446
447 --max-lag
448 type: time; default: 1s
449
450 Pause archiving if the slave given by "--check-slave-lag" lags.
451
452 This option causes mk-archiver to look at the slave every time it's
453 about to fetch another row. If the slave's lag is greater than the
454 option's value, or if the slave isn't running (so its lag is NULL),
455 mk-table-checksum sleeps for "--check-interval" seconds and then
456 looks at the lag again. It repeats until the slave is caught up,
457 then proceeds to fetch and archive the row.
458
459 This option may eliminate the need for "--sleep" or "--sleep-coef".
460
461 --no-ascend
462 Do not use ascending index optimization.
463
464 The default ascending-index optimization causes "mk-archiver" to
465 optimize repeated "SELECT" queries so they seek into the index
466 where the previous query ended, then scan along it, rather than
467 scanning from the beginning of the table every time. This is
468 enabled by default because it is generally a good strategy for
469 repeated accesses.
470
471 Large, multiple-column indexes may cause the WHERE clause to be
472 complex enough that this could actually be less efficient.
473 Consider for example a four-column PRIMARY KEY on (a, b, c, d).
474 The WHERE clause to start where the last query ended is as follows:
475
476 WHERE (a > ?)
477 OR (a = ? AND b > ?)
478 OR (a = ? AND b = ? AND c > ?)
479 OR (a = ? AND b = ? AND c = ? AND d >= ?)
480
481 Populating the placeholders with values uses memory and CPU, adds
482 network traffic and parsing overhead, and may make the query harder
483 for MySQL to optimize. A four-column key isn't a big deal, but a
484 ten-column key in which every column allows "NULL" might be.
485
486 Ascending the index might not be necessary if you know you are
487 simply removing rows from the beginning of the table in chunks, but
488 not leaving any holes, so starting at the beginning of the table is
489 actually the most efficient thing to do.
490
491 See also "--ascend-first". See "EXTENDING" for a discussion of how
492 this interacts with plugins.
493
494 --no-delete
495 Do not delete archived rows.
496
497 Causes "mk-archiver" not to delete rows after processing them.
498 This disallows "--no-ascend", because enabling them both would
499 cause an infinite loop.
500
501 If there is a plugin on the source DSN, its "before_delete" method
502 is called anyway, even though "mk-archiver" will not execute the
503 delete. See "EXTENDING" for more on plugins.
504
505 --optimize
506 type: string
507
508 Run OPTIMIZE TABLE afterwards on "--source" and/or "--dest".
509
510 Runs OPTIMIZE TABLE after finishing. See "--analyze" for the
511 option syntax and <http://dev.mysql.com/doc/en/optimize-table.html>
512 for details on OPTIMIZE TABLE.
513
514 --password
515 short form: -p; type: string
516
517 Password to use when connecting.
518
519 --pid
520 type: string
521
522 Create the given PID file when daemonized. The file contains the
523 process ID of the daemonized instance. The PID file is removed
524 when the daemonized instance exits. The program checks for the
525 existence of the PID file when starting; if it exists and the
526 process with the matching PID exists, the program exits.
527
528 --plugin
529 type: string
530
531 Perl module name to use as a generic plugin.
532
533 Specify the Perl module name of a general-purpose plugin. It is
534 currently used only for statistics (see "--statistics") and must
535 have "new()" and a "statistics()" method.
536
537 The "new( src =" $src, dst => $dst, opts => $o )> method gets the
538 source and destination DSNs, and their database connections, just
539 like the connection-specific plugins do. It also gets an
540 OptionParser object ($o) for accessing command-line options
541 (example: "$o-"get('purge');>).
542
543 The "statistics(\%stats, $time)" method gets a hashref of the
544 statistics collected by the archiving job, and the time the whole
545 job started.
546
547 --port
548 short form: -P; type: int
549
550 Port number to use for connection.
551
552 --primary-key-only
553 Primary key columns only.
554
555 A shortcut for specifying "--columns" with the primary key columns.
556 This is an efficiency if you just want to purge rows; it avoids
557 fetching the entire row, when only the primary key columns are
558 needed for "DELETE" statements. See also "--purge".
559
560 --progress
561 type: int
562
563 Print progress information every X rows.
564
565 Prints current time, elapsed time, and rows archived every X rows.
566
567 --purge
568 Purge instead of archiving; allows omitting "--file" and "--dest".
569
570 Allows archiving without a "--file" or "--dest" argument, which is
571 effectively a purge since the rows are just deleted.
572
573 If you just want to purge rows, consider specifying the table's
574 primary key columns with "--primary-key-only". This will prevent
575 fetching all columns from the server for no reason.
576
577 --quick-delete
578 Adds the QUICK modifier to DELETE statements.
579
580 See <http://dev.mysql.com/doc/en/delete.html> for details. As
581 stated in the documentation, in some cases it may be faster to use
582 DELETE QUICK followed by OPTIMIZE TABLE. You can use "--optimize"
583 for this.
584
585 --quiet
586 short form: -q
587
588 Do not print any output, such as for "--statistics".
589
590 Suppresses normal output, including the output of "--statistics",
591 but doesn't suppress the output from "--why-quit".
592
593 --replace
594 Causes INSERTs into "--dest" to be written as REPLACE.
595
596 --retries
597 type: int; default: 1
598
599 Number of retries per timeout or deadlock.
600
601 Specifies the number of times mk-archiver should retry when there
602 is an InnoDB lock wait timeout or deadlock. When retries are
603 exhausted, mk-archiver will exit with an error.
604
605 Consider carefully what you want to happen when you are archiving
606 between a mixture of transactional and non-transactional storage
607 engines. The INSERT to "--dest" and DELETE from "--source" are on
608 separate connections, so they do not actually participate in the
609 same transaction even if they're on the same server. However, mk-
610 archiver implements simple distributed transactions in code, so
611 commits and rollbacks should happen as desired across the two
612 connections.
613
614 At this time I have not written any code to handle errors with
615 transactional storage engines other than InnoDB. Request that
616 feature if you need it.
617
618 --run-time
619 type: time
620
621 Time to run before exiting.
622
623 Optional suffix s=seconds, m=minutes, h=hours, d=days; if no
624 suffix, s is used.
625
626 --[no]safe-auto-increment
627 default: yes
628
629 Do not archive row with max AUTO_INCREMENT.
630
631 Adds an extra WHERE clause to prevent mk-archiver from removing the
632 newest row when ascending a single-column AUTO_INCREMENT key. This
633 guards against re-using AUTO_INCREMENT values if the server
634 restarts, and is enabled by default.
635
636 The extra WHERE clause contains the maximum value of the auto-
637 increment column as of the beginning of the archive or purge job.
638 If new rows are inserted while mk-archiver is running, it will not
639 see them.
640
641 --sentinel
642 type: string; default: /tmp/mk-archiver-sentinel
643
644 Exit if this file exists.
645
646 The presence of the file specified by "--sentinel" will cause mk-
647 archiver to stop archiving and exit. The default is
648 /tmp/mk-archiver-sentinel. You might find this handy to stop cron
649 jobs gracefully if necessary. See also "--stop".
650
651 --set-vars
652 type: string; default: wait_timeout=10000
653
654 Set these MySQL variables.
655
656 Specify any variables you want to be set immediately after
657 connecting to MySQL. These will be included in a "SET" command.
658
659 --share-lock
660 Adds the LOCK IN SHARE MODE modifier to SELECT statements.
661
662 See <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
663
664 --skip-foreign-key-checks
665 Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
666
667 --sleep
668 type: int
669
670 Sleep time between fetches.
671
672 Specifies how long to sleep between SELECT statements. Default is
673 not to sleep at all. Transactions are NOT committed, and the
674 "--file" file is NOT flushed, before sleeping. See "--txn-size" to
675 control that.
676
677 If "--commit-each" is specified, committing and flushing happens
678 before sleeping.
679
680 --sleep-coef
681 type: float
682
683 Calculate "--sleep" as a multiple of the last SELECT time.
684
685 If this option is specified, mk-archiver will sleep for the query
686 time of the last SELECT multiplied by the specified coefficient.
687
688 This is a slightly more sophisticated way to throttle the SELECTs:
689 sleep a varying amount of time between each SELECT, depending on
690 how long the SELECTs are taking.
691
692 --socket
693 short form: -S; type: string
694
695 Socket file to use for connection.
696
697 --source
698 type: DSN
699
700 DSN specifying the table to archive from (required). This argument
701 is a DSN. See "DSN OPTIONS" for the syntax. Most options control
702 how mk-archiver connects to MySQL, but there are some extended DSN
703 options in this tool's syntax. The D, t, and i options select a
704 table to archive:
705
706 --source h=my_server,D=my_database,t=my_tbl
707
708 The a option specifies the database to set as the connection's
709 default with USE. If the b option is true, it disables binary
710 logging with SQL_LOG_BIN. The m option specifies pluggable
711 actions, which an external Perl module can provide. The only
712 required part is the table; other parts may be read from various
713 places in the environment (such as options files).
714
715 The 'i' part deserves special mention. This tells mk-archiver
716 which index it should scan to archive. This appears in a FORCE
717 INDEX or USE INDEX hint in the SELECT statements used to fetch
718 archivable rows. If you don't specify anything, mk-archiver will
719 auto-discover a good index, preferring a "PRIMARY KEY" if one
720 exists. In my experience this usually works well, so most of the
721 time you can probably just omit the 'i' part.
722
723 The index is used to optimize repeated accesses to the table; mk-
724 archiver remembers the last row it retrieves from each SELECT
725 statement, and uses it to construct a WHERE clause, using the
726 columns in the specified index, that should allow MySQL to start
727 the next SELECT where the last one ended, rather than potentially
728 scanning from the beginning of the table with each successive
729 SELECT. If you are using external plugins, please see "EXTENDING"
730 for a discussion of how they interact with ascending indexes.
731
732 The 'a' and 'b' options allow you to control how statements flow
733 through the binary log. If you specify the 'b' option, binary
734 logging will be disabled on the specified connection. If you
735 specify the 'a' option, the connection will "USE" the specified
736 database, which you can use to prevent slaves from executing the
737 binary log events with "--replicate-ignore-db" options. These two
738 options can be used as different methods to achieve the same goal:
739 archive data off the master, but leave it on the slave. For
740 example, you can run a purge job on the master and prevent it from
741 happening on the slave using your method of choice.
742
743 WARNING: Using a default options file (F) DSN option that defines a
744 socket for "--source" causes mk-archiver to connect to "--dest"
745 using that socket unless another socket for "--dest" is specified.
746 This means that mk-archiver may incorrectly connect to "--source"
747 when it is meant to connect to "--dest". For example:
748
749 --source F=host1.cnf,D=db,t=tbl --dest h=host2
750
751 When mk-archiver connects to "--dest", host2, it will connect via
752 the "--source", host1, socket defined in host1.cnf.
753
754 --statistics
755 Collect and print timing statistics.
756
757 Causes mk-archiver to collect timing statistics about what it does.
758 These statistics are available to the plugin specified by
759 "--plugin"
760
761 Unless you specify "--quiet", "mk-archiver" prints the statistics
762 when it exits. The statistics look like this:
763
764 Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
765 Source: D=db,t=table
766 SELECT 4
767 INSERT 4
768 DELETE 4
769 Action Count Time Pct
770 commit 10 0.1079 88.27
771 select 5 0.0047 3.87
772 deleting 4 0.0028 2.29
773 inserting 4 0.0028 2.28
774 other 0 0.0040 3.29
775
776 The first two (or three) lines show times and the source and
777 destination tables. The next three lines show how many rows were
778 fetched, inserted, and deleted.
779
780 The remaining lines show counts and timing. The columns are the
781 action, the total number of times that action was timed, the total
782 time it took, and the percent of the program's total runtime. The
783 rows are sorted in order of descending total time. The last row is
784 the rest of the time not explicitly attributed to anything.
785 Actions will vary depending on command-line options.
786
787 If "--why-quit" is given, its behavior is changed slightly. This
788 option causes it to print the reason for exiting even when it's
789 just because there are no more rows.
790
791 This option requires the standard Time::HiRes module, which is part
792 of core Perl on reasonably new Perl releases.
793
794 --stop
795 Stop running instances by creating the sentinel file.
796
797 Causes mk-archiver to create the sentinel file specified by
798 "--sentinel" and exit. This should have the effect of stopping all
799 running instances which are watching the same sentinel file.
800
801 --txn-size
802 type: int; default: 1
803
804 Number of rows per transaction.
805
806 Specifies the size, in number of rows, of each transaction. Zero
807 disables transactions altogether. After mk-archiver processes this
808 many rows, it commits both the "--source" and the "--dest" if
809 given, and flushes the file given by "--file".
810
811 This parameter is critical to performance. If you are archiving
812 from a live server, which for example is doing heavy OLTP work, you
813 need to choose a good balance between transaction size and commit
814 overhead. Larger transactions create the possibility of more lock
815 contention and deadlocks, but smaller transactions cause more
816 frequent commit overhead, which can be significant. To give an
817 idea, on a small test set I worked with while writing mk-archiver,
818 a value of 500 caused archiving to take about 2 seconds per 1000
819 rows on an otherwise quiet MySQL instance on my desktop machine,
820 archiving to disk and to another table. Disabling transactions
821 with a value of zero, which turns on autocommit, dropped
822 performance to 38 seconds per thousand rows.
823
824 If you are not archiving from or to a transactional storage engine,
825 you may want to disable transactions so mk-archiver doesn't try to
826 commit.
827
828 --user
829 short form: -u; type: string
830
831 User for login if not current user.
832
833 --version
834 Show version and exit.
835
836 --where
837 type: string
838
839 WHERE clause to limit which rows to archive (required).
840
841 Specifies a WHERE clause to limit which rows are archived. Do not
842 include the word WHERE. You may need to quote the argument to
843 prevent your shell from interpreting it. For example:
844
845 --where 'ts < current_date - interval 90 day'
846
847 For safety, "--where" is required. If you do not require a WHERE
848 clause, use "--where" 1=1.
849
850 --why-quit
851 Print reason for exiting unless rows exhausted.
852
853 Causes mk-archiver to print a message if it exits for any reason
854 other than running out of rows to archive. This can be useful if
855 you have a cron job with "--run-time" specified, for example, and
856 you want to be sure mk-archiver is finishing before running out of
857 time.
858
859 If "--statistics" is given, the behavior is changed slightly. It
860 will print the reason for exiting even when it's just because there
861 are no more rows.
862
863 This output prints even if "--quiet" is given. That's so you can
864 put "mk-archiver" in a "cron" job and get an email if there's an
865 abnormal exit.
866
868 These DSN options are used to create a DSN. Each option is given like
869 "option=value". The options are case-sensitive, so P and p are not the
870 same option. There cannot be whitespace before or after the "=" and if
871 the value contains whitespace it must be quoted. DSN options are
872 comma-separated. See the maatkit manpage for full details.
873
874 • a
875
876 copy: no
877
878 Database to USE when executing queries.
879
880 • A
881
882 dsn: charset; copy: yes
883
884 Default character set.
885
886 • b
887
888 copy: no
889
890 If true, disable binlog with SQL_LOG_BIN.
891
892 • D
893
894 dsn: database; copy: yes
895
896 Database that contains the table.
897
898 • F
899
900 dsn: mysql_read_default_file; copy: yes
901
902 Only read default options from the given file
903
904 • h
905
906 dsn: host; copy: yes
907
908 Connect to host.
909
910 • i
911
912 copy: yes
913
914 Index to use.
915
916 • m
917
918 copy: no
919
920 Plugin module name.
921
922 • p
923
924 dsn: password; copy: yes
925
926 Password to use when connecting.
927
928 • P
929
930 dsn: port; copy: yes
931
932 Port number to use for connection.
933
934 • S
935
936 dsn: mysql_socket; copy: yes
937
938 Socket file to use for connection.
939
940 • t
941
942 copy: yes
943
944 Table to archive from/to.
945
946 • u
947
948 dsn: user; copy: yes
949
950 User for login if not current user.
951
953 mk-archiver is extensible by plugging in external Perl modules to
954 handle some logic and/or actions. You can specify a module for both
955 the "--source" and the "--dest", with the 'm' part of the
956 specification. For example:
957
958 --source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
959
960 This will cause mk-archiver to load the My::Module1 and My::Module2
961 packages, create instances of them, and then make calls to them during
962 the archiving process.
963
964 You can also specify a plugin with "--plugin".
965
966 The module must provide this interface:
967
968 new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
969 The plugin's constructor is passed a reference to the database
970 handle, the database name, and table name. The plugin is created
971 just after mk-archiver opens the connection, and before it examines
972 the table given in the arguments. This gives the plugin a chance
973 to create and populate temporary tables, or do other setup work.
974
975 before_begin(cols => \@cols, allcols => \@allcols)
976 This method is called just before mk-archiver begins iterating
977 through rows and archiving them, but after it does all other setup
978 work (examining table structures, designing SQL queries, and so
979 on). This is the only time mk-archiver tells the plugin column
980 names for the rows it will pass the plugin while archiving.
981
982 The "cols" argument is the column names the user requested to be
983 archived, either by default or by the "--columns" option. The
984 "allcols" argument is the list of column names for every row mk-
985 archiver will fetch from the source table. It may fetch more
986 columns than the user requested, because it needs some columns for
987 its own use. When subsequent plugin functions receive a row, it is
988 the full row containing all the extra columns, if any, added to the
989 end.
990
991 is_archivable(row => \@row)
992 This method is called for each row to determine whether it is
993 archivable. This applies only to "--source". The argument is the
994 row itself, as an arrayref. If the method returns true, the row
995 will be archived; otherwise it will be skipped.
996
997 Skipping a row adds complications for non-unique indexes. Normally
998 mk-archiver uses a WHERE clause designed to target the last
999 processed row as the place to start the scan for the next SELECT
1000 statement. If you have skipped the row by returning false from
1001 is_archivable(), mk-archiver could get into an infinite loop
1002 because the row still exists. Therefore, when you specify a plugin
1003 for the "--source" argument, mk-archiver will change its WHERE
1004 clause slightly. Instead of starting at "greater than or equal to"
1005 the last processed row, it will start "strictly greater than."
1006 This will work fine on unique indexes such as primary keys, but it
1007 may skip rows (leave holes) on non-unique indexes or when ascending
1008 only the first column of an index.
1009
1010 "mk-archiver" will change the clause in the same way if you specify
1011 "--no-delete", because again an infinite loop is possible.
1012
1013 If you specify the "--bulk-delete" option and return false from
1014 this method, "mk-archiver" may not do what you want. The row won't
1015 be archived, but it will be deleted, since bulk deletes operate on
1016 ranges of rows and don't know which rows the plugin selected to
1017 keep.
1018
1019 If you specify the "--bulk-insert" option, this method's return
1020 value will influence whether the row is written to the temporary
1021 file for the bulk insert, so bulk inserts will work as expected.
1022 However, bulk inserts require bulk deletes.
1023
1024 before_delete(row => \@row)
1025 This method is called for each row just before it is deleted. This
1026 applies only to "--source". This is a good place for you to handle
1027 dependencies, such as deleting things that are foreign-keyed to the
1028 row you are about to delete. You could also use this to
1029 recursively archive all dependent tables.
1030
1031 This plugin method is called even if "--no-delete" is given, but
1032 not if "--bulk-delete" is given.
1033
1034 before_bulk_delete(first_row => \@row, last_row => \@row)
1035 This method is called just before a bulk delete is executed. It is
1036 similar to the "before_delete" method, except its arguments are the
1037 first and last row of the range to be deleted. It is called even
1038 if "--no-delete" is given.
1039
1040 before_insert(row => \@row)
1041 This method is called for each row just before it is inserted.
1042 This applies only to "--dest". You could use this to insert the
1043 row into multiple tables, perhaps with an ON DUPLICATE KEY UPDATE
1044 clause to build summary tables in a data warehouse.
1045
1046 This method is not called if "--bulk-insert" is given.
1047
1048 before_bulk_insert(first_row => \@row, last_row => \@row, filename =>
1049 bulk_insert_filename)
1050 This method is called just before a bulk insert is executed. It is
1051 similar to the "before_insert" method, except its arguments are the
1052 first and last row of the range to be deleted.
1053
1054 custom_sth(row => \@row, sql => $sql)
1055 This method is called just before inserting the row, but after
1056 "before_insert()". It allows the plugin to specify different
1057 "INSERT" statement if desired. The return value (if any) should be
1058 a DBI statement handle. The "sql" parameter is the SQL text used
1059 to prepare the default "INSERT" statement. This method is not
1060 called if you specify "--bulk-insert".
1061
1062 If no value is returned, the default "INSERT" statement handle is
1063 used.
1064
1065 This method applies only to the plugin specified for "--dest", so
1066 if your plugin isn't doing what you expect, check that you've
1067 specified it for the destination and not the source.
1068
1069 custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql,
1070 filename => $bulk_insert_filename)
1071 If you've specified "--bulk-insert", this method is called just
1072 before the bulk insert, but after "before_bulk_insert()", and the
1073 arguments are different.
1074
1075 This method's return value etc is similar to the "custom_sth()"
1076 method.
1077
1078 after_finish()
1079 This method is called after mk-archiver exits the archiving loop,
1080 commits all database handles, closes "--file", and prints the final
1081 statistics, but before mk-archiver runs ANALYZE or OPTIMIZE (see
1082 "--analyze" and "--optimize").
1083
1084 If you specify a plugin for both "--source" and "--dest", mk-archiver
1085 constructs, calls before_begin(), and calls after_finish() on the two
1086 plugins in the order "--source", "--dest".
1087
1088 mk-archiver assumes it controls transactions, and that the plugin will
1089 NOT commit or roll back the database handle. The database handle
1090 passed to the plugin's constructor is the same handle mk-archiver uses
1091 itself. Remember that "--source" and "--dest" are separate handles.
1092
1093 A sample module might look like this:
1094
1095 package My::Module;
1096
1097 sub new {
1098 my ( $class, %args ) = @_;
1099 return bless(\%args, $class);
1100 }
1101
1102 sub before_begin {
1103 my ( $self, %args ) = @_;
1104 # Save column names for later
1105 $self->{cols} = $args{cols};
1106 }
1107
1108 sub is_archivable {
1109 my ( $self, %args ) = @_;
1110 # Do some advanced logic with $args{row}
1111 return 1;
1112 }
1113
1114 sub before_delete {} # Take no action
1115 sub before_insert {} # Take no action
1116 sub custom_sth {} # Take no action
1117 sub after_finish {} # Take no action
1118
1119 1;
1120
1122 You can download Maatkit from Google Code at
1123 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1124 easily with a command like the following:
1125
1126 wget http://www.maatkit.org/get/toolname
1127 or
1128 wget http://www.maatkit.org/trunk/toolname
1129
1130 Where "toolname" can be replaced with the name (or fragment of a name)
1131 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1132 installation is needed. The first URL gets the latest released version
1133 of the tool, and the second gets the latest trunk code from Subversion.
1134
1136 The environment variable "MKDEBUG" enables verbose debugging output in
1137 all of the Maatkit tools:
1138
1139 MKDEBUG=1 mk-....
1140
1142 You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1143 installed in any reasonably new version of Perl.
1144
1146 For a list of known bugs see <http://www.maatkit.org/bugs/mk-archiver>.
1147
1148 Please use Google Code Issues and Groups to report bugs or request
1149 support: <http://code.google.com/p/maatkit/>. You can also join
1150 #maatkit on Freenode to discuss Maatkit.
1151
1152 Please include the complete command-line used to reproduce the problem
1153 you are seeing, the version of all MySQL servers involved, the complete
1154 output of the tool when run with "--version", and if possible,
1155 debugging output produced by running with the "MKDEBUG=1" environment
1156 variable.
1157
1159 Thanks to the following people, and apologies to anyone I've omitted:
1160
1161 Andrew O'Brien,
1162
1164 This program is copyright 2007-2011 Baron Schwartz. Feedback and
1165 improvements are welcome.
1166
1167 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1168 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1169 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1170
1171 This program is free software; you can redistribute it and/or modify it
1172 under the terms of the GNU General Public License as published by the
1173 Free Software Foundation, version 2; OR the Perl Artistic License. On
1174 UNIX and similar systems, you can issue `man perlgpl' or `man
1175 perlartistic' to read these licenses.
1176
1177 You should have received a copy of the GNU General Public License along
1178 with this program; if not, write to the Free Software Foundation, Inc.,
1179 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1180
1182 Baron Schwartz
1183
1185 This tool is part of Maatkit, a toolkit for power users of MySQL.
1186 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1187 primary code contributors. Both are employed by Percona. Financial
1188 support for Maatkit development is primarily provided by Percona and
1189 its clients.
1190
1192 This manual page documents Ver 1.0.27 Distrib 7540 $Revision: 7531 $.
1193
1194
1195
1196perl v5.34.0 2022-01-20 MK-ARCHIVER(1)