mk-table-sync(1p)

1MK-TABLE-SYNC(1)      User Contributed Perl Documentation     MK-TABLE-SYNC(1)
2
3
4

NAME

6       mk-table-sync - Synchronize MySQL tables efficiently.
7

SYNOPSIS

9       This tool changes data, so for maximum safety, you should back up your
10       data before you use it.
11
12       To sync db.tbl1 from host1 to host2:
13
14        mk-table-sync --execute u=user,p=pass,h=host1,D=db,t=tbl host2
15
16       Sync all tables in host1 to host2 and host3:
17
18        mk-table-sync --execute host1 host2 host3
19
20       Resolve differences mk-table-checksum found on this master's slaves:
21
22        mk-table-sync --execute --replicate test.checksum master1
23
24       Make slave1 have the same data as its replication master:
25
26        mk-table-sync --execute --sync-to-master slave1
27
28       Ditto, resolving differences mk-table-checksum found:
29
30        mk-table-sync --execute --sync-to-master \
31          --replicate test.checksum slave1
32
33       Sync server2 in a master-master replication configuration, where
34       server2's copy of db1.tbl1 is known or suspected to be incorrect:
35
36        mk-table-sync --execute --sync-to-master h=server2,D=db1,t=tbl1
37
38       Note that in the master-master configuration, the following will NOT do
39       what you want, because it will make changes directly on server2, which
40       will then flow through replication and change server1's data:
41
42        # Don't do this in a master-master setup!
43        mk-table-sync --execute h=server1,D=db1,t=tbl1 h=server2
44

RISKS

46       The following section is included to inform users about the potential
47       risks, whether known or unknown, of using this tool.  The two main
48       categories of risks are those created by the nature of the tool (e.g.
49       read-only tools vs. read-write tools) and those created by bugs.
50
51       With great power comes great responsibility!  This tool changes data,
52       so it is a good idea to back up your data.  It is also very powerful,
53       which means it is very complex, so you should run it with the
54       "--dry-run" option to see what it will do, until you're familiar with
55       its operation.  If you want to see which rows are different, without
56       changing any data, use "--print" instead of "--execute".
57
58       Be careful when using mk-table-sync in any master-master setup.
59       Master-master replication is inherently tricky, and it's easy to make
60       mistakes.  You need to be sure you're using the tool correctly for
61       master-master replication.  See the "SYNOPSIS" for the overview of the
62       correct usage.
63
64       Also be careful with tables that have foreign key constraints with "ON
65       DELETE" or "ON UPDATE" definitions because these might cause unintended
66       changes on the child tables.
67
68       In general, this tool is best suited when your tables have a primary
69       key or unique index.  Although it can synchronize data in tables
70       lacking a primary key or unique index, it might be best to sychronize
71       that data by another means.
72
73       At the time of this release, there is a potential bug using
74       "--lock-and-rename" with MySQL 5.1, a bug detecting certain
75       differences, a bug using ROUND() across different platforms, and a bug
76       mixing collations.
77
78       The authoritative source for updated information is always the online
79       issue tracking system.  Issues that affect this tool will be marked as
80       such.  You can see a list of such issues at the following URL:
81       http://www.maatkit.org/bugs/mk-table-sync
82       <http://www.maatkit.org/bugs/mk-table-sync>.
83
84       See also "BUGS" for more information on filing bugs and getting help.
85

DESCRIPTION

87       WARNING this tool is unfinished and could perform slowly.  The Chunk
88       algorithm is great when it can be used, and so is Nibble, but otherwise
89       GroupBy is the default choice and it may not perform very well.  Please
90       run with "--dry-run" before subjecting your servers to this tool, and
91       make backups of your data!
92
93       This tool is designed to do one-way and bidirectional synchronization
94       of data.  It finds differences efficiently with one of several
95       algorithms (see "ALGORITHMS").  It makes changes on the destination
96       table(s) so it matches the source.
97
98       It does not synchronize table structures, indexes, or any other schema
99       changes.  It synchronizes only data.
100
101       It can operate through replication by comparing a slave with its master
102       and making changes on the master.  These changes will flow through
103       replication and correct any differences found on the slave.
104
105       It accepts a list of DSNs (see the "--help" output) to tell it where
106       and how to connect.
107
108       There are many ways to invoke it.  The following is the abbreviated
109       logic:
110
111          if DSN has a t part, sync only that table:
112             if 1 DSN:
113                if --sync-to-master:
114                   The DSN is a slave.  Connect to its master and sync.
115             if more than 1 DSN:
116                The first DSN is the source.  Sync each DSN in turn.
117          else if --replicate:
118             if --sync-to-master:
119                The DSN is a slave.  Connect to its master, find records
120                of differences, and fix.
121             else:
122                The DSN is the master.  Find slaves and connect to each,
123                find records of differences, and fix.
124          else:
125             if only 1 DSN and --sync-to-master:
126                The DSN is a slave.  Connect to its master, find tables and
127                filter with --databases etc, and sync each table to the master.
128             else:
129                find tables, filtering with --databases etc, and sync each
130                DSN to the first.
131
132       If you're confused about how it the DSNs are interpreted, use the
133       "--explain-hosts" option and it will tell you.
134

OUTPUT

136       If you specify the "--verbose" option, you'll see information about the
137       differences between the tables.  There is one row per table.  Each
138       server is printed separately.  For example,
139
140          # Syncing h=host1,D=test,t=test1
141          # DELETE REPLACE INSERT UPDATE ALGORITHM EXIT DATABASE.TABLE
142          #      0       0      3      0 Chunk     2    test.test1
143
144       Table test.test1 on host1 required 3 "INSERT" statements to synchronize
145       and it used the Chunk algorithm (see "ALGORITHMS").  Because
146       differences were found, its "EXIT STATUS" was 2.
147
148       If you specify the "--print" option, you'll see the actual SQL
149       statements that the script uses to synchronize the table if "--execute"
150       is also specified.
151
152       If you want to see the SQL statements that mk-table-sync is using to
153       select chunks, nibbles, rows, etc., then specify "--print" once and
154       "--verbose" twice.  Be careful though: this can print a lot of SQL
155       statements.
156
157       There are cases where no combination of "INSERT", "UPDATE" or "DELETE"
158       statements can resolve differences without violating some unique key.
159       For example, suppose there's a primary key on column a and a unique key
160       on column b.  Then there is no way to sync these two tables with
161       straightforward UPDATE statements:
162
163        +---+---+  +---+---+
164        | a | b |  | a | b |
165        +---+---+  +---+---+
166        | 1 | 2 |  | 1 | 1 |
167        | 2 | 1 |  | 2 | 2 |
168        +---+---+  +---+---+
169
170       The tool rewrites queries to "DELETE" and "REPLACE" in this case.  This
171       is automatically handled after the first index violation, so you don't
172       have to worry about it.
173

REPLICATION SAFETY

175       Synchronizing a replication master and slave safely is a non-trivial
176       problem, in general.  There are all sorts of issues to think about,
177       such as other processes changing data, trying to change data on the
178       slave, whether the destination and source are a master-master pair, and
179       much more.
180
181       In general, the safe way to do it is to change the data on the master,
182       and let the changes flow through replication to the slave like any
183       other changes.  However, this works only if it's possible to REPLACE
184       into the table on the master.  REPLACE works only if there's a unique
185       index on the table (otherwise it just acts like an ordinary INSERT).
186
187       If your table has unique keys, you should use the "--sync-to-master"
188       and/or "--replicate" options to sync a slave to its master.  This will
189       generally do the right thing.  When there is no unique key on the
190       table, there is no choice but to change the data on the slave, and mk-
191       table-sync will detect that you're trying to do so.  It will complain
192       and die unless you specify "--no-check-slave" (see
193       "--[no]check-slave").
194
195       If you're syncing a table without a primary or unique key on a master-
196       master pair, you must change the data on the destination server.
197       Therefore, you need to specify "--no-bin-log" for safety (see
198       "--[no]bin-log").  If you don't, the changes you make on the
199       destination server will replicate back to the source server and change
200       the data there!
201
202       The generally safe thing to do on a master-master pair is to use the
203       "--sync-to-master" option so you don't change the data on the
204       destination server.  You will also need to specify "--no-check-slave"
205       to keep mk-table-sync from complaining that it is changing data on a
206       slave.
207

ALGORITHMS

209       This tool has a generic data-syncing framework, within which it is
210       possible to use any number of different algorithms to actually find
211       differences.  It chooses the best algorithm automatically.  While I
212       plan to add more algorithms in the future, the following are
213       implemented now:
214
215       Chunk
216           Finds an index whose first column is numeric (including date and
217           time types), and divides the column's range of values into chunks
218           of approximately "--chunk-size" rows.  Syncs a chunk at a time by
219           checksumming the entire chunk.  If the chunk differs on the source
220           and destination, checksums each chunk's rows individually to find
221           the rows that differ.
222
223           It is efficient when the column has sufficient cardinality to make
224           the chunks end up about the right size.
225
226           The initial per-chunk checksum is quite small and results in
227           minimal network traffic and memory consumption.  If a chunk's rows
228           must be examined, only the primary key columns and a checksum are
229           sent over the network, not the entire row.  If a row is found to be
230           different, the entire row will be fetched, but not before.
231
232       Nibble
233           Finds an index and ascends the index in fixed-size nibbles of
234           "--chunk-size" rows, using a non-backtracking algorithm (see mk-
235           archiver for more on this algorithm).  It is very similar to
236           "Chunk", but instead of pre-calculating the boundaries of each
237           piece of the table based on index cardinality, it uses "LIMIT" to
238           define each nibble's upper limit, and the previous nibble's upper
239           limit to define the lower limit.
240
241           It works in steps: one query finds the row that will define the
242           next nibble's upper boundary, and the next query checksums the
243           entire nibble.  If the nibble differs between the source and
244           destination, it examines the nibble row-by-row, just as "Chunk"
245           does.
246
247       GroupBy
248           Selects the entire table grouped by all columns, with a COUNT(*)
249           column added.  Compares all columns, and if they're the same,
250           compares the COUNT(*) column's value to determine how many rows to
251           insert or delete into the destination.  Works on tables with no
252           primary key or unique index.
253
254       Stream
255           Selects the entire table in one big stream and compares all
256           columns.  Selects all columns.  Much less efficient than the other
257           algorithms, but works when there is no suitable index for them to
258           use.
259
260       Future Plans
261           Possibilities for future algorithms are TempTable (what I
262           originally called bottom-up in earlier versions of this tool),
263           DrillDown (what I originallly called top-down), and GroupByPrefix
264           (similar to how SqlYOG Job Agent works).  Each algorithm has
265           strengths and weaknesses.  If you'd like to implement your favorite
266           technique for finding differences between two sources of data on
267           possibly different servers, I'm willing to help.  The algorithms
268           adhere to a simple interface that makes it pretty easy to write
269           your own.
270

BIDIRECTIONAL SYNCING

272       Bidirectional syncing is a new, experimental feature.  To make it work
273       reliably there are a number of strict limitations:
274
275         * only works when syncing one server to other independent servers
276         * does not work in any way with replication
277         * requires that the table(s) are chunkable with the Chunk algorithm
278         * is not N-way, only bidirectional between two servers at a time
279         * does not handle DELETE changes
280
281       For example, suppose we have three servers: c1, r1, r2.  c1 is the
282       central server, a pseudo-master to the other servers (viz. r1 and r2
283       are not slaves to c1).  r1 and r2 are remote servers.  Rows in table
284       foo are updated and inserted on all three servers and we want to
285       synchronize all the changes between all the servers.  Table foo has
286       columns:
287
288         id    int PRIMARY KEY
289         ts    timestamp auto updated
290         name  varchar
291
292       Auto-increment offsets are used so that new rows from any server do not
293       create conflicting primary key (id) values.  In general, newer rows, as
294       determined by the ts column, take precedence when a same but differing
295       row is found during the bidirectional sync.  "Same but differing" means
296       that two rows have the same primary key (id) value but different values
297       for some other column, like the name column in this example.  Same but
298       differing conflicts are resolved by a "conflict".  A conflict compares
299       some column of the competing rows to determine a "winner".  The
300       winnning row becomes the source and its values are used to update the
301       other row.
302
303       There are subtle differences between three columns used to achieve
304       bidirectional syncing that you should be familiar with: chunk column
305       ("--chunk-column"), comparison column(s) ("--columns"), and conflict
306       column ("--conflict-column").  The chunk column is only used to chunk
307       the table; e.g. "WHERE id >= 5 AND id < 10".  Chunks are checksummed
308       and when chunk checksums reveal a difference, the tool selects the rows
309       in that chunk and checksums the "--columns" for each row.  If a column
310       checksum differs, the rows have one or more conflicting column values.
311       In a traditional unidirectional sync, the conflict is a moot point
312       because it can be resolved simply by updating the entire destination
313       row with the source row's values.  In a bidirectional sync, however,
314       the "--conflict-column" (in accordance with other "--conflict-*"
315       options list below) is compared to determine which row is "correct" or
316       "authoritative"; this row becomes the "source".
317
318       To sync all three servers completely, two runs of mk-table-sync are
319       required.  The first run syncs c1 and r1, then syncs c1 and r2
320       including any changes from r1.  At this point c1 and r2 are completely
321       in sync, but r1 is missing any changes from r2 because c1 didn't have
322       these changes when it and r1 were synced.  So a second run is needed
323       which syncs the servers in the same order, but this time when c1 and r1
324       are synced r1 gets r2's changes.
325
326       The tool does not sync N-ways, only bidirectionally between the first
327       DSN given on the command line and each subsequent DSN in turn.  So the
328       tool in this example would be ran twice like:
329
330         mk-table-sync --bidirectional h=c1 h=r1 h=r2
331
332       The "--bidirectional" option enables this feature and causes various
333       sanity checks to be performed.  You must specify other options that
334       tell mk-table-sync how to resolve conflicts for same but differing
335       rows.  These options are:
336
337         * L<"--conflict-column">
338         * L<"--conflict-comparison">
339         * L<"--conflict-value">
340         * L<"--conflict-threshold">
341         * L<"--conflict-error">  (optional)
342
343       Use "--print" to test this option before "--execute".  The printed SQL
344       statements will have comments saying on which host the statment would
345       be executed if you used "--execute".
346
347       Technical side note: the first DSN is always the "left" server and the
348       other DSNs are always the "right" server.  Since either server can
349       become the source or destination it's confusing to think of them as
350       "src" and "dst".  Therefore, they're generically referred to as left
351       and right.  It's easy to remember this because the first DSN is always
352       to the left of the other server DSNs on the command line.
353

EXIT STATUS

355       Exit status is as follows:
356
357          STATUS  MEANING
358          ======  =======================================================
359          0       Success.
360          1       Internal error.
361          2       At least one table differed on the destination.
362          3       Combination of 1 and 2.
363

OPTIONS

365       Specify at least one of "--print", "--execute", or "--dry-run".
366
367       "--where" and "--replicate" are mutually exclusive.
368
369       --algorithms
370           type: string; default: Chunk,Nibble,GroupBy,Stream
371
372           Algorithm to use when comparing the tables, in order of preference.
373
374           For each table, mk-table-sync will check if the table can be synced
375           with the given algorithms in the order that they're given.  The
376           first algorithm that can sync the table is used.  See "ALGORITHMS".
377
378       --ask-pass
379           Prompt for a password when connecting to MySQL.
380
381       --bidirectional
382           Enable bidirectional sync between first and subsequent hosts.
383
384           See "BIDIRECTIONAL SYNCING" for more information.
385
386       --[no]bin-log
387           default: yes
388
389           Log to the binary log ("SET SQL_LOG_BIN=1").
390
391           Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".
392
393       --buffer-in-mysql
394           Instruct MySQL to buffer queries in its memory.
395
396           This option adds the "SQL_BUFFER_RESULT" option to the comparison
397           queries.  This causes MySQL to execute the queries and place them
398           in a temporary table internally before sending the results back to
399           mk-table-sync.  The advantage of this strategy is that mk-table-
400           sync can fetch rows as desired without using a lot of memory inside
401           the Perl process, while releasing locks on the MySQL table (to
402           reduce contention with other queries).  The disadvantage is that it
403           uses more memory on the MySQL server instead.
404
405           You probably want to leave "--[no]buffer-to-client" enabled too,
406           because buffering into a temp table and then fetching it all into
407           Perl's memory is probably a silly thing to do.  This option is most
408           useful for the GroupBy and Stream algorithms, which may fetch a lot
409           of data from the server.
410
411       --[no]buffer-to-client
412           default: yes
413
414           Fetch rows one-by-one from MySQL while comparing.
415
416           This option enables "mysql_use_result" which causes MySQL to hold
417           the selected rows on the server until the tool fetches them.  This
418           allows the tool to use less memory but may keep the rows locked on
419           the server longer.
420
421           If this option is disabled by specifying "--no-buffer-to-client"
422           then "mysql_store_result" is used which causes MySQL to send all
423           selected rows to the tool at once.  This may result in the results
424           "cursor" being held open for a shorter time on the server, but if
425           the tables are large, it could take a long time anyway, and use all
426           your memory.
427
428           For most non-trivial data sizes, you want to leave this option
429           enabled.
430
431           This option is disabled when "--bidirectional" is used.
432
433       --charset
434           short form: -A; type: string
435
436           Default character set.  If the value is utf8, sets Perl's binmode
437           on STDOUT to utf8, passes the mysql_enable_utf8 option to
438           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
439           other value sets binmode on STDOUT without the utf8 layer, and runs
440           SET NAMES after connecting to MySQL.
441
442       --[no]check-master
443           default: yes
444
445           With "--sync-to-master", try to verify that the detected master is
446           the real master.
447
448       --[no]check-privileges
449           default: yes
450
451           Check that user has all necessary privileges on source and
452           destination table.
453
454       --[no]check-slave
455           default: yes
456
457           Check whether the destination server is a slave.
458
459           If the destination server is a slave, it's generally unsafe to make
460           changes on it.  However, sometimes you have to; "--replace" won't
461           work unless there's a unique index, for example, so you can't make
462           changes on the master in that scenario.  By default mk-table-sync
463           will complain if you try to change data on a slave.  Specify
464           "--no-slave-check" to disable this check.  Use it at your own risk.
465
466       --[no]check-triggers
467           default: yes
468
469           Check that no triggers are defined on the destination table.
470
471           Triggers were introduced in MySQL v5.0.2, so for older versions
472           this option has no effect because triggers will not be checked.
473
474       --chunk-column
475           type: string
476
477           Chunk the table on this column.
478
479       --chunk-index
480           type: string
481
482           Chunk the table using this index.
483
484       --chunk-size
485           type: string; default: 1000
486
487           Number of rows or data size per chunk.
488
489           The size of each chunk of rows for the "Chunk" and "Nibble"
490           algorithms.  The size can be either a number of rows, or a data
491           size.  Data sizes are specified with a suffix of k=kibibytes,
492           M=mebibytes, G=gibibytes.  Data sizes are converted to a number of
493           rows by dividing by the average row length.
494
495       --columns
496           short form: -c; type: array
497
498           Compare this comma-separated list of columns.
499
500       --config
501           type: Array
502
503           Read this comma-separated list of config files; if specified, this
504           must be the first option on the command line.
505
506       --conflict-column
507           type: string
508
509           Compare this column when rows conflict during a "--bidirectional"
510           sync.
511
512           When a same but differing row is found the value of this column
513           from each row is compared according to "--conflict-comparison",
514           "--conflict-value" and "--conflict-threshold" to determine which
515           row has the correct data and becomes the source.  The column can be
516           any type for which there is an appropriate "--conflict-comparison"
517           (this is almost all types except, for example, blobs).
518
519           This option only works with "--bidirectional".  See "BIDIRECTIONAL
520           SYNCING" for more information.
521
522       --conflict-comparison
523           type: string
524
525           Choose the "--conflict-column" with this property as the source.
526
527           The option affects how the "--conflict-column" values from the
528           conflicting rows are compared.  Possible comparisons are one of
529           these MAGIC_comparisons:
530
531             newest|oldest|greatest|least|equals|matches
532
533             COMPARISON  CHOOSES ROW WITH
534             ==========  =========================================================
535             newest      Newest temporal L<"--conflict-column"> value
536             oldest      Oldest temporal L<"--conflict-column"> value
537             greatest    Greatest numerical L<"--conflict-column"> value
538             least       Least numerical L<"--conflict-column"> value
539             equals      L<"--conflict-column"> value equal to L<"--conflict-value">
540             matches     L<"--conflict-column"> value matching Perl regex pattern
541                         L<"--conflict-value">
542
543           This option only works with "--bidirectional".  See "BIDIRECTIONAL
544           SYNCING" for more information.
545
546       --conflict-error
547           type: string; default: warn
548
549           How to report unresolvable conflicts and conflict errors
550
551           This option changes how the user is notified when a conflict cannot
552           be resolved or causes some kind of error.  Possible values are:
553
554             * warn: Print a warning to STDERR about the unresolvable conflict
555             * die:  Die, stop syncing, and print a warning to STDERR
556
557           This option only works with "--bidirectional".  See "BIDIRECTIONAL
558           SYNCING" for more information.
559
560       --conflict-threshold
561           type: string
562
563           Amount by which one "--conflict-column" must exceed the other.
564
565           The "--conflict-threshold" prevents a conflict from being resolved
566           if the absolute difference between the two "--conflict-column"
567           values is less than this amount.  For example, if two
568           "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
569           "2009-12-01 12:05:00" the difference is 5 minutes.  If
570           "--conflict-threshold" is set to "5m" the conflict will be
571           resolved, but if "--conflict-threshold" is set to "6m" the conflict
572           will fail to resolve because the difference is not greater than or
573           equal to 6 minutes.  In this latter case, "--conflict-error" will
574           report the failure.
575
576           This option only works with "--bidirectional".  See "BIDIRECTIONAL
577           SYNCING" for more information.
578
579       --conflict-value
580           type: string
581
582           Use this value for certain "--conflict-comparison".
583
584           This option gives the value for "equals" and "matches"
585           "--conflict-comparison".
586
587           This option only works with "--bidirectional".  See "BIDIRECTIONAL
588           SYNCING" for more information.
589
590       --databases
591           short form: -d; type: hash
592
593           Sync only this comma-separated list of databases.
594
595           A common request is to sync tables from one database with tables
596           from another database on the same or different server.  This is not
597           yet possible.  "--databases" will not do it, and you can't do it
598           with the D part of the DSN either because in the absence of a table
599           name it assumes the whole server should be synced and the D part
600           controls only the connection's default database.
601
602       --defaults-file
603           short form: -F; type: string
604
605           Only read mysql options from the given file.  You must give an
606           absolute pathname.
607
608       --dry-run
609           Analyze, decide the sync algorithm to use, print and exit.
610
611           Implies "--verbose" so you can see the results.  The results are in
612           the same output format that you'll see from actually running the
613           tool, but there will be zeros for rows affected.  This is because
614           the tool actually executes, but stops before it compares any data
615           and just returns zeros.  The zeros do not mean there are no changes
616           to be made.
617
618       --engines
619           short form: -e; type: hash
620
621           Sync only this comma-separated list of storage engines.
622
623       --execute
624           Execute queries to make the tables have identical data.
625
626           This option makes mk-table-sync actually sync table data by
627           executing all the queries that it created to resolve table
628           differences.  Therefore, the tables will be changed!  And unless
629           you also specify "--verbose", the changes will be made silently.
630           If this is not what you want, see "--print" or "--dry-run".
631
632       --explain-hosts
633           Print connection information and exit.
634
635           Print out a list of hosts to which mk-table-sync will connect, with
636           all the various connection options, and exit.
637
638       --float-precision
639           type: int
640
641           Precision for "FLOAT" and "DOUBLE" column comparisons.
642
643           If you specify this option, FLOAT and DOUBLE columns will be
644           rounded to the specified number of digits after the decimal point
645           for the checksum.  This can avoid mismatches due to different
646           floating-point representations of the same values on different
647           MySQL versions and hardware.
648
649       --[no]foreign-key-checks
650           default: yes
651
652           Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").
653
654           Specifying "--no-foreign-key-checks" will "SET
655           FOREIGN_KEY_CHECKS=0".
656
657       --function
658           type: string
659
660           Which hash function you'd like to use for checksums.
661
662           The default is "CRC32".  Other good choices include "MD5" and
663           "SHA1".  If you have installed the "FNV_64" user-defined function,
664           "mk-table-sync" will detect it and prefer to use it, because it is
665           much faster than the built-ins.  You can also use MURMUR_HASH if
666           you've installed that user-defined function.  Both of these are
667           distributed with Maatkit.  See mk-table-checksum for more
668           information and benchmarks.
669
670       --help
671           Show help and exit.
672
673       --[no]hex-blob
674           default: yes
675
676           "HEX()" "BLOB", "TEXT" and "BINARY" columns.
677
678           When row data from the source is fetched to create queries to sync
679           the data (i.e. the queries seen with "--print" and exeucted by
680           "--execute"), binary columns are wrapped in HEX() so the binary
681           data does not produce an invalid SQL statement.  You can disable
682           this option but you probably shouldn't.
683
684       --host
685           short form: -h; type: string
686
687           Connect to host.
688
689       --ignore-columns
690           type: Hash
691
692           Ignore this comma-separated list of column names in comparisons.
693
694           This option causes columns not to be compared.  However, if a row
695           is determined to differ between tables, all columns in that row
696           will be synced, regardless.  (It is not currently possible to
697           exclude columns from the sync process itself, only from the
698           comparison.)
699
700       --ignore-databases
701           type: Hash
702
703           Ignore this comma-separated list of databases.
704
705       --ignore-engines
706           type: Hash; default: FEDERATED,MRG_MyISAM
707
708           Ignore this comma-separated list of storage engines.
709
710       --ignore-tables
711           type: Hash
712
713           Ignore this comma-separated list of tables.
714
715           Table names may be qualified with the database name.
716
717       --[no]index-hint
718           default: yes
719
720           Add FORCE/USE INDEX hints to the chunk and row queries.
721
722           By default "mk-table-sync" adds a FORCE/USE INDEX hint to each SQL
723           statement to coerce MySQL into using the index chosen by the sync
724           algorithm or specified by "--chunk-index".  This is usually a good
725           thing, but in rare cases the index may not be the best for the
726           query so you can suppress the index hint by specifying
727           "--no-index-hint" and let MySQL choose the index.
728
729           This does not affect the queries printed by "--print"; it only
730           affects the chunk and row queries that "mk-table-sync" uses to
731           select and compare rows.
732
733       --lock
734           type: int
735
736           Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
737
738           This uses "LOCK TABLES".  This can help prevent tables being
739           changed while you're examining them.  The possible values are as
740           follows:
741
742             VALUE  MEANING
743             =====  =======================================================
744             0      Never lock tables.
745             1      Lock and unlock one time per sync cycle (as implemented
746                    by the syncing algorithm).  This is the most granular
747                    level of locking available.  For example, the Chunk
748                    algorithm will lock each chunk of C<N> rows, and then
749                    unlock them if they are the same on the source and the
750                    destination, before moving on to the next chunk.
751             2      Lock and unlock before and after each table.
752             3      Lock and unlock once for every server (DSN) synced, with
753                    C<FLUSH TABLES WITH READ LOCK>.
754
755           A replication slave is never locked if "--replicate" or
756           "--sync-to-master" is specified, since in theory locking the table
757           on the master should prevent any changes from taking place.  (You
758           are not changing data on your slave, right?)  If "--wait" is given,
759           the master (source) is locked and then the tool waits for the slave
760           to catch up to the master before continuing.
761
762           If "--transaction" is specified, "LOCK TABLES" is not used.
763           Instead, lock and unlock are implemented by beginning and
764           committing transactions.  The exception is if "--lock" is 3.
765
766           If "--no-transaction" is specified, then "LOCK TABLES" is used for
767           any value of "--lock". See "--[no]transaction".
768
769       --lock-and-rename
770           Lock the source and destination table, sync, then swap names.  This
771           is useful as a less-blocking ALTER TABLE, once the tables are
772           reasonably in sync with each other (which you may choose to
773           accomplish via any number of means, including dump and reload or
774           even something like mk-archiver).  It requires exactly two DSNs and
775           assumes they are on the same server, so it does no waiting for
776           replication or the like.  Tables are locked with LOCK TABLES.
777
778       --password
779           short form: -p; type: string
780
781           Password to use when connecting.
782
783       --pid
784           type: string
785
786           Create the given PID file.  The file contains the process ID of the
787           script.  The PID file is removed when the script exits.  Before
788           starting, the script checks if the PID file already exists.  If it
789           does not, then the script creates and writes its own PID to it.  If
790           it does, then the script checks the following: if the file contains
791           a PID and a process is running with that PID, then the script dies;
792           or, if there is no process running with that PID, then the script
793           overwrites the file with its own PID and starts; else, if the file
794           contains no PID, then the script dies.
795
796       --port
797           short form: -P; type: int
798
799           Port number to use for connection.
800
801       --print
802           Print queries that will resolve differences.
803
804           If you don't trust "mk-table-sync", or just want to see what it
805           will do, this is a good way to be safe.  These queries are valid
806           SQL and you can run them yourself if you want to sync the tables
807           manually.
808
809       --recursion-method
810           type: string
811
812           Preferred recursion method used to find slaves.
813
814           Possible methods are:
815
816             METHOD       USES
817             ===========  ================
818             processlist  SHOW PROCESSLIST
819             hosts        SHOW SLAVE HOSTS
820
821           The processlist method is preferred because SHOW SLAVE HOSTS is not
822           reliable.  However, the hosts method is required if the server uses
823           a non-standard port (not 3306).  Usually mk-table-sync does the
824           right thing and finds the slaves, but you may give a preferred
825           method and it will be used first.  If it doesn't find any slaves,
826           the other methods will be tried.
827
828       --replace
829           Write all "INSERT" and "UPDATE" statements as "REPLACE".
830
831           This is automatically switched on as needed when there are unique
832           index violations.
833
834       --replicate
835           type: string
836
837           Sync tables listed as different in this table.
838
839           Specifies that "mk-table-sync" should examine the specified table
840           to find data that differs.  The table is exactly the same as the
841           argument of the same name to mk-table-checksum.  That is, it
842           contains records of which tables (and ranges of values) differ
843           between the master and slave.
844
845           For each table and range of values that shows differences between
846           the master and slave, "mk-table-checksum" will sync that table,
847           with the appropriate "WHERE" clause, to its master.
848
849           This automatically sets "--wait" to 60 and causes changes to be
850           made on the master instead of the slave.
851
852           If "--sync-to-master" is specified, the tool will assume the server
853           you specified is the slave, and connect to the master as usual to
854           sync.
855
856           Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of
857           the server you specified.  If it is unable to find any slaves via
858           "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.
859           You must configure each slave's "report-host", "report-port" and
860           other options for this to work right.  After finding slaves, it
861           will inspect the specified table on each slave to find data that
862           needs to be synced, and sync it.
863
864           The tool examines the master's copy of the table first, assuming
865           that the master is potentially a slave as well.  Any table that
866           shows differences there will NOT be synced on the slave(s).  For
867           example, suppose your replication is set up as A->B, B->C, B->D.
868           Suppose you use this argument and specify server B.  The tool will
869           examine server B's copy of the table.  If it looks like server B's
870           data in table "test.tbl1" is different from server A's copy, the
871           tool will not sync that table on servers C and D.
872
873       --set-vars
874           type: string; default: wait_timeout=10000
875
876           Set these MySQL variables.  Immediately after connecting to MySQL,
877           this string will be appended to SET and executed.
878
879       --socket
880           short form: -S; type: string
881
882           Socket file to use for connection.
883
884       --sync-to-master
885           Treat the DSN as a slave and sync it to its master.
886
887           Treat the server you specified as a slave.  Inspect "SHOW SLAVE
888           STATUS", connect to the server's master, and treat the master as
889           the source and the slave as the destination.  Causes changes to be
890           made on the master.  Sets "--wait" to 60 by default, sets "--lock"
891           to 1 by default, and disables "--[no]transaction" by default.  See
892           also "--replicate", which changes this option's behavior.
893
894       --tables
895           short form: -t; type: hash
896
897           Sync only this comma-separated list of tables.
898
899           Table names may be qualified with the database name.
900
901       --timeout-ok
902           Keep going if "--wait" fails.
903
904           If you specify "--wait" and the slave doesn't catch up to the
905           master's position before the wait times out, the default behavior
906           is to abort.  This option makes the tool keep going anyway.
907           Warning: if you are trying to get a consistent comparision between
908           the two servers, you probably don't want to keep going after a
909           timeout.
910
911       --[no]transaction
912           Use transactions instead of "LOCK TABLES".
913
914           The granularity of beginning and committing transactions is
915           controlled by "--lock".  This is enabled by default, but since
916           "--lock" is disabled by default, it has no effect.
917
918           Most options that enable locking also disable transactions by
919           default, so if you want to use transactional locking (via "LOCK IN
920           SHARE MODE" and "FOR UPDATE", you must specify "--transaction"
921           explicitly.
922
923           If you don't specify "--transaction" explicitly "mk-table-sync"
924           will decide on a per-table basis whether to use transactions or
925           table locks.  It currently uses transactions on InnoDB tables, and
926           table locks on all others.
927
928           If "--no-transaction" is specified, then "mk-table-sync" will not
929           use transactions at all (not even for InnoDB tables) and locking is
930           controlled by "--lock".
931
932           When enabled, either explicitly or implicitly, the transaction
933           isolation level is set "REPEATABLE READ" and transactions are
934           started "WITH CONSISTENT SNAPSHOT".
935
936       --trim
937           "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes.  Helps
938           when comparing MySQL 4.1 to >= 5.0.
939
940           This is useful when you don't care about the trailing space
941           differences between MySQL versions which vary in their handling of
942           trailing spaces. MySQL 5.0 and later all retain trailing spaces in
943           "VARCHAR", while previous versions would remove them.
944
945       --[no]unique-checks
946           default: yes
947
948           Enable unique key checks ("SET UNIQUE_CHECKS=1").
949
950           Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".
951
952       --user
953           short form: -u; type: string
954
955           User for login if not current user.
956
957       --verbose
958           short form: -v; cumulative: yes
959
960           Print results of sync operations.
961
962           See "OUTPUT" for more details about the output.
963
964       --version
965           Show version and exit.
966
967       --wait
968           short form: -w; type: time
969
970           How long to wait for slaves to catch up to their master.
971
972           Make the master wait for the slave to catch up in replication
973           before comparing the tables.  The value is the number of seconds to
974           wait before timing out (see also "--timeout-ok").  Sets "--lock" to
975           1 and "--[no]transaction" to 0 by default.  If you see an error
976           such as the following,
977
978             MASTER_POS_WAIT returned -1
979
980           It means the timeout was exceeded and you need to increase it.
981
982           The default value of this option is influenced by other options.
983           To see what value is in effect, run with "--help".
984
985           To disable waiting entirely (except for locks), specify "--wait" 0.
986           This helps when the slave is lagging on tables that are not being
987           synced.
988
989       --where
990           type: string
991
992           "WHERE" clause to restrict syncing to part of the table.
993
994       --[no]zero-chunk
995           default: yes
996
997           Add a chunk for rows with zero or zero-equivalent values.  The only
998           has an effect when "--chunk-size" is specified.  The purpose of the
999           zero chunk is to capture a potentially large number of zero values
1000           that would imbalance the size of the first chunk.  For example, if
1001           a lot of negative numbers were inserted into an unsigned integer
1002           column causing them to be stored as zeros, then these zero values
1003           are captured by the zero chunk instead of the first chunk and all
1004           its non-zero values.
1005

DSN OPTIONS

1007       These DSN options are used to create a DSN.  Each option is given like
1008       "option=value".  The options are case-sensitive, so P and p are not the
1009       same option.  There cannot be whitespace before or after the "=" and if
1010       the value contains whitespace it must be quoted.  DSN options are
1011       comma-separated.  See the maatkit manpage for full details.
1012
1013       ·   A
1014
1015           dsn: charset; copy: yes
1016
1017           Default character set.
1018
1019       ·   D
1020
1021           dsn: database; copy: yes
1022
1023           Database containing the table to be synced.
1024
1025       ·   F
1026
1027           dsn: mysql_read_default_file; copy: yes
1028
1029           Only read default options from the given file
1030
1031       ·   h
1032
1033           dsn: host; copy: yes
1034
1035           Connect to host.
1036
1037       ·   p
1038
1039           dsn: password; copy: yes
1040
1041           Password to use when connecting.
1042
1043       ·   P
1044
1045           dsn: port; copy: yes
1046
1047           Port number to use for connection.
1048
1049       ·   S
1050
1051           dsn: mysql_socket; copy: yes
1052
1053           Socket file to use for connection.
1054
1055       ·   t
1056
1057           copy: yes
1058
1059           Table to be synced.
1060
1061       ·   u
1062
1063           dsn: user; copy: yes
1064
1065           User for login if not current user.
1066

DOWNLOADING

1068       You can download Maatkit from Google Code at
1069       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1070       easily with a command like the following:
1071
1072          wget http://www.maatkit.org/get/toolname
1073          or
1074          wget http://www.maatkit.org/trunk/toolname
1075
1076       Where "toolname" can be replaced with the name (or fragment of a name)
1077       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1078       installation is needed.  The first URL gets the latest released version
1079       of the tool, and the second gets the latest trunk code from Subversion.
1080

ENVIRONMENT

1082       The environment variable MKDEBUG enables verbose debugging output in
1083       all of the Maatkit tools:
1084
1085          MKDEBUG=1 mk-....
1086

SYSTEM REQUIREMENTS

1088       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1089       installed in any reasonably new version of Perl.
1090

BUGS

1092       For a list of known bugs see: http://www.maatkit.org/bugs/mk-table-sync
1093       <http://www.maatkit.org/bugs/mk-table-sync>.
1094
1095       Please use Google Code Issues and Groups to report bugs or request
1096       support: <http://code.google.com/p/maatkit/>.  You can also join
1097       #maatkit on Freenode to discuss Maatkit.
1098
1099       Please include the complete command-line used to reproduce the problem
1100       you are seeing, the version of all MySQL servers involved, the complete
1101       output of the tool when run with "--version", and if possible,
1102       debugging output produced by running with the "MKDEBUG=1" environment
1103       variable.
1104

COPYRIGHT, LICENSE AND WARRANTY

1106       This program is copyright 2007-2010 Baron Schwartz.  Feedback and
1107       improvements are welcome.
1108
1109       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1110       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1111       MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1112
1113       This program is free software; you can redistribute it and/or modify it
1114       under the terms of the GNU General Public License as published by the
1115       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1116       UNIX and similar systems, you can issue `man perlgpl' or `man
1117       perlartistic' to read these licenses.
1118
1119       You should have received a copy of the GNU General Public License along
1120       with this program; if not, write to the Free Software Foundation, Inc.,
1121       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
1122

AUTHOR

1124       Baron Schwartz
1125

ABOUT MAATKIT

1127       This tool is part of Maatkit, a toolkit for power users of MySQL.
1128       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1129       primary code contributors.  Both are employed by Percona.  Financial
1130       support for Maatkit development is primarily provided by Percona and
1131       its clients.
1132

HISTORY AND ACKNOWLEDGEMENTS

1134       My work is based in part on Giuseppe Maxia's work on distributed
1135       databases, <http://www.sysadminmag.com/articles/2004/0408/> and code
1136       derived from that article.  There is more explanation, and a link to
1137       the code, at <http://www.perlmonks.org/?node_id=381053>.
1138
1139       Another programmer extended Maxia's work even further.  Fabien Coelho
1140       changed and generalized Maxia's technique, introducing symmetry and
1141       avoiding some problems that might have caused too-frequent checksum
1142       collisions.  This work grew into pg_comparator,
1143       <http://www.coelho.net/pg_comparator/>.  Coelho also explained the
1144       technique further in a paper titled "Remote Comparison of Database
1145       Tables" (http://cri.ensmp.fr/classement/doc/A-375.pdf
1146       <http://cri.ensmp.fr/classement/doc/A-375.pdf>).
1147
1148       This existing literature mostly addressed how to find the differences
1149       between the tables, not how to resolve them once found.  I needed a
1150       tool that would not only find them efficiently, but would then resolve
1151       them.  I first began thinking about how to improve the technique
1152       further with my article http://tinyurl.com/mysql-data-diff-algorithm
1153       <http://tinyurl.com/mysql-data-diff-algorithm>, where I discussed a
1154       number of problems with the Maxia/Coelho "bottom-up" algorithm.  After
1155       writing that article, I began to write this tool.  I wanted to actually
1156       implement their algorithm with some improvements so I was sure I
1157       understood it completely.  I discovered it is not what I thought it
1158       was, and is considerably more complex than it appeared to me at first.
1159       Fabien Coelho was kind enough to address some questions over email.
1160
1161       The first versions of this tool implemented a version of the
1162       Coelho/Maxia algorithm, which I called "bottom-up", and my own, which I
1163       called "top-down."  Those algorithms are considerably more complex than
1164       the current algorithms and I have removed them from this tool, and may
1165       add them back later.  The improvements to the bottom-up algorithm are
1166       my original work, as is the top-down algorithm.  The techniques to
1167       actually resolve the differences are also my own work.
1168
1169       Another tool that can synchronize tables is the SQLyog Job Agent from
1170       webyog.  Thanks to Rohit Nadhani, SJA's author, for the conversations
1171       about the general techniques.  There is a comparison of mk-table-sync
1172       and SJA at http://tinyurl.com/maatkit-vs-sqlyog
1173       <http://tinyurl.com/maatkit-vs-sqlyog>
1174
1175       Thanks to the following people and organizations for helping in many
1176       ways:
1177
1178       The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB
1179       <http://www.mysql.com/>, Blue Ridge InternetWorks
1180       <http://www.briworks.com/>, Percona <http://www.percona.com/>, Fabien
1181       Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp
1182       (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at Perlmonks, And
1183       others too numerous to mention.
1184

VERSION

1186       This manual page documents Ver 1.0.30 Distrib 6839 $Revision: 6832 $.
1187
1188
1189
1190perl v5.12.1                      2010-08-01                  MK-TABLE-SYNC(1)