1MK-TABLE-SYNC(1)      User Contributed Perl Documentation     MK-TABLE-SYNC(1)
2
3
4

NAME

6       mk-table-sync - Synchronize MySQL table data efficiently.
7

SYNOPSIS

9       Usage: mk-table-sync [OPTION...] DSN [DSN...]
10
11       mk-table-sync synchronizes data efficiently between MySQL tables.
12
13       This tool changes data, so for maximum safety, you should back up your
14       data before you use it.  When synchronizing a server that is a
15       replication slave with the --replicate or --sync-to-master methods, it
16       always makes the changes on the replication master, never the
17       replication slave directly.  This is in general the only safe way to
18       bring a replica back in sync with its master; changes to the replica
19       are usually the source of the problems in the first place.  However,
20       the changes it makes on the master should be no-op changes that set the
21       data to their current values, and actually affect only the replica.
22       Please read the detailed documentation that follows to learn more about
23       this.
24
25       Sync db.tbl on host1 to host2:
26
27         mk-table-sync --execute h=host1,D=db,t=tbl h=host2
28
29       Sync all tables on host1 to host2 and host3:
30
31         mk-table-sync --execute host1 host2 host3
32
33       Make slave1 have the same data as its replication master:
34
35         mk-table-sync --execute --sync-to-master slave1
36
37       Resolve differences that mk-table-checksum found on all slaves of
38       master1:
39
40         mk-table-sync --execute --replicate test.checksum master1
41
42       Same as above but only resolve differences on slave1:
43
44         mk-table-sync --execute --replicate test.checksum \
45           --sync-to-master slave1
46
47       Sync master2 in a master-master replication configuration, where
48       master2's copy of db.tbl is known or suspected to be incorrect:
49
50         mk-table-sync --execute --sync-to-master h=master2,D=db,t=tbl
51
52       Note that in the master-master configuration, the following will NOT do
53       what you want, because it will make changes directly on master2, which
54       will then flow through replication and change master1's data:
55
56         # Don't do this in a master-master setup!
57         mk-table-sync --execute h=master1,D=db,t=tbl master2
58

RISKS

60       The following section is included to inform users about the potential
61       risks, whether known or unknown, of using this tool.  The two main
62       categories of risks are those created by the nature of the tool (e.g.
63       read-only tools vs. read-write tools) and those created by bugs.
64
65       With great power comes great responsibility!  This tool changes data,
66       so it is a good idea to back up your data.  It is also very powerful,
67       which means it is very complex, so you should run it with the
68       "--dry-run" option to see what it will do, until you're familiar with
69       its operation.  If you want to see which rows are different, without
70       changing any data, use "--print" instead of "--execute".
71
72       Be careful when using mk-table-sync in any master-master setup.
73       Master-master replication is inherently tricky, and it's easy to make
74       mistakes.  You need to be sure you're using the tool correctly for
75       master-master replication.  See the "SYNOPSIS" for the overview of the
76       correct usage.
77
78       Also be careful with tables that have foreign key constraints with "ON
79       DELETE" or "ON UPDATE" definitions because these might cause unintended
80       changes on the child tables.
81
82       In general, this tool is best suited when your tables have a primary
83       key or unique index.  Although it can synchronize data in tables
84       lacking a primary key or unique index, it might be best to synchronize
85       that data by another means.
86
87       At the time of this release, there is a potential bug using
88       "--lock-and-rename" with MySQL 5.1, a bug detecting certain
89       differences, a bug using ROUND() across different platforms, and a bug
90       mixing collations.
91
92       The authoritative source for updated information is always the online
93       issue tracking system.  Issues that affect this tool will be marked as
94       such.  You can see a list of such issues at the following URL:
95       <http://www.maatkit.org/bugs/mk-table-sync>.
96
97       See also "BUGS" for more information on filing bugs and getting help.
98

DESCRIPTION

100       mk-table-sync does one-way and bidirectional synchronization of table
101       data.  It does not synchronize table structures, indexes, or any other
102       schema objects.  The following describes one-way synchronization.
103       "BIDIRECTIONAL SYNCING" is described later.
104
105       This tool is complex and functions in several different ways.  To use
106       it safely and effectively, you should understand three things: the
107       purpose of "--replicate", finding differences, and specifying hosts.
108       These three concepts are closely related and determine how the tool
109       will run.  The following is the abbreviated logic:
110
111          if DSN has a t part, sync only that table:
112             if 1 DSN:
113                if --sync-to-master:
114                   The DSN is a slave.  Connect to its master and sync.
115             if more than 1 DSN:
116                The first DSN is the source.  Sync each DSN in turn.
117          else if --replicate:
118             if --sync-to-master:
119                The DSN is a slave.  Connect to its master, find records
120                of differences, and fix.
121             else:
122                The DSN is the master.  Find slaves and connect to each,
123                find records of differences, and fix.
124          else:
125             if only 1 DSN and --sync-to-master:
126                The DSN is a slave.  Connect to its master, find tables and
127                filter with --databases etc, and sync each table to the master.
128             else:
129                find tables, filtering with --databases etc, and sync each
130                DSN to the first.
131
132       mk-table-sync can run in one of two ways: with "--replicate" or
133       without.  The default is to run without "--replicate" which causes mk-
134       table-sync to automatically find differences efficiently with one of
135       several algorithms (see "ALGORITHMS").  Alternatively, the value of
136       "--replicate", if specified, causes mk-table-sync to use the
137       differences already found by having previously ran mk-table-checksum
138       with its own "--replicate" option.  Strictly speaking, you don't need
139       to use "--replicate" because mk-table-sync can find differences, but
140       many people use "--replicate" if, for example, they checksum regularly
141       using mk-table-checksum then fix differences as needed with mk-table-
142       sync.  If you're unsure, read each tool's documentation carefully and
143       decide for yourself, or consult with an expert.
144
145       Regardless of whether "--replicate" is used or not, you need to specify
146       which hosts to sync.  There are two ways: with "--sync-to-master" or
147       without.  Specifying "--sync-to-master" makes mk-table-sync expect one
148       and only slave DSN on the command line.  The tool will automatically
149       discover the slave's master and sync it so that its data is the same as
150       its master.  This is accomplished by making changes on the master which
151       then flow through replication and update the slave to resolve its
152       differences.  Be careful though: although this option specifies and
153       syncs a single slave, if there are other slaves on the same master,
154       they will receive via replication the changes intended for the slave
155       that you're trying to sync.
156
157       Alternatively, if you do not specify "--sync-to-master", the first DSN
158       given on the command line is the source host.  There is only ever one
159       source host.  If you do not also specify "--replicate", then you must
160       specify at least one other DSN as the destination host.  There can be
161       one or more destination hosts.  Source and destination hosts must be
162       independent; they cannot be in the same replication topology.  mk-
163       table-sync will die with an error if it detects that a destination host
164       is a slave because changes are written directly to destination hosts
165       (and it's not safe to write directly to slaves).  Or, if you specify
166       "--replicate" (but not "--sync-to-master") then mk-table-sync expects
167       one and only one master DSN on the command line.  The tool will
168       automatically discover all the master's slaves and sync them to the
169       master.  This is the only way to sync several (all) slaves at once
170       (because "--sync-to-master" only specifies one slave).
171
172       Each host on the command line is specified as a DSN.  The first DSN (or
173       only DSN for cases like "--sync-to-master") provides default values for
174       other DSNs, whether those other DSNs are specified on the command line
175       or auto-discovered by the tool.  So in this example,
176
177         mk-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2
178
179       the host2 DSN inherits the "u" and "p" DSN parts from the host1 DSN.
180       Use the "--explain-hosts" option to see how mk-table-sync will
181       interpret the DSNs given on the command line.
182

OUTPUT

184       If you specify the "--verbose" option, you'll see information about the
185       differences between the tables.  There is one row per table.  Each
186       server is printed separately.  For example,
187
188         # Syncing h=host1,D=test,t=test1
189         # DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
190         #      0       0      3      0 Chunk     13:00:00 13:00:17 2    test.test1
191
192       Table test.test1 on host1 required 3 "INSERT" statements to synchronize
193       and it used the Chunk algorithm (see "ALGORITHMS").  The sync operation
194       for this table started at 13:00:00 and ended 17 seconds later (times
195       taken from "NOW()" on the source host).  Because differences were
196       found, its "EXIT STATUS" was 2.
197
198       If you specify the "--print" option, you'll see the actual SQL
199       statements that the script uses to synchronize the table if "--execute"
200       is also specified.
201
202       If you want to see the SQL statements that mk-table-sync is using to
203       select chunks, nibbles, rows, etc., then specify "--print" once and
204       "--verbose" twice.  Be careful though: this can print a lot of SQL
205       statements.
206
207       There are cases where no combination of "INSERT", "UPDATE" or "DELETE"
208       statements can resolve differences without violating some unique key.
209       For example, suppose there's a primary key on column a and a unique key
210       on column b.  Then there is no way to sync these two tables with
211       straightforward UPDATE statements:
212
213        +---+---+  +---+---+
214        | a | b |  | a | b |
215        +---+---+  +---+---+
216        | 1 | 2 |  | 1 | 1 |
217        | 2 | 1 |  | 2 | 2 |
218        +---+---+  +---+---+
219
220       The tool rewrites queries to "DELETE" and "REPLACE" in this case.  This
221       is automatically handled after the first index violation, so you don't
222       have to worry about it.
223

REPLICATION SAFETY

225       Synchronizing a replication master and slave safely is a non-trivial
226       problem, in general.  There are all sorts of issues to think about,
227       such as other processes changing data, trying to change data on the
228       slave, whether the destination and source are a master-master pair, and
229       much more.
230
231       In general, the safe way to do it is to change the data on the master,
232       and let the changes flow through replication to the slave like any
233       other changes.  However, this works only if it's possible to REPLACE
234       into the table on the master.  REPLACE works only if there's a unique
235       index on the table (otherwise it just acts like an ordinary INSERT).
236
237       If your table has unique keys, you should use the "--sync-to-master"
238       and/or "--replicate" options to sync a slave to its master.  This will
239       generally do the right thing.  When there is no unique key on the
240       table, there is no choice but to change the data on the slave, and mk-
241       table-sync will detect that you're trying to do so.  It will complain
242       and die unless you specify "--no-check-slave" (see
243       "--[no]check-slave").
244
245       If you're syncing a table without a primary or unique key on a master-
246       master pair, you must change the data on the destination server.
247       Therefore, you need to specify "--no-bin-log" for safety (see
248       "--[no]bin-log").  If you don't, the changes you make on the
249       destination server will replicate back to the source server and change
250       the data there!
251
252       The generally safe thing to do on a master-master pair is to use the
253       "--sync-to-master" option so you don't change the data on the
254       destination server.  You will also need to specify "--no-check-slave"
255       to keep mk-table-sync from complaining that it is changing data on a
256       slave.
257

ALGORITHMS

259       mk-table-sync has a generic data-syncing framework which uses different
260       algorithms to find differences.  The tool automatically chooses the
261       best algorithm for each table based on indexes, column types, and the
262       algorithm preferences specified by "--algorithms".  The following
263       algorithms are available, listed in their default order of preference:
264
265       Chunk
266           Finds an index whose first column is numeric (including date and
267           time types), and divides the column's range of values into chunks
268           of approximately "--chunk-size" rows.  Syncs a chunk at a time by
269           checksumming the entire chunk.  If the chunk differs on the source
270           and destination, checksums each chunk's rows individually to find
271           the rows that differ.
272
273           It is efficient when the column has sufficient cardinality to make
274           the chunks end up about the right size.
275
276           The initial per-chunk checksum is quite small and results in
277           minimal network traffic and memory consumption.  If a chunk's rows
278           must be examined, only the primary key columns and a checksum are
279           sent over the network, not the entire row.  If a row is found to be
280           different, the entire row will be fetched, but not before.
281
282       Nibble
283           Finds an index and ascends the index in fixed-size nibbles of
284           "--chunk-size" rows, using a non-backtracking algorithm (see mk-
285           archiver for more on this algorithm).  It is very similar to
286           "Chunk", but instead of pre-calculating the boundaries of each
287           piece of the table based on index cardinality, it uses "LIMIT" to
288           define each nibble's upper limit, and the previous nibble's upper
289           limit to define the lower limit.
290
291           It works in steps: one query finds the row that will define the
292           next nibble's upper boundary, and the next query checksums the
293           entire nibble.  If the nibble differs between the source and
294           destination, it examines the nibble row-by-row, just as "Chunk"
295           does.
296
297       GroupBy
298           Selects the entire table grouped by all columns, with a COUNT(*)
299           column added.  Compares all columns, and if they're the same,
300           compares the COUNT(*) column's value to determine how many rows to
301           insert or delete into the destination.  Works on tables with no
302           primary key or unique index.
303
304       Stream
305           Selects the entire table in one big stream and compares all
306           columns.  Selects all columns.  Much less efficient than the other
307           algorithms, but works when there is no suitable index for them to
308           use.
309
310       Future Plans
311           Possibilities for future algorithms are TempTable (what I
312           originally called bottom-up in earlier versions of this tool),
313           DrillDown (what I originally called top-down), and GroupByPrefix
314           (similar to how SqlYOG Job Agent works).  Each algorithm has
315           strengths and weaknesses.  If you'd like to implement your favorite
316           technique for finding differences between two sources of data on
317           possibly different servers, I'm willing to help.  The algorithms
318           adhere to a simple interface that makes it pretty easy to write
319           your own.
320

BIDIRECTIONAL SYNCING

322       Bidirectional syncing is a new, experimental feature.  To make it work
323       reliably there are a number of strict limitations:
324
325         * only works when syncing one server to other independent servers
326         * does not work in any way with replication
327         * requires that the table(s) are chunkable with the Chunk algorithm
328         * is not N-way, only bidirectional between two servers at a time
329         * does not handle DELETE changes
330
331       For example, suppose we have three servers: c1, r1, r2.  c1 is the
332       central server, a pseudo-master to the other servers (viz. r1 and r2
333       are not slaves to c1).  r1 and r2 are remote servers.  Rows in table
334       foo are updated and inserted on all three servers and we want to
335       synchronize all the changes between all the servers.  Table foo has
336       columns:
337
338         id    int PRIMARY KEY
339         ts    timestamp auto updated
340         name  varchar
341
342       Auto-increment offsets are used so that new rows from any server do not
343       create conflicting primary key (id) values.  In general, newer rows, as
344       determined by the ts column, take precedence when a same but differing
345       row is found during the bidirectional sync.  "Same but differing" means
346       that two rows have the same primary key (id) value but different values
347       for some other column, like the name column in this example.  Same but
348       differing conflicts are resolved by a "conflict".  A conflict compares
349       some column of the competing rows to determine a "winner".  The winning
350       row becomes the source and its values are used to update the other row.
351
352       There are subtle differences between three columns used to achieve
353       bidirectional syncing that you should be familiar with: chunk column
354       ("--chunk-column"), comparison column(s) ("--columns"), and conflict
355       column ("--conflict-column").  The chunk column is only used to chunk
356       the table; e.g. "WHERE id >= 5 AND id < 10".  Chunks are checksummed
357       and when chunk checksums reveal a difference, the tool selects the rows
358       in that chunk and checksums the "--columns" for each row.  If a column
359       checksum differs, the rows have one or more conflicting column values.
360       In a traditional unidirectional sync, the conflict is a moot point
361       because it can be resolved simply by updating the entire destination
362       row with the source row's values.  In a bidirectional sync, however,
363       the "--conflict-column" (in accordance with other "--conflict-*"
364       options list below) is compared to determine which row is "correct" or
365       "authoritative"; this row becomes the "source".
366
367       To sync all three servers completely, two runs of mk-table-sync are
368       required.  The first run syncs c1 and r1, then syncs c1 and r2
369       including any changes from r1.  At this point c1 and r2 are completely
370       in sync, but r1 is missing any changes from r2 because c1 didn't have
371       these changes when it and r1 were synced.  So a second run is needed
372       which syncs the servers in the same order, but this time when c1 and r1
373       are synced r1 gets r2's changes.
374
375       The tool does not sync N-ways, only bidirectionally between the first
376       DSN given on the command line and each subsequent DSN in turn.  So the
377       tool in this example would be ran twice like:
378
379         mk-table-sync --bidirectional h=c1 h=r1 h=r2
380
381       The "--bidirectional" option enables this feature and causes various
382       sanity checks to be performed.  You must specify other options that
383       tell mk-table-sync how to resolve conflicts for same but differing
384       rows.  These options are:
385
386         * L<"--conflict-column">
387         * L<"--conflict-comparison">
388         * L<"--conflict-value">
389         * L<"--conflict-threshold">
390         * L<"--conflict-error">  (optional)
391
392       Use "--print" to test this option before "--execute".  The printed SQL
393       statements will have comments saying on which host the statement would
394       be executed if you used "--execute".
395
396       Technical side note: the first DSN is always the "left" server and the
397       other DSNs are always the "right" server.  Since either server can
398       become the source or destination it's confusing to think of them as
399       "src" and "dst".  Therefore, they're generically referred to as left
400       and right.  It's easy to remember this because the first DSN is always
401       to the left of the other server DSNs on the command line.
402

EXIT STATUS

404       The following are the exit statuses (also called return values, or
405       return codes) when mk-table-sync finishes and exits.
406
407          STATUS  MEANING
408          ======  =======================================================
409          0       Success.
410          1       Internal error.
411          2       At least one table differed on the destination.
412          3       Combination of 1 and 2.
413

OPTIONS

415       Specify at least one of "--print", "--execute", or "--dry-run".
416
417       "--where" and "--replicate" are mutually exclusive.
418
419       This tool accepts additional command-line arguments.  Refer to the
420       "SYNOPSIS" and usage information for details.
421
422       --algorithms
423           type: string; default: Chunk,Nibble,GroupBy,Stream
424
425           Algorithm to use when comparing the tables, in order of preference.
426
427           For each table, mk-table-sync will check if the table can be synced
428           with the given algorithms in the order that they're given.  The
429           first algorithm that can sync the table is used.  See "ALGORITHMS".
430
431       --ask-pass
432           Prompt for a password when connecting to MySQL.
433
434       --bidirectional
435           Enable bidirectional sync between first and subsequent hosts.
436
437           See "BIDIRECTIONAL SYNCING" for more information.
438
439       --[no]bin-log
440           default: yes
441
442           Log to the binary log ("SET SQL_LOG_BIN=1").
443
444           Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".
445
446       --buffer-in-mysql
447           Instruct MySQL to buffer queries in its memory.
448
449           This option adds the "SQL_BUFFER_RESULT" option to the comparison
450           queries.  This causes MySQL to execute the queries and place them
451           in a temporary table internally before sending the results back to
452           mk-table-sync.  The advantage of this strategy is that mk-table-
453           sync can fetch rows as desired without using a lot of memory inside
454           the Perl process, while releasing locks on the MySQL table (to
455           reduce contention with other queries).  The disadvantage is that it
456           uses more memory on the MySQL server instead.
457
458           You probably want to leave "--[no]buffer-to-client" enabled too,
459           because buffering into a temp table and then fetching it all into
460           Perl's memory is probably a silly thing to do.  This option is most
461           useful for the GroupBy and Stream algorithms, which may fetch a lot
462           of data from the server.
463
464       --[no]buffer-to-client
465           default: yes
466
467           Fetch rows one-by-one from MySQL while comparing.
468
469           This option enables "mysql_use_result" which causes MySQL to hold
470           the selected rows on the server until the tool fetches them.  This
471           allows the tool to use less memory but may keep the rows locked on
472           the server longer.
473
474           If this option is disabled by specifying "--no-buffer-to-client"
475           then "mysql_store_result" is used which causes MySQL to send all
476           selected rows to the tool at once.  This may result in the results
477           "cursor" being held open for a shorter time on the server, but if
478           the tables are large, it could take a long time anyway, and use all
479           your memory.
480
481           For most non-trivial data sizes, you want to leave this option
482           enabled.
483
484           This option is disabled when "--bidirectional" is used.
485
486       --charset
487           short form: -A; type: string
488
489           Default character set.  If the value is utf8, sets Perl's binmode
490           on STDOUT to utf8, passes the mysql_enable_utf8 option to
491           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
492           other value sets binmode on STDOUT without the utf8 layer, and runs
493           SET NAMES after connecting to MySQL.
494
495       --[no]check-master
496           default: yes
497
498           With "--sync-to-master", try to verify that the detected master is
499           the real master.
500
501       --[no]check-privileges
502           default: yes
503
504           Check that user has all necessary privileges on source and
505           destination table.
506
507       --[no]check-slave
508           default: yes
509
510           Check whether the destination server is a slave.
511
512           If the destination server is a slave, it's generally unsafe to make
513           changes on it.  However, sometimes you have to; "--replace" won't
514           work unless there's a unique index, for example, so you can't make
515           changes on the master in that scenario.  By default mk-table-sync
516           will complain if you try to change data on a slave.  Specify
517           "--no-check-slave" to disable this check.  Use it at your own risk.
518
519       --[no]check-triggers
520           default: yes
521
522           Check that no triggers are defined on the destination table.
523
524           Triggers were introduced in MySQL v5.0.2, so for older versions
525           this option has no effect because triggers will not be checked.
526
527       --chunk-column
528           type: string
529
530           Chunk the table on this column.
531
532       --chunk-index
533           type: string
534
535           Chunk the table using this index.
536
537       --chunk-size
538           type: string; default: 1000
539
540           Number of rows or data size per chunk.
541
542           The size of each chunk of rows for the "Chunk" and "Nibble"
543           algorithms.  The size can be either a number of rows, or a data
544           size.  Data sizes are specified with a suffix of k=kibibytes,
545           M=mebibytes, G=gibibytes.  Data sizes are converted to a number of
546           rows by dividing by the average row length.
547
548       --columns
549           short form: -c; type: array
550
551           Compare this comma-separated list of columns.
552
553       --config
554           type: Array
555
556           Read this comma-separated list of config files; if specified, this
557           must be the first option on the command line.
558
559       --conflict-column
560           type: string
561
562           Compare this column when rows conflict during a "--bidirectional"
563           sync.
564
565           When a same but differing row is found the value of this column
566           from each row is compared according to "--conflict-comparison",
567           "--conflict-value" and "--conflict-threshold" to determine which
568           row has the correct data and becomes the source.  The column can be
569           any type for which there is an appropriate "--conflict-comparison"
570           (this is almost all types except, for example, blobs).
571
572           This option only works with "--bidirectional".  See "BIDIRECTIONAL
573           SYNCING" for more information.
574
575       --conflict-comparison
576           type: string
577
578           Choose the "--conflict-column" with this property as the source.
579
580           The option affects how the "--conflict-column" values from the
581           conflicting rows are compared.  Possible comparisons are one of
582           these MAGIC_comparisons:
583
584             newest|oldest|greatest|least|equals|matches
585
586             COMPARISON  CHOOSES ROW WITH
587             ==========  =========================================================
588             newest      Newest temporal L<"--conflict-column"> value
589             oldest      Oldest temporal L<"--conflict-column"> value
590             greatest    Greatest numerical L<"--conflict-column"> value
591             least       Least numerical L<"--conflict-column"> value
592             equals      L<"--conflict-column"> value equal to L<"--conflict-value">
593             matches     L<"--conflict-column"> value matching Perl regex pattern
594                         L<"--conflict-value">
595
596           This option only works with "--bidirectional".  See "BIDIRECTIONAL
597           SYNCING" for more information.
598
599       --conflict-error
600           type: string; default: warn
601
602           How to report unresolvable conflicts and conflict errors
603
604           This option changes how the user is notified when a conflict cannot
605           be resolved or causes some kind of error.  Possible values are:
606
607             * warn: Print a warning to STDERR about the unresolvable conflict
608             * die:  Die, stop syncing, and print a warning to STDERR
609
610           This option only works with "--bidirectional".  See "BIDIRECTIONAL
611           SYNCING" for more information.
612
613       --conflict-threshold
614           type: string
615
616           Amount by which one "--conflict-column" must exceed the other.
617
618           The "--conflict-threshold" prevents a conflict from being resolved
619           if the absolute difference between the two "--conflict-column"
620           values is less than this amount.  For example, if two
621           "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
622           "2009-12-01 12:05:00" the difference is 5 minutes.  If
623           "--conflict-threshold" is set to "5m" the conflict will be
624           resolved, but if "--conflict-threshold" is set to "6m" the conflict
625           will fail to resolve because the difference is not greater than or
626           equal to 6 minutes.  In this latter case, "--conflict-error" will
627           report the failure.
628
629           This option only works with "--bidirectional".  See "BIDIRECTIONAL
630           SYNCING" for more information.
631
632       --conflict-value
633           type: string
634
635           Use this value for certain "--conflict-comparison".
636
637           This option gives the value for "equals" and "matches"
638           "--conflict-comparison".
639
640           This option only works with "--bidirectional".  See "BIDIRECTIONAL
641           SYNCING" for more information.
642
643       --databases
644           short form: -d; type: hash
645
646           Sync only this comma-separated list of databases.
647
648           A common request is to sync tables from one database with tables
649           from another database on the same or different server.  This is not
650           yet possible.  "--databases" will not do it, and you can't do it
651           with the D part of the DSN either because in the absence of a table
652           name it assumes the whole server should be synced and the D part
653           controls only the connection's default database.
654
655       --defaults-file
656           short form: -F; type: string
657
658           Only read mysql options from the given file.  You must give an
659           absolute pathname.
660
661       --dry-run
662           Analyze, decide the sync algorithm to use, print and exit.
663
664           Implies "--verbose" so you can see the results.  The results are in
665           the same output format that you'll see from actually running the
666           tool, but there will be zeros for rows affected.  This is because
667           the tool actually executes, but stops before it compares any data
668           and just returns zeros.  The zeros do not mean there are no changes
669           to be made.
670
671       --engines
672           short form: -e; type: hash
673
674           Sync only this comma-separated list of storage engines.
675
676       --execute
677           Execute queries to make the tables have identical data.
678
679           This option makes mk-table-sync actually sync table data by
680           executing all the queries that it created to resolve table
681           differences.  Therefore, the tables will be changed!  And unless
682           you also specify "--verbose", the changes will be made silently.
683           If this is not what you want, see "--print" or "--dry-run".
684
685       --explain-hosts
686           Print connection information and exit.
687
688           Print out a list of hosts to which mk-table-sync will connect, with
689           all the various connection options, and exit.
690
691       --float-precision
692           type: int
693
694           Precision for "FLOAT" and "DOUBLE" number-to-string conversion.
695           Causes FLOAT and DOUBLE values to be rounded to the specified
696           number of digits after the decimal point, with the ROUND() function
697           in MySQL.  This can help avoid checksum mismatches due to different
698           floating-point representations of the same values on different
699           MySQL versions and hardware.  The default is no rounding; the
700           values are converted to strings by the CONCAT() function, and MySQL
701           chooses the string representation.  If you specify a value of 2,
702           for example, then the values 1.008 and 1.009 will be rounded to
703           1.01, and will checksum as equal.
704
705       --[no]foreign-key-checks
706           default: yes
707
708           Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").
709
710           Specifying "--no-foreign-key-checks" will "SET
711           FOREIGN_KEY_CHECKS=0".
712
713       --function
714           type: string
715
716           Which hash function you'd like to use for checksums.
717
718           The default is "CRC32".  Other good choices include "MD5" and
719           "SHA1".  If you have installed the "FNV_64" user-defined function,
720           "mk-table-sync" will detect it and prefer to use it, because it is
721           much faster than the built-ins.  You can also use MURMUR_HASH if
722           you've installed that user-defined function.  Both of these are
723           distributed with Maatkit.  See mk-table-checksum for more
724           information and benchmarks.
725
726       --help
727           Show help and exit.
728
729       --[no]hex-blob
730           default: yes
731
732           "HEX()" "BLOB", "TEXT" and "BINARY" columns.
733
734           When row data from the source is fetched to create queries to sync
735           the data (i.e. the queries seen with "--print" and executed by
736           "--execute"), binary columns are wrapped in HEX() so the binary
737           data does not produce an invalid SQL statement.  You can disable
738           this option but you probably shouldn't.
739
740       --host
741           short form: -h; type: string
742
743           Connect to host.
744
745       --ignore-columns
746           type: Hash
747
748           Ignore this comma-separated list of column names in comparisons.
749
750           This option causes columns not to be compared.  However, if a row
751           is determined to differ between tables, all columns in that row
752           will be synced, regardless.  (It is not currently possible to
753           exclude columns from the sync process itself, only from the
754           comparison.)
755
756       --ignore-databases
757           type: Hash
758
759           Ignore this comma-separated list of databases.
760
761       --ignore-engines
762           type: Hash; default: FEDERATED,MRG_MyISAM
763
764           Ignore this comma-separated list of storage engines.
765
766       --ignore-tables
767           type: Hash
768
769           Ignore this comma-separated list of tables.
770
771           Table names may be qualified with the database name.
772
773       --[no]index-hint
774           default: yes
775
776           Add FORCE/USE INDEX hints to the chunk and row queries.
777
778           By default "mk-table-sync" adds a FORCE/USE INDEX hint to each SQL
779           statement to coerce MySQL into using the index chosen by the sync
780           algorithm or specified by "--chunk-index".  This is usually a good
781           thing, but in rare cases the index may not be the best for the
782           query so you can suppress the index hint by specifying
783           "--no-index-hint" and let MySQL choose the index.
784
785           This does not affect the queries printed by "--print"; it only
786           affects the chunk and row queries that "mk-table-sync" uses to
787           select and compare rows.
788
789       --lock
790           type: int
791
792           Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
793
794           This uses "LOCK TABLES".  This can help prevent tables being
795           changed while you're examining them.  The possible values are as
796           follows:
797
798             VALUE  MEANING
799             =====  =======================================================
800             0      Never lock tables.
801             1      Lock and unlock one time per sync cycle (as implemented
802                    by the syncing algorithm).  This is the most granular
803                    level of locking available.  For example, the Chunk
804                    algorithm will lock each chunk of C<N> rows, and then
805                    unlock them if they are the same on the source and the
806                    destination, before moving on to the next chunk.
807             2      Lock and unlock before and after each table.
808             3      Lock and unlock once for every server (DSN) synced, with
809                    C<FLUSH TABLES WITH READ LOCK>.
810
811           A replication slave is never locked if "--replicate" or
812           "--sync-to-master" is specified, since in theory locking the table
813           on the master should prevent any changes from taking place.  (You
814           are not changing data on your slave, right?)  If "--wait" is given,
815           the master (source) is locked and then the tool waits for the slave
816           to catch up to the master before continuing.
817
818           If "--transaction" is specified, "LOCK TABLES" is not used.
819           Instead, lock and unlock are implemented by beginning and
820           committing transactions.  The exception is if "--lock" is 3.
821
822           If "--no-transaction" is specified, then "LOCK TABLES" is used for
823           any value of "--lock". See "--[no]transaction".
824
825       --lock-and-rename
826           Lock the source and destination table, sync, then swap names.  This
827           is useful as a less-blocking ALTER TABLE, once the tables are
828           reasonably in sync with each other (which you may choose to
829           accomplish via any number of means, including dump and reload or
830           even something like mk-archiver).  It requires exactly two DSNs and
831           assumes they are on the same server, so it does no waiting for
832           replication or the like.  Tables are locked with LOCK TABLES.
833
834       --password
835           short form: -p; type: string
836
837           Password to use when connecting.
838
839       --pid
840           type: string
841
842           Create the given PID file.  The file contains the process ID of the
843           script.  The PID file is removed when the script exits.  Before
844           starting, the script checks if the PID file already exists.  If it
845           does not, then the script creates and writes its own PID to it.  If
846           it does, then the script checks the following: if the file contains
847           a PID and a process is running with that PID, then the script dies;
848           or, if there is no process running with that PID, then the script
849           overwrites the file with its own PID and starts; else, if the file
850           contains no PID, then the script dies.
851
852       --port
853           short form: -P; type: int
854
855           Port number to use for connection.
856
857       --print
858           Print queries that will resolve differences.
859
860           If you don't trust "mk-table-sync", or just want to see what it
861           will do, this is a good way to be safe.  These queries are valid
862           SQL and you can run them yourself if you want to sync the tables
863           manually.
864
865       --recursion-method
866           type: string
867
868           Preferred recursion method used to find slaves.
869
870           Possible methods are:
871
872             METHOD       USES
873             ===========  ================
874             processlist  SHOW PROCESSLIST
875             hosts        SHOW SLAVE HOSTS
876
877           The processlist method is preferred because SHOW SLAVE HOSTS is not
878           reliable.  However, the hosts method is required if the server uses
879           a non-standard port (not 3306).  Usually mk-table-sync does the
880           right thing and finds the slaves, but you may give a preferred
881           method and it will be used first.  If it doesn't find any slaves,
882           the other methods will be tried.
883
884       --replace
885           Write all "INSERT" and "UPDATE" statements as "REPLACE".
886
887           This is automatically switched on as needed when there are unique
888           index violations.
889
890       --replicate
891           type: string
892
893           Sync tables listed as different in this table.
894
895           Specifies that "mk-table-sync" should examine the specified table
896           to find data that differs.  The table is exactly the same as the
897           argument of the same name to mk-table-checksum.  That is, it
898           contains records of which tables (and ranges of values) differ
899           between the master and slave.
900
901           For each table and range of values that shows differences between
902           the master and slave, "mk-table-checksum" will sync that table,
903           with the appropriate "WHERE" clause, to its master.
904
905           This automatically sets "--wait" to 60 and causes changes to be
906           made on the master instead of the slave.
907
908           If "--sync-to-master" is specified, the tool will assume the server
909           you specified is the slave, and connect to the master as usual to
910           sync.
911
912           Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of
913           the server you specified.  If it is unable to find any slaves via
914           "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.
915           You must configure each slave's "report-host", "report-port" and
916           other options for this to work right.  After finding slaves, it
917           will inspect the specified table on each slave to find data that
918           needs to be synced, and sync it.
919
920           The tool examines the master's copy of the table first, assuming
921           that the master is potentially a slave as well.  Any table that
922           shows differences there will NOT be synced on the slave(s).  For
923           example, suppose your replication is set up as A->B, B->C, B->D.
924           Suppose you use this argument and specify server B.  The tool will
925           examine server B's copy of the table.  If it looks like server B's
926           data in table "test.tbl1" is different from server A's copy, the
927           tool will not sync that table on servers C and D.
928
929       --set-vars
930           type: string; default: wait_timeout=10000
931
932           Set these MySQL variables.  Immediately after connecting to MySQL,
933           this string will be appended to SET and executed.
934
935       --socket
936           short form: -S; type: string
937
938           Socket file to use for connection.
939
940       --sync-to-master
941           Treat the DSN as a slave and sync it to its master.
942
943           Treat the server you specified as a slave.  Inspect "SHOW SLAVE
944           STATUS", connect to the server's master, and treat the master as
945           the source and the slave as the destination.  Causes changes to be
946           made on the master.  Sets "--wait" to 60 by default, sets "--lock"
947           to 1 by default, and disables "--[no]transaction" by default.  See
948           also "--replicate", which changes this option's behavior.
949
950       --tables
951           short form: -t; type: hash
952
953           Sync only this comma-separated list of tables.
954
955           Table names may be qualified with the database name.
956
957       --timeout-ok
958           Keep going if "--wait" fails.
959
960           If you specify "--wait" and the slave doesn't catch up to the
961           master's position before the wait times out, the default behavior
962           is to abort.  This option makes the tool keep going anyway.
963           Warning: if you are trying to get a consistent comparison between
964           the two servers, you probably don't want to keep going after a
965           timeout.
966
967       --[no]transaction
968           Use transactions instead of "LOCK TABLES".
969
970           The granularity of beginning and committing transactions is
971           controlled by "--lock".  This is enabled by default, but since
972           "--lock" is disabled by default, it has no effect.
973
974           Most options that enable locking also disable transactions by
975           default, so if you want to use transactional locking (via "LOCK IN
976           SHARE MODE" and "FOR UPDATE", you must specify "--transaction"
977           explicitly.
978
979           If you don't specify "--transaction" explicitly "mk-table-sync"
980           will decide on a per-table basis whether to use transactions or
981           table locks.  It currently uses transactions on InnoDB tables, and
982           table locks on all others.
983
984           If "--no-transaction" is specified, then "mk-table-sync" will not
985           use transactions at all (not even for InnoDB tables) and locking is
986           controlled by "--lock".
987
988           When enabled, either explicitly or implicitly, the transaction
989           isolation level is set "REPEATABLE READ" and transactions are
990           started "WITH CONSISTENT SNAPSHOT".
991
992       --trim
993           "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes.  Helps
994           when comparing MySQL 4.1 to >= 5.0.
995
996           This is useful when you don't care about the trailing space
997           differences between MySQL versions which vary in their handling of
998           trailing spaces. MySQL 5.0 and later all retain trailing spaces in
999           "VARCHAR", while previous versions would remove them.
1000
1001       --[no]unique-checks
1002           default: yes
1003
1004           Enable unique key checks ("SET UNIQUE_CHECKS=1").
1005
1006           Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".
1007
1008       --user
1009           short form: -u; type: string
1010
1011           User for login if not current user.
1012
1013       --verbose
1014           short form: -v; cumulative: yes
1015
1016           Print results of sync operations.
1017
1018           See "OUTPUT" for more details about the output.
1019
1020       --version
1021           Show version and exit.
1022
1023       --wait
1024           short form: -w; type: time
1025
1026           How long to wait for slaves to catch up to their master.
1027
1028           Make the master wait for the slave to catch up in replication
1029           before comparing the tables.  The value is the number of seconds to
1030           wait before timing out (see also "--timeout-ok").  Sets "--lock" to
1031           1 and "--[no]transaction" to 0 by default.  If you see an error
1032           such as the following,
1033
1034             MASTER_POS_WAIT returned -1
1035
1036           It means the timeout was exceeded and you need to increase it.
1037
1038           The default value of this option is influenced by other options.
1039           To see what value is in effect, run with "--help".
1040
1041           To disable waiting entirely (except for locks), specify "--wait" 0.
1042           This helps when the slave is lagging on tables that are not being
1043           synced.
1044
1045       --where
1046           type: string
1047
1048           "WHERE" clause to restrict syncing to part of the table.
1049
1050       --[no]zero-chunk
1051           default: yes
1052
1053           Add a chunk for rows with zero or zero-equivalent values.  The only
1054           has an effect when "--chunk-size" is specified.  The purpose of the
1055           zero chunk is to capture a potentially large number of zero values
1056           that would imbalance the size of the first chunk.  For example, if
1057           a lot of negative numbers were inserted into an unsigned integer
1058           column causing them to be stored as zeros, then these zero values
1059           are captured by the zero chunk instead of the first chunk and all
1060           its non-zero values.
1061

DSN OPTIONS

1063       These DSN options are used to create a DSN.  Each option is given like
1064       "option=value".  The options are case-sensitive, so P and p are not the
1065       same option.  There cannot be whitespace before or after the "=" and if
1066       the value contains whitespace it must be quoted.  DSN options are
1067       comma-separated.  See the maatkit manpage for full details.
1068
1069       ·   A
1070
1071           dsn: charset; copy: yes
1072
1073           Default character set.
1074
1075       ·   D
1076
1077           dsn: database; copy: yes
1078
1079           Database containing the table to be synced.
1080
1081       ·   F
1082
1083           dsn: mysql_read_default_file; copy: yes
1084
1085           Only read default options from the given file
1086
1087       ·   h
1088
1089           dsn: host; copy: yes
1090
1091           Connect to host.
1092
1093       ·   p
1094
1095           dsn: password; copy: yes
1096
1097           Password to use when connecting.
1098
1099       ·   P
1100
1101           dsn: port; copy: yes
1102
1103           Port number to use for connection.
1104
1105       ·   S
1106
1107           dsn: mysql_socket; copy: yes
1108
1109           Socket file to use for connection.
1110
1111       ·   t
1112
1113           copy: yes
1114
1115           Table to be synced.
1116
1117       ·   u
1118
1119           dsn: user; copy: yes
1120
1121           User for login if not current user.
1122

DOWNLOADING

1124       You can download Maatkit from Google Code at
1125       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1126       easily with a command like the following:
1127
1128          wget http://www.maatkit.org/get/toolname
1129          or
1130          wget http://www.maatkit.org/trunk/toolname
1131
1132       Where "toolname" can be replaced with the name (or fragment of a name)
1133       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1134       installation is needed.  The first URL gets the latest released version
1135       of the tool, and the second gets the latest trunk code from Subversion.
1136

ENVIRONMENT

1138       The environment variable MKDEBUG enables verbose debugging output in
1139       all of the Maatkit tools:
1140
1141          MKDEBUG=1 mk-....
1142

SYSTEM REQUIREMENTS

1144       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1145       installed in any reasonably new version of Perl.
1146

BUGS

1148       For a list of known bugs see:
1149       <http://www.maatkit.org/bugs/mk-table-sync>.
1150
1151       Please use Google Code Issues and Groups to report bugs or request
1152       support: <http://code.google.com/p/maatkit/>.  You can also join
1153       #maatkit on Freenode to discuss Maatkit.
1154
1155       Please include the complete command-line used to reproduce the problem
1156       you are seeing, the version of all MySQL servers involved, the complete
1157       output of the tool when run with "--version", and if possible,
1158       debugging output produced by running with the "MKDEBUG=1" environment
1159       variable.
1160

COPYRIGHT, LICENSE AND WARRANTY

1162       This program is copyright 2007-2011 Baron Schwartz.  Feedback and
1163       improvements are welcome.
1164
1165       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1166       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1167       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1168
1169       This program is free software; you can redistribute it and/or modify it
1170       under the terms of the GNU General Public License as published by the
1171       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1172       UNIX and similar systems, you can issue `man perlgpl' or `man
1173       perlartistic' to read these licenses.
1174
1175       You should have received a copy of the GNU General Public License along
1176       with this program; if not, write to the Free Software Foundation, Inc.,
1177       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
1178

AUTHOR

1180       Baron Schwartz
1181

ABOUT MAATKIT

1183       This tool is part of Maatkit, a toolkit for power users of MySQL.
1184       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1185       primary code contributors.  Both are employed by Percona.  Financial
1186       support for Maatkit development is primarily provided by Percona and
1187       its clients.
1188

HISTORY AND ACKNOWLEDGMENTS

1190       My work is based in part on Giuseppe Maxia's work on distributed
1191       databases, <http://www.sysadminmag.com/articles/2004/0408/> and code
1192       derived from that article.  There is more explanation, and a link to
1193       the code, at <http://www.perlmonks.org/?node_id=381053>.
1194
1195       Another programmer extended Maxia's work even further.  Fabien Coelho
1196       changed and generalized Maxia's technique, introducing symmetry and
1197       avoiding some problems that might have caused too-frequent checksum
1198       collisions.  This work grew into pg_comparator,
1199       <http://www.coelho.net/pg_comparator/>.  Coelho also explained the
1200       technique further in a paper titled "Remote Comparison of Database
1201       Tables" (<http://cri.ensmp.fr/classement/doc/A-375.pdf>).
1202
1203       This existing literature mostly addressed how to find the differences
1204       between the tables, not how to resolve them once found.  I needed a
1205       tool that would not only find them efficiently, but would then resolve
1206       them.  I first began thinking about how to improve the technique
1207       further with my article <http://tinyurl.com/mysql-data-diff-algorithm>,
1208       where I discussed a number of problems with the Maxia/Coelho "bottom-
1209       up" algorithm.  After writing that article, I began to write this tool.
1210       I wanted to actually implement their algorithm with some improvements
1211       so I was sure I understood it completely.  I discovered it is not what
1212       I thought it was, and is considerably more complex than it appeared to
1213       me at first.  Fabien Coelho was kind enough to address some questions
1214       over email.
1215
1216       The first versions of this tool implemented a version of the
1217       Coelho/Maxia algorithm, which I called "bottom-up", and my own, which I
1218       called "top-down."  Those algorithms are considerably more complex than
1219       the current algorithms and I have removed them from this tool, and may
1220       add them back later.  The improvements to the bottom-up algorithm are
1221       my original work, as is the top-down algorithm.  The techniques to
1222       actually resolve the differences are also my own work.
1223
1224       Another tool that can synchronize tables is the SQLyog Job Agent from
1225       webyog.  Thanks to Rohit Nadhani, SJA's author, for the conversations
1226       about the general techniques.  There is a comparison of mk-table-sync
1227       and SJA at <http://tinyurl.com/maatkit-vs-sqlyog>
1228
1229       Thanks to the following people and organizations for helping in many
1230       ways:
1231
1232       The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB
1233       <http://www.mysql.com/>, Blue Ridge InternetWorks
1234       <http://www.briworks.com/>, Percona <http://www.percona.com/>, Fabien
1235       Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp
1236       (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at Perlmonks, And
1237       others too numerous to mention.
1238

VERSION

1240       This manual page documents Ver 1.0.31 Distrib 7540 $Revision: 7476 $.
1241
1242
1243
1244perl v5.28.1                      2011-06-08                  MK-TABLE-SYNC(1)
Impressum