1MK-TABLE-SYNC(1)      User Contributed Perl Documentation     MK-TABLE-SYNC(1)


6       mk-table-sync - Synchronize MySQL table data efficiently.


9       Usage: mk-table-sync [OPTION...] DSN [DSN...]
11       mk-table-sync synchronizes data efficiently between MySQL tables.
13       This tool changes data, so for maximum safety, you should back up your
14       data before you use it.  When synchronizing a server that is a
15       replication slave with the --replicate or --sync-to-master methods, it
16       always makes the changes on the replication master, never the
17       replication slave directly.  This is in general the only safe way to
18       bring a replica back in sync with its master; changes to the replica
19       are usually the source of the problems in the first place.  However,
20       the changes it makes on the master should be no-op changes that set the
21       data to their current values, and actually affect only the replica.
22       Please read the detailed documentation that follows to learn more about
23       this.
25       Sync db.tbl on host1 to host2:
27         mk-table-sync --execute h=host1,D=db,t=tbl h=host2
29       Sync all tables on host1 to host2 and host3:
31         mk-table-sync --execute host1 host2 host3
33       Make slave1 have the same data as its replication master:
35         mk-table-sync --execute --sync-to-master slave1
37       Resolve differences that mk-table-checksum found on all slaves of
38       master1:
40         mk-table-sync --execute --replicate test.checksum master1
42       Same as above but only resolve differences on slave1:
44         mk-table-sync --execute --replicate test.checksum \
45           --sync-to-master slave1
47       Sync master2 in a master-master replication configuration, where
48       master2's copy of db.tbl is known or suspected to be incorrect:
50         mk-table-sync --execute --sync-to-master h=master2,D=db,t=tbl
52       Note that in the master-master configuration, the following will NOT do
53       what you want, because it will make changes directly on master2, which
54       will then flow through replication and change master1's data:
56         # Don't do this in a master-master setup!
57         mk-table-sync --execute h=master1,D=db,t=tbl master2


60       The following section is included to inform users about the potential
61       risks, whether known or unknown, of using this tool.  The two main
62       categories of risks are those created by the nature of the tool (e.g.
63       read-only tools vs. read-write tools) and those created by bugs.
65       With great power comes great responsibility!  This tool changes data,
66       so it is a good idea to back up your data.  It is also very powerful,
67       which means it is very complex, so you should run it with the
68       "--dry-run" option to see what it will do, until you're familiar with
69       its operation.  If you want to see which rows are different, without
70       changing any data, use "--print" instead of "--execute".
72       Be careful when using mk-table-sync in any master-master setup.
73       Master-master replication is inherently tricky, and it's easy to make
74       mistakes.  You need to be sure you're using the tool correctly for
75       master-master replication.  See the "SYNOPSIS" for the overview of the
76       correct usage.
78       Also be careful with tables that have foreign key constraints with "ON
79       DELETE" or "ON UPDATE" definitions because these might cause unintended
80       changes on the child tables.
82       In general, this tool is best suited when your tables have a primary
83       key or unique index.  Although it can synchronize data in tables
84       lacking a primary key or unique index, it might be best to synchronize
85       that data by another means.
87       At the time of this release, there is a potential bug using
88       "--lock-and-rename" with MySQL 5.1, a bug detecting certain
89       differences, a bug using ROUND() across different platforms, and a bug
90       mixing collations.
92       The authoritative source for updated information is always the online
93       issue tracking system.  Issues that affect this tool will be marked as
94       such.  You can see a list of such issues at the following URL:
95       <http://www.maatkit.org/bugs/mk-table-sync>.
97       See also "BUGS" for more information on filing bugs and getting help.


100       mk-table-sync does one-way and bidirectional synchronization of table
101       data.  It does not synchronize table structures, indexes, or any other
102       schema objects.  The following describes one-way synchronization.
103       "BIDIRECTIONAL SYNCING" is described later.
105       This tool is complex and functions in several different ways.  To use
106       it safely and effectively, you should understand three things: the
107       purpose of "--replicate", finding differences, and specifying hosts.
108       These three concepts are closely related and determine how the tool
109       will run.  The following is the abbreviated logic:
111          if DSN has a t part, sync only that table:
112             if 1 DSN:
113                if --sync-to-master:
114                   The DSN is a slave.  Connect to its master and sync.
115             if more than 1 DSN:
116                The first DSN is the source.  Sync each DSN in turn.
117          else if --replicate:
118             if --sync-to-master:
119                The DSN is a slave.  Connect to its master, find records
120                of differences, and fix.
121             else:
122                The DSN is the master.  Find slaves and connect to each,
123                find records of differences, and fix.
124          else:
125             if only 1 DSN and --sync-to-master:
126                The DSN is a slave.  Connect to its master, find tables and
127                filter with --databases etc, and sync each table to the master.
128             else:
129                find tables, filtering with --databases etc, and sync each
130                DSN to the first.
132       mk-table-sync can run in one of two ways: with "--replicate" or
133       without.  The default is to run without "--replicate" which causes mk-
134       table-sync to automatically find differences efficiently with one of
135       several algorithms (see "ALGORITHMS").  Alternatively, the value of
136       "--replicate", if specified, causes mk-table-sync to use the
137       differences already found by having previously ran mk-table-checksum
138       with its own "--replicate" option.  Strictly speaking, you don't need
139       to use "--replicate" because mk-table-sync can find differences, but
140       many people use "--replicate" if, for example, they checksum regularly
141       using mk-table-checksum then fix differences as needed with mk-table-
142       sync.  If you're unsure, read each tool's documentation carefully and
143       decide for yourself, or consult with an expert.
145       Regardless of whether "--replicate" is used or not, you need to specify
146       which hosts to sync.  There are two ways: with "--sync-to-master" or
147       without.  Specifying "--sync-to-master" makes mk-table-sync expect one
148       and only slave DSN on the command line.  The tool will automatically
149       discover the slave's master and sync it so that its data is the same as
150       its master.  This is accomplished by making changes on the master which
151       then flow through replication and update the slave to resolve its
152       differences.  Be careful though: although this option specifies and
153       syncs a single slave, if there are other slaves on the same master,
154       they will receive via replication the changes intended for the slave
155       that you're trying to sync.
157       Alternatively, if you do not specify "--sync-to-master", the first DSN
158       given on the command line is the source host.  There is only ever one
159       source host.  If you do not also specify "--replicate", then you must
160       specify at least one other DSN as the destination host.  There can be
161       one or more destination hosts.  Source and destination hosts must be
162       independent; they cannot be in the same replication topology.  mk-
163       table-sync will die with an error if it detects that a destination host
164       is a slave because changes are written directly to destination hosts
165       (and it's not safe to write directly to slaves).  Or, if you specify
166       "--replicate" (but not "--sync-to-master") then mk-table-sync expects
167       one and only one master DSN on the command line.  The tool will
168       automatically discover all the master's slaves and sync them to the
169       master.  This is the only way to sync several (all) slaves at once
170       (because "--sync-to-master" only specifies one slave).
172       Each host on the command line is specified as a DSN.  The first DSN (or
173       only DSN for cases like "--sync-to-master") provides default values for
174       other DSNs, whether those other DSNs are specified on the command line
175       or auto-discovered by the tool.  So in this example,
177         mk-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2
179       the host2 DSN inherits the "u" and "p" DSN parts from the host1 DSN.
180       Use the "--explain-hosts" option to see how mk-table-sync will
181       interpret the DSNs given on the command line.


184       If you specify the "--verbose" option, you'll see information about the
185       differences between the tables.  There is one row per table.  Each
186       server is printed separately.  For example,
188         # Syncing h=host1,D=test,t=test1
190         #      0       0      3      0 Chunk     13:00:00 13:00:17 2    test.test1
192       Table test.test1 on host1 required 3 "INSERT" statements to synchronize
193       and it used the Chunk algorithm (see "ALGORITHMS").  The sync operation
194       for this table started at 13:00:00 and ended 17 seconds later (times
195       taken from NOW() on the source host).  Because differences were found,
196       its "EXIT STATUS" was 2.
198       If you specify the "--print" option, you'll see the actual SQL
199       statements that the script uses to synchronize the table if "--execute"
200       is also specified.
202       If you want to see the SQL statements that mk-table-sync is using to
203       select chunks, nibbles, rows, etc., then specify "--print" once and
204       "--verbose" twice.  Be careful though: this can print a lot of SQL
205       statements.
207       There are cases where no combination of "INSERT", "UPDATE" or "DELETE"
208       statements can resolve differences without violating some unique key.
209       For example, suppose there's a primary key on column a and a unique key
210       on column b.  Then there is no way to sync these two tables with
211       straightforward UPDATE statements:
213        +---+---+  +---+---+
214        | a | b |  | a | b |
215        +---+---+  +---+---+
216        | 1 | 2 |  | 1 | 1 |
217        | 2 | 1 |  | 2 | 2 |
218        +---+---+  +---+---+
220       The tool rewrites queries to "DELETE" and "REPLACE" in this case.  This
221       is automatically handled after the first index violation, so you don't
222       have to worry about it.


225       Synchronizing a replication master and slave safely is a non-trivial
226       problem, in general.  There are all sorts of issues to think about,
227       such as other processes changing data, trying to change data on the
228       slave, whether the destination and source are a master-master pair, and
229       much more.
231       In general, the safe way to do it is to change the data on the master,
232       and let the changes flow through replication to the slave like any
233       other changes.  However, this works only if it's possible to REPLACE
234       into the table on the master.  REPLACE works only if there's a unique
235       index on the table (otherwise it just acts like an ordinary INSERT).
237       If your table has unique keys, you should use the "--sync-to-master"
238       and/or "--replicate" options to sync a slave to its master.  This will
239       generally do the right thing.  When there is no unique key on the
240       table, there is no choice but to change the data on the slave, and mk-
241       table-sync will detect that you're trying to do so.  It will complain
242       and die unless you specify "--no-check-slave" (see
243       "--[no]check-slave").
245       If you're syncing a table without a primary or unique key on a master-
246       master pair, you must change the data on the destination server.
247       Therefore, you need to specify "--no-bin-log" for safety (see
248       "--[no]bin-log").  If you don't, the changes you make on the
249       destination server will replicate back to the source server and change
250       the data there!
252       The generally safe thing to do on a master-master pair is to use the
253       "--sync-to-master" option so you don't change the data on the
254       destination server.  You will also need to specify "--no-check-slave"
255       to keep mk-table-sync from complaining that it is changing data on a
256       slave.


259       mk-table-sync has a generic data-syncing framework which uses different
260       algorithms to find differences.  The tool automatically chooses the
261       best algorithm for each table based on indexes, column types, and the
262       algorithm preferences specified by "--algorithms".  The following
263       algorithms are available, listed in their default order of preference:
265       Chunk
266           Finds an index whose first column is numeric (including date and
267           time types), and divides the column's range of values into chunks
268           of approximately "--chunk-size" rows.  Syncs a chunk at a time by
269           checksumming the entire chunk.  If the chunk differs on the source
270           and destination, checksums each chunk's rows individually to find
271           the rows that differ.
273           It is efficient when the column has sufficient cardinality to make
274           the chunks end up about the right size.
276           The initial per-chunk checksum is quite small and results in
277           minimal network traffic and memory consumption.  If a chunk's rows
278           must be examined, only the primary key columns and a checksum are
279           sent over the network, not the entire row.  If a row is found to be
280           different, the entire row will be fetched, but not before.
282       Nibble
283           Finds an index and ascends the index in fixed-size nibbles of
284           "--chunk-size" rows, using a non-backtracking algorithm (see mk-
285           archiver for more on this algorithm).  It is very similar to
286           "Chunk", but instead of pre-calculating the boundaries of each
287           piece of the table based on index cardinality, it uses "LIMIT" to
288           define each nibble's upper limit, and the previous nibble's upper
289           limit to define the lower limit.
291           It works in steps: one query finds the row that will define the
292           next nibble's upper boundary, and the next query checksums the
293           entire nibble.  If the nibble differs between the source and
294           destination, it examines the nibble row-by-row, just as "Chunk"
295           does.
297       GroupBy
298           Selects the entire table grouped by all columns, with a COUNT(*)
299           column added.  Compares all columns, and if they're the same,
300           compares the COUNT(*) column's value to determine how many rows to
301           insert or delete into the destination.  Works on tables with no
302           primary key or unique index.
304       Stream
305           Selects the entire table in one big stream and compares all
306           columns.  Selects all columns.  Much less efficient than the other
307           algorithms, but works when there is no suitable index for them to
308           use.
310       Future Plans
311           Possibilities for future algorithms are TempTable (what I
312           originally called bottom-up in earlier versions of this tool),
313           DrillDown (what I originally called top-down), and GroupByPrefix
314           (similar to how SqlYOG Job Agent works).  Each algorithm has
315           strengths and weaknesses.  If you'd like to implement your favorite
316           technique for finding differences between two sources of data on
317           possibly different servers, I'm willing to help.  The algorithms
318           adhere to a simple interface that makes it pretty easy to write
319           your own.


322       Bidirectional syncing is a new, experimental feature.  To make it work
323       reliably there are a number of strict limitations:
325         * only works when syncing one server to other independent servers
326         * does not work in any way with replication
327         * requires that the table(s) are chunkable with the Chunk algorithm
328         * is not N-way, only bidirectional between two servers at a time
329         * does not handle DELETE changes
331       For example, suppose we have three servers: c1, r1, r2.  c1 is the
332       central server, a pseudo-master to the other servers (viz. r1 and r2
333       are not slaves to c1).  r1 and r2 are remote servers.  Rows in table
334       foo are updated and inserted on all three servers and we want to
335       synchronize all the changes between all the servers.  Table foo has
336       columns:
338         id    int PRIMARY KEY
339         ts    timestamp auto updated
340         name  varchar
342       Auto-increment offsets are used so that new rows from any server do not
343       create conflicting primary key (id) values.  In general, newer rows, as
344       determined by the ts column, take precedence when a same but differing
345       row is found during the bidirectional sync.  "Same but differing" means
346       that two rows have the same primary key (id) value but different values
347       for some other column, like the name column in this example.  Same but
348       differing conflicts are resolved by a "conflict".  A conflict compares
349       some column of the competing rows to determine a "winner".  The winning
350       row becomes the source and its values are used to update the other row.
352       There are subtle differences between three columns used to achieve
353       bidirectional syncing that you should be familiar with: chunk column
354       ("--chunk-column"), comparison column(s) ("--columns"), and conflict
355       column ("--conflict-column").  The chunk column is only used to chunk
356       the table; e.g. "WHERE id >= 5 AND id < 10".  Chunks are checksummed
357       and when chunk checksums reveal a difference, the tool selects the rows
358       in that chunk and checksums the "--columns" for each row.  If a column
359       checksum differs, the rows have one or more conflicting column values.
360       In a traditional unidirectional sync, the conflict is a moot point
361       because it can be resolved simply by updating the entire destination
362       row with the source row's values.  In a bidirectional sync, however,
363       the "--conflict-column" (in accordance with other "--conflict-*"
364       options list below) is compared to determine which row is "correct" or
365       "authoritative"; this row becomes the "source".
367       To sync all three servers completely, two runs of mk-table-sync are
368       required.  The first run syncs c1 and r1, then syncs c1 and r2
369       including any changes from r1.  At this point c1 and r2 are completely
370       in sync, but r1 is missing any changes from r2 because c1 didn't have
371       these changes when it and r1 were synced.  So a second run is needed
372       which syncs the servers in the same order, but this time when c1 and r1
373       are synced r1 gets r2's changes.
375       The tool does not sync N-ways, only bidirectionally between the first
376       DSN given on the command line and each subsequent DSN in turn.  So the
377       tool in this example would be ran twice like:
379         mk-table-sync --bidirectional h=c1 h=r1 h=r2
381       The "--bidirectional" option enables this feature and causes various
382       sanity checks to be performed.  You must specify other options that
383       tell mk-table-sync how to resolve conflicts for same but differing
384       rows.  These options are:
386         * L<"--conflict-column">
387         * L<"--conflict-comparison">
388         * L<"--conflict-value">
389         * L<"--conflict-threshold">
390         * L<"--conflict-error">  (optional)
392       Use "--print" to test this option before "--execute".  The printed SQL
393       statements will have comments saying on which host the statement would
394       be executed if you used "--execute".
396       Technical side note: the first DSN is always the "left" server and the
397       other DSNs are always the "right" server.  Since either server can
398       become the source or destination it's confusing to think of them as
399       "src" and "dst".  Therefore, they're generically referred to as left
400       and right.  It's easy to remember this because the first DSN is always
401       to the left of the other server DSNs on the command line.


404       The following are the exit statuses (also called return values, or
405       return codes) when mk-table-sync finishes and exits.
407          STATUS  MEANING
408          ======  =======================================================
409          0       Success.
410          1       Internal error.
411          2       At least one table differed on the destination.
412          3       Combination of 1 and 2.


415       Specify at least one of "--print", "--execute", or "--dry-run".
417       "--where" and "--replicate" are mutually exclusive.
419       This tool accepts additional command-line arguments.  Refer to the
420       "SYNOPSIS" and usage information for details.
422       --algorithms
423           type: string; default: Chunk,Nibble,GroupBy,Stream
425           Algorithm to use when comparing the tables, in order of preference.
427           For each table, mk-table-sync will check if the table can be synced
428           with the given algorithms in the order that they're given.  The
429           first algorithm that can sync the table is used.  See "ALGORITHMS".
431       --ask-pass
432           Prompt for a password when connecting to MySQL.
434       --bidirectional
435           Enable bidirectional sync between first and subsequent hosts.
437           See "BIDIRECTIONAL SYNCING" for more information.
439       --[no]bin-log
440           default: yes
442           Log to the binary log ("SET SQL_LOG_BIN=1").
444           Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".
446       --buffer-in-mysql
447           Instruct MySQL to buffer queries in its memory.
449           This option adds the "SQL_BUFFER_RESULT" option to the comparison
450           queries.  This causes MySQL to execute the queries and place them
451           in a temporary table internally before sending the results back to
452           mk-table-sync.  The advantage of this strategy is that mk-table-
453           sync can fetch rows as desired without using a lot of memory inside
454           the Perl process, while releasing locks on the MySQL table (to
455           reduce contention with other queries).  The disadvantage is that it
456           uses more memory on the MySQL server instead.
458           You probably want to leave "--[no]buffer-to-client" enabled too,
459           because buffering into a temp table and then fetching it all into
460           Perl's memory is probably a silly thing to do.  This option is most
461           useful for the GroupBy and Stream algorithms, which may fetch a lot
462           of data from the server.
464       --[no]buffer-to-client
465           default: yes
467           Fetch rows one-by-one from MySQL while comparing.
469           This option enables "mysql_use_result" which causes MySQL to hold
470           the selected rows on the server until the tool fetches them.  This
471           allows the tool to use less memory but may keep the rows locked on
472           the server longer.
474           If this option is disabled by specifying "--no-buffer-to-client"
475           then "mysql_store_result" is used which causes MySQL to send all
476           selected rows to the tool at once.  This may result in the results
477           "cursor" being held open for a shorter time on the server, but if
478           the tables are large, it could take a long time anyway, and use all
479           your memory.
481           For most non-trivial data sizes, you want to leave this option
482           enabled.
484           This option is disabled when "--bidirectional" is used.
486       --charset
487           short form: -A; type: string
489           Default character set.  If the value is utf8, sets Perl's binmode
490           on STDOUT to utf8, passes the mysql_enable_utf8 option to
491           DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any
492           other value sets binmode on STDOUT without the utf8 layer, and runs
493           SET NAMES after connecting to MySQL.
495       --[no]check-master
496           default: yes
498           With "--sync-to-master", try to verify that the detected master is
499           the real master.
501       --[no]check-privileges
502           default: yes
504           Check that user has all necessary privileges on source and
505           destination table.
507       --[no]check-slave
508           default: yes
510           Check whether the destination server is a slave.
512           If the destination server is a slave, it's generally unsafe to make
513           changes on it.  However, sometimes you have to; "--replace" won't
514           work unless there's a unique index, for example, so you can't make
515           changes on the master in that scenario.  By default mk-table-sync
516           will complain if you try to change data on a slave.  Specify
517           "--no-check-slave" to disable this check.  Use it at your own risk.
519       --[no]check-triggers
520           default: yes
522           Check that no triggers are defined on the destination table.
524           Triggers were introduced in MySQL v5.0.2, so for older versions
525           this option has no effect because triggers will not be checked.
527       --chunk-column
528           type: string
530           Chunk the table on this column.
532       --chunk-index
533           type: string
535           Chunk the table using this index.
537       --chunk-size
538           type: string; default: 1000
540           Number of rows or data size per chunk.
542           The size of each chunk of rows for the "Chunk" and "Nibble"
543           algorithms.  The size can be either a number of rows, or a data
544           size.  Data sizes are specified with a suffix of k=kibibytes,
545           M=mebibytes, G=gibibytes.  Data sizes are converted to a number of
546           rows by dividing by the average row length.
548       --columns
549           short form: -c; type: array
551           Compare this comma-separated list of columns.
553       --config
554           type: Array
556           Read this comma-separated list of config files; if specified, this
557           must be the first option on the command line.
559       --conflict-column
560           type: string
562           Compare this column when rows conflict during a "--bidirectional"
563           sync.
565           When a same but differing row is found the value of this column
566           from each row is compared according to "--conflict-comparison",
567           "--conflict-value" and "--conflict-threshold" to determine which
568           row has the correct data and becomes the source.  The column can be
569           any type for which there is an appropriate "--conflict-comparison"
570           (this is almost all types except, for example, blobs).
572           This option only works with "--bidirectional".  See "BIDIRECTIONAL
573           SYNCING" for more information.
575       --conflict-comparison
576           type: string
578           Choose the "--conflict-column" with this property as the source.
580           The option affects how the "--conflict-column" values from the
581           conflicting rows are compared.  Possible comparisons are one of
582           these MAGIC_comparisons:
584             newest|oldest|greatest|least|equals|matches
587             ==========  =========================================================
588             newest      Newest temporal L<"--conflict-column"> value
589             oldest      Oldest temporal L<"--conflict-column"> value
590             greatest    Greatest numerical L<"--conflict-column"> value
591             least       Least numerical L<"--conflict-column"> value
592             equals      L<"--conflict-column"> value equal to L<"--conflict-value">
593             matches     L<"--conflict-column"> value matching Perl regex pattern
594                         L<"--conflict-value">
596           This option only works with "--bidirectional".  See "BIDIRECTIONAL
597           SYNCING" for more information.
599       --conflict-error
600           type: string; default: warn
602           How to report unresolvable conflicts and conflict errors
604           This option changes how the user is notified when a conflict cannot
605           be resolved or causes some kind of error.  Possible values are:
607             * warn: Print a warning to STDERR about the unresolvable conflict
608             * die:  Die, stop syncing, and print a warning to STDERR
610           This option only works with "--bidirectional".  See "BIDIRECTIONAL
611           SYNCING" for more information.
613       --conflict-threshold
614           type: string
616           Amount by which one "--conflict-column" must exceed the other.
618           The "--conflict-threshold" prevents a conflict from being resolved
619           if the absolute difference between the two "--conflict-column"
620           values is less than this amount.  For example, if two
621           "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
622           "2009-12-01 12:05:00" the difference is 5 minutes.  If
623           "--conflict-threshold" is set to "5m" the conflict will be
624           resolved, but if "--conflict-threshold" is set to "6m" the conflict
625           will fail to resolve because the difference is not greater than or
626           equal to 6 minutes.  In this latter case, "--conflict-error" will
627           report the failure.
629           This option only works with "--bidirectional".  See "BIDIRECTIONAL
630           SYNCING" for more information.
632       --conflict-value
633           type: string
635           Use this value for certain "--conflict-comparison".
637           This option gives the value for "equals" and "matches"
638           "--conflict-comparison".
640           This option only works with "--bidirectional".  See "BIDIRECTIONAL
641           SYNCING" for more information.
643       --databases
644           short form: -d; type: hash
646           Sync only this comma-separated list of databases.
648           A common request is to sync tables from one database with tables
649           from another database on the same or different server.  This is not
650           yet possible.  "--databases" will not do it, and you can't do it
651           with the D part of the DSN either because in the absence of a table
652           name it assumes the whole server should be synced and the D part
653           controls only the connection's default database.
655       --defaults-file
656           short form: -F; type: string
658           Only read mysql options from the given file.  You must give an
659           absolute pathname.
661       --dry-run
662           Analyze, decide the sync algorithm to use, print and exit.
664           Implies "--verbose" so you can see the results.  The results are in
665           the same output format that you'll see from actually running the
666           tool, but there will be zeros for rows affected.  This is because
667           the tool actually executes, but stops before it compares any data
668           and just returns zeros.  The zeros do not mean there are no changes
669           to be made.
671       --engines
672           short form: -e; type: hash
674           Sync only this comma-separated list of storage engines.
676       --execute
677           Execute queries to make the tables have identical data.
679           This option makes mk-table-sync actually sync table data by
680           executing all the queries that it created to resolve table
681           differences.  Therefore, the tables will be changed!  And unless
682           you also specify "--verbose", the changes will be made silently.
683           If this is not what you want, see "--print" or "--dry-run".
685       --explain-hosts
686           Print connection information and exit.
688           Print out a list of hosts to which mk-table-sync will connect, with
689           all the various connection options, and exit.
691       --float-precision
692           type: int
694           Precision for "FLOAT" and "DOUBLE" number-to-string conversion.
695           Causes FLOAT and DOUBLE values to be rounded to the specified
696           number of digits after the decimal point, with the ROUND() function
697           in MySQL.  This can help avoid checksum mismatches due to different
698           floating-point representations of the same values on different
699           MySQL versions and hardware.  The default is no rounding; the
700           values are converted to strings by the CONCAT() function, and MySQL
701           chooses the string representation.  If you specify a value of 2,
702           for example, then the values 1.008 and 1.009 will be rounded to
703           1.01, and will checksum as equal.
705       --[no]foreign-key-checks
706           default: yes
708           Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").
710           Specifying "--no-foreign-key-checks" will "SET
711           FOREIGN_KEY_CHECKS=0".
713       --function
714           type: string
716           Which hash function you'd like to use for checksums.
718           The default is "CRC32".  Other good choices include "MD5" and
719           "SHA1".  If you have installed the "FNV_64" user-defined function,
720           "mk-table-sync" will detect it and prefer to use it, because it is
721           much faster than the built-ins.  You can also use MURMUR_HASH if
722           you've installed that user-defined function.  Both of these are
723           distributed with Maatkit.  See mk-table-checksum for more
724           information and benchmarks.
726       --help
727           Show help and exit.
729       --[no]hex-blob
730           default: yes
732           HEX() "BLOB", "TEXT" and "BINARY" columns.
734           When row data from the source is fetched to create queries to sync
735           the data (i.e. the queries seen with "--print" and executed by
736           "--execute"), binary columns are wrapped in HEX() so the binary
737           data does not produce an invalid SQL statement.  You can disable
738           this option but you probably shouldn't.
740       --host
741           short form: -h; type: string
743           Connect to host.
745       --ignore-columns
746           type: Hash
748           Ignore this comma-separated list of column names in comparisons.
750           This option causes columns not to be compared.  However, if a row
751           is determined to differ between tables, all columns in that row
752           will be synced, regardless.  (It is not currently possible to
753           exclude columns from the sync process itself, only from the
754           comparison.)
756       --ignore-databases
757           type: Hash
759           Ignore this comma-separated list of databases.
761       --ignore-engines
762           type: Hash; default: FEDERATED,MRG_MyISAM
764           Ignore this comma-separated list of storage engines.
766       --ignore-tables
767           type: Hash
769           Ignore this comma-separated list of tables.
771           Table names may be qualified with the database name.
773       --[no]index-hint
774           default: yes
776           Add FORCE/USE INDEX hints to the chunk and row queries.
778           By default "mk-table-sync" adds a FORCE/USE INDEX hint to each SQL
779           statement to coerce MySQL into using the index chosen by the sync
780           algorithm or specified by "--chunk-index".  This is usually a good
781           thing, but in rare cases the index may not be the best for the
782           query so you can suppress the index hint by specifying
783           "--no-index-hint" and let MySQL choose the index.
785           This does not affect the queries printed by "--print"; it only
786           affects the chunk and row queries that "mk-table-sync" uses to
787           select and compare rows.
789       --lock
790           type: int
792           Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
794           This uses "LOCK TABLES".  This can help prevent tables being
795           changed while you're examining them.  The possible values are as
796           follows:
798             VALUE  MEANING
799             =====  =======================================================
800             0      Never lock tables.
801             1      Lock and unlock one time per sync cycle (as implemented
802                    by the syncing algorithm).  This is the most granular
803                    level of locking available.  For example, the Chunk
804                    algorithm will lock each chunk of C<N> rows, and then
805                    unlock them if they are the same on the source and the
806                    destination, before moving on to the next chunk.
807             2      Lock and unlock before and after each table.
808             3      Lock and unlock once for every server (DSN) synced, with
809                    C<FLUSH TABLES WITH READ LOCK>.
811           A replication slave is never locked if "--replicate" or
812           "--sync-to-master" is specified, since in theory locking the table
813           on the master should prevent any changes from taking place.  (You
814           are not changing data on your slave, right?)  If "--wait" is given,
815           the master (source) is locked and then the tool waits for the slave
816           to catch up to the master before continuing.
818           If "--transaction" is specified, "LOCK TABLES" is not used.
819           Instead, lock and unlock are implemented by beginning and
820           committing transactions.  The exception is if "--lock" is 3.
822           If "--no-transaction" is specified, then "LOCK TABLES" is used for
823           any value of "--lock". See "--[no]transaction".
825       --lock-and-rename
826           Lock the source and destination table, sync, then swap names.  This
827           is useful as a less-blocking ALTER TABLE, once the tables are
828           reasonably in sync with each other (which you may choose to
829           accomplish via any number of means, including dump and reload or
830           even something like mk-archiver).  It requires exactly two DSNs and
831           assumes they are on the same server, so it does no waiting for
832           replication or the like.  Tables are locked with LOCK TABLES.
834       --password
835           short form: -p; type: string
837           Password to use when connecting.
839       --pid
840           type: string
842           Create the given PID file.  The file contains the process ID of the
843           script.  The PID file is removed when the script exits.  Before
844           starting, the script checks if the PID file already exists.  If it
845           does not, then the script creates and writes its own PID to it.  If
846           it does, then the script checks the following: if the file contains
847           a PID and a process is running with that PID, then the script dies;
848           or, if there is no process running with that PID, then the script
849           overwrites the file with its own PID and starts; else, if the file
850           contains no PID, then the script dies.
852       --port
853           short form: -P; type: int
855           Port number to use for connection.
857       --print
858           Print queries that will resolve differences.
860           If you don't trust "mk-table-sync", or just want to see what it
861           will do, this is a good way to be safe.  These queries are valid
862           SQL and you can run them yourself if you want to sync the tables
863           manually.
865       --recursion-method
866           type: string
868           Preferred recursion method used to find slaves.
870           Possible methods are:
872             METHOD       USES
873             ===========  ================
874             processlist  SHOW PROCESSLIST
875             hosts        SHOW SLAVE HOSTS
877           The processlist method is preferred because SHOW SLAVE HOSTS is not
878           reliable.  However, the hosts method is required if the server uses
879           a non-standard port (not 3306).  Usually mk-table-sync does the
880           right thing and finds the slaves, but you may give a preferred
881           method and it will be used first.  If it doesn't find any slaves,
882           the other methods will be tried.
884       --replace
885           Write all "INSERT" and "UPDATE" statements as "REPLACE".
887           This is automatically switched on as needed when there are unique
888           index violations.
890       --replicate
891           type: string
893           Sync tables listed as different in this table.
895           Specifies that "mk-table-sync" should examine the specified table
896           to find data that differs.  The table is exactly the same as the
897           argument of the same name to mk-table-checksum.  That is, it
898           contains records of which tables (and ranges of values) differ
899           between the master and slave.
901           For each table and range of values that shows differences between
902           the master and slave, "mk-table-checksum" will sync that table,
903           with the appropriate "WHERE" clause, to its master.
905           This automatically sets "--wait" to 60 and causes changes to be
906           made on the master instead of the slave.
908           If "--sync-to-master" is specified, the tool will assume the server
909           you specified is the slave, and connect to the master as usual to
910           sync.
912           Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of
913           the server you specified.  If it is unable to find any slaves via
914           "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.
915           You must configure each slave's "report-host", "report-port" and
916           other options for this to work right.  After finding slaves, it
917           will inspect the specified table on each slave to find data that
918           needs to be synced, and sync it.
920           The tool examines the master's copy of the table first, assuming
921           that the master is potentially a slave as well.  Any table that
922           shows differences there will NOT be synced on the slave(s).  For
923           example, suppose your replication is set up as A->B, B->C, B->D.
924           Suppose you use this argument and specify server B.  The tool will
925           examine server B's copy of the table.  If it looks like server B's
926           data in table "test.tbl1" is different from server A's copy, the
927           tool will not sync that table on servers C and D.
929       --set-vars
930           type: string; default: wait_timeout=10000
932           Set these MySQL variables.  Immediately after connecting to MySQL,
933           this string will be appended to SET and executed.
935       --socket
936           short form: -S; type: string
938           Socket file to use for connection.
940       --sync-to-master
941           Treat the DSN as a slave and sync it to its master.
943           Treat the server you specified as a slave.  Inspect "SHOW SLAVE
944           STATUS", connect to the server's master, and treat the master as
945           the source and the slave as the destination.  Causes changes to be
946           made on the master.  Sets "--wait" to 60 by default, sets "--lock"
947           to 1 by default, and disables "--[no]transaction" by default.  See
948           also "--replicate", which changes this option's behavior.
950       --tables
951           short form: -t; type: hash
953           Sync only this comma-separated list of tables.
955           Table names may be qualified with the database name.
957       --timeout-ok
958           Keep going if "--wait" fails.
960           If you specify "--wait" and the slave doesn't catch up to the
961           master's position before the wait times out, the default behavior
962           is to abort.  This option makes the tool keep going anyway.
963           Warning: if you are trying to get a consistent comparison between
964           the two servers, you probably don't want to keep going after a
965           timeout.
967       --[no]transaction
968           Use transactions instead of "LOCK TABLES".
970           The granularity of beginning and committing transactions is
971           controlled by "--lock".  This is enabled by default, but since
972           "--lock" is disabled by default, it has no effect.
974           Most options that enable locking also disable transactions by
975           default, so if you want to use transactional locking (via "LOCK IN
976           SHARE MODE" and "FOR UPDATE", you must specify "--transaction"
977           explicitly.
979           If you don't specify "--transaction" explicitly "mk-table-sync"
980           will decide on a per-table basis whether to use transactions or
981           table locks.  It currently uses transactions on InnoDB tables, and
982           table locks on all others.
984           If "--no-transaction" is specified, then "mk-table-sync" will not
985           use transactions at all (not even for InnoDB tables) and locking is
986           controlled by "--lock".
988           When enabled, either explicitly or implicitly, the transaction
989           isolation level is set "REPEATABLE READ" and transactions are
990           started "WITH CONSISTENT SNAPSHOT".
992       --trim
993           TRIM() "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes.  Helps
994           when comparing MySQL 4.1 to >= 5.0.
996           This is useful when you don't care about the trailing space
997           differences between MySQL versions which vary in their handling of
998           trailing spaces. MySQL 5.0 and later all retain trailing spaces in
999           "VARCHAR", while previous versions would remove them.
1001       --[no]unique-checks
1002           default: yes
1004           Enable unique key checks ("SET UNIQUE_CHECKS=1").
1006           Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".
1008       --user
1009           short form: -u; type: string
1011           User for login if not current user.
1013       --verbose
1014           short form: -v; cumulative: yes
1016           Print results of sync operations.
1018           See "OUTPUT" for more details about the output.
1020       --version
1021           Show version and exit.
1023       --wait
1024           short form: -w; type: time
1026           How long to wait for slaves to catch up to their master.
1028           Make the master wait for the slave to catch up in replication
1029           before comparing the tables.  The value is the number of seconds to
1030           wait before timing out (see also "--timeout-ok").  Sets "--lock" to
1031           1 and "--[no]transaction" to 0 by default.  If you see an error
1032           such as the following,
1034             MASTER_POS_WAIT returned -1
1036           It means the timeout was exceeded and you need to increase it.
1038           The default value of this option is influenced by other options.
1039           To see what value is in effect, run with "--help".
1041           To disable waiting entirely (except for locks), specify "--wait" 0.
1042           This helps when the slave is lagging on tables that are not being
1043           synced.
1045       --where
1046           type: string
1048           "WHERE" clause to restrict syncing to part of the table.
1050       --[no]zero-chunk
1051           default: yes
1053           Add a chunk for rows with zero or zero-equivalent values.  The only
1054           has an effect when "--chunk-size" is specified.  The purpose of the
1055           zero chunk is to capture a potentially large number of zero values
1056           that would imbalance the size of the first chunk.  For example, if
1057           a lot of negative numbers were inserted into an unsigned integer
1058           column causing them to be stored as zeros, then these zero values
1059           are captured by the zero chunk instead of the first chunk and all
1060           its non-zero values.


1063       These DSN options are used to create a DSN.  Each option is given like
1064       "option=value".  The options are case-sensitive, so P and p are not the
1065       same option.  There cannot be whitespace before or after the "=" and if
1066       the value contains whitespace it must be quoted.  DSN options are
1067       comma-separated.  See the maatkit manpage for full details.
1069       •   A
1071           dsn: charset; copy: yes
1073           Default character set.
1075       •   D
1077           dsn: database; copy: yes
1079           Database containing the table to be synced.
1081       •   F
1083           dsn: mysql_read_default_file; copy: yes
1085           Only read default options from the given file
1087       •   h
1089           dsn: host; copy: yes
1091           Connect to host.
1093       •   p
1095           dsn: password; copy: yes
1097           Password to use when connecting.
1099       •   P
1101           dsn: port; copy: yes
1103           Port number to use for connection.
1105       •   S
1107           dsn: mysql_socket; copy: yes
1109           Socket file to use for connection.
1111       •   t
1113           copy: yes
1115           Table to be synced.
1117       •   u
1119           dsn: user; copy: yes
1121           User for login if not current user.


1124       You can download Maatkit from Google Code at
1125       <http://code.google.com/p/maatkit/>, or you can get any of the tools
1126       easily with a command like the following:
1128          wget http://www.maatkit.org/get/toolname
1129          or
1130          wget http://www.maatkit.org/trunk/toolname
1132       Where "toolname" can be replaced with the name (or fragment of a name)
1133       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
1134       installation is needed.  The first URL gets the latest released version
1135       of the tool, and the second gets the latest trunk code from Subversion.


1138       The environment variable MKDEBUG enables verbose debugging output in
1139       all of the Maatkit tools:
1141          MKDEBUG=1 mk-....


1144       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1145       installed in any reasonably new version of Perl.


1148       For a list of known bugs see:
1149       <http://www.maatkit.org/bugs/mk-table-sync>.
1151       Please use Google Code Issues and Groups to report bugs or request
1152       support: <http://code.google.com/p/maatkit/>.  You can also join
1153       #maatkit on Freenode to discuss Maatkit.
1155       Please include the complete command-line used to reproduce the problem
1156       you are seeing, the version of all MySQL servers involved, the complete
1157       output of the tool when run with "--version", and if possible,
1158       debugging output produced by running with the "MKDEBUG=1" environment
1159       variable.


1162       This program is copyright 2007-2011 Baron Schwartz.  Feedback and
1163       improvements are welcome.
1169       This program is free software; you can redistribute it and/or modify it
1170       under the terms of the GNU General Public License as published by the
1171       Free Software Foundation, version 2; OR the Perl Artistic License.  On
1172       UNIX and similar systems, you can issue `man perlgpl' or `man
1173       perlartistic' to read these licenses.
1175       You should have received a copy of the GNU General Public License along
1176       with this program; if not, write to the Free Software Foundation, Inc.,
1177       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.


1180       Baron Schwartz


1183       This tool is part of Maatkit, a toolkit for power users of MySQL.
1184       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1185       primary code contributors.  Both are employed by Percona.  Financial
1186       support for Maatkit development is primarily provided by Percona and
1187       its clients.


1190       My work is based in part on Giuseppe Maxia's work on distributed
1191       databases, <http://www.sysadminmag.com/articles/2004/0408/> and code
1192       derived from that article.  There is more explanation, and a link to
1193       the code, at <http://www.perlmonks.org/?node_id=381053>.
1195       Another programmer extended Maxia's work even further.  Fabien Coelho
1196       changed and generalized Maxia's technique, introducing symmetry and
1197       avoiding some problems that might have caused too-frequent checksum
1198       collisions.  This work grew into pg_comparator,
1199       <http://www.coelho.net/pg_comparator/>.  Coelho also explained the
1200       technique further in a paper titled "Remote Comparison of Database
1201       Tables" (<http://cri.ensmp.fr/classement/doc/A-375.pdf>).
1203       This existing literature mostly addressed how to find the differences
1204       between the tables, not how to resolve them once found.  I needed a
1205       tool that would not only find them efficiently, but would then resolve
1206       them.  I first began thinking about how to improve the technique
1207       further with my article <http://tinyurl.com/mysql-data-diff-algorithm>,
1208       where I discussed a number of problems with the Maxia/Coelho "bottom-
1209       up" algorithm.  After writing that article, I began to write this tool.
1210       I wanted to actually implement their algorithm with some improvements
1211       so I was sure I understood it completely.  I discovered it is not what
1212       I thought it was, and is considerably more complex than it appeared to
1213       me at first.  Fabien Coelho was kind enough to address some questions
1214       over email.
1216       The first versions of this tool implemented a version of the
1217       Coelho/Maxia algorithm, which I called "bottom-up", and my own, which I
1218       called "top-down."  Those algorithms are considerably more complex than
1219       the current algorithms and I have removed them from this tool, and may
1220       add them back later.  The improvements to the bottom-up algorithm are
1221       my original work, as is the top-down algorithm.  The techniques to
1222       actually resolve the differences are also my own work.
1224       Another tool that can synchronize tables is the SQLyog Job Agent from
1225       webyog.  Thanks to Rohit Nadhani, SJA's author, for the conversations
1226       about the general techniques.  There is a comparison of mk-table-sync
1227       and SJA at <http://tinyurl.com/maatkit-vs-sqlyog>
1229       Thanks to the following people and organizations for helping in many
1230       ways:
1232       The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB
1233       <http://www.mysql.com/>, Blue Ridge InternetWorks
1234       <http://www.briworks.com/>, Percona <http://www.percona.com/>, Fabien
1235       Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp
1236       (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at Perlmonks, And
1237       others too numerous to mention.


1240       This manual page documents Ver 1.0.31 Distrib 7540 $Revision: 7476 $.
1244perl v5.38.0                      2023-07-20                  MK-TABLE-SYNC(1)