1MK-TABLE-SYNC(1) User Contributed Perl Documentation MK-TABLE-SYNC(1)
2
3
4
6 mk-table-sync - Synchronize MySQL table data efficiently.
7
9 Usage: mk-table-sync [OPTION...] DSN [DSN...]
10
11 mk-table-sync synchronizes data efficiently between MySQL tables.
12
13 This tool changes data, so for maximum safety, you should back up your
14 data before you use it. When synchronizing a server that is a
15 replication slave with the --replicate or --sync-to-master methods, it
16 always makes the changes on the replication master, never the
17 replication slave directly. This is in general the only safe way to
18 bring a replica back in sync with its master; changes to the replica
19 are usually the source of the problems in the first place. However,
20 the changes it makes on the master should be no-op changes that set the
21 data to their current values, and actually affect only the replica.
22 Please read the detailed documentation that follows to learn more about
23 this.
24
25 Sync db.tbl on host1 to host2:
26
27 mk-table-sync --execute h=host1,D=db,t=tbl h=host2
28
29 Sync all tables on host1 to host2 and host3:
30
31 mk-table-sync --execute host1 host2 host3
32
33 Make slave1 have the same data as its replication master:
34
35 mk-table-sync --execute --sync-to-master slave1
36
37 Resolve differences that mk-table-checksum found on all slaves of
38 master1:
39
40 mk-table-sync --execute --replicate test.checksum master1
41
42 Same as above but only resolve differences on slave1:
43
44 mk-table-sync --execute --replicate test.checksum \
45 --sync-to-master slave1
46
47 Sync master2 in a master-master replication configuration, where
48 master2's copy of db.tbl is known or suspected to be incorrect:
49
50 mk-table-sync --execute --sync-to-master h=master2,D=db,t=tbl
51
52 Note that in the master-master configuration, the following will NOT do
53 what you want, because it will make changes directly on master2, which
54 will then flow through replication and change master1's data:
55
56 # Don't do this in a master-master setup!
57 mk-table-sync --execute h=master1,D=db,t=tbl master2
58
60 The following section is included to inform users about the potential
61 risks, whether known or unknown, of using this tool. The two main
62 categories of risks are those created by the nature of the tool (e.g.
63 read-only tools vs. read-write tools) and those created by bugs.
64
65 With great power comes great responsibility! This tool changes data,
66 so it is a good idea to back up your data. It is also very powerful,
67 which means it is very complex, so you should run it with the
68 "--dry-run" option to see what it will do, until you're familiar with
69 its operation. If you want to see which rows are different, without
70 changing any data, use "--print" instead of "--execute".
71
72 Be careful when using mk-table-sync in any master-master setup.
73 Master-master replication is inherently tricky, and it's easy to make
74 mistakes. You need to be sure you're using the tool correctly for
75 master-master replication. See the "SYNOPSIS" for the overview of the
76 correct usage.
77
78 Also be careful with tables that have foreign key constraints with "ON
79 DELETE" or "ON UPDATE" definitions because these might cause unintended
80 changes on the child tables.
81
82 In general, this tool is best suited when your tables have a primary
83 key or unique index. Although it can synchronize data in tables
84 lacking a primary key or unique index, it might be best to synchronize
85 that data by another means.
86
87 At the time of this release, there is a potential bug using
88 "--lock-and-rename" with MySQL 5.1, a bug detecting certain
89 differences, a bug using ROUND() across different platforms, and a bug
90 mixing collations.
91
92 The authoritative source for updated information is always the online
93 issue tracking system. Issues that affect this tool will be marked as
94 such. You can see a list of such issues at the following URL:
95 <http://www.maatkit.org/bugs/mk-table-sync>.
96
97 See also "BUGS" for more information on filing bugs and getting help.
98
100 mk-table-sync does one-way and bidirectional synchronization of table
101 data. It does not synchronize table structures, indexes, or any other
102 schema objects. The following describes one-way synchronization.
103 "BIDIRECTIONAL SYNCING" is described later.
104
105 This tool is complex and functions in several different ways. To use
106 it safely and effectively, you should understand three things: the
107 purpose of "--replicate", finding differences, and specifying hosts.
108 These three concepts are closely related and determine how the tool
109 will run. The following is the abbreviated logic:
110
111 if DSN has a t part, sync only that table:
112 if 1 DSN:
113 if --sync-to-master:
114 The DSN is a slave. Connect to its master and sync.
115 if more than 1 DSN:
116 The first DSN is the source. Sync each DSN in turn.
117 else if --replicate:
118 if --sync-to-master:
119 The DSN is a slave. Connect to its master, find records
120 of differences, and fix.
121 else:
122 The DSN is the master. Find slaves and connect to each,
123 find records of differences, and fix.
124 else:
125 if only 1 DSN and --sync-to-master:
126 The DSN is a slave. Connect to its master, find tables and
127 filter with --databases etc, and sync each table to the master.
128 else:
129 find tables, filtering with --databases etc, and sync each
130 DSN to the first.
131
132 mk-table-sync can run in one of two ways: with "--replicate" or
133 without. The default is to run without "--replicate" which causes mk-
134 table-sync to automatically find differences efficiently with one of
135 several algorithms (see "ALGORITHMS"). Alternatively, the value of
136 "--replicate", if specified, causes mk-table-sync to use the
137 differences already found by having previously ran mk-table-checksum
138 with its own "--replicate" option. Strictly speaking, you don't need
139 to use "--replicate" because mk-table-sync can find differences, but
140 many people use "--replicate" if, for example, they checksum regularly
141 using mk-table-checksum then fix differences as needed with mk-table-
142 sync. If you're unsure, read each tool's documentation carefully and
143 decide for yourself, or consult with an expert.
144
145 Regardless of whether "--replicate" is used or not, you need to specify
146 which hosts to sync. There are two ways: with "--sync-to-master" or
147 without. Specifying "--sync-to-master" makes mk-table-sync expect one
148 and only slave DSN on the command line. The tool will automatically
149 discover the slave's master and sync it so that its data is the same as
150 its master. This is accomplished by making changes on the master which
151 then flow through replication and update the slave to resolve its
152 differences. Be careful though: although this option specifies and
153 syncs a single slave, if there are other slaves on the same master,
154 they will receive via replication the changes intended for the slave
155 that you're trying to sync.
156
157 Alternatively, if you do not specify "--sync-to-master", the first DSN
158 given on the command line is the source host. There is only ever one
159 source host. If you do not also specify "--replicate", then you must
160 specify at least one other DSN as the destination host. There can be
161 one or more destination hosts. Source and destination hosts must be
162 independent; they cannot be in the same replication topology. mk-
163 table-sync will die with an error if it detects that a destination host
164 is a slave because changes are written directly to destination hosts
165 (and it's not safe to write directly to slaves). Or, if you specify
166 "--replicate" (but not "--sync-to-master") then mk-table-sync expects
167 one and only one master DSN on the command line. The tool will
168 automatically discover all the master's slaves and sync them to the
169 master. This is the only way to sync several (all) slaves at once
170 (because "--sync-to-master" only specifies one slave).
171
172 Each host on the command line is specified as a DSN. The first DSN (or
173 only DSN for cases like "--sync-to-master") provides default values for
174 other DSNs, whether those other DSNs are specified on the command line
175 or auto-discovered by the tool. So in this example,
176
177 mk-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2
178
179 the host2 DSN inherits the "u" and "p" DSN parts from the host1 DSN.
180 Use the "--explain-hosts" option to see how mk-table-sync will
181 interpret the DSNs given on the command line.
182
184 If you specify the "--verbose" option, you'll see information about the
185 differences between the tables. There is one row per table. Each
186 server is printed separately. For example,
187
188 # Syncing h=host1,D=test,t=test1
189 # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
190 # 0 0 3 0 Chunk 13:00:00 13:00:17 2 test.test1
191
192 Table test.test1 on host1 required 3 "INSERT" statements to synchronize
193 and it used the Chunk algorithm (see "ALGORITHMS"). The sync operation
194 for this table started at 13:00:00 and ended 17 seconds later (times
195 taken from "NOW()" on the source host). Because differences were
196 found, its "EXIT STATUS" was 2.
197
198 If you specify the "--print" option, you'll see the actual SQL
199 statements that the script uses to synchronize the table if "--execute"
200 is also specified.
201
202 If you want to see the SQL statements that mk-table-sync is using to
203 select chunks, nibbles, rows, etc., then specify "--print" once and
204 "--verbose" twice. Be careful though: this can print a lot of SQL
205 statements.
206
207 There are cases where no combination of "INSERT", "UPDATE" or "DELETE"
208 statements can resolve differences without violating some unique key.
209 For example, suppose there's a primary key on column a and a unique key
210 on column b. Then there is no way to sync these two tables with
211 straightforward UPDATE statements:
212
213 +---+---+ +---+---+
214 | a | b | | a | b |
215 +---+---+ +---+---+
216 | 1 | 2 | | 1 | 1 |
217 | 2 | 1 | | 2 | 2 |
218 +---+---+ +---+---+
219
220 The tool rewrites queries to "DELETE" and "REPLACE" in this case. This
221 is automatically handled after the first index violation, so you don't
222 have to worry about it.
223
225 Synchronizing a replication master and slave safely is a non-trivial
226 problem, in general. There are all sorts of issues to think about,
227 such as other processes changing data, trying to change data on the
228 slave, whether the destination and source are a master-master pair, and
229 much more.
230
231 In general, the safe way to do it is to change the data on the master,
232 and let the changes flow through replication to the slave like any
233 other changes. However, this works only if it's possible to REPLACE
234 into the table on the master. REPLACE works only if there's a unique
235 index on the table (otherwise it just acts like an ordinary INSERT).
236
237 If your table has unique keys, you should use the "--sync-to-master"
238 and/or "--replicate" options to sync a slave to its master. This will
239 generally do the right thing. When there is no unique key on the
240 table, there is no choice but to change the data on the slave, and mk-
241 table-sync will detect that you're trying to do so. It will complain
242 and die unless you specify "--no-check-slave" (see
243 "--[no]check-slave").
244
245 If you're syncing a table without a primary or unique key on a master-
246 master pair, you must change the data on the destination server.
247 Therefore, you need to specify "--no-bin-log" for safety (see
248 "--[no]bin-log"). If you don't, the changes you make on the
249 destination server will replicate back to the source server and change
250 the data there!
251
252 The generally safe thing to do on a master-master pair is to use the
253 "--sync-to-master" option so you don't change the data on the
254 destination server. You will also need to specify "--no-check-slave"
255 to keep mk-table-sync from complaining that it is changing data on a
256 slave.
257
259 mk-table-sync has a generic data-syncing framework which uses different
260 algorithms to find differences. The tool automatically chooses the
261 best algorithm for each table based on indexes, column types, and the
262 algorithm preferences specified by "--algorithms". The following
263 algorithms are available, listed in their default order of preference:
264
265 Chunk
266 Finds an index whose first column is numeric (including date and
267 time types), and divides the column's range of values into chunks
268 of approximately "--chunk-size" rows. Syncs a chunk at a time by
269 checksumming the entire chunk. If the chunk differs on the source
270 and destination, checksums each chunk's rows individually to find
271 the rows that differ.
272
273 It is efficient when the column has sufficient cardinality to make
274 the chunks end up about the right size.
275
276 The initial per-chunk checksum is quite small and results in
277 minimal network traffic and memory consumption. If a chunk's rows
278 must be examined, only the primary key columns and a checksum are
279 sent over the network, not the entire row. If a row is found to be
280 different, the entire row will be fetched, but not before.
281
282 Nibble
283 Finds an index and ascends the index in fixed-size nibbles of
284 "--chunk-size" rows, using a non-backtracking algorithm (see mk-
285 archiver for more on this algorithm). It is very similar to
286 "Chunk", but instead of pre-calculating the boundaries of each
287 piece of the table based on index cardinality, it uses "LIMIT" to
288 define each nibble's upper limit, and the previous nibble's upper
289 limit to define the lower limit.
290
291 It works in steps: one query finds the row that will define the
292 next nibble's upper boundary, and the next query checksums the
293 entire nibble. If the nibble differs between the source and
294 destination, it examines the nibble row-by-row, just as "Chunk"
295 does.
296
297 GroupBy
298 Selects the entire table grouped by all columns, with a COUNT(*)
299 column added. Compares all columns, and if they're the same,
300 compares the COUNT(*) column's value to determine how many rows to
301 insert or delete into the destination. Works on tables with no
302 primary key or unique index.
303
304 Stream
305 Selects the entire table in one big stream and compares all
306 columns. Selects all columns. Much less efficient than the other
307 algorithms, but works when there is no suitable index for them to
308 use.
309
310 Future Plans
311 Possibilities for future algorithms are TempTable (what I
312 originally called bottom-up in earlier versions of this tool),
313 DrillDown (what I originally called top-down), and GroupByPrefix
314 (similar to how SqlYOG Job Agent works). Each algorithm has
315 strengths and weaknesses. If you'd like to implement your favorite
316 technique for finding differences between two sources of data on
317 possibly different servers, I'm willing to help. The algorithms
318 adhere to a simple interface that makes it pretty easy to write
319 your own.
320
322 Bidirectional syncing is a new, experimental feature. To make it work
323 reliably there are a number of strict limitations:
324
325 * only works when syncing one server to other independent servers
326 * does not work in any way with replication
327 * requires that the table(s) are chunkable with the Chunk algorithm
328 * is not N-way, only bidirectional between two servers at a time
329 * does not handle DELETE changes
330
331 For example, suppose we have three servers: c1, r1, r2. c1 is the
332 central server, a pseudo-master to the other servers (viz. r1 and r2
333 are not slaves to c1). r1 and r2 are remote servers. Rows in table
334 foo are updated and inserted on all three servers and we want to
335 synchronize all the changes between all the servers. Table foo has
336 columns:
337
338 id int PRIMARY KEY
339 ts timestamp auto updated
340 name varchar
341
342 Auto-increment offsets are used so that new rows from any server do not
343 create conflicting primary key (id) values. In general, newer rows, as
344 determined by the ts column, take precedence when a same but differing
345 row is found during the bidirectional sync. "Same but differing" means
346 that two rows have the same primary key (id) value but different values
347 for some other column, like the name column in this example. Same but
348 differing conflicts are resolved by a "conflict". A conflict compares
349 some column of the competing rows to determine a "winner". The winning
350 row becomes the source and its values are used to update the other row.
351
352 There are subtle differences between three columns used to achieve
353 bidirectional syncing that you should be familiar with: chunk column
354 ("--chunk-column"), comparison column(s) ("--columns"), and conflict
355 column ("--conflict-column"). The chunk column is only used to chunk
356 the table; e.g. "WHERE id >= 5 AND id < 10". Chunks are checksummed
357 and when chunk checksums reveal a difference, the tool selects the rows
358 in that chunk and checksums the "--columns" for each row. If a column
359 checksum differs, the rows have one or more conflicting column values.
360 In a traditional unidirectional sync, the conflict is a moot point
361 because it can be resolved simply by updating the entire destination
362 row with the source row's values. In a bidirectional sync, however,
363 the "--conflict-column" (in accordance with other "--conflict-*"
364 options list below) is compared to determine which row is "correct" or
365 "authoritative"; this row becomes the "source".
366
367 To sync all three servers completely, two runs of mk-table-sync are
368 required. The first run syncs c1 and r1, then syncs c1 and r2
369 including any changes from r1. At this point c1 and r2 are completely
370 in sync, but r1 is missing any changes from r2 because c1 didn't have
371 these changes when it and r1 were synced. So a second run is needed
372 which syncs the servers in the same order, but this time when c1 and r1
373 are synced r1 gets r2's changes.
374
375 The tool does not sync N-ways, only bidirectionally between the first
376 DSN given on the command line and each subsequent DSN in turn. So the
377 tool in this example would be ran twice like:
378
379 mk-table-sync --bidirectional h=c1 h=r1 h=r2
380
381 The "--bidirectional" option enables this feature and causes various
382 sanity checks to be performed. You must specify other options that
383 tell mk-table-sync how to resolve conflicts for same but differing
384 rows. These options are:
385
386 * L<"--conflict-column">
387 * L<"--conflict-comparison">
388 * L<"--conflict-value">
389 * L<"--conflict-threshold">
390 * L<"--conflict-error"> (optional)
391
392 Use "--print" to test this option before "--execute". The printed SQL
393 statements will have comments saying on which host the statement would
394 be executed if you used "--execute".
395
396 Technical side note: the first DSN is always the "left" server and the
397 other DSNs are always the "right" server. Since either server can
398 become the source or destination it's confusing to think of them as
399 "src" and "dst". Therefore, they're generically referred to as left
400 and right. It's easy to remember this because the first DSN is always
401 to the left of the other server DSNs on the command line.
402
404 The following are the exit statuses (also called return values, or
405 return codes) when mk-table-sync finishes and exits.
406
407 STATUS MEANING
408 ====== =======================================================
409 0 Success.
410 1 Internal error.
411 2 At least one table differed on the destination.
412 3 Combination of 1 and 2.
413
415 Specify at least one of "--print", "--execute", or "--dry-run".
416
417 "--where" and "--replicate" are mutually exclusive.
418
419 This tool accepts additional command-line arguments. Refer to the
420 "SYNOPSIS" and usage information for details.
421
422 --algorithms
423 type: string; default: Chunk,Nibble,GroupBy,Stream
424
425 Algorithm to use when comparing the tables, in order of preference.
426
427 For each table, mk-table-sync will check if the table can be synced
428 with the given algorithms in the order that they're given. The
429 first algorithm that can sync the table is used. See "ALGORITHMS".
430
431 --ask-pass
432 Prompt for a password when connecting to MySQL.
433
434 --bidirectional
435 Enable bidirectional sync between first and subsequent hosts.
436
437 See "BIDIRECTIONAL SYNCING" for more information.
438
439 --[no]bin-log
440 default: yes
441
442 Log to the binary log ("SET SQL_LOG_BIN=1").
443
444 Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".
445
446 --buffer-in-mysql
447 Instruct MySQL to buffer queries in its memory.
448
449 This option adds the "SQL_BUFFER_RESULT" option to the comparison
450 queries. This causes MySQL to execute the queries and place them
451 in a temporary table internally before sending the results back to
452 mk-table-sync. The advantage of this strategy is that mk-table-
453 sync can fetch rows as desired without using a lot of memory inside
454 the Perl process, while releasing locks on the MySQL table (to
455 reduce contention with other queries). The disadvantage is that it
456 uses more memory on the MySQL server instead.
457
458 You probably want to leave "--[no]buffer-to-client" enabled too,
459 because buffering into a temp table and then fetching it all into
460 Perl's memory is probably a silly thing to do. This option is most
461 useful for the GroupBy and Stream algorithms, which may fetch a lot
462 of data from the server.
463
464 --[no]buffer-to-client
465 default: yes
466
467 Fetch rows one-by-one from MySQL while comparing.
468
469 This option enables "mysql_use_result" which causes MySQL to hold
470 the selected rows on the server until the tool fetches them. This
471 allows the tool to use less memory but may keep the rows locked on
472 the server longer.
473
474 If this option is disabled by specifying "--no-buffer-to-client"
475 then "mysql_store_result" is used which causes MySQL to send all
476 selected rows to the tool at once. This may result in the results
477 "cursor" being held open for a shorter time on the server, but if
478 the tables are large, it could take a long time anyway, and use all
479 your memory.
480
481 For most non-trivial data sizes, you want to leave this option
482 enabled.
483
484 This option is disabled when "--bidirectional" is used.
485
486 --charset
487 short form: -A; type: string
488
489 Default character set. If the value is utf8, sets Perl's binmode
490 on STDOUT to utf8, passes the mysql_enable_utf8 option to
491 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
492 other value sets binmode on STDOUT without the utf8 layer, and runs
493 SET NAMES after connecting to MySQL.
494
495 --[no]check-master
496 default: yes
497
498 With "--sync-to-master", try to verify that the detected master is
499 the real master.
500
501 --[no]check-privileges
502 default: yes
503
504 Check that user has all necessary privileges on source and
505 destination table.
506
507 --[no]check-slave
508 default: yes
509
510 Check whether the destination server is a slave.
511
512 If the destination server is a slave, it's generally unsafe to make
513 changes on it. However, sometimes you have to; "--replace" won't
514 work unless there's a unique index, for example, so you can't make
515 changes on the master in that scenario. By default mk-table-sync
516 will complain if you try to change data on a slave. Specify
517 "--no-check-slave" to disable this check. Use it at your own risk.
518
519 --[no]check-triggers
520 default: yes
521
522 Check that no triggers are defined on the destination table.
523
524 Triggers were introduced in MySQL v5.0.2, so for older versions
525 this option has no effect because triggers will not be checked.
526
527 --chunk-column
528 type: string
529
530 Chunk the table on this column.
531
532 --chunk-index
533 type: string
534
535 Chunk the table using this index.
536
537 --chunk-size
538 type: string; default: 1000
539
540 Number of rows or data size per chunk.
541
542 The size of each chunk of rows for the "Chunk" and "Nibble"
543 algorithms. The size can be either a number of rows, or a data
544 size. Data sizes are specified with a suffix of k=kibibytes,
545 M=mebibytes, G=gibibytes. Data sizes are converted to a number of
546 rows by dividing by the average row length.
547
548 --columns
549 short form: -c; type: array
550
551 Compare this comma-separated list of columns.
552
553 --config
554 type: Array
555
556 Read this comma-separated list of config files; if specified, this
557 must be the first option on the command line.
558
559 --conflict-column
560 type: string
561
562 Compare this column when rows conflict during a "--bidirectional"
563 sync.
564
565 When a same but differing row is found the value of this column
566 from each row is compared according to "--conflict-comparison",
567 "--conflict-value" and "--conflict-threshold" to determine which
568 row has the correct data and becomes the source. The column can be
569 any type for which there is an appropriate "--conflict-comparison"
570 (this is almost all types except, for example, blobs).
571
572 This option only works with "--bidirectional". See "BIDIRECTIONAL
573 SYNCING" for more information.
574
575 --conflict-comparison
576 type: string
577
578 Choose the "--conflict-column" with this property as the source.
579
580 The option affects how the "--conflict-column" values from the
581 conflicting rows are compared. Possible comparisons are one of
582 these MAGIC_comparisons:
583
584 newest|oldest|greatest|least|equals|matches
585
586 COMPARISON CHOOSES ROW WITH
587 ========== =========================================================
588 newest Newest temporal L<"--conflict-column"> value
589 oldest Oldest temporal L<"--conflict-column"> value
590 greatest Greatest numerical L<"--conflict-column"> value
591 least Least numerical L<"--conflict-column"> value
592 equals L<"--conflict-column"> value equal to L<"--conflict-value">
593 matches L<"--conflict-column"> value matching Perl regex pattern
594 L<"--conflict-value">
595
596 This option only works with "--bidirectional". See "BIDIRECTIONAL
597 SYNCING" for more information.
598
599 --conflict-error
600 type: string; default: warn
601
602 How to report unresolvable conflicts and conflict errors
603
604 This option changes how the user is notified when a conflict cannot
605 be resolved or causes some kind of error. Possible values are:
606
607 * warn: Print a warning to STDERR about the unresolvable conflict
608 * die: Die, stop syncing, and print a warning to STDERR
609
610 This option only works with "--bidirectional". See "BIDIRECTIONAL
611 SYNCING" for more information.
612
613 --conflict-threshold
614 type: string
615
616 Amount by which one "--conflict-column" must exceed the other.
617
618 The "--conflict-threshold" prevents a conflict from being resolved
619 if the absolute difference between the two "--conflict-column"
620 values is less than this amount. For example, if two
621 "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
622 "2009-12-01 12:05:00" the difference is 5 minutes. If
623 "--conflict-threshold" is set to "5m" the conflict will be
624 resolved, but if "--conflict-threshold" is set to "6m" the conflict
625 will fail to resolve because the difference is not greater than or
626 equal to 6 minutes. In this latter case, "--conflict-error" will
627 report the failure.
628
629 This option only works with "--bidirectional". See "BIDIRECTIONAL
630 SYNCING" for more information.
631
632 --conflict-value
633 type: string
634
635 Use this value for certain "--conflict-comparison".
636
637 This option gives the value for "equals" and "matches"
638 "--conflict-comparison".
639
640 This option only works with "--bidirectional". See "BIDIRECTIONAL
641 SYNCING" for more information.
642
643 --databases
644 short form: -d; type: hash
645
646 Sync only this comma-separated list of databases.
647
648 A common request is to sync tables from one database with tables
649 from another database on the same or different server. This is not
650 yet possible. "--databases" will not do it, and you can't do it
651 with the D part of the DSN either because in the absence of a table
652 name it assumes the whole server should be synced and the D part
653 controls only the connection's default database.
654
655 --defaults-file
656 short form: -F; type: string
657
658 Only read mysql options from the given file. You must give an
659 absolute pathname.
660
661 --dry-run
662 Analyze, decide the sync algorithm to use, print and exit.
663
664 Implies "--verbose" so you can see the results. The results are in
665 the same output format that you'll see from actually running the
666 tool, but there will be zeros for rows affected. This is because
667 the tool actually executes, but stops before it compares any data
668 and just returns zeros. The zeros do not mean there are no changes
669 to be made.
670
671 --engines
672 short form: -e; type: hash
673
674 Sync only this comma-separated list of storage engines.
675
676 --execute
677 Execute queries to make the tables have identical data.
678
679 This option makes mk-table-sync actually sync table data by
680 executing all the queries that it created to resolve table
681 differences. Therefore, the tables will be changed! And unless
682 you also specify "--verbose", the changes will be made silently.
683 If this is not what you want, see "--print" or "--dry-run".
684
685 --explain-hosts
686 Print connection information and exit.
687
688 Print out a list of hosts to which mk-table-sync will connect, with
689 all the various connection options, and exit.
690
691 --float-precision
692 type: int
693
694 Precision for "FLOAT" and "DOUBLE" number-to-string conversion.
695 Causes FLOAT and DOUBLE values to be rounded to the specified
696 number of digits after the decimal point, with the ROUND() function
697 in MySQL. This can help avoid checksum mismatches due to different
698 floating-point representations of the same values on different
699 MySQL versions and hardware. The default is no rounding; the
700 values are converted to strings by the CONCAT() function, and MySQL
701 chooses the string representation. If you specify a value of 2,
702 for example, then the values 1.008 and 1.009 will be rounded to
703 1.01, and will checksum as equal.
704
705 --[no]foreign-key-checks
706 default: yes
707
708 Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").
709
710 Specifying "--no-foreign-key-checks" will "SET
711 FOREIGN_KEY_CHECKS=0".
712
713 --function
714 type: string
715
716 Which hash function you'd like to use for checksums.
717
718 The default is "CRC32". Other good choices include "MD5" and
719 "SHA1". If you have installed the "FNV_64" user-defined function,
720 "mk-table-sync" will detect it and prefer to use it, because it is
721 much faster than the built-ins. You can also use MURMUR_HASH if
722 you've installed that user-defined function. Both of these are
723 distributed with Maatkit. See mk-table-checksum for more
724 information and benchmarks.
725
726 --help
727 Show help and exit.
728
729 --[no]hex-blob
730 default: yes
731
732 "HEX()" "BLOB", "TEXT" and "BINARY" columns.
733
734 When row data from the source is fetched to create queries to sync
735 the data (i.e. the queries seen with "--print" and executed by
736 "--execute"), binary columns are wrapped in HEX() so the binary
737 data does not produce an invalid SQL statement. You can disable
738 this option but you probably shouldn't.
739
740 --host
741 short form: -h; type: string
742
743 Connect to host.
744
745 --ignore-columns
746 type: Hash
747
748 Ignore this comma-separated list of column names in comparisons.
749
750 This option causes columns not to be compared. However, if a row
751 is determined to differ between tables, all columns in that row
752 will be synced, regardless. (It is not currently possible to
753 exclude columns from the sync process itself, only from the
754 comparison.)
755
756 --ignore-databases
757 type: Hash
758
759 Ignore this comma-separated list of databases.
760
761 --ignore-engines
762 type: Hash; default: FEDERATED,MRG_MyISAM
763
764 Ignore this comma-separated list of storage engines.
765
766 --ignore-tables
767 type: Hash
768
769 Ignore this comma-separated list of tables.
770
771 Table names may be qualified with the database name.
772
773 --[no]index-hint
774 default: yes
775
776 Add FORCE/USE INDEX hints to the chunk and row queries.
777
778 By default "mk-table-sync" adds a FORCE/USE INDEX hint to each SQL
779 statement to coerce MySQL into using the index chosen by the sync
780 algorithm or specified by "--chunk-index". This is usually a good
781 thing, but in rare cases the index may not be the best for the
782 query so you can suppress the index hint by specifying
783 "--no-index-hint" and let MySQL choose the index.
784
785 This does not affect the queries printed by "--print"; it only
786 affects the chunk and row queries that "mk-table-sync" uses to
787 select and compare rows.
788
789 --lock
790 type: int
791
792 Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
793
794 This uses "LOCK TABLES". This can help prevent tables being
795 changed while you're examining them. The possible values are as
796 follows:
797
798 VALUE MEANING
799 ===== =======================================================
800 0 Never lock tables.
801 1 Lock and unlock one time per sync cycle (as implemented
802 by the syncing algorithm). This is the most granular
803 level of locking available. For example, the Chunk
804 algorithm will lock each chunk of C<N> rows, and then
805 unlock them if they are the same on the source and the
806 destination, before moving on to the next chunk.
807 2 Lock and unlock before and after each table.
808 3 Lock and unlock once for every server (DSN) synced, with
809 C<FLUSH TABLES WITH READ LOCK>.
810
811 A replication slave is never locked if "--replicate" or
812 "--sync-to-master" is specified, since in theory locking the table
813 on the master should prevent any changes from taking place. (You
814 are not changing data on your slave, right?) If "--wait" is given,
815 the master (source) is locked and then the tool waits for the slave
816 to catch up to the master before continuing.
817
818 If "--transaction" is specified, "LOCK TABLES" is not used.
819 Instead, lock and unlock are implemented by beginning and
820 committing transactions. The exception is if "--lock" is 3.
821
822 If "--no-transaction" is specified, then "LOCK TABLES" is used for
823 any value of "--lock". See "--[no]transaction".
824
825 --lock-and-rename
826 Lock the source and destination table, sync, then swap names. This
827 is useful as a less-blocking ALTER TABLE, once the tables are
828 reasonably in sync with each other (which you may choose to
829 accomplish via any number of means, including dump and reload or
830 even something like mk-archiver). It requires exactly two DSNs and
831 assumes they are on the same server, so it does no waiting for
832 replication or the like. Tables are locked with LOCK TABLES.
833
834 --password
835 short form: -p; type: string
836
837 Password to use when connecting.
838
839 --pid
840 type: string
841
842 Create the given PID file. The file contains the process ID of the
843 script. The PID file is removed when the script exits. Before
844 starting, the script checks if the PID file already exists. If it
845 does not, then the script creates and writes its own PID to it. If
846 it does, then the script checks the following: if the file contains
847 a PID and a process is running with that PID, then the script dies;
848 or, if there is no process running with that PID, then the script
849 overwrites the file with its own PID and starts; else, if the file
850 contains no PID, then the script dies.
851
852 --port
853 short form: -P; type: int
854
855 Port number to use for connection.
856
857 --print
858 Print queries that will resolve differences.
859
860 If you don't trust "mk-table-sync", or just want to see what it
861 will do, this is a good way to be safe. These queries are valid
862 SQL and you can run them yourself if you want to sync the tables
863 manually.
864
865 --recursion-method
866 type: string
867
868 Preferred recursion method used to find slaves.
869
870 Possible methods are:
871
872 METHOD USES
873 =========== ================
874 processlist SHOW PROCESSLIST
875 hosts SHOW SLAVE HOSTS
876
877 The processlist method is preferred because SHOW SLAVE HOSTS is not
878 reliable. However, the hosts method is required if the server uses
879 a non-standard port (not 3306). Usually mk-table-sync does the
880 right thing and finds the slaves, but you may give a preferred
881 method and it will be used first. If it doesn't find any slaves,
882 the other methods will be tried.
883
884 --replace
885 Write all "INSERT" and "UPDATE" statements as "REPLACE".
886
887 This is automatically switched on as needed when there are unique
888 index violations.
889
890 --replicate
891 type: string
892
893 Sync tables listed as different in this table.
894
895 Specifies that "mk-table-sync" should examine the specified table
896 to find data that differs. The table is exactly the same as the
897 argument of the same name to mk-table-checksum. That is, it
898 contains records of which tables (and ranges of values) differ
899 between the master and slave.
900
901 For each table and range of values that shows differences between
902 the master and slave, "mk-table-checksum" will sync that table,
903 with the appropriate "WHERE" clause, to its master.
904
905 This automatically sets "--wait" to 60 and causes changes to be
906 made on the master instead of the slave.
907
908 If "--sync-to-master" is specified, the tool will assume the server
909 you specified is the slave, and connect to the master as usual to
910 sync.
911
912 Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of
913 the server you specified. If it is unable to find any slaves via
914 "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.
915 You must configure each slave's "report-host", "report-port" and
916 other options for this to work right. After finding slaves, it
917 will inspect the specified table on each slave to find data that
918 needs to be synced, and sync it.
919
920 The tool examines the master's copy of the table first, assuming
921 that the master is potentially a slave as well. Any table that
922 shows differences there will NOT be synced on the slave(s). For
923 example, suppose your replication is set up as A->B, B->C, B->D.
924 Suppose you use this argument and specify server B. The tool will
925 examine server B's copy of the table. If it looks like server B's
926 data in table "test.tbl1" is different from server A's copy, the
927 tool will not sync that table on servers C and D.
928
929 --set-vars
930 type: string; default: wait_timeout=10000
931
932 Set these MySQL variables. Immediately after connecting to MySQL,
933 this string will be appended to SET and executed.
934
935 --socket
936 short form: -S; type: string
937
938 Socket file to use for connection.
939
940 --sync-to-master
941 Treat the DSN as a slave and sync it to its master.
942
943 Treat the server you specified as a slave. Inspect "SHOW SLAVE
944 STATUS", connect to the server's master, and treat the master as
945 the source and the slave as the destination. Causes changes to be
946 made on the master. Sets "--wait" to 60 by default, sets "--lock"
947 to 1 by default, and disables "--[no]transaction" by default. See
948 also "--replicate", which changes this option's behavior.
949
950 --tables
951 short form: -t; type: hash
952
953 Sync only this comma-separated list of tables.
954
955 Table names may be qualified with the database name.
956
957 --timeout-ok
958 Keep going if "--wait" fails.
959
960 If you specify "--wait" and the slave doesn't catch up to the
961 master's position before the wait times out, the default behavior
962 is to abort. This option makes the tool keep going anyway.
963 Warning: if you are trying to get a consistent comparison between
964 the two servers, you probably don't want to keep going after a
965 timeout.
966
967 --[no]transaction
968 Use transactions instead of "LOCK TABLES".
969
970 The granularity of beginning and committing transactions is
971 controlled by "--lock". This is enabled by default, but since
972 "--lock" is disabled by default, it has no effect.
973
974 Most options that enable locking also disable transactions by
975 default, so if you want to use transactional locking (via "LOCK IN
976 SHARE MODE" and "FOR UPDATE", you must specify "--transaction"
977 explicitly.
978
979 If you don't specify "--transaction" explicitly "mk-table-sync"
980 will decide on a per-table basis whether to use transactions or
981 table locks. It currently uses transactions on InnoDB tables, and
982 table locks on all others.
983
984 If "--no-transaction" is specified, then "mk-table-sync" will not
985 use transactions at all (not even for InnoDB tables) and locking is
986 controlled by "--lock".
987
988 When enabled, either explicitly or implicitly, the transaction
989 isolation level is set "REPEATABLE READ" and transactions are
990 started "WITH CONSISTENT SNAPSHOT".
991
992 --trim
993 "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes. Helps
994 when comparing MySQL 4.1 to >= 5.0.
995
996 This is useful when you don't care about the trailing space
997 differences between MySQL versions which vary in their handling of
998 trailing spaces. MySQL 5.0 and later all retain trailing spaces in
999 "VARCHAR", while previous versions would remove them.
1000
1001 --[no]unique-checks
1002 default: yes
1003
1004 Enable unique key checks ("SET UNIQUE_CHECKS=1").
1005
1006 Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".
1007
1008 --user
1009 short form: -u; type: string
1010
1011 User for login if not current user.
1012
1013 --verbose
1014 short form: -v; cumulative: yes
1015
1016 Print results of sync operations.
1017
1018 See "OUTPUT" for more details about the output.
1019
1020 --version
1021 Show version and exit.
1022
1023 --wait
1024 short form: -w; type: time
1025
1026 How long to wait for slaves to catch up to their master.
1027
1028 Make the master wait for the slave to catch up in replication
1029 before comparing the tables. The value is the number of seconds to
1030 wait before timing out (see also "--timeout-ok"). Sets "--lock" to
1031 1 and "--[no]transaction" to 0 by default. If you see an error
1032 such as the following,
1033
1034 MASTER_POS_WAIT returned -1
1035
1036 It means the timeout was exceeded and you need to increase it.
1037
1038 The default value of this option is influenced by other options.
1039 To see what value is in effect, run with "--help".
1040
1041 To disable waiting entirely (except for locks), specify "--wait" 0.
1042 This helps when the slave is lagging on tables that are not being
1043 synced.
1044
1045 --where
1046 type: string
1047
1048 "WHERE" clause to restrict syncing to part of the table.
1049
1050 --[no]zero-chunk
1051 default: yes
1052
1053 Add a chunk for rows with zero or zero-equivalent values. The only
1054 has an effect when "--chunk-size" is specified. The purpose of the
1055 zero chunk is to capture a potentially large number of zero values
1056 that would imbalance the size of the first chunk. For example, if
1057 a lot of negative numbers were inserted into an unsigned integer
1058 column causing them to be stored as zeros, then these zero values
1059 are captured by the zero chunk instead of the first chunk and all
1060 its non-zero values.
1061
1063 These DSN options are used to create a DSN. Each option is given like
1064 "option=value". The options are case-sensitive, so P and p are not the
1065 same option. There cannot be whitespace before or after the "=" and if
1066 the value contains whitespace it must be quoted. DSN options are
1067 comma-separated. See the maatkit manpage for full details.
1068
1069 · A
1070
1071 dsn: charset; copy: yes
1072
1073 Default character set.
1074
1075 · D
1076
1077 dsn: database; copy: yes
1078
1079 Database containing the table to be synced.
1080
1081 · F
1082
1083 dsn: mysql_read_default_file; copy: yes
1084
1085 Only read default options from the given file
1086
1087 · h
1088
1089 dsn: host; copy: yes
1090
1091 Connect to host.
1092
1093 · p
1094
1095 dsn: password; copy: yes
1096
1097 Password to use when connecting.
1098
1099 · P
1100
1101 dsn: port; copy: yes
1102
1103 Port number to use for connection.
1104
1105 · S
1106
1107 dsn: mysql_socket; copy: yes
1108
1109 Socket file to use for connection.
1110
1111 · t
1112
1113 copy: yes
1114
1115 Table to be synced.
1116
1117 · u
1118
1119 dsn: user; copy: yes
1120
1121 User for login if not current user.
1122
1124 You can download Maatkit from Google Code at
1125 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1126 easily with a command like the following:
1127
1128 wget http://www.maatkit.org/get/toolname
1129 or
1130 wget http://www.maatkit.org/trunk/toolname
1131
1132 Where "toolname" can be replaced with the name (or fragment of a name)
1133 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1134 installation is needed. The first URL gets the latest released version
1135 of the tool, and the second gets the latest trunk code from Subversion.
1136
1138 The environment variable MKDEBUG enables verbose debugging output in
1139 all of the Maatkit tools:
1140
1141 MKDEBUG=1 mk-....
1142
1144 You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1145 installed in any reasonably new version of Perl.
1146
1148 For a list of known bugs see:
1149 <http://www.maatkit.org/bugs/mk-table-sync>.
1150
1151 Please use Google Code Issues and Groups to report bugs or request
1152 support: <http://code.google.com/p/maatkit/>. You can also join
1153 #maatkit on Freenode to discuss Maatkit.
1154
1155 Please include the complete command-line used to reproduce the problem
1156 you are seeing, the version of all MySQL servers involved, the complete
1157 output of the tool when run with "--version", and if possible,
1158 debugging output produced by running with the "MKDEBUG=1" environment
1159 variable.
1160
1162 This program is copyright 2007-2011 Baron Schwartz. Feedback and
1163 improvements are welcome.
1164
1165 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1166 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1167 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1168
1169 This program is free software; you can redistribute it and/or modify it
1170 under the terms of the GNU General Public License as published by the
1171 Free Software Foundation, version 2; OR the Perl Artistic License. On
1172 UNIX and similar systems, you can issue `man perlgpl' or `man
1173 perlartistic' to read these licenses.
1174
1175 You should have received a copy of the GNU General Public License along
1176 with this program; if not, write to the Free Software Foundation, Inc.,
1177 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1178
1180 Baron Schwartz
1181
1183 This tool is part of Maatkit, a toolkit for power users of MySQL.
1184 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1185 primary code contributors. Both are employed by Percona. Financial
1186 support for Maatkit development is primarily provided by Percona and
1187 its clients.
1188
1190 My work is based in part on Giuseppe Maxia's work on distributed
1191 databases, <http://www.sysadminmag.com/articles/2004/0408/> and code
1192 derived from that article. There is more explanation, and a link to
1193 the code, at <http://www.perlmonks.org/?node_id=381053>.
1194
1195 Another programmer extended Maxia's work even further. Fabien Coelho
1196 changed and generalized Maxia's technique, introducing symmetry and
1197 avoiding some problems that might have caused too-frequent checksum
1198 collisions. This work grew into pg_comparator,
1199 <http://www.coelho.net/pg_comparator/>. Coelho also explained the
1200 technique further in a paper titled "Remote Comparison of Database
1201 Tables" (<http://cri.ensmp.fr/classement/doc/A-375.pdf>).
1202
1203 This existing literature mostly addressed how to find the differences
1204 between the tables, not how to resolve them once found. I needed a
1205 tool that would not only find them efficiently, but would then resolve
1206 them. I first began thinking about how to improve the technique
1207 further with my article <http://tinyurl.com/mysql-data-diff-algorithm>,
1208 where I discussed a number of problems with the Maxia/Coelho "bottom-
1209 up" algorithm. After writing that article, I began to write this tool.
1210 I wanted to actually implement their algorithm with some improvements
1211 so I was sure I understood it completely. I discovered it is not what
1212 I thought it was, and is considerably more complex than it appeared to
1213 me at first. Fabien Coelho was kind enough to address some questions
1214 over email.
1215
1216 The first versions of this tool implemented a version of the
1217 Coelho/Maxia algorithm, which I called "bottom-up", and my own, which I
1218 called "top-down." Those algorithms are considerably more complex than
1219 the current algorithms and I have removed them from this tool, and may
1220 add them back later. The improvements to the bottom-up algorithm are
1221 my original work, as is the top-down algorithm. The techniques to
1222 actually resolve the differences are also my own work.
1223
1224 Another tool that can synchronize tables is the SQLyog Job Agent from
1225 webyog. Thanks to Rohit Nadhani, SJA's author, for the conversations
1226 about the general techniques. There is a comparison of mk-table-sync
1227 and SJA at <http://tinyurl.com/maatkit-vs-sqlyog>
1228
1229 Thanks to the following people and organizations for helping in many
1230 ways:
1231
1232 The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB
1233 <http://www.mysql.com/>, Blue Ridge InternetWorks
1234 <http://www.briworks.com/>, Percona <http://www.percona.com/>, Fabien
1235 Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp
1236 (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at Perlmonks, And
1237 others too numerous to mention.
1238
1240 This manual page documents Ver 1.0.31 Distrib 7540 $Revision: 7476 $.
1241
1242
1243
1244perl v5.28.1 2011-06-08 MK-TABLE-SYNC(1)