1MK-TABLE-SYNC(1) User Contributed Perl Documentation MK-TABLE-SYNC(1)
2
3
4
6 mk-table-sync - Synchronize MySQL tables efficiently.
7
9 This tool changes data, so for maximum safety, you should back up your
10 data before you use it.
11
12 To sync db.tbl1 from host1 to host2:
13
14 mk-table-sync --execute u=user,p=pass,h=host1,D=db,t=tbl host2
15
16 Sync all tables in host1 to host2 and host3:
17
18 mk-table-sync --execute host1 host2 host3
19
20 Resolve differences mk-table-checksum found on this master's slaves:
21
22 mk-table-sync --execute --replicate test.checksum master1
23
24 Make slave1 have the same data as its replication master:
25
26 mk-table-sync --execute --sync-to-master slave1
27
28 Ditto, resolving differences mk-table-checksum found:
29
30 mk-table-sync --execute --sync-to-master \
31 --replicate test.checksum slave1
32
33 Sync server2 in a master-master replication configuration, where
34 server2's copy of db1.tbl1 is known or suspected to be incorrect:
35
36 mk-table-sync --execute --sync-to-master h=server2,D=db1,t=tbl1
37
38 Note that in the master-master configuration, the following will NOT do
39 what you want, because it will make changes directly on server2, which
40 will then flow through replication and change server1's data:
41
42 # Don't do this in a master-master setup!
43 mk-table-sync --execute h=server1,D=db1,t=tbl1 h=server2
44
46 The following section is included to inform users about the potential
47 risks, whether known or unknown, of using this tool. The two main
48 categories of risks are those created by the nature of the tool (e.g.
49 read-only tools vs. read-write tools) and those created by bugs.
50
51 With great power comes great responsibility! This tool changes data,
52 so it is a good idea to back up your data. It is also very powerful,
53 which means it is very complex, so you should run it with the
54 "--dry-run" option to see what it will do, until you're familiar with
55 its operation. If you want to see which rows are different, without
56 changing any data, use "--print" instead of "--execute".
57
58 Be careful when using mk-table-sync in any master-master setup.
59 Master-master replication is inherently tricky, and it's easy to make
60 mistakes. You need to be sure you're using the tool correctly for
61 master-master replication. See the "SYNOPSIS" for the overview of the
62 correct usage.
63
64 Also be careful with tables that have foreign key constraints with "ON
65 DELETE" or "ON UPDATE" definitions because these might cause unintended
66 changes on the child tables.
67
68 In general, this tool is best suited when your tables have a primary
69 key or unique index. Although it can synchronize data in tables
70 lacking a primary key or unique index, it might be best to sychronize
71 that data by another means.
72
73 At the time of this release, there is a potential bug using
74 "--lock-and-rename" with MySQL 5.1, a bug detecting certain
75 differences, a bug using ROUND() across different platforms, and a bug
76 mixing collations.
77
78 The authoritative source for updated information is always the online
79 issue tracking system. Issues that affect this tool will be marked as
80 such. You can see a list of such issues at the following URL:
81 http://www.maatkit.org/bugs/mk-table-sync
82 <http://www.maatkit.org/bugs/mk-table-sync>.
83
84 See also "BUGS" for more information on filing bugs and getting help.
85
87 WARNING this tool is unfinished and could perform slowly. The Chunk
88 algorithm is great when it can be used, and so is Nibble, but otherwise
89 GroupBy is the default choice and it may not perform very well. Please
90 run with "--dry-run" before subjecting your servers to this tool, and
91 make backups of your data!
92
93 This tool is designed to do one-way and bidirectional synchronization
94 of data. It finds differences efficiently with one of several
95 algorithms (see "ALGORITHMS"). It makes changes on the destination
96 table(s) so it matches the source.
97
98 It does not synchronize table structures, indexes, or any other schema
99 changes. It synchronizes only data.
100
101 It can operate through replication by comparing a slave with its master
102 and making changes on the master. These changes will flow through
103 replication and correct any differences found on the slave.
104
105 It accepts a list of DSNs (see the "--help" output) to tell it where
106 and how to connect.
107
108 There are many ways to invoke it. The following is the abbreviated
109 logic:
110
111 if DSN has a t part, sync only that table:
112 if 1 DSN:
113 if --sync-to-master:
114 The DSN is a slave. Connect to its master and sync.
115 if more than 1 DSN:
116 The first DSN is the source. Sync each DSN in turn.
117 else if --replicate:
118 if --sync-to-master:
119 The DSN is a slave. Connect to its master, find records
120 of differences, and fix.
121 else:
122 The DSN is the master. Find slaves and connect to each,
123 find records of differences, and fix.
124 else:
125 if only 1 DSN and --sync-to-master:
126 The DSN is a slave. Connect to its master, find tables and
127 filter with --databases etc, and sync each table to the master.
128 else:
129 find tables, filtering with --databases etc, and sync each
130 DSN to the first.
131
132 If you're confused about how it the DSNs are interpreted, use the
133 "--explain-hosts" option and it will tell you.
134
136 If you specify the "--verbose" option, you'll see information about the
137 differences between the tables. There is one row per table. Each
138 server is printed separately. For example,
139
140 # Syncing h=host1,D=test,t=test1
141 # DELETE REPLACE INSERT UPDATE ALGORITHM EXIT DATABASE.TABLE
142 # 0 0 3 0 Chunk 2 test.test1
143
144 Table test.test1 on host1 required 3 "INSERT" statements to synchronize
145 and it used the Chunk algorithm (see "ALGORITHMS"). Because
146 differences were found, its "EXIT STATUS" was 2.
147
148 If you specify the "--print" option, you'll see the actual SQL
149 statements that the script uses to synchronize the table if "--execute"
150 is also specified.
151
152 If you want to see the SQL statements that mk-table-sync is using to
153 select chunks, nibbles, rows, etc., then specify "--print" once and
154 "--verbose" twice. Be careful though: this can print a lot of SQL
155 statements.
156
157 There are cases where no combination of "INSERT", "UPDATE" or "DELETE"
158 statements can resolve differences without violating some unique key.
159 For example, suppose there's a primary key on column a and a unique key
160 on column b. Then there is no way to sync these two tables with
161 straightforward UPDATE statements:
162
163 +---+---+ +---+---+
164 | a | b | | a | b |
165 +---+---+ +---+---+
166 | 1 | 2 | | 1 | 1 |
167 | 2 | 1 | | 2 | 2 |
168 +---+---+ +---+---+
169
170 The tool rewrites queries to "DELETE" and "REPLACE" in this case. This
171 is automatically handled after the first index violation, so you don't
172 have to worry about it.
173
175 Synchronizing a replication master and slave safely is a non-trivial
176 problem, in general. There are all sorts of issues to think about,
177 such as other processes changing data, trying to change data on the
178 slave, whether the destination and source are a master-master pair, and
179 much more.
180
181 In general, the safe way to do it is to change the data on the master,
182 and let the changes flow through replication to the slave like any
183 other changes. However, this works only if it's possible to REPLACE
184 into the table on the master. REPLACE works only if there's a unique
185 index on the table (otherwise it just acts like an ordinary INSERT).
186
187 If your table has unique keys, you should use the "--sync-to-master"
188 and/or "--replicate" options to sync a slave to its master. This will
189 generally do the right thing. When there is no unique key on the
190 table, there is no choice but to change the data on the slave, and mk-
191 table-sync will detect that you're trying to do so. It will complain
192 and die unless you specify "--no-check-slave" (see
193 "--[no]check-slave").
194
195 If you're syncing a table without a primary or unique key on a master-
196 master pair, you must change the data on the destination server.
197 Therefore, you need to specify "--no-bin-log" for safety (see
198 "--[no]bin-log"). If you don't, the changes you make on the
199 destination server will replicate back to the source server and change
200 the data there!
201
202 The generally safe thing to do on a master-master pair is to use the
203 "--sync-to-master" option so you don't change the data on the
204 destination server. You will also need to specify "--no-check-slave"
205 to keep mk-table-sync from complaining that it is changing data on a
206 slave.
207
209 This tool has a generic data-syncing framework, within which it is
210 possible to use any number of different algorithms to actually find
211 differences. It chooses the best algorithm automatically. While I
212 plan to add more algorithms in the future, the following are
213 implemented now:
214
215 Chunk
216 Finds an index whose first column is numeric (including date and
217 time types), and divides the column's range of values into chunks
218 of approximately "--chunk-size" rows. Syncs a chunk at a time by
219 checksumming the entire chunk. If the chunk differs on the source
220 and destination, checksums each chunk's rows individually to find
221 the rows that differ.
222
223 It is efficient when the column has sufficient cardinality to make
224 the chunks end up about the right size.
225
226 The initial per-chunk checksum is quite small and results in
227 minimal network traffic and memory consumption. If a chunk's rows
228 must be examined, only the primary key columns and a checksum are
229 sent over the network, not the entire row. If a row is found to be
230 different, the entire row will be fetched, but not before.
231
232 Nibble
233 Finds an index and ascends the index in fixed-size nibbles of
234 "--chunk-size" rows, using a non-backtracking algorithm (see mk-
235 archiver for more on this algorithm). It is very similar to
236 "Chunk", but instead of pre-calculating the boundaries of each
237 piece of the table based on index cardinality, it uses "LIMIT" to
238 define each nibble's upper limit, and the previous nibble's upper
239 limit to define the lower limit.
240
241 It works in steps: one query finds the row that will define the
242 next nibble's upper boundary, and the next query checksums the
243 entire nibble. If the nibble differs between the source and
244 destination, it examines the nibble row-by-row, just as "Chunk"
245 does.
246
247 GroupBy
248 Selects the entire table grouped by all columns, with a COUNT(*)
249 column added. Compares all columns, and if they're the same,
250 compares the COUNT(*) column's value to determine how many rows to
251 insert or delete into the destination. Works on tables with no
252 primary key or unique index.
253
254 Stream
255 Selects the entire table in one big stream and compares all
256 columns. Selects all columns. Much less efficient than the other
257 algorithms, but works when there is no suitable index for them to
258 use.
259
260 Future Plans
261 Possibilities for future algorithms are TempTable (what I
262 originally called bottom-up in earlier versions of this tool),
263 DrillDown (what I originallly called top-down), and GroupByPrefix
264 (similar to how SqlYOG Job Agent works). Each algorithm has
265 strengths and weaknesses. If you'd like to implement your favorite
266 technique for finding differences between two sources of data on
267 possibly different servers, I'm willing to help. The algorithms
268 adhere to a simple interface that makes it pretty easy to write
269 your own.
270
272 Bidirectional syncing is a new, experimental feature. To make it work
273 reliably there are a number of strict limitations:
274
275 * only works when syncing one server to other independent servers
276 * does not work in any way with replication
277 * requires that the table(s) are chunkable with the Chunk algorithm
278 * is not N-way, only bidirectional between two servers at a time
279 * does not handle DELETE changes
280
281 For example, suppose we have three servers: c1, r1, r2. c1 is the
282 central server, a pseudo-master to the other servers (viz. r1 and r2
283 are not slaves to c1). r1 and r2 are remote servers. Rows in table
284 foo are updated and inserted on all three servers and we want to
285 synchronize all the changes between all the servers. Table foo has
286 columns:
287
288 id int PRIMARY KEY
289 ts timestamp auto updated
290 name varchar
291
292 Auto-increment offsets are used so that new rows from any server do not
293 create conflicting primary key (id) values. In general, newer rows, as
294 determined by the ts column, take precedence when a same but differing
295 row is found during the bidirectional sync. "Same but differing" means
296 that two rows have the same primary key (id) value but different values
297 for some other column, like the name column in this example. Same but
298 differing conflicts are resolved by a "conflict". A conflict compares
299 some column of the competing rows to determine a "winner". The
300 winnning row becomes the source and its values are used to update the
301 other row.
302
303 There are subtle differences between three columns used to achieve
304 bidirectional syncing that you should be familiar with: chunk column
305 ("--chunk-column"), comparison column(s) ("--columns"), and conflict
306 column ("--conflict-column"). The chunk column is only used to chunk
307 the table; e.g. "WHERE id >= 5 AND id < 10". Chunks are checksummed
308 and when chunk checksums reveal a difference, the tool selects the rows
309 in that chunk and checksums the "--columns" for each row. If a column
310 checksum differs, the rows have one or more conflicting column values.
311 In a traditional unidirectional sync, the conflict is a moot point
312 because it can be resolved simply by updating the entire destination
313 row with the source row's values. In a bidirectional sync, however,
314 the "--conflict-column" (in accordance with other "--conflict-*"
315 options list below) is compared to determine which row is "correct" or
316 "authoritative"; this row becomes the "source".
317
318 To sync all three servers completely, two runs of mk-table-sync are
319 required. The first run syncs c1 and r1, then syncs c1 and r2
320 including any changes from r1. At this point c1 and r2 are completely
321 in sync, but r1 is missing any changes from r2 because c1 didn't have
322 these changes when it and r1 were synced. So a second run is needed
323 which syncs the servers in the same order, but this time when c1 and r1
324 are synced r1 gets r2's changes.
325
326 The tool does not sync N-ways, only bidirectionally between the first
327 DSN given on the command line and each subsequent DSN in turn. So the
328 tool in this example would be ran twice like:
329
330 mk-table-sync --bidirectional h=c1 h=r1 h=r2
331
332 The "--bidirectional" option enables this feature and causes various
333 sanity checks to be performed. You must specify other options that
334 tell mk-table-sync how to resolve conflicts for same but differing
335 rows. These options are:
336
337 * L<"--conflict-column">
338 * L<"--conflict-comparison">
339 * L<"--conflict-value">
340 * L<"--conflict-threshold">
341 * L<"--conflict-error"> (optional)
342
343 Use "--print" to test this option before "--execute". The printed SQL
344 statements will have comments saying on which host the statment would
345 be executed if you used "--execute".
346
347 Technical side note: the first DSN is always the "left" server and the
348 other DSNs are always the "right" server. Since either server can
349 become the source or destination it's confusing to think of them as
350 "src" and "dst". Therefore, they're generically referred to as left
351 and right. It's easy to remember this because the first DSN is always
352 to the left of the other server DSNs on the command line.
353
355 Exit status is as follows:
356
357 STATUS MEANING
358 ====== =======================================================
359 0 Success.
360 1 Internal error.
361 2 At least one table differed on the destination.
362 3 Combination of 1 and 2.
363
365 Specify at least one of "--print", "--execute", or "--dry-run".
366
367 "--where" and "--replicate" are mutually exclusive.
368
369 --algorithms
370 type: string; default: Chunk,Nibble,GroupBy,Stream
371
372 Algorithm to use when comparing the tables, in order of preference.
373
374 For each table, mk-table-sync will check if the table can be synced
375 with the given algorithms in the order that they're given. The
376 first algorithm that can sync the table is used. See "ALGORITHMS".
377
378 --ask-pass
379 Prompt for a password when connecting to MySQL.
380
381 --bidirectional
382 Enable bidirectional sync between first and subsequent hosts.
383
384 See "BIDIRECTIONAL SYNCING" for more information.
385
386 --[no]bin-log
387 default: yes
388
389 Log to the binary log ("SET SQL_LOG_BIN=1").
390
391 Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".
392
393 --buffer-in-mysql
394 Instruct MySQL to buffer queries in its memory.
395
396 This option adds the "SQL_BUFFER_RESULT" option to the comparison
397 queries. This causes MySQL to execute the queries and place them
398 in a temporary table internally before sending the results back to
399 mk-table-sync. The advantage of this strategy is that mk-table-
400 sync can fetch rows as desired without using a lot of memory inside
401 the Perl process, while releasing locks on the MySQL table (to
402 reduce contention with other queries). The disadvantage is that it
403 uses more memory on the MySQL server instead.
404
405 You probably want to leave "--[no]buffer-to-client" enabled too,
406 because buffering into a temp table and then fetching it all into
407 Perl's memory is probably a silly thing to do. This option is most
408 useful for the GroupBy and Stream algorithms, which may fetch a lot
409 of data from the server.
410
411 --[no]buffer-to-client
412 default: yes
413
414 Fetch rows one-by-one from MySQL while comparing.
415
416 This option enables "mysql_use_result" which causes MySQL to hold
417 the selected rows on the server until the tool fetches them. This
418 allows the tool to use less memory but may keep the rows locked on
419 the server longer.
420
421 If this option is disabled by specifying "--no-buffer-to-client"
422 then "mysql_store_result" is used which causes MySQL to send all
423 selected rows to the tool at once. This may result in the results
424 "cursor" being held open for a shorter time on the server, but if
425 the tables are large, it could take a long time anyway, and use all
426 your memory.
427
428 For most non-trivial data sizes, you want to leave this option
429 enabled.
430
431 This option is disabled when "--bidirectional" is used.
432
433 --charset
434 short form: -A; type: string
435
436 Default character set. If the value is utf8, sets Perl's binmode
437 on STDOUT to utf8, passes the mysql_enable_utf8 option to
438 DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any
439 other value sets binmode on STDOUT without the utf8 layer, and runs
440 SET NAMES after connecting to MySQL.
441
442 --[no]check-master
443 default: yes
444
445 With "--sync-to-master", try to verify that the detected master is
446 the real master.
447
448 --[no]check-privileges
449 default: yes
450
451 Check that user has all necessary privileges on source and
452 destination table.
453
454 --[no]check-slave
455 default: yes
456
457 Check whether the destination server is a slave.
458
459 If the destination server is a slave, it's generally unsafe to make
460 changes on it. However, sometimes you have to; "--replace" won't
461 work unless there's a unique index, for example, so you can't make
462 changes on the master in that scenario. By default mk-table-sync
463 will complain if you try to change data on a slave. Specify
464 "--no-slave-check" to disable this check. Use it at your own risk.
465
466 --[no]check-triggers
467 default: yes
468
469 Check that no triggers are defined on the destination table.
470
471 Triggers were introduced in MySQL v5.0.2, so for older versions
472 this option has no effect because triggers will not be checked.
473
474 --chunk-column
475 type: string
476
477 Chunk the table on this column.
478
479 --chunk-index
480 type: string
481
482 Chunk the table using this index.
483
484 --chunk-size
485 type: string; default: 1000
486
487 Number of rows or data size per chunk.
488
489 The size of each chunk of rows for the "Chunk" and "Nibble"
490 algorithms. The size can be either a number of rows, or a data
491 size. Data sizes are specified with a suffix of k=kibibytes,
492 M=mebibytes, G=gibibytes. Data sizes are converted to a number of
493 rows by dividing by the average row length.
494
495 --columns
496 short form: -c; type: array
497
498 Compare this comma-separated list of columns.
499
500 --config
501 type: Array
502
503 Read this comma-separated list of config files; if specified, this
504 must be the first option on the command line.
505
506 --conflict-column
507 type: string
508
509 Compare this column when rows conflict during a "--bidirectional"
510 sync.
511
512 When a same but differing row is found the value of this column
513 from each row is compared according to "--conflict-comparison",
514 "--conflict-value" and "--conflict-threshold" to determine which
515 row has the correct data and becomes the source. The column can be
516 any type for which there is an appropriate "--conflict-comparison"
517 (this is almost all types except, for example, blobs).
518
519 This option only works with "--bidirectional". See "BIDIRECTIONAL
520 SYNCING" for more information.
521
522 --conflict-comparison
523 type: string
524
525 Choose the "--conflict-column" with this property as the source.
526
527 The option affects how the "--conflict-column" values from the
528 conflicting rows are compared. Possible comparisons are one of
529 these MAGIC_comparisons:
530
531 newest|oldest|greatest|least|equals|matches
532
533 COMPARISON CHOOSES ROW WITH
534 ========== =========================================================
535 newest Newest temporal L<"--conflict-column"> value
536 oldest Oldest temporal L<"--conflict-column"> value
537 greatest Greatest numerical L<"--conflict-column"> value
538 least Least numerical L<"--conflict-column"> value
539 equals L<"--conflict-column"> value equal to L<"--conflict-value">
540 matches L<"--conflict-column"> value matching Perl regex pattern
541 L<"--conflict-value">
542
543 This option only works with "--bidirectional". See "BIDIRECTIONAL
544 SYNCING" for more information.
545
546 --conflict-error
547 type: string; default: warn
548
549 How to report unresolvable conflicts and conflict errors
550
551 This option changes how the user is notified when a conflict cannot
552 be resolved or causes some kind of error. Possible values are:
553
554 * warn: Print a warning to STDERR about the unresolvable conflict
555 * die: Die, stop syncing, and print a warning to STDERR
556
557 This option only works with "--bidirectional". See "BIDIRECTIONAL
558 SYNCING" for more information.
559
560 --conflict-threshold
561 type: string
562
563 Amount by which one "--conflict-column" must exceed the other.
564
565 The "--conflict-threshold" prevents a conflict from being resolved
566 if the absolute difference between the two "--conflict-column"
567 values is less than this amount. For example, if two
568 "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
569 "2009-12-01 12:05:00" the difference is 5 minutes. If
570 "--conflict-threshold" is set to "5m" the conflict will be
571 resolved, but if "--conflict-threshold" is set to "6m" the conflict
572 will fail to resolve because the difference is not greater than or
573 equal to 6 minutes. In this latter case, "--conflict-error" will
574 report the failure.
575
576 This option only works with "--bidirectional". See "BIDIRECTIONAL
577 SYNCING" for more information.
578
579 --conflict-value
580 type: string
581
582 Use this value for certain "--conflict-comparison".
583
584 This option gives the value for "equals" and "matches"
585 "--conflict-comparison".
586
587 This option only works with "--bidirectional". See "BIDIRECTIONAL
588 SYNCING" for more information.
589
590 --databases
591 short form: -d; type: hash
592
593 Sync only this comma-separated list of databases.
594
595 A common request is to sync tables from one database with tables
596 from another database on the same or different server. This is not
597 yet possible. "--databases" will not do it, and you can't do it
598 with the D part of the DSN either because in the absence of a table
599 name it assumes the whole server should be synced and the D part
600 controls only the connection's default database.
601
602 --defaults-file
603 short form: -F; type: string
604
605 Only read mysql options from the given file. You must give an
606 absolute pathname.
607
608 --dry-run
609 Analyze, decide the sync algorithm to use, print and exit.
610
611 Implies "--verbose" so you can see the results. The results are in
612 the same output format that you'll see from actually running the
613 tool, but there will be zeros for rows affected. This is because
614 the tool actually executes, but stops before it compares any data
615 and just returns zeros. The zeros do not mean there are no changes
616 to be made.
617
618 --engines
619 short form: -e; type: hash
620
621 Sync only this comma-separated list of storage engines.
622
623 --execute
624 Execute queries to make the tables have identical data.
625
626 This option makes mk-table-sync actually sync table data by
627 executing all the queries that it created to resolve table
628 differences. Therefore, the tables will be changed! And unless
629 you also specify "--verbose", the changes will be made silently.
630 If this is not what you want, see "--print" or "--dry-run".
631
632 --explain-hosts
633 Print connection information and exit.
634
635 Print out a list of hosts to which mk-table-sync will connect, with
636 all the various connection options, and exit.
637
638 --float-precision
639 type: int
640
641 Precision for "FLOAT" and "DOUBLE" column comparisons.
642
643 If you specify this option, FLOAT and DOUBLE columns will be
644 rounded to the specified number of digits after the decimal point
645 for the checksum. This can avoid mismatches due to different
646 floating-point representations of the same values on different
647 MySQL versions and hardware.
648
649 --[no]foreign-key-checks
650 default: yes
651
652 Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").
653
654 Specifying "--no-foreign-key-checks" will "SET
655 FOREIGN_KEY_CHECKS=0".
656
657 --function
658 type: string
659
660 Which hash function you'd like to use for checksums.
661
662 The default is "CRC32". Other good choices include "MD5" and
663 "SHA1". If you have installed the "FNV_64" user-defined function,
664 "mk-table-sync" will detect it and prefer to use it, because it is
665 much faster than the built-ins. You can also use MURMUR_HASH if
666 you've installed that user-defined function. Both of these are
667 distributed with Maatkit. See mk-table-checksum for more
668 information and benchmarks.
669
670 --help
671 Show help and exit.
672
673 --[no]hex-blob
674 default: yes
675
676 "HEX()" "BLOB", "TEXT" and "BINARY" columns.
677
678 When row data from the source is fetched to create queries to sync
679 the data (i.e. the queries seen with "--print" and exeucted by
680 "--execute"), binary columns are wrapped in HEX() so the binary
681 data does not produce an invalid SQL statement. You can disable
682 this option but you probably shouldn't.
683
684 --host
685 short form: -h; type: string
686
687 Connect to host.
688
689 --ignore-columns
690 type: Hash
691
692 Ignore this comma-separated list of column names in comparisons.
693
694 This option causes columns not to be compared. However, if a row
695 is determined to differ between tables, all columns in that row
696 will be synced, regardless. (It is not currently possible to
697 exclude columns from the sync process itself, only from the
698 comparison.)
699
700 --ignore-databases
701 type: Hash
702
703 Ignore this comma-separated list of databases.
704
705 --ignore-engines
706 type: Hash; default: FEDERATED,MRG_MyISAM
707
708 Ignore this comma-separated list of storage engines.
709
710 --ignore-tables
711 type: Hash
712
713 Ignore this comma-separated list of tables.
714
715 Table names may be qualified with the database name.
716
717 --[no]index-hint
718 default: yes
719
720 Add FORCE/USE INDEX hints to the chunk and row queries.
721
722 By default "mk-table-sync" adds a FORCE/USE INDEX hint to each SQL
723 statement to coerce MySQL into using the index chosen by the sync
724 algorithm or specified by "--chunk-index". This is usually a good
725 thing, but in rare cases the index may not be the best for the
726 query so you can suppress the index hint by specifying
727 "--no-index-hint" and let MySQL choose the index.
728
729 This does not affect the queries printed by "--print"; it only
730 affects the chunk and row queries that "mk-table-sync" uses to
731 select and compare rows.
732
733 --lock
734 type: int
735
736 Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
737
738 This uses "LOCK TABLES". This can help prevent tables being
739 changed while you're examining them. The possible values are as
740 follows:
741
742 VALUE MEANING
743 ===== =======================================================
744 0 Never lock tables.
745 1 Lock and unlock one time per sync cycle (as implemented
746 by the syncing algorithm). This is the most granular
747 level of locking available. For example, the Chunk
748 algorithm will lock each chunk of C<N> rows, and then
749 unlock them if they are the same on the source and the
750 destination, before moving on to the next chunk.
751 2 Lock and unlock before and after each table.
752 3 Lock and unlock once for every server (DSN) synced, with
753 C<FLUSH TABLES WITH READ LOCK>.
754
755 A replication slave is never locked if "--replicate" or
756 "--sync-to-master" is specified, since in theory locking the table
757 on the master should prevent any changes from taking place. (You
758 are not changing data on your slave, right?) If "--wait" is given,
759 the master (source) is locked and then the tool waits for the slave
760 to catch up to the master before continuing.
761
762 If "--transaction" is specified, "LOCK TABLES" is not used.
763 Instead, lock and unlock are implemented by beginning and
764 committing transactions. The exception is if "--lock" is 3.
765
766 If "--no-transaction" is specified, then "LOCK TABLES" is used for
767 any value of "--lock". See "--[no]transaction".
768
769 --lock-and-rename
770 Lock the source and destination table, sync, then swap names. This
771 is useful as a less-blocking ALTER TABLE, once the tables are
772 reasonably in sync with each other (which you may choose to
773 accomplish via any number of means, including dump and reload or
774 even something like mk-archiver). It requires exactly two DSNs and
775 assumes they are on the same server, so it does no waiting for
776 replication or the like. Tables are locked with LOCK TABLES.
777
778 --password
779 short form: -p; type: string
780
781 Password to use when connecting.
782
783 --pid
784 type: string
785
786 Create the given PID file. The file contains the process ID of the
787 script. The PID file is removed when the script exits. Before
788 starting, the script checks if the PID file already exists. If it
789 does not, then the script creates and writes its own PID to it. If
790 it does, then the script checks the following: if the file contains
791 a PID and a process is running with that PID, then the script dies;
792 or, if there is no process running with that PID, then the script
793 overwrites the file with its own PID and starts; else, if the file
794 contains no PID, then the script dies.
795
796 --port
797 short form: -P; type: int
798
799 Port number to use for connection.
800
801 --print
802 Print queries that will resolve differences.
803
804 If you don't trust "mk-table-sync", or just want to see what it
805 will do, this is a good way to be safe. These queries are valid
806 SQL and you can run them yourself if you want to sync the tables
807 manually.
808
809 --recursion-method
810 type: string
811
812 Preferred recursion method used to find slaves.
813
814 Possible methods are:
815
816 METHOD USES
817 =========== ================
818 processlist SHOW PROCESSLIST
819 hosts SHOW SLAVE HOSTS
820
821 The processlist method is preferred because SHOW SLAVE HOSTS is not
822 reliable. However, the hosts method is required if the server uses
823 a non-standard port (not 3306). Usually mk-table-sync does the
824 right thing and finds the slaves, but you may give a preferred
825 method and it will be used first. If it doesn't find any slaves,
826 the other methods will be tried.
827
828 --replace
829 Write all "INSERT" and "UPDATE" statements as "REPLACE".
830
831 This is automatically switched on as needed when there are unique
832 index violations.
833
834 --replicate
835 type: string
836
837 Sync tables listed as different in this table.
838
839 Specifies that "mk-table-sync" should examine the specified table
840 to find data that differs. The table is exactly the same as the
841 argument of the same name to mk-table-checksum. That is, it
842 contains records of which tables (and ranges of values) differ
843 between the master and slave.
844
845 For each table and range of values that shows differences between
846 the master and slave, "mk-table-checksum" will sync that table,
847 with the appropriate "WHERE" clause, to its master.
848
849 This automatically sets "--wait" to 60 and causes changes to be
850 made on the master instead of the slave.
851
852 If "--sync-to-master" is specified, the tool will assume the server
853 you specified is the slave, and connect to the master as usual to
854 sync.
855
856 Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of
857 the server you specified. If it is unable to find any slaves via
858 "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.
859 You must configure each slave's "report-host", "report-port" and
860 other options for this to work right. After finding slaves, it
861 will inspect the specified table on each slave to find data that
862 needs to be synced, and sync it.
863
864 The tool examines the master's copy of the table first, assuming
865 that the master is potentially a slave as well. Any table that
866 shows differences there will NOT be synced on the slave(s). For
867 example, suppose your replication is set up as A->B, B->C, B->D.
868 Suppose you use this argument and specify server B. The tool will
869 examine server B's copy of the table. If it looks like server B's
870 data in table "test.tbl1" is different from server A's copy, the
871 tool will not sync that table on servers C and D.
872
873 --set-vars
874 type: string; default: wait_timeout=10000
875
876 Set these MySQL variables. Immediately after connecting to MySQL,
877 this string will be appended to SET and executed.
878
879 --socket
880 short form: -S; type: string
881
882 Socket file to use for connection.
883
884 --sync-to-master
885 Treat the DSN as a slave and sync it to its master.
886
887 Treat the server you specified as a slave. Inspect "SHOW SLAVE
888 STATUS", connect to the server's master, and treat the master as
889 the source and the slave as the destination. Causes changes to be
890 made on the master. Sets "--wait" to 60 by default, sets "--lock"
891 to 1 by default, and disables "--[no]transaction" by default. See
892 also "--replicate", which changes this option's behavior.
893
894 --tables
895 short form: -t; type: hash
896
897 Sync only this comma-separated list of tables.
898
899 Table names may be qualified with the database name.
900
901 --timeout-ok
902 Keep going if "--wait" fails.
903
904 If you specify "--wait" and the slave doesn't catch up to the
905 master's position before the wait times out, the default behavior
906 is to abort. This option makes the tool keep going anyway.
907 Warning: if you are trying to get a consistent comparision between
908 the two servers, you probably don't want to keep going after a
909 timeout.
910
911 --[no]transaction
912 Use transactions instead of "LOCK TABLES".
913
914 The granularity of beginning and committing transactions is
915 controlled by "--lock". This is enabled by default, but since
916 "--lock" is disabled by default, it has no effect.
917
918 Most options that enable locking also disable transactions by
919 default, so if you want to use transactional locking (via "LOCK IN
920 SHARE MODE" and "FOR UPDATE", you must specify "--transaction"
921 explicitly.
922
923 If you don't specify "--transaction" explicitly "mk-table-sync"
924 will decide on a per-table basis whether to use transactions or
925 table locks. It currently uses transactions on InnoDB tables, and
926 table locks on all others.
927
928 If "--no-transaction" is specified, then "mk-table-sync" will not
929 use transactions at all (not even for InnoDB tables) and locking is
930 controlled by "--lock".
931
932 When enabled, either explicitly or implicitly, the transaction
933 isolation level is set "REPEATABLE READ" and transactions are
934 started "WITH CONSISTENT SNAPSHOT".
935
936 --trim
937 "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes. Helps
938 when comparing MySQL 4.1 to >= 5.0.
939
940 This is useful when you don't care about the trailing space
941 differences between MySQL versions which vary in their handling of
942 trailing spaces. MySQL 5.0 and later all retain trailing spaces in
943 "VARCHAR", while previous versions would remove them.
944
945 --[no]unique-checks
946 default: yes
947
948 Enable unique key checks ("SET UNIQUE_CHECKS=1").
949
950 Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".
951
952 --user
953 short form: -u; type: string
954
955 User for login if not current user.
956
957 --verbose
958 short form: -v; cumulative: yes
959
960 Print results of sync operations.
961
962 See "OUTPUT" for more details about the output.
963
964 --version
965 Show version and exit.
966
967 --wait
968 short form: -w; type: time
969
970 How long to wait for slaves to catch up to their master.
971
972 Make the master wait for the slave to catch up in replication
973 before comparing the tables. The value is the number of seconds to
974 wait before timing out (see also "--timeout-ok"). Sets "--lock" to
975 1 and "--[no]transaction" to 0 by default. If you see an error
976 such as the following,
977
978 MASTER_POS_WAIT returned -1
979
980 It means the timeout was exceeded and you need to increase it.
981
982 The default value of this option is influenced by other options.
983 To see what value is in effect, run with "--help".
984
985 To disable waiting entirely (except for locks), specify "--wait" 0.
986 This helps when the slave is lagging on tables that are not being
987 synced.
988
989 --where
990 type: string
991
992 "WHERE" clause to restrict syncing to part of the table.
993
994 --[no]zero-chunk
995 default: yes
996
997 Add a chunk for rows with zero or zero-equivalent values. The only
998 has an effect when "--chunk-size" is specified. The purpose of the
999 zero chunk is to capture a potentially large number of zero values
1000 that would imbalance the size of the first chunk. For example, if
1001 a lot of negative numbers were inserted into an unsigned integer
1002 column causing them to be stored as zeros, then these zero values
1003 are captured by the zero chunk instead of the first chunk and all
1004 its non-zero values.
1005
1007 These DSN options are used to create a DSN. Each option is given like
1008 "option=value". The options are case-sensitive, so P and p are not the
1009 same option. There cannot be whitespace before or after the "=" and if
1010 the value contains whitespace it must be quoted. DSN options are
1011 comma-separated. See the maatkit manpage for full details.
1012
1013 · A
1014
1015 dsn: charset; copy: yes
1016
1017 Default character set.
1018
1019 · D
1020
1021 dsn: database; copy: yes
1022
1023 Database containing the table to be synced.
1024
1025 · F
1026
1027 dsn: mysql_read_default_file; copy: yes
1028
1029 Only read default options from the given file
1030
1031 · h
1032
1033 dsn: host; copy: yes
1034
1035 Connect to host.
1036
1037 · p
1038
1039 dsn: password; copy: yes
1040
1041 Password to use when connecting.
1042
1043 · P
1044
1045 dsn: port; copy: yes
1046
1047 Port number to use for connection.
1048
1049 · S
1050
1051 dsn: mysql_socket; copy: yes
1052
1053 Socket file to use for connection.
1054
1055 · t
1056
1057 copy: yes
1058
1059 Table to be synced.
1060
1061 · u
1062
1063 dsn: user; copy: yes
1064
1065 User for login if not current user.
1066
1068 You can download Maatkit from Google Code at
1069 <http://code.google.com/p/maatkit/>, or you can get any of the tools
1070 easily with a command like the following:
1071
1072 wget http://www.maatkit.org/get/toolname
1073 or
1074 wget http://www.maatkit.org/trunk/toolname
1075
1076 Where "toolname" can be replaced with the name (or fragment of a name)
1077 of any of the Maatkit tools. Once downloaded, they're ready to run; no
1078 installation is needed. The first URL gets the latest released version
1079 of the tool, and the second gets the latest trunk code from Subversion.
1080
1082 The environment variable MKDEBUG enables verbose debugging output in
1083 all of the Maatkit tools:
1084
1085 MKDEBUG=1 mk-....
1086
1088 You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1089 installed in any reasonably new version of Perl.
1090
1092 For a list of known bugs see: http://www.maatkit.org/bugs/mk-table-sync
1093 <http://www.maatkit.org/bugs/mk-table-sync>.
1094
1095 Please use Google Code Issues and Groups to report bugs or request
1096 support: <http://code.google.com/p/maatkit/>. You can also join
1097 #maatkit on Freenode to discuss Maatkit.
1098
1099 Please include the complete command-line used to reproduce the problem
1100 you are seeing, the version of all MySQL servers involved, the complete
1101 output of the tool when run with "--version", and if possible,
1102 debugging output produced by running with the "MKDEBUG=1" environment
1103 variable.
1104
1106 This program is copyright 2007-2010 Baron Schwartz. Feedback and
1107 improvements are welcome.
1108
1109 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1110 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1111 MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1112
1113 This program is free software; you can redistribute it and/or modify it
1114 under the terms of the GNU General Public License as published by the
1115 Free Software Foundation, version 2; OR the Perl Artistic License. On
1116 UNIX and similar systems, you can issue `man perlgpl' or `man
1117 perlartistic' to read these licenses.
1118
1119 You should have received a copy of the GNU General Public License along
1120 with this program; if not, write to the Free Software Foundation, Inc.,
1121 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
1122
1124 Baron Schwartz
1125
1127 This tool is part of Maatkit, a toolkit for power users of MySQL.
1128 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
1129 primary code contributors. Both are employed by Percona. Financial
1130 support for Maatkit development is primarily provided by Percona and
1131 its clients.
1132
1134 My work is based in part on Giuseppe Maxia's work on distributed
1135 databases, <http://www.sysadminmag.com/articles/2004/0408/> and code
1136 derived from that article. There is more explanation, and a link to
1137 the code, at <http://www.perlmonks.org/?node_id=381053>.
1138
1139 Another programmer extended Maxia's work even further. Fabien Coelho
1140 changed and generalized Maxia's technique, introducing symmetry and
1141 avoiding some problems that might have caused too-frequent checksum
1142 collisions. This work grew into pg_comparator,
1143 <http://www.coelho.net/pg_comparator/>. Coelho also explained the
1144 technique further in a paper titled "Remote Comparison of Database
1145 Tables" (http://cri.ensmp.fr/classement/doc/A-375.pdf
1146 <http://cri.ensmp.fr/classement/doc/A-375.pdf>).
1147
1148 This existing literature mostly addressed how to find the differences
1149 between the tables, not how to resolve them once found. I needed a
1150 tool that would not only find them efficiently, but would then resolve
1151 them. I first began thinking about how to improve the technique
1152 further with my article http://tinyurl.com/mysql-data-diff-algorithm
1153 <http://tinyurl.com/mysql-data-diff-algorithm>, where I discussed a
1154 number of problems with the Maxia/Coelho "bottom-up" algorithm. After
1155 writing that article, I began to write this tool. I wanted to actually
1156 implement their algorithm with some improvements so I was sure I
1157 understood it completely. I discovered it is not what I thought it
1158 was, and is considerably more complex than it appeared to me at first.
1159 Fabien Coelho was kind enough to address some questions over email.
1160
1161 The first versions of this tool implemented a version of the
1162 Coelho/Maxia algorithm, which I called "bottom-up", and my own, which I
1163 called "top-down." Those algorithms are considerably more complex than
1164 the current algorithms and I have removed them from this tool, and may
1165 add them back later. The improvements to the bottom-up algorithm are
1166 my original work, as is the top-down algorithm. The techniques to
1167 actually resolve the differences are also my own work.
1168
1169 Another tool that can synchronize tables is the SQLyog Job Agent from
1170 webyog. Thanks to Rohit Nadhani, SJA's author, for the conversations
1171 about the general techniques. There is a comparison of mk-table-sync
1172 and SJA at http://tinyurl.com/maatkit-vs-sqlyog
1173 <http://tinyurl.com/maatkit-vs-sqlyog>
1174
1175 Thanks to the following people and organizations for helping in many
1176 ways:
1177
1178 The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB
1179 <http://www.mysql.com/>, Blue Ridge InternetWorks
1180 <http://www.briworks.com/>, Percona <http://www.percona.com/>, Fabien
1181 Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp
1182 (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at Perlmonks, And
1183 others too numerous to mention.
1184
1186 This manual page documents Ver 1.0.30 Distrib 6839 $Revision: 6832 $.
1187
1188
1189
1190perl v5.12.1 2010-08-01 MK-TABLE-SYNC(1)