1MK-PARALLEL-RESTORE(1)User Contributed Perl DocumentationMK-PARALLEL-RESTORE(1)
2
3
4
6 mk-parallel-restore - Load files into MySQL in parallel.
7
9 mk-parallel-restore /path/to/files
10 mk-parallel-restore --tab /path/to/files
11
13 The following section is included to inform users about the potential
14 risks, whether known or unknown, of using this tool. The two main
15 categories of risks are those created by the nature of the tool (e.g.
16 read-only tools vs. read-write tools) and those created by bugs.
17
18 mk-parallel-restore is not a backup program! It is only for fast data
19 imports, for purposes such as loading test data into a system quickly.
20 Do not use mk-parallel-restore for backups. mk-parallel-restore
21 inserts data unless you use the "--dry-run" option.
22
23 At the time of this release, there is a bug that prevents huge
24 statements from being printed when an error is encountered and a bug
25 applying "--[no]foreign-key-checks" when truncating tables.
26
27 The authoritative source for updated information is always the online
28 issue tracking system. Issues that affect this tool will be marked as
29 such. You can see a list of such issues at the following URL:
30 http://www.maatkit.org/bugs/mk-parallel-restore
31 <http://www.maatkit.org/bugs/mk-parallel-restore>.
32
33 See also "BUGS" for more information on filing bugs and getting help.
34
36 mk-parallel-restore is a way to load SQL or delimited-file dumps into
37 MySQL in parallel at high speed. It is especially designed for
38 restoring files dumped by mk-parallel-dump. It automatically detects
39 whether a file contains SQL or delimited data from the filename
40 extension, and either shells out to "mysql" or executes "LOAD DATA
41 INFILE" with the file. On UNIX-like systems, it will even make a FIFO
42 to decompress gzipped files for "LOAD DATA INFILE".
43
44 By default it discovers all files in the directory you specify on the
45 command line. It uses the file's parent directory as the database name
46 and the file's name (up to the first dot) as the table name. It can
47 deal with files named like the following:
48
49 dir/tbl.sql
50 dir/tbl.txt
51 dir/tbl.csv
52
53 It is also happy with files that look like this, where "EXT" is one of
54 the extensions just listed.
55
56 dir/tbl.EXT.000
57 dir/tbl.EXT.000.gz
58
59 By default, it loads "SQL" files first, if they exist, then loads "CSV"
60 or "TXT" files next, in order of the numbers in the filename extension
61 as just shown. This makes it easy for you to reload a table's
62 definition followed by its data, in case you dumped them into separate
63 files (as happens with "mysqldump"'s "--tab" option). See mk-parallel-
64 dump for details on how data is dumped.
65
66 Exit status is 0 if everything went well, 1 if any files failed, and
67 any other value indicates an internal error.
68
70 Output depends on verbosity. When "--dry-run" is given, output
71 includes commands that would be executed.
72
73 When "--verbose" is 0, there is normally no output unless there's an
74 error.
75
76 When "--verbose" is 1, there is one line of output for the entire job,
77 showing how many tables were processed, how many files were loaded with
78 what status, how much time elapsed, and how much time the parallel load
79 jobs added up to. If any files were skipped, the filenames are printed
80 to the output.
81
82 When "--verbose" is 2, there's one line of output per table, showing
83 extra data such as how many threads were running when each table
84 finished loading:
85
86 DATABASE TABLE FILES TIME STATUS THREADS
87 sakila language 2 0.07 0 2
88 sakila film_actor 2 0.07 0 2
89 sakila actor 2 0.06 0 2
90 sakila payment 2 0.07 0 2
91 sakila transport_backup 2 0.05 0 2
92 sakila country 2 0.08 0 2
93 sakila film 2 0.05 0 2
94 sakila rental 2 0.07 0 2
95
97 User-contributed benchmarks are welcome. See
98 http://www.paragon-cs.com/wordpress/?p=52 <http://www.paragon-
99 cs.com/wordpress/?p=52> for one user's experiences.
100
102 --ask-pass
103 Prompt for a password when connecting to MySQL.
104
105 --[no]atomic-resume
106 default: yes
107
108 Treat chunks as atomic when resuming restore.
109
110 By default "mk-parallel-restore" resumes restoration from the first
111 chunk that is missing all its rows. For dumps of transactionally-
112 safe tables (InnoDB), it cannot happen that a chunk is only
113 partially restored. Therefore, restoring from the first missing
114 chunk is safe.
115
116 However, for dumps of non-transactionally safe tables, it is
117 possible that a chunk can be only partially restored. In such
118 cases, the chunk will wrongly appear to be fully restored.
119 Therefore, you must specify "--no-atomic-resume" so that the
120 partially restored chunk is fully restored.
121
122 --base-dir
123 type: string
124
125 Directory where FIFO files will be created.
126
127 --[no]biggest-first
128 default: yes
129
130 Restore the biggest tables first for highest concurrency.
131
132 --[no]bin-log
133 default: yes
134
135 Enable binary logging ("SET SQL_LOG_BIN=1").
136
137 Restore operations are replicated by default (SQL_LOG_BIN=1) except
138 for "--tab" restores which are not replicated by default
139 (SQL_LOG_BIN=0). This prevents large loads from being logged to
140 the server's binary log.
141
142 The value given on the command line overrides the defaults.
143 Therefore, specifying "--bin-log" with "--tab" will allow the
144 "--tab" restore to replicate.
145
146 --bulk-insert-buffer-size
147 type: int
148
149 Set bulk_insert_buffer_size before each "LOAD DATA INFILE".
150
151 Has no effect without "--tab".
152
153 --charset
154 short form: -A; type: string; default: BINARY
155
156 Sets the connection, database, and "LOAD DATA INFILE" character
157 set.
158
159 The default is "BINARY", which is the safest value to use for "LOAD
160 DATA INFILE". Has no effect without "--tab".
161
162 --[no]commit
163 default: yes
164
165 Commit after each file.
166
167 --config
168 type: Array
169
170 Read this comma-separated list of config files; if specified, this
171 must be the first option on the command line.
172
173 --create-databases
174 Create databases if they don't exist.
175
176 --[no]create-tables
177 default: yes
178
179 Create tables.
180
181 See also "--[no]drop-tables".
182
183 --csv
184 Files are in CSV format (implies "--tab").
185
186 Changes "--tab" options so the following "LOAD DATA INFILE"
187 statement is used:
188
189 LOAD DATA INFILE <filename> INTO TABLE <table>
190 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
191 LINES TERMINATED BY '\n';
192
193 --database
194 short form: -D; type: string
195
196 Load all files into this database.
197
198 Overrides the database which is normally specified by the directory
199 in which the files live. Does not specify a default database for
200 the connection.
201
202 --databases
203 short form: -d; type: hash
204
205 Restore only this comma-separated list of databases.
206
207 --databases-regex
208 type: string
209
210 Restore only databases whose names match this regex.
211
212 --decompress
213 type: string; default: gzip -d -c
214
215 Command used to decompress and print .gz files to STDOUT (like
216 zcat).
217
218 --defaults-file
219 short form: -F; type: string
220
221 Only read mysql options from the given file. You must give an
222 absolute pathname.
223
224 --[no]disable-keys
225 default: yes
226
227 Execute "ALTER TABLE DISABLE KEYS" before each MyISAM table.
228
229 This option only works with MyISAM tables.
230
231 --[no]drop-tables
232 default: yes
233
234 Execute "DROP TABLE IF EXISTS" before creating each table.
235
236 --dry-run
237 Print commands instead of executing them.
238
239 --fast-index
240 Do InnoDB plugin fast index creation by restoring secondary indexes
241 after data.
242
243 This option only works with InnoDB tables and the InnoDB plugin.
244
245 --[no]fifo
246 default: yes
247
248 Stream files into a FIFO for "--tab".
249
250 Load compressed tab-separated files by piping them into a FIFO and
251 using the FIFO with "LOAD DATA INFILE", instead of by decompressing
252 the files on disk. Sets "--umask" to 0.
253
254 --[no]foreign-key-checks
255 default: yes
256
257 Set "FOREIGN_KEY_CHECKS=1" before "LOAD DATA INFILE".
258
259 --help
260 Show help and exit.
261
262 --host
263 short form: -h; type: string
264
265 Connect to host.
266
267 --ignore
268 Adds the "IGNORE" modifier to "LOAD DATA INFILE".
269
270 --ignore-databases
271 type: Hash
272
273 Ignore this comma-separated list of databases.
274
275 --ignore-tables
276 type: Hash
277
278 Ignore this comma-separated list of table names.
279
280 Table names may be qualified with the database name.
281
282 --local
283 Uses the "LOCAL" option to "LOAD DATA INFILE".
284
285 If you enable this option, the files are read locally by the client
286 library, not by the server.
287
288 --[no]lock-tables
289 Lock tables before "LOAD DATA INFILE".
290
291 --[no]no-auto-value-on-0
292 default: yes
293
294 Set SQL "NO_AUTO_VALUE_ON_ZERO".
295
296 --only-empty-databases
297 Restore only to empty databases.
298
299 By default mk-parallel-restore will restore tables into a database
300 so long as it exists (or is created by "--create-databases"). This
301 option is a safety feature that prevents any tables from being
302 restored into a database that already has tables even if those
303 tables are the same ones being restored. If you specify this
304 option, every database must have zero tables.
305
306 This implicitly disables "--[no]resume". "--create-databases" will
307 work if the database doesn't already exist and it creates it.
308
309 The databases are checked after all filters ("--databases", etc.)
310
311 --password
312 short form: -p; type: string
313
314 Password to use when connecting.
315
316 --pid
317 type: string
318
319 Create the given PID file. The file contains the process ID of the
320 script. The PID file is removed when the script exits. Before
321 starting, the script checks if the PID file already exists. If it
322 does not, then the script creates and writes its own PID to it. If
323 it does, then the script checks the following: if the file contains
324 a PID and a process is running with that PID, then the script dies;
325 or, if there is no process running with that PID, then the script
326 overwrites the file with its own PID and starts; else, if the file
327 contains no PID, then the script dies.
328
329 --port
330 short form: -P; type: int
331
332 Port number to use for connection.
333
334 --progress
335 Display progress messages.
336
337 Progress is displayed each time a table finishes loading. Progress
338 is calculated by measuring the size of each file to be loaded, and
339 assuming all bytes are created equal. The output is the completed
340 and total size, the percent completed, estimated time remaining,
341 and estimated completion time.
342
343 --quiet
344 short form: -q
345
346 Sets "--verbose" to 0.
347
348 --replace
349 Adds the "REPLACE" modifier to "LOAD DATA INFILE".
350
351 --[no]resume
352 default: yes
353
354 Resume the restore from a previously incomplete restore.
355
356 By default, "mk-parallel-restore" checks each table's chunks for
357 existing rows and restores only from the point where a previous
358 restore stopped. Specify --no-resume to disable restore resumption
359 and fully restores every table.
360
361 Restore resumption does not work with tab-separated files or dumps
362 that were not chunked.
363
364 --set-vars
365 type: string; default: wait_timeout=10000
366
367 Set these MySQL variables. Immediately after connecting to MySQL,
368 this string will be appended to SET and executed.
369
370 --socket
371 short form: -S; type: string
372
373 Socket file to use for connection.
374
375 --tab
376 Load tab-separated files with "LOAD DATA INFILE".
377
378 This is similar to what "mysqlimport" does, but more flexible.
379
380 The following options are enabled unless they are specifically
381 disabled on the command line:
382
383 L<"--commit">
384 L<"--[no]disable-keys">
385 L<"--[no]no-auto-value-on-0">
386
387 And the following options are disabled ("--no-bin-log", etc.)
388 unless they are specifically enabled on the command line:
389
390 L<"--[no]bin-log">
391 L<"--[no]unique-checks">
392 L<"--[no]foreign-key-checks">
393
394 --tables
395 short form: -t; type: hash
396
397 Restore only this comma-separated list of table names.
398
399 Table names may be qualified with the database name.
400
401 --tables-regex
402 type: string
403
404 Restore only tables whose names match this regex.
405
406 --threads
407 type: int; default: 2
408
409 Specifies the number of parallel processes to run.
410
411 The default is 2 (this is mk-parallel-restore after all -- 1 is not
412 parallel). On GNU/Linux machines, the default is the number of
413 times 'processor' appears in /proc/cpuinfo. On Windows, the
414 default is read from the environment. In any case, the default is
415 at least 2, even when there's only a single processor.
416
417 --truncate
418 Run "TRUNCATE TABLE" before "LOAD DATA INFILE".
419
420 This will delete all rows from a table before loading the first
421 tab-delimited file into it.
422
423 --umask
424 type: string
425
426 Set the program's "umask" to this octal value.
427
428 This is useful when you want created files (such as FIFO files) to
429 be readable or writable by other users (for example, the MySQL
430 server itself).
431
432 --[no]unique-checks
433 default: yes
434
435 Set "UNIQUE_CHECKS=1" before "LOAD DATA INFILE".
436
437 --user
438 short form: -u; type: string
439
440 User for login if not current user.
441
442 --verbose
443 short form: -v; cumulative: yes; default: 1
444
445 Verbosity; can specify multiple times.
446
447 Repeatedly specifying it increments the verbosity. Default is 1 if
448 not specified. See "OUTPUT".
449
450 --version
451 Show version and exit.
452
453 --wait
454 short form: -w; type: time; default: 5m
455
456 Wait limit when server is down.
457
458 If the MySQL server crashes during loading, waits until the server
459 comes back and then continues with the rest of the files.
460 "mk-parallel-restore" will check the server every second until this
461 time is exhausted, at which point it will give up and exit.
462
464 These DSN options are used to create a DSN. Each option is given like
465 "option=value". The options are case-sensitive, so P and p are not the
466 same option. There cannot be whitespace before or after the "=" and if
467 the value contains whitespace it must be quoted. DSN options are
468 comma-separated. See the maatkit manpage for full details.
469
470 · A
471
472 dsn: charset; copy: yes
473
474 Default character set.
475
476 · D
477
478 dsn: database; copy: yes
479
480 Default database.
481
482 · F
483
484 dsn: mysql_read_default_file; copy: yes
485
486 Only read default options from the given file
487
488 · h
489
490 dsn: host; copy: yes
491
492 Connect to host.
493
494 · p
495
496 dsn: password; copy: yes
497
498 Password to use when connecting.
499
500 · P
501
502 dsn: port; copy: yes
503
504 Port number to use for connection.
505
506 · S
507
508 dsn: mysql_socket; copy: yes
509
510 Socket file to use for connection.
511
512 · u
513
514 dsn: user; copy: yes
515
516 User for login if not current user.
517
519 You can download Maatkit from Google Code at
520 <http://code.google.com/p/maatkit/>, or you can get any of the tools
521 easily with a command like the following:
522
523 wget http://www.maatkit.org/get/toolname
524 or
525 wget http://www.maatkit.org/trunk/toolname
526
527 Where "toolname" can be replaced with the name (or fragment of a name)
528 of any of the Maatkit tools. Once downloaded, they're ready to run; no
529 installation is needed. The first URL gets the latest released version
530 of the tool, and the second gets the latest trunk code from Subversion.
531
533 The environment variable "MKDEBUG" enables verbose debugging output in
534 all of the Maatkit tools:
535
536 MKDEBUG=1 mk-....
537
539 You need Perl, DBI, DBD::mysql, and some core packages that ought to be
540 installed in any reasonably new version of Perl.
541
543 For list of known bugs see
544 http://www.maatkit.org/bugs/mk-parallel-restore
545 <http://www.maatkit.org/bugs/mk-parallel-restore>.
546
547 Please use Google Code Issues and Groups to report bugs or request
548 support: <http://code.google.com/p/maatkit/>. You can also join
549 #maatkit on Freenode to discuss Maatkit.
550
551 Please include the complete command-line used to reproduce the problem
552 you are seeing, the version of all MySQL servers involved, the complete
553 output of the tool when run with "--version", and if possible,
554 debugging output produced by running with the "MKDEBUG=1" environment
555 variable.
556
558 This program is copyright 2007-2010 Baron Schwartz. Feedback and
559 improvements are welcome.
560
561 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
562 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
563 MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
564
565 This program is free software; you can redistribute it and/or modify it
566 under the terms of the GNU General Public License as published by the
567 Free Software Foundation, version 2; OR the Perl Artistic License. On
568 UNIX and similar systems, you can issue `man perlgpl' or `man
569 perlartistic' to read these licenses.
570
571 You should have received a copy of the GNU General Public License along
572 with this program; if not, write to the Free Software Foundation, Inc.,
573 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
574
576 See also mk-parallel-dump.
577
579 Baron Schwartz
580
582 This tool is part of Maatkit, a toolkit for power users of MySQL.
583 Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
584 primary code contributors. Both are employed by Percona. Financial
585 support for Maatkit development is primarily provided by Percona and
586 its clients.
587
589 This manual page documents Ver 1.0.23 Distrib 6839 $Revision: 6831 $.
590
591
592
593perl v5.12.1 2010-08-01 MK-PARALLEL-RESTORE(1)