1MK-PARALLEL-RESTORE(1)User Contributed Perl DocumentationMK-PARALLEL-RESTORE(1)
2
3
4

NAME

6       mk-parallel-restore - Load files into MySQL in parallel.
7

SYNOPSIS

9         mk-parallel-restore /path/to/files
10         mk-parallel-restore --tab /path/to/files
11

RISKS

13       The following section is included to inform users about the potential
14       risks, whether known or unknown, of using this tool.  The two main
15       categories of risks are those created by the nature of the tool (e.g.
16       read-only tools vs. read-write tools) and those created by bugs.
17
18       mk-parallel-restore is not a backup program!  It is only for fast data
19       imports, for purposes such as loading test data into a system quickly.
20       Do not use mk-parallel-restore for backups.  mk-parallel-restore
21       inserts data unless you use the "--dry-run" option.
22
23       At the time of this release, there is a bug that prevents huge
24       statements from being printed when an error is encountered and a bug
25       applying "--[no]foreign-key-checks" when truncating tables.
26
27       The authoritative source for updated information is always the online
28       issue tracking system.  Issues that affect this tool will be marked as
29       such.  You can see a list of such issues at the following URL:
30       http://www.maatkit.org/bugs/mk-parallel-restore
31       <http://www.maatkit.org/bugs/mk-parallel-restore>.
32
33       See also "BUGS" for more information on filing bugs and getting help.
34

DESCRIPTION

36       mk-parallel-restore is a way to load SQL or delimited-file dumps into
37       MySQL in parallel at high speed.  It is especially designed for
38       restoring files dumped by mk-parallel-dump.  It automatically detects
39       whether a file contains SQL or delimited data from the filename
40       extension, and either shells out to "mysql" or executes "LOAD DATA
41       INFILE" with the file.  On UNIX-like systems, it will even make a FIFO
42       to decompress gzipped files for "LOAD DATA INFILE".
43
44       By default it discovers all files in the directory you specify on the
45       command line.  It uses the file's parent directory as the database name
46       and the file's name (up to the first dot) as the table name.  It can
47       deal with files named like the following:
48
49         dir/tbl.sql
50         dir/tbl.txt
51         dir/tbl.csv
52
53       It is also happy with files that look like this, where "EXT" is one of
54       the extensions just listed.
55
56         dir/tbl.EXT.000
57         dir/tbl.EXT.000.gz
58
59       By default, it loads "SQL" files first, if they exist, then loads "CSV"
60       or "TXT" files next, in order of the numbers in the filename extension
61       as just shown.  This makes it easy for you to reload a table's
62       definition followed by its data, in case you dumped them into separate
63       files (as happens with "mysqldump"'s "--tab" option).  See mk-parallel-
64       dump for details on how data is dumped.
65
66       Exit status is 0 if everything went well, 1 if any files failed, and
67       any other value indicates an internal error.
68

OUTPUT

70       Output depends on verbosity.  When "--dry-run" is given, output
71       includes commands that would be executed.
72
73       When "--verbose" is 0, there is normally no output unless there's an
74       error.
75
76       When "--verbose" is 1, there is one line of output for the entire job,
77       showing how many tables were processed, how many files were loaded with
78       what status, how much time elapsed, and how much time the parallel load
79       jobs added up to.  If any files were skipped, the filenames are printed
80       to the output.
81
82       When "--verbose" is 2, there's one line of output per table, showing
83       extra data such as how many threads were running when each table
84       finished loading:
85
86         DATABASE TABLE            FILES  TIME STATUS THREADS
87         sakila   language             2  0.07      0       2
88         sakila   film_actor           2  0.07      0       2
89         sakila   actor                2  0.06      0       2
90         sakila   payment              2  0.07      0       2
91         sakila   transport_backup     2  0.05      0       2
92         sakila   country              2  0.08      0       2
93         sakila   film                 2  0.05      0       2
94         sakila   rental               2  0.07      0       2
95

SPEED OF PARALLEL LOADING

97       User-contributed benchmarks are welcome.  See
98       http://www.paragon-cs.com/wordpress/?p=52 <http://www.paragon-
99       cs.com/wordpress/?p=52> for one user's experiences.
100

OPTIONS

102       --ask-pass
103           Prompt for a password when connecting to MySQL.
104
105       --[no]atomic-resume
106           default: yes
107
108           Treat chunks as atomic when resuming restore.
109
110           By default "mk-parallel-restore" resumes restoration from the first
111           chunk that is missing all its rows.  For dumps of transactionally-
112           safe tables (InnoDB), it cannot happen that a chunk is only
113           partially restored.  Therefore, restoring from the first missing
114           chunk is safe.
115
116           However, for dumps of non-transactionally safe tables, it is
117           possible that a chunk can be only partially restored.  In such
118           cases, the chunk will wrongly appear to be fully restored.
119           Therefore, you must specify "--no-atomic-resume" so that the
120           partially restored chunk is fully restored.
121
122       --base-dir
123           type: string
124
125           Directory where FIFO files will be created.
126
127       --[no]biggest-first
128           default: yes
129
130           Restore the biggest tables first for highest concurrency.
131
132       --[no]bin-log
133           default: yes
134
135           Enable binary logging ("SET SQL_LOG_BIN=1").
136
137           Restore operations are replicated by default (SQL_LOG_BIN=1) except
138           for "--tab" restores which are not replicated by default
139           (SQL_LOG_BIN=0).  This prevents large loads from being logged to
140           the server's binary log.
141
142           The value given on the command line overrides the defaults.
143           Therefore, specifying "--bin-log" with "--tab" will allow the
144           "--tab" restore to replicate.
145
146       --bulk-insert-buffer-size
147           type: int
148
149           Set bulk_insert_buffer_size before each "LOAD DATA INFILE".
150
151           Has no effect without "--tab".
152
153       --charset
154           short form: -A; type: string; default: BINARY
155
156           Sets the connection, database, and "LOAD DATA INFILE" character
157           set.
158
159           The default is "BINARY", which is the safest value to use for "LOAD
160           DATA INFILE".  Has no effect without "--tab".
161
162       --[no]commit
163           default: yes
164
165           Commit after each file.
166
167       --config
168           type: Array
169
170           Read this comma-separated list of config files; if specified, this
171           must be the first option on the command line.
172
173       --create-databases
174           Create databases if they don't exist.
175
176       --[no]create-tables
177           default: yes
178
179           Create tables.
180
181           See also "--[no]drop-tables".
182
183       --csv
184           Files are in CSV format (implies "--tab").
185
186           Changes "--tab" options so the following "LOAD DATA INFILE"
187           statement is used:
188
189              LOAD DATA INFILE <filename> INTO TABLE <table>
190              FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
191              LINES TERMINATED BY '\n';
192
193       --database
194           short form: -D; type: string
195
196           Load all files into this database.
197
198           Overrides the database which is normally specified by the directory
199           in which the files live.  Does not specify a default database for
200           the connection.
201
202       --databases
203           short form: -d; type: hash
204
205           Restore only this comma-separated list of databases.
206
207       --databases-regex
208           type: string
209
210           Restore only databases whose names match this regex.
211
212       --decompress
213           type: string; default: gzip -d -c
214
215           Command used to decompress and print .gz files to STDOUT (like
216           zcat).
217
218       --defaults-file
219           short form: -F; type: string
220
221           Only read mysql options from the given file.  You must give an
222           absolute pathname.
223
224       --[no]disable-keys
225           default: yes
226
227           Execute "ALTER TABLE DISABLE KEYS" before each MyISAM table.
228
229           This option only works with MyISAM tables.
230
231       --[no]drop-tables
232           default: yes
233
234           Execute "DROP TABLE IF EXISTS" before creating each table.
235
236       --dry-run
237           Print commands instead of executing them.
238
239       --fast-index
240           Do InnoDB plugin fast index creation by restoring secondary indexes
241           after data.
242
243           This option only works with InnoDB tables and the InnoDB plugin.
244
245       --[no]fifo
246           default: yes
247
248           Stream files into a FIFO for "--tab".
249
250           Load compressed tab-separated files by piping them into a FIFO and
251           using the FIFO with "LOAD DATA INFILE", instead of by decompressing
252           the files on disk.  Sets "--umask" to 0.
253
254       --[no]foreign-key-checks
255           default: yes
256
257           Set "FOREIGN_KEY_CHECKS=1" before "LOAD DATA INFILE".
258
259       --help
260           Show help and exit.
261
262       --host
263           short form: -h; type: string
264
265           Connect to host.
266
267       --ignore
268           Adds the "IGNORE" modifier to "LOAD DATA INFILE".
269
270       --ignore-databases
271           type: Hash
272
273           Ignore this comma-separated list of databases.
274
275       --ignore-tables
276           type: Hash
277
278           Ignore this comma-separated list of table names.
279
280           Table names may be qualified with the database name.
281
282       --local
283           Uses the "LOCAL" option to "LOAD DATA INFILE".
284
285           If you enable this option, the files are read locally by the client
286           library, not by the server.
287
288       --[no]lock-tables
289           Lock tables before "LOAD DATA INFILE".
290
291       --[no]no-auto-value-on-0
292           default: yes
293
294           Set SQL "NO_AUTO_VALUE_ON_ZERO".
295
296       --only-empty-databases
297           Restore only to empty databases.
298
299           By default mk-parallel-restore will restore tables into a database
300           so long as it exists (or is created by "--create-databases").  This
301           option is a safety feature that prevents any tables from being
302           restored into a database that already has tables even if those
303           tables are the same ones being restored.  If you specify this
304           option, every database must have zero tables.
305
306           This implicitly disables "--[no]resume".  "--create-databases" will
307           work if the database doesn't already exist and it creates it.
308
309           The databases are checked after all filters ("--databases", etc.)
310
311       --password
312           short form: -p; type: string
313
314           Password to use when connecting.
315
316       --pid
317           type: string
318
319           Create the given PID file.  The file contains the process ID of the
320           script.  The PID file is removed when the script exits.  Before
321           starting, the script checks if the PID file already exists.  If it
322           does not, then the script creates and writes its own PID to it.  If
323           it does, then the script checks the following: if the file contains
324           a PID and a process is running with that PID, then the script dies;
325           or, if there is no process running with that PID, then the script
326           overwrites the file with its own PID and starts; else, if the file
327           contains no PID, then the script dies.
328
329       --port
330           short form: -P; type: int
331
332           Port number to use for connection.
333
334       --progress
335           Display progress messages.
336
337           Progress is displayed each time a table finishes loading.  Progress
338           is calculated by measuring the size of each file to be loaded, and
339           assuming all bytes are created equal.  The output is the completed
340           and total size, the percent completed, estimated time remaining,
341           and estimated completion time.
342
343       --quiet
344           short form: -q
345
346           Sets "--verbose" to 0.
347
348       --replace
349           Adds the "REPLACE" modifier to "LOAD DATA INFILE".
350
351       --[no]resume
352           default: yes
353
354           Resume the restore from a previously incomplete restore.
355
356           By default, "mk-parallel-restore" checks each table's chunks for
357           existing rows and restores only from the point where a previous
358           restore stopped.  Specify --no-resume to disable restore resumption
359           and fully restores every table.
360
361           Restore resumption does not work with tab-separated files or dumps
362           that were not chunked.
363
364       --set-vars
365           type: string; default: wait_timeout=10000
366
367           Set these MySQL variables.  Immediately after connecting to MySQL,
368           this string will be appended to SET and executed.
369
370       --socket
371           short form: -S; type: string
372
373           Socket file to use for connection.
374
375       --tab
376           Load tab-separated files with "LOAD DATA INFILE".
377
378           This is similar to what "mysqlimport" does, but more flexible.
379
380           The following options are enabled unless they are specifically
381           disabled on the command line:
382
383              L<"--commit">
384              L<"--[no]disable-keys">
385              L<"--[no]no-auto-value-on-0">
386
387           And the following options are disabled ("--no-bin-log", etc.)
388           unless they are specifically enabled on the command line:
389
390              L<"--[no]bin-log">
391              L<"--[no]unique-checks">
392              L<"--[no]foreign-key-checks">
393
394       --tables
395           short form: -t; type: hash
396
397           Restore only this comma-separated list of table names.
398
399           Table names may be qualified with the database name.
400
401       --tables-regex
402           type: string
403
404           Restore only tables whose names match this regex.
405
406       --threads
407           type: int; default: 2
408
409           Specifies the number of parallel processes to run.
410
411           The default is 2 (this is mk-parallel-restore after all -- 1 is not
412           parallel).  On GNU/Linux machines, the default is the number of
413           times 'processor' appears in /proc/cpuinfo.  On Windows, the
414           default is read from the environment.  In any case, the default is
415           at least 2, even when there's only a single processor.
416
417       --truncate
418           Run "TRUNCATE TABLE" before "LOAD DATA INFILE".
419
420           This will delete all rows from a table before loading the first
421           tab-delimited file into it.
422
423       --umask
424           type: string
425
426           Set the program's "umask" to this octal value.
427
428           This is useful when you want created files (such as FIFO files) to
429           be readable or writable by other users (for example, the MySQL
430           server itself).
431
432       --[no]unique-checks
433           default: yes
434
435           Set "UNIQUE_CHECKS=1" before "LOAD DATA INFILE".
436
437       --user
438           short form: -u; type: string
439
440           User for login if not current user.
441
442       --verbose
443           short form: -v; cumulative: yes; default: 1
444
445           Verbosity; can specify multiple times.
446
447           Repeatedly specifying it increments the verbosity.  Default is 1 if
448           not specified.  See "OUTPUT".
449
450       --version
451           Show version and exit.
452
453       --wait
454           short form: -w; type: time; default: 5m
455
456           Wait limit when server is down.
457
458           If the MySQL server crashes during loading, waits until the server
459           comes back and then continues with the rest of the files.
460           "mk-parallel-restore" will check the server every second until this
461           time is exhausted, at which point it will give up and exit.
462

DSN OPTIONS

464       These DSN options are used to create a DSN.  Each option is given like
465       "option=value".  The options are case-sensitive, so P and p are not the
466       same option.  There cannot be whitespace before or after the "=" and if
467       the value contains whitespace it must be quoted.  DSN options are
468       comma-separated.  See the maatkit manpage for full details.
469
470       ·   A
471
472           dsn: charset; copy: yes
473
474           Default character set.
475
476       ·   D
477
478           dsn: database; copy: yes
479
480           Default database.
481
482       ·   F
483
484           dsn: mysql_read_default_file; copy: yes
485
486           Only read default options from the given file
487
488       ·   h
489
490           dsn: host; copy: yes
491
492           Connect to host.
493
494       ·   p
495
496           dsn: password; copy: yes
497
498           Password to use when connecting.
499
500       ·   P
501
502           dsn: port; copy: yes
503
504           Port number to use for connection.
505
506       ·   S
507
508           dsn: mysql_socket; copy: yes
509
510           Socket file to use for connection.
511
512       ·   u
513
514           dsn: user; copy: yes
515
516           User for login if not current user.
517

DOWNLOADING

519       You can download Maatkit from Google Code at
520       <http://code.google.com/p/maatkit/>, or you can get any of the tools
521       easily with a command like the following:
522
523          wget http://www.maatkit.org/get/toolname
524          or
525          wget http://www.maatkit.org/trunk/toolname
526
527       Where "toolname" can be replaced with the name (or fragment of a name)
528       of any of the Maatkit tools.  Once downloaded, they're ready to run; no
529       installation is needed.  The first URL gets the latest released version
530       of the tool, and the second gets the latest trunk code from Subversion.
531

ENVIRONMENT

533       The environment variable "MKDEBUG" enables verbose debugging output in
534       all of the Maatkit tools:
535
536          MKDEBUG=1 mk-....
537

SYSTEM REQUIREMENTS

539       You need Perl, DBI, DBD::mysql, and some core packages that ought to be
540       installed in any reasonably new version of Perl.
541

BUGS

543       For list of known bugs see
544       http://www.maatkit.org/bugs/mk-parallel-restore
545       <http://www.maatkit.org/bugs/mk-parallel-restore>.
546
547       Please use Google Code Issues and Groups to report bugs or request
548       support: <http://code.google.com/p/maatkit/>.  You can also join
549       #maatkit on Freenode to discuss Maatkit.
550
551       Please include the complete command-line used to reproduce the problem
552       you are seeing, the version of all MySQL servers involved, the complete
553       output of the tool when run with "--version", and if possible,
554       debugging output produced by running with the "MKDEBUG=1" environment
555       variable.
556

COPYRIGHT, LICENSE AND WARRANTY

558       This program is copyright 2007-2010 Baron Schwartz.  Feedback and
559       improvements are welcome.
560
561       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
562       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
563       MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
564
565       This program is free software; you can redistribute it and/or modify it
566       under the terms of the GNU General Public License as published by the
567       Free Software Foundation, version 2; OR the Perl Artistic License.  On
568       UNIX and similar systems, you can issue `man perlgpl' or `man
569       perlartistic' to read these licenses.
570
571       You should have received a copy of the GNU General Public License along
572       with this program; if not, write to the Free Software Foundation, Inc.,
573       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.
574

SEE ALSO

576       See also mk-parallel-dump.
577

AUTHOR

579       Baron Schwartz
580

ABOUT MAATKIT

582       This tool is part of Maatkit, a toolkit for power users of MySQL.
583       Maatkit was created by Baron Schwartz; Baron and Daniel Nichter are the
584       primary code contributors.  Both are employed by Percona.  Financial
585       support for Maatkit development is primarily provided by Percona and
586       its clients.
587

VERSION

589       This manual page documents Ver 1.0.23 Distrib 6839 $Revision: 6831 $.
590
591
592
593perl v5.12.1                      2010-08-01            MK-PARALLEL-RESTORE(1)
Impressum