1GIT-FAST-IMPORT(1) Git Manual GIT-FAST-IMPORT(1)
2
3
4
6 git-fast-import - Backend for fast Git data importers
7
9 frontend | git fast-import [<options>]
10
11
13 This program is usually not what the end user wants to run directly.
14 Most end users want to use one of the existing frontend programs, which
15 parses a specific type of foreign source and feeds the contents stored
16 there to git fast-import.
17
18 fast-import reads a mixed command/data stream from standard input and
19 writes one or more packfiles directly into the current repository. When
20 EOF is received on standard input, fast import writes out updated
21 branch and tag refs, fully updating the current repository with the
22 newly imported data.
23
24 The fast-import backend itself can import into an empty repository (one
25 that has already been initialized by git init) or incrementally update
26 an existing populated repository. Whether or not incremental imports
27 are supported from a particular foreign source depends on the frontend
28 program in use.
29
31 --force
32 Force updating modified existing branches, even if doing so would
33 cause commits to be lost (as the new commit does not contain the
34 old commit).
35
36 --quiet
37 Disable all non-fatal output, making fast-import silent when it is
38 successful. This option disables the output shown by --stats.
39
40 --stats
41 Display some basic statistics about the objects fast-import has
42 created, the packfiles they were stored into, and the memory used
43 by fast-import during this run. Showing this output is currently
44 the default, but can be disabled with --quiet.
45
46 Options for Frontends
47 --cat-blob-fd=<fd>
48 Write responses to get-mark, cat-blob, and ls queries to the file
49 descriptor <fd> instead of stdout. Allows progress output intended
50 for the end-user to be separated from other output.
51
52 --date-format=<fmt>
53 Specify the type of dates the frontend will supply to fast-import
54 within author, committer and tagger commands. See “Date Formats”
55 below for details about which formats are supported, and their
56 syntax.
57
58 --done
59 Terminate with error if there is no done command at the end of the
60 stream. This option might be useful for detecting errors that cause
61 the frontend to terminate before it has started to write a stream.
62
63 Locations of Marks Files
64 --export-marks=<file>
65 Dumps the internal marks table to <file> when complete. Marks are
66 written one per line as :markid SHA-1. Frontends can use this file
67 to validate imports after they have been completed, or to save the
68 marks table across incremental runs. As <file> is only opened and
69 truncated at checkpoint (or completion) the same path can also be
70 safely given to --import-marks.
71
72 --import-marks=<file>
73 Before processing any input, load the marks specified in <file>.
74 The input file must exist, must be readable, and must use the same
75 format as produced by --export-marks. Multiple options may be
76 supplied to import more than one set of marks. If a mark is defined
77 to different values, the last file wins.
78
79 --import-marks-if-exists=<file>
80 Like --import-marks but instead of erroring out, silently skips the
81 file if it does not exist.
82
83 --[no-]relative-marks
84 After specifying --relative-marks the paths specified with
85 --import-marks= and --export-marks= are relative to an internal
86 directory in the current repository. In git-fast-import this means
87 that the paths are relative to the .git/info/fast-import directory.
88 However, other importers may use a different location.
89
90 Relative and non-relative marks may be combined by interweaving
91 --(no-)-relative-marks with the --(import|export)-marks= options.
92
93 Performance and Compression Tuning
94 --active-branches=<n>
95 Maximum number of branches to maintain active at once. See “Memory
96 Utilization” below for details. Default is 5.
97
98 --big-file-threshold=<n>
99 Maximum size of a blob that fast-import will attempt to create a
100 delta for, expressed in bytes. The default is 512m (512 MiB). Some
101 importers may wish to lower this on systems with constrained
102 memory.
103
104 --depth=<n>
105 Maximum delta depth, for blob and tree deltification. Default is
106 50.
107
108 --export-pack-edges=<file>
109 After creating a packfile, print a line of data to <file> listing
110 the filename of the packfile and the last commit on each branch
111 that was written to that packfile. This information may be useful
112 after importing projects whose total object set exceeds the 4 GiB
113 packfile limit, as these commits can be used as edge points during
114 calls to git pack-objects.
115
116 --max-pack-size=<n>
117 Maximum size of each output packfile. The default is unlimited.
118
119 fastimport.unpackLimit
120 See git-config(1)
121
123 The design of fast-import allows it to import large projects in a
124 minimum amount of memory usage and processing time. Assuming the
125 frontend is able to keep up with fast-import and feed it a constant
126 stream of data, import times for projects holding 10+ years of history
127 and containing 100,000+ individual commits are generally completed in
128 just 1-2 hours on quite modest (~$2,000 USD) hardware.
129
130 Most bottlenecks appear to be in foreign source data access (the source
131 just cannot extract revisions fast enough) or disk IO (fast-import
132 writes as fast as the disk will take the data). Imports will run faster
133 if the source data is stored on a different drive than the destination
134 Git repository (due to less IO contention).
135
137 A typical frontend for fast-import tends to weigh in at approximately
138 200 lines of Perl/Python/Ruby code. Most developers have been able to
139 create working importers in just a couple of hours, even though it is
140 their first exposure to fast-import, and sometimes even to Git. This is
141 an ideal situation, given that most conversion tools are throw-away
142 (use once, and never look back).
143
145 Like git push or git fetch, imports handled by fast-import are safe to
146 run alongside parallel git repack -a -d or git gc invocations, or any
147 other Git operation (including git prune, as loose objects are never
148 used by fast-import).
149
150 fast-import does not lock the branch or tag refs it is actively
151 importing. After the import, during its ref update phase, fast-import
152 tests each existing branch ref to verify the update will be a
153 fast-forward update (the commit stored in the ref is contained in the
154 new history of the commit to be written). If the update is not a
155 fast-forward update, fast-import will skip updating that ref and
156 instead prints a warning message. fast-import will always attempt to
157 update all branch refs, and does not stop on the first failure.
158
159 Branch updates can be forced with --force, but it’s recommended that
160 this only be used on an otherwise quiet repository. Using --force is
161 not necessary for an initial import into an empty repository.
162
164 fast-import tracks a set of branches in memory. Any branch can be
165 created or modified at any point during the import process by sending a
166 commit command on the input stream. This design allows a frontend
167 program to process an unlimited number of branches simultaneously,
168 generating commits in the order they are available from the source
169 data. It also simplifies the frontend programs considerably.
170
171 fast-import does not use or alter the current working directory, or any
172 file within it. (It does however update the current Git repository, as
173 referenced by GIT_DIR.) Therefore an import frontend may use the
174 working directory for its own purposes, such as extracting file
175 revisions from the foreign source. This ignorance of the working
176 directory also allows fast-import to run very quickly, as it does not
177 need to perform any costly file update operations when switching
178 between branches.
179
181 With the exception of raw file data (which Git does not interpret) the
182 fast-import input format is text (ASCII) based. This text based format
183 simplifies development and debugging of frontend programs, especially
184 when a higher level language such as Perl, Python or Ruby is being
185 used.
186
187 fast-import is very strict about its input. Where we say SP below we
188 mean exactly one space. Likewise LF means one (and only one) linefeed
189 and HT one (and only one) horizontal tab. Supplying additional
190 whitespace characters will cause unexpected results, such as branch
191 names or file names with leading or trailing spaces in their name, or
192 early termination of fast-import when it encounters unexpected input.
193
194 Stream Comments
195 To aid in debugging frontends fast-import ignores any line that begins
196 with # (ASCII pound/hash) up to and including the line ending LF. A
197 comment line may contain any sequence of bytes that does not contain an
198 LF and therefore may be used to include any detailed debugging
199 information that might be specific to the frontend and useful when
200 inspecting a fast-import data stream.
201
202 Date Formats
203 The following date formats are supported. A frontend should select the
204 format it will use for this import by passing the format name in the
205 --date-format=<fmt> command-line option.
206
207 raw
208 This is the Git native format and is <time> SP <offutc>. It is also
209 fast-import’s default format, if --date-format was not specified.
210
211 The time of the event is specified by <time> as the number of
212 seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
213 written as an ASCII decimal integer.
214
215 The local offset is specified by <offutc> as a positive or negative
216 offset from UTC. For example EST (which is 5 hours behind UTC)
217 would be expressed in <tz> by “-0500” while UTC is “+0000”. The
218 local offset does not affect <time>; it is used only as an
219 advisement to help formatting routines display the timestamp.
220
221 If the local offset is not available in the source material, use
222 “+0000”, or the most common local offset. For example many
223 organizations have a CVS repository which has only ever been
224 accessed by users who are located in the same location and time
225 zone. In this case a reasonable offset from UTC could be assumed.
226
227 Unlike the rfc2822 format, this format is very strict. Any
228 variation in formatting will cause fast-import to reject the value.
229
230 rfc2822
231 This is the standard email format as described by RFC 2822.
232
233 An example value is “Tue Feb 6 11:22:18 2007 -0500”. The Git parser
234 is accurate, but a little on the lenient side. It is the same
235 parser used by git am when applying patches received from email.
236
237 Some malformed strings may be accepted as valid dates. In some of
238 these cases Git will still be able to obtain the correct date from
239 the malformed string. There are also some types of malformed
240 strings which Git will parse wrong, and yet consider valid.
241 Seriously malformed strings will be rejected.
242
243 Unlike the raw format above, the time zone/UTC offset information
244 contained in an RFC 2822 date string is used to adjust the date
245 value to UTC prior to storage. Therefore it is important that this
246 information be as accurate as possible.
247
248 If the source material uses RFC 2822 style dates, the frontend
249 should let fast-import handle the parsing and conversion (rather
250 than attempting to do it itself) as the Git parser has been well
251 tested in the wild.
252
253 Frontends should prefer the raw format if the source material
254 already uses UNIX-epoch format, can be coaxed to give dates in that
255 format, or its format is easily convertible to it, as there is no
256 ambiguity in parsing.
257
258 now
259 Always use the current time and time zone. The literal now must
260 always be supplied for <when>.
261
262 This is a toy format. The current time and time zone of this system
263 is always copied into the identity string at the time it is being
264 created by fast-import. There is no way to specify a different time
265 or time zone.
266
267 This particular format is supplied as it’s short to implement and
268 may be useful to a process that wants to create a new commit right
269 now, without needing to use a working directory or git
270 update-index.
271
272 If separate author and committer commands are used in a commit the
273 timestamps may not match, as the system clock will be polled twice
274 (once for each command). The only way to ensure that both author
275 and committer identity information has the same timestamp is to
276 omit author (thus copying from committer) or to use a date format
277 other than now.
278
279 Commands
280 fast-import accepts several commands to update the current repository
281 and control the current import process. More detailed discussion (with
282 examples) of each command follows later.
283
284 commit
285 Creates a new branch or updates an existing branch by creating a
286 new commit and updating the branch to point at the newly created
287 commit.
288
289 tag
290 Creates an annotated tag object from an existing commit or branch.
291 Lightweight tags are not supported by this command, as they are not
292 recommended for recording meaningful points in time.
293
294 reset
295 Reset an existing branch (or a new branch) to a specific revision.
296 This command must be used to change a branch to a specific revision
297 without making a commit on it.
298
299 blob
300 Convert raw file data into a blob, for future use in a commit
301 command. This command is optional and is not needed to perform an
302 import.
303
304 checkpoint
305 Forces fast-import to close the current packfile, generate its
306 unique SHA-1 checksum and index, and start a new packfile. This
307 command is optional and is not needed to perform an import.
308
309 progress
310 Causes fast-import to echo the entire line to its own standard
311 output. This command is optional and is not needed to perform an
312 import.
313
314 done
315 Marks the end of the stream. This command is optional unless the
316 done feature was requested using the --done command-line option or
317 feature done command.
318
319 get-mark
320 Causes fast-import to print the SHA-1 corresponding to a mark to
321 the file descriptor set with --cat-blob-fd, or stdout if
322 unspecified.
323
324 cat-blob
325 Causes fast-import to print a blob in cat-file --batch format to
326 the file descriptor set with --cat-blob-fd or stdout if
327 unspecified.
328
329 ls
330 Causes fast-import to print a line describing a directory entry in
331 ls-tree format to the file descriptor set with --cat-blob-fd or
332 stdout if unspecified.
333
334 feature
335 Enable the specified feature. This requires that fast-import
336 supports the specified feature, and aborts if it does not.
337
338 option
339 Specify any of the options listed under OPTIONS that do not change
340 stream semantic to suit the frontend’s needs. This command is
341 optional and is not needed to perform an import.
342
343 commit
344 Create or update a branch with a new commit, recording one logical
345 change to the project.
346
347 'commit' SP <ref> LF
348 mark?
349 ('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
350 'committer' (SP <name>)? SP LT <email> GT SP <when> LF
351 data
352 ('from' SP <commit-ish> LF)?
353 ('merge' SP <commit-ish> LF)?
354 (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
355 LF?
356
357 where <ref> is the name of the branch to make the commit on. Typically
358 branch names are prefixed with refs/heads/ in Git, so importing the CVS
359 branch symbol RELENG-1_0 would use refs/heads/RELENG-1_0 for the value
360 of <ref>. The value of <ref> must be a valid refname in Git. As LF is
361 not valid in a Git refname, no quoting or escaping syntax is supported
362 here.
363
364 A mark command may optionally appear, requesting fast-import to save a
365 reference to the newly created commit for future use by the frontend
366 (see below for format). It is very common for frontends to mark every
367 commit they create, thereby allowing future branch creation from any
368 imported commit.
369
370 The data command following committer must supply the commit message
371 (see below for data command syntax). To import an empty commit message
372 use a 0 length data. Commit messages are free-form and are not
373 interpreted by Git. Currently they must be encoded in UTF-8, as
374 fast-import does not permit other encodings to be specified.
375
376 Zero or more filemodify, filedelete, filecopy, filerename,
377 filedeleteall and notemodify commands may be included to update the
378 contents of the branch prior to creating the commit. These commands may
379 be supplied in any order. However it is recommended that a
380 filedeleteall command precede all filemodify, filecopy, filerename and
381 notemodify commands in the same commit, as filedeleteall wipes the
382 branch clean (see below).
383
384 The LF after the command is optional (it used to be required).
385
386 author
387 An author command may optionally appear, if the author information
388 might differ from the committer information. If author is omitted
389 then fast-import will automatically use the committer’s information
390 for the author portion of the commit. See below for a description
391 of the fields in author, as they are identical to committer.
392
393 committer
394 The committer command indicates who made this commit, and when they
395 made it.
396
397 Here <name> is the person’s display name (for example “Com M
398 Itter”) and <email> is the person’s email address
399 (“cm@example.com”). LT and GT are the literal less-than (\x3c) and
400 greater-than (\x3e) symbols. These are required to delimit the
401 email address from the other fields in the line. Note that <name>
402 and <email> are free-form and may contain any sequence of bytes,
403 except LT, GT and LF. <name> is typically UTF-8 encoded.
404
405 The time of the change is specified by <when> using the date format
406 that was selected by the --date-format=<fmt> command-line option.
407 See “Date Formats” above for the set of supported formats, and
408 their syntax.
409
410 from
411 The from command is used to specify the commit to initialize this
412 branch from. This revision will be the first ancestor of the new
413 commit. The state of the tree built at this commit will begin with
414 the state at the from commit, and be altered by the content
415 modifications in this commit.
416
417 Omitting the from command in the first commit of a new branch will
418 cause fast-import to create that commit with no ancestor. This
419 tends to be desired only for the initial commit of a project. If
420 the frontend creates all files from scratch when making a new
421 branch, a merge command may be used instead of from to start the
422 commit with an empty tree. Omitting the from command on existing
423 branches is usually desired, as the current commit on that branch
424 is automatically assumed to be the first ancestor of the new
425 commit.
426
427 As LF is not valid in a Git refname or SHA-1 expression, no quoting
428 or escaping syntax is supported within <commit-ish>.
429
430 Here <commit-ish> is any of the following:
431
432 · The name of an existing branch already in fast-import’s
433 internal branch table. If fast-import doesn’t know the name,
434 it’s treated as a SHA-1 expression.
435
436 · A mark reference, :<idnum>, where <idnum> is the mark number.
437
438 The reason fast-import uses : to denote a mark reference is
439 this character is not legal in a Git branch name. The leading :
440 makes it easy to distinguish between the mark 42 (:42) and the
441 branch 42 (42 or refs/heads/42), or an abbreviated SHA-1 which
442 happened to consist only of base-10 digits.
443
444 Marks must be declared (via mark) before they can be used.
445
446 · A complete 40 byte or abbreviated commit SHA-1 in hex.
447
448 · Any valid Git SHA-1 expression that resolves to a commit. See
449 “SPECIFYING REVISIONS” in gitrevisions(7) for details.
450
451 · The special null SHA-1 (40 zeros) specifies that the branch is
452 to be removed.
453
454 The special case of restarting an incremental import from the
455 current branch value should be written as:
456
457 from refs/heads/branch^0
458
459
460 The ^0 suffix is necessary as fast-import does not permit a branch
461 to start from itself, and the branch is created in memory before
462 the from command is even read from the input. Adding ^0 will force
463 fast-import to resolve the commit through Git’s revision parsing
464 library, rather than its internal branch table, thereby loading in
465 the existing value of the branch.
466
467 merge
468 Includes one additional ancestor commit. The additional ancestry
469 link does not change the way the tree state is built at this
470 commit. If the from command is omitted when creating a new branch,
471 the first merge commit will be the first ancestor of the current
472 commit, and the branch will start out with no files. An unlimited
473 number of merge commands per commit are permitted by fast-import,
474 thereby establishing an n-way merge.
475
476 Here <commit-ish> is any of the commit specification expressions
477 also accepted by from (see above).
478
479 filemodify
480 Included in a commit command to add a new file or change the
481 content of an existing file. This command has two different means
482 of specifying the content of the file.
483
484 External data format
485 The data content for the file was already supplied by a prior
486 blob command. The frontend just needs to connect it.
487
488 'M' SP <mode> SP <dataref> SP <path> LF
489
490 Here usually <dataref> must be either a mark reference
491 (:<idnum>) set by a prior blob command, or a full 40-byte SHA-1
492 of an existing Git blob object. If <mode> is 040000` then
493 <dataref> must be the full 40-byte SHA-1 of an existing Git
494 tree object or a mark reference set with --import-marks.
495
496 Inline data format
497 The data content for the file has not been supplied yet. The
498 frontend wants to supply it as part of this modify command.
499
500 'M' SP <mode> SP 'inline' SP <path> LF
501 data
502
503 See below for a detailed description of the data command.
504
505 In both formats <mode> is the type of file entry, specified in
506 octal. Git only supports the following modes:
507
508 · 100644 or 644: A normal (not-executable) file. The majority of
509 files in most projects use this mode. If in doubt, this is what
510 you want.
511
512 · 100755 or 755: A normal, but executable, file.
513
514 · 120000: A symlink, the content of the file will be the link
515 target.
516
517 · 160000: A gitlink, SHA-1 of the object refers to a commit in
518 another repository. Git links can only be specified by SHA or
519 through a commit mark. They are used to implement submodules.
520
521 · 040000: A subdirectory. Subdirectories can only be specified by
522 SHA or through a tree mark set with --import-marks.
523
524 In both formats <path> is the complete path of the file to be added
525 (if not already existing) or modified (if already existing).
526
527 A <path> string must use UNIX-style directory separators (forward
528 slash /), may contain any byte other than LF, and must not start
529 with double quote (").
530
531 A path can use C-style string quoting; this is accepted in all
532 cases and mandatory if the filename starts with double quote or
533 contains LF. In C-style quoting, the complete name should be
534 surrounded with double quotes, and any LF, backslash, or double
535 quote characters must be escaped by preceding them with a backslash
536 (e.g., "path/with\n, \\ and \" in it").
537
538 The value of <path> must be in canonical form. That is it must not:
539
540 · contain an empty directory component (e.g. foo//bar is
541 invalid),
542
543 · end with a directory separator (e.g. foo/ is invalid),
544
545 · start with a directory separator (e.g. /foo is invalid),
546
547 · contain the special component . or .. (e.g. foo/./bar and
548 foo/../bar are invalid).
549
550 The root of the tree can be represented by an empty string as
551 <path>.
552
553 It is recommended that <path> always be encoded using UTF-8.
554
555 filedelete
556 Included in a commit command to remove a file or recursively delete
557 an entire directory from the branch. If the file or directory
558 removal makes its parent directory empty, the parent directory will
559 be automatically removed too. This cascades up the tree until the
560 first non-empty directory or the root is reached.
561
562 'D' SP <path> LF
563
564 here <path> is the complete path of the file or subdirectory to be
565 removed from the branch. See filemodify above for a detailed
566 description of <path>.
567
568 filecopy
569 Recursively copies an existing file or subdirectory to a different
570 location within the branch. The existing file or directory must
571 exist. If the destination exists it will be completely replaced by
572 the content copied from the source.
573
574 'C' SP <path> SP <path> LF
575
576 here the first <path> is the source location and the second <path>
577 is the destination. See filemodify above for a detailed description
578 of what <path> may look like. To use a source path that contains SP
579 the path must be quoted.
580
581 A filecopy command takes effect immediately. Once the source
582 location has been copied to the destination any future commands
583 applied to the source location will not impact the destination of
584 the copy.
585
586 filerename
587 Renames an existing file or subdirectory to a different location
588 within the branch. The existing file or directory must exist. If
589 the destination exists it will be replaced by the source directory.
590
591 'R' SP <path> SP <path> LF
592
593 here the first <path> is the source location and the second <path>
594 is the destination. See filemodify above for a detailed description
595 of what <path> may look like. To use a source path that contains SP
596 the path must be quoted.
597
598 A filerename command takes effect immediately. Once the source
599 location has been renamed to the destination any future commands
600 applied to the source location will create new files there and not
601 impact the destination of the rename.
602
603 Note that a filerename is the same as a filecopy followed by a
604 filedelete of the source location. There is a slight performance
605 advantage to using filerename, but the advantage is so small that
606 it is never worth trying to convert a delete/add pair in source
607 material into a rename for fast-import. This filerename command is
608 provided just to simplify frontends that already have rename
609 information and don’t want bother with decomposing it into a
610 filecopy followed by a filedelete.
611
612 filedeleteall
613 Included in a commit command to remove all files (and also all
614 directories) from the branch. This command resets the internal
615 branch structure to have no files in it, allowing the frontend to
616 subsequently add all interesting files from scratch.
617
618 'deleteall' LF
619
620 This command is extremely useful if the frontend does not know (or
621 does not care to know) what files are currently on the branch, and
622 therefore cannot generate the proper filedelete commands to update
623 the content.
624
625 Issuing a filedeleteall followed by the needed filemodify commands
626 to set the correct content will produce the same results as sending
627 only the needed filemodify and filedelete commands. The
628 filedeleteall approach may however require fast-import to use
629 slightly more memory per active branch (less than 1 MiB for even
630 most large projects); so frontends that can easily obtain only the
631 affected paths for a commit are encouraged to do so.
632
633 notemodify
634 Included in a commit <notes_ref> command to add a new note
635 annotating a <commit-ish> or change this annotation contents.
636 Internally it is similar to filemodify 100644 on <commit-ish> path
637 (maybe split into subdirectories). It’s not advised to use any
638 other commands to write to the <notes_ref> tree except
639 filedeleteall to delete all existing notes in this tree. This
640 command has two different means of specifying the content of the
641 note.
642
643 External data format
644 The data content for the note was already supplied by a prior
645 blob command. The frontend just needs to connect it to the
646 commit that is to be annotated.
647
648 'N' SP <dataref> SP <commit-ish> LF
649
650 Here <dataref> can be either a mark reference (:<idnum>) set by
651 a prior blob command, or a full 40-byte SHA-1 of an existing
652 Git blob object.
653
654 Inline data format
655 The data content for the note has not been supplied yet. The
656 frontend wants to supply it as part of this modify command.
657
658 'N' SP 'inline' SP <commit-ish> LF
659 data
660
661 See below for a detailed description of the data command.
662
663 In both formats <commit-ish> is any of the commit specification
664 expressions also accepted by from (see above).
665
666 mark
667 Arranges for fast-import to save a reference to the current object,
668 allowing the frontend to recall this object at a future point in time,
669 without knowing its SHA-1. Here the current object is the object
670 creation command the mark command appears within. This can be commit,
671 tag, and blob, but commit is the most common usage.
672
673 'mark' SP ':' <idnum> LF
674
675 where <idnum> is the number assigned by the frontend to this mark. The
676 value of <idnum> is expressed as an ASCII decimal integer. The value 0
677 is reserved and cannot be used as a mark. Only values greater than or
678 equal to 1 may be used as marks.
679
680 New marks are created automatically. Existing marks can be moved to
681 another object simply by reusing the same <idnum> in another mark
682 command.
683
684 tag
685 Creates an annotated tag referring to a specific commit. To create
686 lightweight (non-annotated) tags see the reset command below.
687
688 'tag' SP <name> LF
689 'from' SP <commit-ish> LF
690 'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
691 data
692
693 where <name> is the name of the tag to create.
694
695 Tag names are automatically prefixed with refs/tags/ when stored in
696 Git, so importing the CVS branch symbol RELENG-1_0-FINAL would use just
697 RELENG-1_0-FINAL for <name>, and fast-import will write the
698 corresponding ref as refs/tags/RELENG-1_0-FINAL.
699
700 The value of <name> must be a valid refname in Git and therefore may
701 contain forward slashes. As LF is not valid in a Git refname, no
702 quoting or escaping syntax is supported here.
703
704 The from command is the same as in the commit command; see above for
705 details.
706
707 The tagger command uses the same format as committer within commit;
708 again see above for details.
709
710 The data command following tagger must supply the annotated tag message
711 (see below for data command syntax). To import an empty tag message use
712 a 0 length data. Tag messages are free-form and are not interpreted by
713 Git. Currently they must be encoded in UTF-8, as fast-import does not
714 permit other encodings to be specified.
715
716 Signing annotated tags during import from within fast-import is not
717 supported. Trying to include your own PGP/GPG signature is not
718 recommended, as the frontend does not (easily) have access to the
719 complete set of bytes which normally goes into such a signature. If
720 signing is required, create lightweight tags from within fast-import
721 with reset, then create the annotated versions of those tags offline
722 with the standard git tag process.
723
724 reset
725 Creates (or recreates) the named branch, optionally starting from a
726 specific revision. The reset command allows a frontend to issue a new
727 from command for an existing branch, or to create a new branch from an
728 existing commit without creating a new commit.
729
730 'reset' SP <ref> LF
731 ('from' SP <commit-ish> LF)?
732 LF?
733
734 For a detailed description of <ref> and <commit-ish> see above under
735 commit and from.
736
737 The LF after the command is optional (it used to be required).
738
739 The reset command can also be used to create lightweight
740 (non-annotated) tags. For example:
741
742 reset refs/tags/938
743 from :938
744
745 would create the lightweight tag refs/tags/938 referring to whatever
746 commit mark :938 references.
747
748 blob
749 Requests writing one file revision to the packfile. The revision is not
750 connected to any commit; this connection must be formed in a subsequent
751 commit command by referencing the blob through an assigned mark.
752
753 'blob' LF
754 mark?
755 data
756
757 The mark command is optional here as some frontends have chosen to
758 generate the Git SHA-1 for the blob on their own, and feed that
759 directly to commit. This is typically more work than it’s worth
760 however, as marks are inexpensive to store and easy to use.
761
762 data
763 Supplies raw data (for use as blob/file content, commit messages, or
764 annotated tag messages) to fast-import. Data can be supplied using an
765 exact byte count or delimited with a terminating line. Real frontends
766 intended for production-quality conversions should always use the exact
767 byte count format, as it is more robust and performs better. The
768 delimited format is intended primarily for testing fast-import.
769
770 Comment lines appearing within the <raw> part of data commands are
771 always taken to be part of the body of the data and are therefore never
772 ignored by fast-import. This makes it safe to import any file/message
773 content whose lines might start with #.
774
775 Exact byte count format
776 The frontend must specify the number of bytes of data.
777
778 'data' SP <count> LF
779 <raw> LF?
780
781 where <count> is the exact number of bytes appearing within <raw>.
782 The value of <count> is expressed as an ASCII decimal integer. The
783 LF on either side of <raw> is not included in <count> and will not
784 be included in the imported data.
785
786 The LF after <raw> is optional (it used to be required) but
787 recommended. Always including it makes debugging a fast-import
788 stream easier as the next command always starts in column 0 of the
789 next line, even if <raw> did not end with an LF.
790
791 Delimited format
792 A delimiter string is used to mark the end of the data. fast-import
793 will compute the length by searching for the delimiter. This format
794 is primarily useful for testing and is not recommended for real
795 data.
796
797 'data' SP '<<' <delim> LF
798 <raw> LF
799 <delim> LF
800 LF?
801
802 where <delim> is the chosen delimiter string. The string <delim>
803 must not appear on a line by itself within <raw>, as otherwise
804 fast-import will think the data ends earlier than it really does.
805 The LF immediately trailing <raw> is part of <raw>. This is one of
806 the limitations of the delimited format, it is impossible to supply
807 a data chunk which does not have an LF as its last byte.
808
809 The LF after <delim> LF is optional (it used to be required).
810
811 checkpoint
812 Forces fast-import to close the current packfile, start a new one, and
813 to save out all current branch refs, tags and marks.
814
815 'checkpoint' LF
816 LF?
817
818 Note that fast-import automatically switches packfiles when the current
819 packfile reaches --max-pack-size, or 4 GiB, whichever limit is smaller.
820 During an automatic packfile switch fast-import does not update the
821 branch refs, tags or marks.
822
823 As a checkpoint can require a significant amount of CPU time and disk
824 IO (to compute the overall pack SHA-1 checksum, generate the
825 corresponding index file, and update the refs) it can easily take
826 several minutes for a single checkpoint command to complete.
827
828 Frontends may choose to issue checkpoints during extremely large and
829 long running imports, or when they need to allow another Git process
830 access to a branch. However given that a 30 GiB Subversion repository
831 can be loaded into Git through fast-import in about 3 hours, explicit
832 checkpointing may not be necessary.
833
834 The LF after the command is optional (it used to be required).
835
836 progress
837 Causes fast-import to print the entire progress line unmodified to its
838 standard output channel (file descriptor 1) when the command is
839 processed from the input stream. The command otherwise has no impact on
840 the current import, or on any of fast-import’s internal state.
841
842 'progress' SP <any> LF
843 LF?
844
845 The <any> part of the command may contain any sequence of bytes that
846 does not contain LF. The LF after the command is optional. Callers may
847 wish to process the output through a tool such as sed to remove the
848 leading part of the line, for example:
849
850 frontend | git fast-import | sed 's/^progress //'
851
852 Placing a progress command immediately after a checkpoint will inform
853 the reader when the checkpoint has been completed and it can safely
854 access the refs that fast-import updated.
855
856 get-mark
857 Causes fast-import to print the SHA-1 corresponding to a mark to stdout
858 or to the file descriptor previously arranged with the --cat-blob-fd
859 argument. The command otherwise has no impact on the current import;
860 its purpose is to retrieve SHA-1s that later commits might want to
861 refer to in their commit messages.
862
863 'get-mark' SP ':' <idnum> LF
864
865 This command can be used anywhere in the stream that comments are
866 accepted. In particular, the get-mark command can be used in the middle
867 of a commit but not in the middle of a data command.
868
869 See “Responses To Commands” below for details about how to read this
870 output safely.
871
872 cat-blob
873 Causes fast-import to print a blob to a file descriptor previously
874 arranged with the --cat-blob-fd argument. The command otherwise has no
875 impact on the current import; its main purpose is to retrieve blobs
876 that may be in fast-import’s memory but not accessible from the target
877 repository.
878
879 'cat-blob' SP <dataref> LF
880
881 The <dataref> can be either a mark reference (:<idnum>) set previously
882 or a full 40-byte SHA-1 of a Git blob, preexisting or ready to be
883 written.
884
885 Output uses the same format as git cat-file --batch:
886
887 <sha1> SP 'blob' SP <size> LF
888 <contents> LF
889
890 This command can be used anywhere in the stream that comments are
891 accepted. In particular, the cat-blob command can be used in the middle
892 of a commit but not in the middle of a data command.
893
894 See “Responses To Commands” below for details about how to read this
895 output safely.
896
897 ls
898 Prints information about the object at a path to a file descriptor
899 previously arranged with the --cat-blob-fd argument. This allows
900 printing a blob from the active commit (with cat-blob) or copying a
901 blob or tree from a previous commit for use in the current one (with
902 filemodify).
903
904 The ls command can be used anywhere in the stream that comments are
905 accepted, including the middle of a commit.
906
907 Reading from the active commit
908 This form can only be used in the middle of a commit. The path
909 names a directory entry within fast-import’s active commit. The
910 path must be quoted in this case.
911
912 'ls' SP <path> LF
913
914 Reading from a named tree
915 The <dataref> can be a mark reference (:<idnum>) or the full
916 40-byte SHA-1 of a Git tag, commit, or tree object, preexisting or
917 waiting to be written. The path is relative to the top level of the
918 tree named by <dataref>.
919
920 'ls' SP <dataref> SP <path> LF
921
922 See filemodify above for a detailed description of <path>.
923
924 Output uses the same format as git ls-tree <tree> -- <path>:
925
926 <mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF
927
928 The <dataref> represents the blob, tree, or commit object at <path> and
929 can be used in later get-mark, cat-blob, filemodify, or ls commands.
930
931 If there is no file or subtree at that path, git fast-import will
932 instead report
933
934 missing SP <path> LF
935
936 See “Responses To Commands” below for details about how to read this
937 output safely.
938
939 feature
940 Require that fast-import supports the specified feature, or abort if it
941 does not.
942
943 'feature' SP <feature> ('=' <argument>)? LF
944
945 The <feature> part of the command may be any one of the following:
946
947 date-format, export-marks, relative-marks, no-relative-marks, force
948 Act as though the corresponding command-line option with a leading
949 -- was passed on the command line (see OPTIONS, above).
950
951 import-marks, import-marks-if-exists
952 Like --import-marks except in two respects: first, only one
953 "feature import-marks" or "feature import-marks-if-exists" command
954 is allowed per stream; second, an --import-marks= or
955 --import-marks-if-exists command-line option overrides any of these
956 "feature" commands in the stream; third, "feature
957 import-marks-if-exists" like a corresponding command-line option
958 silently skips a nonexistent file.
959
960 get-mark, cat-blob, ls
961 Require that the backend support the get-mark, cat-blob, or ls
962 command respectively. Versions of fast-import not supporting the
963 specified command will exit with a message indicating so. This lets
964 the import error out early with a clear message, rather than
965 wasting time on the early part of an import before the unsupported
966 command is detected.
967
968 notes
969 Require that the backend support the notemodify (N) subcommand to
970 the commit command. Versions of fast-import not supporting notes
971 will exit with a message indicating so.
972
973 done
974 Error out if the stream ends without a done command. Without this
975 feature, errors causing the frontend to end abruptly at a
976 convenient point in the stream can go undetected. This may occur,
977 for example, if an import front end dies in mid-operation without
978 emitting SIGTERM or SIGKILL at its subordinate git fast-import
979 instance.
980
981 option
982 Processes the specified option so that git fast-import behaves in a way
983 that suits the frontend’s needs. Note that options specified by the
984 frontend are overridden by any options the user may specify to git
985 fast-import itself.
986
987 'option' SP <option> LF
988
989 The <option> part of the command may contain any of the options listed
990 in the OPTIONS section that do not change import semantics, without the
991 leading -- and is treated in the same way.
992
993 Option commands must be the first commands on the input (not counting
994 feature commands), to give an option command after any non-option
995 command is an error.
996
997 The following command-line options change import semantics and may
998 therefore not be passed as option:
999
1000 · date-format
1001
1002 · import-marks
1003
1004 · export-marks
1005
1006 · cat-blob-fd
1007
1008 · force
1009
1010 done
1011 If the done feature is not in use, treated as if EOF was read. This can
1012 be used to tell fast-import to finish early.
1013
1014 If the --done command-line option or feature done command is in use,
1015 the done command is mandatory and marks the end of the stream.
1016
1018 New objects written by fast-import are not available immediately. Most
1019 fast-import commands have no visible effect until the next checkpoint
1020 (or completion). The frontend can send commands to fill fast-import’s
1021 input pipe without worrying about how quickly they will take effect,
1022 which improves performance by simplifying scheduling.
1023
1024 For some frontends, though, it is useful to be able to read back data
1025 from the current repository as it is being updated (for example when
1026 the source material describes objects in terms of patches to be applied
1027 to previously imported objects). This can be accomplished by connecting
1028 the frontend and fast-import via bidirectional pipes:
1029
1030 mkfifo fast-import-output
1031 frontend <fast-import-output |
1032 git fast-import >fast-import-output
1033
1034 A frontend set up this way can use progress, get-mark, ls, and cat-blob
1035 commands to read information from the import in progress.
1036
1037 To avoid deadlock, such frontends must completely consume any pending
1038 output from progress, ls, get-mark, and cat-blob before performing
1039 writes to fast-import that might block.
1040
1042 If fast-import is supplied invalid input it will terminate with a
1043 non-zero exit status and create a crash report in the top level of the
1044 Git repository it was importing into. Crash reports contain a snapshot
1045 of the internal fast-import state as well as the most recent commands
1046 that lead up to the crash.
1047
1048 All recent commands (including stream comments, file changes and
1049 progress commands) are shown in the command history within the crash
1050 report, but raw file data and commit messages are excluded from the
1051 crash report. This exclusion saves space within the report file and
1052 reduces the amount of buffering that fast-import must perform during
1053 execution.
1054
1055 After writing a crash report fast-import will close the current
1056 packfile and export the marks table. This allows the frontend developer
1057 to inspect the repository state and resume the import from the point
1058 where it crashed. The modified branches and tags are not updated during
1059 a crash, as the import did not complete successfully. Branch and tag
1060 information can be found in the crash report and must be applied
1061 manually if the update is needed.
1062
1063 An example crash:
1064
1065 $ cat >in <<END_OF_INPUT
1066 # my very first test commit
1067 commit refs/heads/master
1068 committer Shawn O. Pearce <spearce> 19283 -0400
1069 # who is that guy anyway?
1070 data <<EOF
1071 this is my commit
1072 EOF
1073 M 644 inline .gitignore
1074 data <<EOF
1075 .gitignore
1076 EOF
1077 M 777 inline bob
1078 END_OF_INPUT
1079
1080 $ git fast-import <in
1081 fatal: Corrupt mode: M 777 inline bob
1082 fast-import: dumping crash report to .git/fast_import_crash_8434
1083
1084 $ cat .git/fast_import_crash_8434
1085 fast-import crash report:
1086 fast-import process: 8434
1087 parent process : 1391
1088 at Sat Sep 1 00:58:12 2007
1089
1090 fatal: Corrupt mode: M 777 inline bob
1091
1092 Most Recent Commands Before Crash
1093 ---------------------------------
1094 # my very first test commit
1095 commit refs/heads/master
1096 committer Shawn O. Pearce <spearce> 19283 -0400
1097 # who is that guy anyway?
1098 data <<EOF
1099 M 644 inline .gitignore
1100 data <<EOF
1101 * M 777 inline bob
1102
1103 Active Branch LRU
1104 -----------------
1105 active_branches = 1 cur, 5 max
1106
1107 pos clock name
1108 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1109 1) 0 refs/heads/master
1110
1111 Inactive Branches
1112 -----------------
1113 refs/heads/master:
1114 status : active loaded dirty
1115 tip commit : 0000000000000000000000000000000000000000
1116 old tree : 0000000000000000000000000000000000000000
1117 cur tree : 0000000000000000000000000000000000000000
1118 commit clock: 0
1119 last pack :
1120
1121 -------------------
1122 END OF CRASH REPORT
1123
1125 The following tips and tricks have been collected from various users of
1126 fast-import, and are offered here as suggestions.
1127
1128 Use One Mark Per Commit
1129 When doing a repository conversion, use a unique mark per commit (mark
1130 :<n>) and supply the --export-marks option on the command line.
1131 fast-import will dump a file which lists every mark and the Git object
1132 SHA-1 that corresponds to it. If the frontend can tie the marks back to
1133 the source repository, it is easy to verify the accuracy and
1134 completeness of the import by comparing each Git commit to the
1135 corresponding source revision.
1136
1137 Coming from a system such as Perforce or Subversion this should be
1138 quite simple, as the fast-import mark can also be the Perforce
1139 changeset number or the Subversion revision number.
1140
1141 Freely Skip Around Branches
1142 Don’t bother trying to optimize the frontend to stick to one branch at
1143 a time during an import. Although doing so might be slightly faster for
1144 fast-import, it tends to increase the complexity of the frontend code
1145 considerably.
1146
1147 The branch LRU builtin to fast-import tends to behave very well, and
1148 the cost of activating an inactive branch is so low that bouncing
1149 around between branches has virtually no impact on import performance.
1150
1151 Handling Renames
1152 When importing a renamed file or directory, simply delete the old
1153 name(s) and modify the new name(s) during the corresponding commit. Git
1154 performs rename detection after-the-fact, rather than explicitly during
1155 a commit.
1156
1157 Use Tag Fixup Branches
1158 Some other SCM systems let the user create a tag from multiple files
1159 which are not from the same commit/changeset. Or to create tags which
1160 are a subset of the files available in the repository.
1161
1162 Importing these tags as-is in Git is impossible without making at least
1163 one commit which “fixes up” the files to match the content of the tag.
1164 Use fast-import’s reset command to reset a dummy branch outside of your
1165 normal branch space to the base commit for the tag, then commit one or
1166 more file fixup commits, and finally tag the dummy branch.
1167
1168 For example since all normal branches are stored under refs/heads/ name
1169 the tag fixup branch TAG_FIXUP. This way it is impossible for the fixup
1170 branch used by the importer to have namespace conflicts with real
1171 branches imported from the source (the name TAG_FIXUP is not
1172 refs/heads/TAG_FIXUP).
1173
1174 When committing fixups, consider using merge to connect the commit(s)
1175 which are supplying file revisions to the fixup branch. Doing so will
1176 allow tools such as git blame to track through the real commit history
1177 and properly annotate the source files.
1178
1179 After fast-import terminates the frontend will need to do rm
1180 .git/TAG_FIXUP to remove the dummy branch.
1181
1182 Import Now, Repack Later
1183 As soon as fast-import completes the Git repository is completely valid
1184 and ready for use. Typically this takes only a very short time, even
1185 for considerably large projects (100,000+ commits).
1186
1187 However repacking the repository is necessary to improve data locality
1188 and access performance. It can also take hours on extremely large
1189 projects (especially if -f and a large --window parameter is used).
1190 Since repacking is safe to run alongside readers and writers, run the
1191 repack in the background and let it finish when it finishes. There is
1192 no reason to wait to explore your new Git project!
1193
1194 If you choose to wait for the repack, don’t try to run benchmarks or
1195 performance tests until repacking is completed. fast-import outputs
1196 suboptimal packfiles that are simply never seen in real use situations.
1197
1198 Repacking Historical Data
1199 If you are repacking very old imported data (e.g. older than the last
1200 year), consider expending some extra CPU time and supplying --window=50
1201 (or higher) when you run git repack. This will take longer, but will
1202 also produce a smaller packfile. You only need to expend the effort
1203 once, and everyone using your project will benefit from the smaller
1204 repository.
1205
1206 Include Some Progress Messages
1207 Every once in a while have your frontend emit a progress message to
1208 fast-import. The contents of the messages are entirely free-form, so
1209 one suggestion would be to output the current month and year each time
1210 the current commit date moves into the next month. Your users will feel
1211 better knowing how much of the data stream has been processed.
1212
1214 When packing a blob fast-import always attempts to deltify against the
1215 last blob written. Unless specifically arranged for by the frontend,
1216 this will probably not be a prior version of the same file, so the
1217 generated delta will not be the smallest possible. The resulting
1218 packfile will be compressed, but will not be optimal.
1219
1220 Frontends which have efficient access to all revisions of a single file
1221 (for example reading an RCS/CVS ,v file) can choose to supply all
1222 revisions of that file as a sequence of consecutive blob commands. This
1223 allows fast-import to deltify the different file revisions against each
1224 other, saving space in the final packfile. Marks can be used to later
1225 identify individual file revisions during a sequence of commit
1226 commands.
1227
1228 The packfile(s) created by fast-import do not encourage good disk
1229 access patterns. This is caused by fast-import writing the data in the
1230 order it is received on standard input, while Git typically organizes
1231 data within packfiles to make the most recent (current tip) data appear
1232 before historical data. Git also clusters commits together, speeding up
1233 revision traversal through better cache locality.
1234
1235 For this reason it is strongly recommended that users repack the
1236 repository with git repack -a -d after fast-import completes, allowing
1237 Git to reorganize the packfiles for faster data access. If blob deltas
1238 are suboptimal (see above) then also adding the -f option to force
1239 recomputation of all deltas can significantly reduce the final packfile
1240 size (30-50% smaller can be quite typical).
1241
1243 There are a number of factors which affect how much memory fast-import
1244 requires to perform an import. Like critical sections of core Git,
1245 fast-import uses its own memory allocators to amortize any overheads
1246 associated with malloc. In practice fast-import tends to amortize any
1247 malloc overheads to 0, due to its use of large block allocations.
1248
1249 per object
1250 fast-import maintains an in-memory structure for every object written
1251 in this execution. On a 32 bit system the structure is 32 bytes, on a
1252 64 bit system the structure is 40 bytes (due to the larger pointer
1253 sizes). Objects in the table are not deallocated until fast-import
1254 terminates. Importing 2 million objects on a 32 bit system will require
1255 approximately 64 MiB of memory.
1256
1257 The object table is actually a hashtable keyed on the object name (the
1258 unique SHA-1). This storage configuration allows fast-import to reuse
1259 an existing or already written object and avoid writing duplicates to
1260 the output packfile. Duplicate blobs are surprisingly common in an
1261 import, typically due to branch merges in the source.
1262
1263 per mark
1264 Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
1265 bytes, depending on pointer size) per mark. Although the array is
1266 sparse, frontends are still strongly encouraged to use marks between 1
1267 and n, where n is the total number of marks required for this import.
1268
1269 per branch
1270 Branches are classified as active and inactive. The memory usage of the
1271 two classes is significantly different.
1272
1273 Inactive branches are stored in a structure which uses 96 or 120 bytes
1274 (32 bit or 64 bit systems, respectively), plus the length of the branch
1275 name (typically under 200 bytes), per branch. fast-import will easily
1276 handle as many as 10,000 inactive branches in under 2 MiB of memory.
1277
1278 Active branches have the same overhead as inactive branches, but also
1279 contain copies of every tree that has been recently modified on that
1280 branch. If subtree include has not been modified since the branch
1281 became active, its contents will not be loaded into memory, but if
1282 subtree src has been modified by a commit since the branch became
1283 active, then its contents will be loaded in memory.
1284
1285 As active branches store metadata about the files contained on that
1286 branch, their in-memory storage size can grow to a considerable size
1287 (see below).
1288
1289 fast-import automatically moves active branches to inactive status
1290 based on a simple least-recently-used algorithm. The LRU chain is
1291 updated on each commit command. The maximum number of active branches
1292 can be increased or decreased on the command line with
1293 --active-branches=.
1294
1295 per active tree
1296 Trees (aka directories) use just 12 bytes of memory on top of the
1297 memory required for their entries (see “per active file” below). The
1298 cost of a tree is virtually 0, as its overhead amortizes out over the
1299 individual file entries.
1300
1301 per active file entry
1302 Files (and pointers to subtrees) within active trees require 52 or 64
1303 bytes (32/64 bit platforms) per entry. To conserve space, file and tree
1304 names are pooled in a common string table, allowing the filename
1305 “Makefile” to use just 16 bytes (after including the string header
1306 overhead) no matter how many times it occurs within the project.
1307
1308 The active branch LRU, when coupled with the filename string pool and
1309 lazy loading of subtrees, allows fast-import to efficiently import
1310 projects with 2,000+ branches and 45,114+ files in a very limited
1311 memory footprint (less than 2.7 MiB per active branch).
1312
1314 Sending SIGUSR1 to the git fast-import process ends the current
1315 packfile early, simulating a checkpoint command. The impatient operator
1316 can use this facility to peek at the objects and refs from an import in
1317 progress, at the cost of some added running time and worse compression.
1318
1320 git-fast-export(1)
1321
1323 Part of the git(1) suite
1324
1325
1326
1327Git 2.20.1 12/15/2018 GIT-FAST-IMPORT(1)