1REPOSURGEON(1) Development Tools REPOSURGEON(1)
2
3
4
6 reposurgeon - surgical operations on repositories
7
9 reposurgeon [command...]
10
12 The purpose of reposurgeon is to enable risky operations that VCSes
13 (version-control systems) don't want to let you do, such as (a) editing
14 past comments and metadata, (b) excising commits, (c) coalescing and
15 splitting commits, (d) removing files and subtrees from repo history,
16 (e) merging or grafting two or more repos, and (f) cutting a repo in
17 two by cutting a parent-child link, preserving the branch structure of
18 both child repos.
19
20 A major use of reposurgeon is to assist a human operator to perform
21 higher-quality conversions among version control systems than can be
22 achieved with fully automated converters.
23
24 The original motivation for reposurgeon was to clean up artifacts
25 created by repository conversions. It was foreseen that the tool would
26 also have applications when code needs to be removed from repositories
27 for legal or policy reasons.
28
29 To keep reposurgeon simple and flexible, it normally does not do its
30 own repository reading and writing. Instead, it relies on being able to
31 parse and emit the command streams created by git-fast-export and read
32 by git-fast-import. This means that it can be used on any
33 version-control system that has both fast-export and fast-import
34 utilities. The git-import stream format also implicitly defines a
35 common language of primitive operations for reposurgeon to speak.
36
37 Fully supported systems (those for which reposurgeon can both read and
38 write repositories) include git, hg, bzr, svn, darcs, bk, RCS, and SRC.
39 For a complete list, with dependencies and technical notes, type prefer
40 to the reposurgeon prompt.
41
42 Writing to the file-oriented systems RCS and SRC is done via rcs-fast-
43 import(1) and has some serious limitations because those systems cannot
44 represent all the metadata in a git-fast-export stream. Consult that
45 tool's documentation for details and partial workarounds.
46
47 Writing Subversion repositories also has some significant limitations,
48 discussed in the section on Working With Subversion.
49
50 Fossil repository files can be read in using the --format=fossil option
51 of the read command and written out with the --format=fossil option of
52 the write. Ignore patterns are not translated in either direction.
53
54 CVS is supported for read only, not write. For CVS, reposurgeon must be
55 run from within a repository directory (one with a CVSROOT
56 subdirectory). Each module becomes a subdirectory in the the
57 reposurgeon representation of the change history.
58
59 In order to deal with version-control systems that do not have
60 fast-export equivalents, reposurgeon can also host extractor code that
61 reads repositories directly. For each version-control system supported
62 through an extractor, reposurgeon uses a small amount of knowledge
63 about the system's command-line tools to (in effect) replay repository
64 history into an input stream internally. Repositories under systems
65 supported through extractors can be read by reposurgeon, but not
66 modified by it. In particular, reposurgeon can be used to move a
67 repository history from any VCS supported by an extractor to any VCS
68 supported by a normal importer/exporter pair.
69
70 Mercurial repository reading is implemented with an extractor class;
71 writing is handled with the stock "hg fastimport" command. A test
72 extractor exists for git, but is normally disabled in favor of the
73 regular exporter.
74
75 For guidance on the pragmatics of repository conversion, see the DVCS
76 Migration HOWTO[1].
77
79 reposurgeon is a sharp enough tool to cut you. It takes care not to
80 ever write a repository in an actually inconsistent state, and will
81 terminate with an error message rather than proceed when its internal
82 data structures are confused. However, there are lots of things you can
83 do with it - like altering stored commit timestamps so they no longer
84 match the commit sequence - that are likely to cause havoc after you're
85 done. Proceed with caution and check your work.
86
87 Also note that, if your DVCS does the usual thing of making commit IDs
88 a cryptographic hash of content and parent links, editing a
89 publicly-accessible repository with this tool would be a bad idea. All
90 of the surgical operations in reposurgeon will modify the hash chains.
91
92 Please also see the notes on system-specific issues under the section
93 called “LIMITATIONS AND GUARANTEES”.
94
96 The program can be run in one of two modes, either as an interactive
97 command interpreter or in batch mode to execute commands given as
98 arguments on the reposurgeon invocation line. The only differences
99 between these modes are (1) the interactive one begins by turning on
100 the 'verbose 1' option, (2) in batch mode all errors (including
101 normally recoverable errors in selection-set syntax) are fatal, and (3)
102 each command-line argument beginning with “--” has that stripped off
103 (which, in particular means that --help and --version will work as
104 expected). Also, in interactive mode, Ctrl-P and Ctrl-N will be
105 available to scroll through your command history and tab completion of
106 both command keywords and name arguments (wherever that makes semantic
107 sense) is available.
108
109 A git-fast-import stream consists of a sequence of commands which must
110 be executed in the specified sequence to build the repo; to avoid
111 confusion with reposurgeon commands we will refer to the stream
112 commands as events in this documentation. These events are implicitly
113 numbered from 1 upwards. Most commands require specifying a selection
114 of event sequence numbers so reposurgeon will know which events to
115 modify or delete.
116
117 For all the details of event types and semantics, see the git-fast-
118 import(1) manual page; the rest of this paragraph is a quick start for
119 the impatient. Most events in a stream are commits describing revision
120 states of the repository; these group together under a single change
121 comment one or more fileops (file operations), which usually point to
122 blobs that are revision states of individual files. A fileop may also
123 be a delete operation indicating that a specified previously-existing
124 file was deleted as part of the version commit; there are a couple of
125 other special fileop types of lesser importance.
126
127 Commands to reposurgeon consist of a command keyword, sometimes
128 preceded by a selection set, sometimes followed by whitespace-separated
129 arguments. It is often possible to omit the selection-set argument and
130 have it default to something reasonable.
131
132 Here are some motivating examples. The commands will be explained in
133 more detail after the description of selection syntax.
134
135 :15 edit ;; edit the object associated with mark :15
136
137 edit ;; edit all editable objects
138
139 29..71 list ;; list summary index of events 29..71
140
141 236..$ list ;; List events from 236 to the last
142
143 <#523> inspect ;; Look for commit #523; they are numbered
144 ;; 1-origin from the beginning of the repository.
145
146 <2317> inspect ;; Look for a tag with the name 2317, a tip commit
147 ;; of a branch named 2317, or a commit with legacy ID
148 ;; 2317. Inspect what is found. A plain number is
149 ;; probably a legacy ID inherited from a Subversion
150 ;; revision number.
151
152 /regression/ list ;; list all commits and tags with comments or
153 ;; committer headers or author headers containing
154 ;; the string "regression"
155
156 1..:97 & =T delete ;; delete tags from event 1 to mark 97
157
158 [Makefile] inspect ;; Inspect all commits with a file op touching Makefile
159 ;; and all blobs referred to in a fileop
160 ;; touching Makefile.
161
162 :46 tip ;; Display the branch tip that owns commit :46.
163
164 @dsc(:55) list ;; Display all commits with ancestry tracing to :55
165
166 @min([.gitignore]) remove .gitignore delete
167 ;; Remove the first .gitignore fileop in the repo.
168
169 SELECTION SYNTAX
170 A selection set is ordered; that is, any given element may occur only
171 one, and the set is ordered by when its members were first added.
172
173 The selection-set specification syntax is an expression-oriented
174 minilanguage. The most basic term in this language is a location. The
175 following sorts of primitive locations are supported:
176
177 event numbers
178 A plain numeric literal is interpreted as a 1-origin event-sequence
179 number.
180
181 marks
182 A numeric literal preceded by a colon is interpreted as a mark; see
183 the import stream format documentation for explanation of the
184 semantics of marks.
185
186 tag and branch names
187 The basename of a branch (including branches in the refs/tags
188 namespace) refers to its tip commit. The name of a tag is
189 equivalent to its mark (that of the tag itself, not the commit it
190 refers to). Tag and branch locations are bracketed with < > (angle
191 brackets) to distinguish them from command keywords.
192
193 legacy IDs
194 If the contents of name brackets (< >) does not match a tag or
195 branch name, the interpreter next searches legacy IDs of commits.
196 This is especially useful when you have imported a Subversion dump;
197 it means that commits made from it can be referred to by their
198 corresponding Subversion revision numbers.
199
200 commit numbers
201 A numeric literal within name brackets (< >) preceded by # is
202 interpreted as a 1-origin commit-sequence number.
203
204 reset@ names
205 A name with the prefix 'reset@' refers to the latest reset with a
206 basename matching the part after the @. Usually there is only one
207 such reset.
208
209 $
210 Refers to the last event.
211
212 These may be grouped into sets in the following ways:
213
214 ranges
215 A range is two locations separated by "..", and is the set of
216 events beginning at the left-hand location and ending at the
217 right-hand location (inclusive).
218
219 lists
220 Comma-separated lists of locations and ranges are accepted, with
221 the obvious meaning.
222
223 There are some other ways to construct event sets:
224
225 visibility sets
226 A visibility set is an expression specifying a set of event types.
227 It will consist of a leading equal sign, followed by type letters.
228 These are the type letters:
229
230 ┌──┬─────────────────────┬───────────────────────┐
231 │B │ blobs │ Most default │
232 │ │ │ selection sets │
233 │ │ │ exclude blobs; they │
234 │ │ │ have to be │
235 │ │ │ manipulated through │
236 │ │ │ the commits they │
237 │ │ │ are attached to. │
238 ├──┼─────────────────────┼───────────────────────┤
239 │C │ commits │ │
240 ├──┼─────────────────────┼───────────────────────┤
241 │D │ all-delete commits │ These are artifacts │
242 │ │ │ produced by some │
243 │ │ │ older │
244 │ │ │ repository-conversion │
245 │ │ │ tools. │
246 ├──┼─────────────────────┼───────────────────────┤
247 │H │ head (branch tip) │ │
248 │ │ commits │ │
249 ├──┼─────────────────────┼───────────────────────┤
250 │O │ orphaned │ │
251 │ │ (parentless) │ │
252 │ │ commits │ │
253 ├──┼─────────────────────┼───────────────────────┤
254 │U │ commits with │ │
255 │ │ callouts as parents │ │
256 ├──┼─────────────────────┼───────────────────────┤
257 │Z │ commits with no │ │
258 │ │ fileops │ │
259 ├──┼─────────────────────┼───────────────────────┤
260 │M │ merge │ │
261 │ │ (multi-parent) │ │
262 │ │ commits │ │
263 ├──┼─────────────────────┼───────────────────────┤
264 │F │ fork (multi-child) │ │
265 │ │ commits │ │
266 ├──┼─────────────────────┼───────────────────────┤
267 │L │ commits with │ │
268 │ │ unclean multi-line │ │
269 │ │ comments (without a │ │
270 │ │ separating empty │ │
271 │ │ line after the │ │
272 │ │ first) │ │
273 ├──┼─────────────────────┼───────────────────────┤
274 │I │ commits for which │ │
275 │ │ metadata cannot be │ │
276 │ │ decoded to UTF-8 │ │
277 ├──┼─────────────────────┼───────────────────────┤
278 │T │ tags │ │
279 ├──┼─────────────────────┼───────────────────────┤
280 │R │ resets │ │
281 ├──┼─────────────────────┼───────────────────────┤
282 │P │ Passthrough │ All event types │
283 │ │ │ simply passed │
284 │ │ │ through, including │
285 │ │ │ comments, progress │
286 │ │ │ commands, and │
287 │ │ │ checkpoint commands. │
288 ├──┼─────────────────────┼───────────────────────┤
289 │N │ Legacy IDs │ Any string matching a │
290 │ │ │ cookie (legacy-ID) │
291 │ │ │ format. │
292 └──┴─────────────────────┴───────────────────────┘
293
294 references
295 A reference name (bracketed by angle brackets) resolves to a single
296 object, either a commit or tag.
297
298 ┌──────────────┬────────────────────────────┐
299 │ type │ interpretation │
300 ├──────────────┼────────────────────────────┤
301 │ tag name │ annotated tag with that │
302 │ │ name │
303 ├──────────────┼────────────────────────────┤
304 │ branch name │ the branch tip commit │
305 ├──────────────┼────────────────────────────┤
306 │ legacy ID │ commit with that legacy ID │
307 ├──────────────┼────────────────────────────┤
308 │assigned name │ name equated to a │
309 │ │ selection by assign │
310 └──────────────┴────────────────────────────┘
311 Note that if an annotated tag and a branch have the same name foo,
312 <foo> will resolve to the tag rather than the branch tip commit.
313
314 dates and action stamps
315 A date or action stamp in angle brackets resolves to a selection
316 set of all matching commits.
317
318 ┌───────────────────────────┬───────────────────────────┐
319 │ type │ interpretation │
320 ├───────────────────────────┼───────────────────────────┤
321 │ RFC3339 timestamp │ commit or tag with that │
322 │ │ time/date │
323 ├───────────────────────────┼───────────────────────────┤
324 │ action stamp │ commits or tags with that │
325 │ (timestamp!email) │ timestamp and author (or │
326 │ │ committer if no author). │
327 ├───────────────────────────┼───────────────────────────┤
328 │yyyy-mm-dd part of RFC3339 │ all commits and tags with │
329 │timestamp │ that date │
330 └───────────────────────────┴───────────────────────────┘
331 To refine the match to a single commit, use a 1-origin index suffix
332 separated by '#'. Thus "<2000-02-06T09:35:10Z>" can match multiple
333 commits, but "<2000-02-06T09:35:10Z#2>" matches only the second in
334 the set.
335
336 text search
337 A text search expression is a Python regular expression surrounded
338 by forward slashes (to embed a forward slash in it, use a Python
339 string escape such as \x2f).
340
341 A text search normally matches against the comment fields of
342 commits and annotated tags, or against their author/committer
343 names, or against the names of tags; also the text of passthrough
344 objects.
345
346 The scope of a text search can be changed with qualifier letters
347 after the trailing slash. These are as follows:
348
349 ┌───────┬───────────────────────────┐
350 │letter │ interpretation │
351 ├───────┼───────────────────────────┤
352 │ a │ author name in commit │
353 ├───────┼───────────────────────────┤
354 │ b │ branch name in commit; │
355 │ │ also matches blobs │
356 │ │ referenced by commits on │
357 │ │ matching branches, and │
358 │ │ tags which point to │
359 │ │ commmits on patching │
360 │ │ branches. │
361 ├───────┼───────────────────────────┤
362 │ c │ comment text of commit or │
363 │ │ tag │
364 ├───────┼───────────────────────────┤
365 │ r │ committish reference in │
366 │ │ tag or reset │
367 ├───────┼───────────────────────────┤
368 │ p │ text in passthrough │
369 ├───────┼───────────────────────────┤
370 │ t │ tagger in tag │
371 ├───────┼───────────────────────────┤
372 │ n │ name of tag │
373 ├───────┼───────────────────────────┤
374 │ B │ blob content │
375 └───────┴───────────────────────────┘
376 Multiple qualifier letters can add more search scopes.
377
378 (The “b” qualifier replaces the branchset syntax in earlier
379 versions of reposurgeon.)
380
381 paths
382 A "path expression" enclosed in square brackets resolves to the set
383 of all commits and blobs related to a path matching the given
384 expression. The path expression itself is either a path literal or
385 a regular expression surrounded by slashes. Immediately after the
386 trailing / of a path regexp you can put any number of the following
387 characters which act as flags: 'a', 'c', 'D', "M', 'R', 'C', 'N'.
388
389 By default, a path is related to a commit if the latter has a
390 fileop that touches that file path - modifies that change it,
391 deletes that remove it, renames and copies that have it as a source
392 or target. When the 'c' flag is in use the meaning changes: the
393 paths related to a commit become all paths that would be present in
394 a checkout for that commit.
395
396 A path literal matches a commit if and only if the path literal is
397 exactly one of the paths related to the commit (no prefix or suffix
398 operation is done). In particular a path literal won't match if it
399 corresponds to a directory in the chosen repository.
400
401 A regular expression matches a commit if it matches any path
402 related to the commit anywhere in the path. You can use '^' or '$'
403 if you want the expression to only match at the beginning or end of
404 paths. When the 'a' flag is in use, the path expression selects
405 commits whose every path matches the regular expression. This is
406 not always a subset of commits selected without the 'a' flag
407 because it also selects commits with no related paths (e.g. empty
408 commits, deletealls and commits with empty trees). If you want to
409 avoid those, you can use e.g. '[/regex/] & [/regex/a]'.
410
411 The flags 'D', "M', 'R', 'C', 'N' restrict match checking to the
412 corresponding fileop types. Note that this means an 'a' match is
413 easier (not harder) to achieve. These are no-ops when used with
414 'c'.
415
416 A path or literal matches a blob if it matches any path that
417 appeared in a modification fileop that referred to that blob. To
418 select purely matching blobs or matching commits, compose a path
419 expression with =B or =C.
420
421 If you need to embed '[^/]' into your regular expression (e.g. to
422 express "all characters but a slash") you can use a Python string
423 escape such as \x2f.
424
425 function calls
426 The expression language has named special functions. The sequence
427 for a named function is “@” followed by a function name, followed
428 by an argument in parentheses. Presently the following functions
429 are defined:
430
431 ┌─────┬────────────────────────────┐
432 │name │ interpretation │
433 ├─────┼────────────────────────────┤
434 │min │ minimum member of a │
435 │ │ selection set │
436 ├─────┼────────────────────────────┤
437 │max │ maximum member of a │
438 │ │ selection set │
439 ├─────┼────────────────────────────┤
440 │amp │ nonempty selection set │
441 │ │ becomes all objects, empty │
442 │ │ set is returned empty │
443 ├─────┼────────────────────────────┤
444 │par │ all parents of commits in │
445 │ │ the argument set │
446 ├─────┼────────────────────────────┤
447 │chn │ all children of commits in │
448 │ │ the argument set │
449 ├─────┼────────────────────────────┤
450 │dsc │ all commits descended from │
451 │ │ the argument set (argument │
452 │ │ set included) │
453 ├─────┼────────────────────────────┤
454 │anc │ all commits whom the │
455 │ │ argument set is descended │
456 │ │ from (argument set │
457 │ │ included) │
458 ├─────┼────────────────────────────┤
459 │pre │ events before the argument │
460 │ │ set; empty if the argument │
461 │ │ set includes the first │
462 │ │ event. │
463 ├─────┼────────────────────────────┤
464 │suc │ events after the argument │
465 │ │ set; empty if the argument │
466 │ │ set includes the last │
467 │ │ event. │
468 ├─────┼────────────────────────────┤
469 │srt │ sort the argument set by │
470 │ │ event number. │
471 └─────┴────────────────────────────┘
472
473 Set expressions may be combined with the operators | and &; these are,
474 respectively, set union and intersection. The | has lower precedence
475 than intersection, but you may use parentheses '(' and ')' to group
476 expressions in case there is ambiguity (this replaces the curly
477 brackets used in older versions of the syntax).
478
479 Any set operation may be followed by '?' to add the set members'
480 neighbors and referents. This extends the set to include the parents
481 and children of all commits in the set, and the referents of any tags
482 and resets in the set. Each blob reference in the set is replaced by
483 all commit events that refer to it. The '?' can be repeated to extend
484 the neighborhood depth. The result of a '?' extension is sorted so the
485 result is in ascending order.
486
487 Do set negation with prefix ~; it has higher precedence than & and |
488 but lower than ?
489
490 IMPORT AND EXPORT
491 reposurgeon can hold multiple repository states in core. Each has a
492 name. At any given time, one may be selected for editing. Commands in
493 this group import repositories, export them, and manipulate the in-core
494 list and the selection.
495
496 read [--format=fossil] [directory|-|<infile]
497 With a directory-name argument, this command attempts to read in
498 the contents of a repository in any supported version-control
499 system under that directory; read with no arguments does this in
500 the current directory. If output is redirected to a plain file, it
501 will be read in as a fast-import stream or Subversion dumpfile.
502 With an argument of “-”, this command reads a fast-import stream or
503 Subversion dumpfile from standard input (this will be useful in
504 filters constructed with command-line arguments).
505
506 If the contents is a fast-import stream, any "cvs-revision"
507 property on a commit is taken to be a newline-separated list of CVS
508 revision cookies pointing to the commit, and used for reference
509 lifting.
510
511 If the contents is a fast-import stream, any "legacy-id" property
512 on a commit is taken to be a legacy ID token pointing to the
513 commit, and used for reference-lifting.
514
515 If the read location is a git repository and contains a
516 .git/cvsauthors file (such as is left in place by git cvsimport -A)
517 that file will be read in as if it had been given to the authors
518 read command.
519
520 If the read location is a directory, and its repository
521 subdirectory has a file named legacy-map, that file will be read as
522 though passed to a legacy read command.
523
524 If the read location is a file and the --format=fossil is used, the
525 file is interpreted as a Fossil repository.
526
527 The --preserve is interpreted in a way dependent of the type of the
528 incoming repository or stream. Presently it only affects the
529 processing of Subversion repositories; see the section called
530 “WORKING WITH SUBVERSION” for details.
531
532 The just-read-in repo is added to the list of loaded repositories
533 and becomes the current one, selected for surgery. If it was read
534 from a plain file and the file name ends with one of the extensions
535 .fi or .svn, that extension is removed from the load list name.
536
537 Note: this command does not take a selection set.
538
539 write [--legacy] [--format=fossil] [--noincremental] [--callout]
540 [>outfile|-]
541 Dump selected events as a fast-import stream representing the
542 edited repository; the default selection set is all events. Where
543 to dump to is standard output if there is no argument or the
544 argument is '-', or the target of an output redirect.
545
546 Alternatively, if there is no redirect and the argument names a
547 directory, the repository is rebuilt into that directory, with any
548 selection set being ignored; if that target directory is nonempty
549 its contents are backed up to a save directory.
550
551 If the write location is a file and the --format=fossil is used,
552 the file is written in Fossil repository format.
553
554 With the --legacy option, the Legacy-ID of each commit is appended
555 to its commit comment at write time. This option is mainly useful
556 for debugging conversion edge cases.
557
558 If you specify a partial selection set such that some commits are
559 included but their parents are not, the output will include
560 incremental dump cookies for each branch with an origin outside the
561 selection set, just before the first reference to that branch in a
562 commit. An incremental dump cookie looks like "refs/heads/foo^0"
563 and is a clue to export-stream loaders that the branch should be
564 glued to the tip of a pre-existing branch of the same name. The
565 --noincremental option suppresses this behavior.
566
567 When you specify a partial selection set, including a commit object
568 forces the inclusion of every blob to which it refers and every tag
569 that refers to it.
570
571 Specifying a partial selection may cause a situation in which some
572 parent marks in merges don't correspond to commits present in the
573 dump. When this happens and --callout option was specified, the
574 write code replaces the merge mark with a callout, the action stamp
575 of the parent commit; otherwise the parent mark is omitted.
576 Importers will fail when reading a stream dump with callouts; it is
577 intended to be used by the graft command.
578
579 Specifying a write selection set with gaps in it is allowed but
580 unlikely to lead to good results if it is loaded by an importer.
581
582 Property extensions will be be omitted from the output if the
583 importer for the preferred repository type cannot digest them.
584
585 Note: to examine small groups of commits without the progress
586 meter, use inspect.
587
588 choose [reponame]
589 Choose a named repo on which to operate. The name of a repo is
590 normally the basename of the directory or file it was loaded from,
591 but repos loaded from standard input are "unnamed". reposurgeon
592 will add a disambiguating suffix if there have been multiple reads
593 from the same source.
594
595 With no argument, lists the names of the currently stored
596 repositories and their load times. The second column is '*' for the
597 currently selected repository, '-' for others.
598
599 drop [reponame]
600 Drop a repo named by the argument from reposurgeon's list, freeing
601 the memory used for its metadata and deleting on-disk blobs. With
602 no argument, drops the currently chosen repo.
603
604 rename reponame
605 Rename the currently chosen repo; requires an argument. Won't do it
606 if there is already one by the new name.
607
608 REBUILDS IN PLACE
609 reposurgeon can rebuild an altered repository in place. Untracked files
610 are normally saved and restored when the contents of the new repository
611 is checked out (but see the documentation of the “preserve” command for
612 a caveat).
613
614 rebuild [directory]
615 Rebuild a repository from the state held by reposurgeon. This
616 command does not take a selection set.
617
618 The single argument, if present, specifies the target directory in
619 which to do the rebuild; if the repository read was from a repo
620 directory (and not a git-import stream), it defaults to that
621 directory. If the target directory is nonempty its contents are
622 backed up to a save directory. Files and directories on the
623 repository's preserve list are copied back from the backup
624 directory after repo rebuild. The default preserve list depends on
625 the repository type, and can be displayed with the stats command.
626
627 If reposurgeon has a nonempty legacy map, it will be written to a
628 file named legacy-map in the repository subdirectory as though by a
629 legacy write command. (This will normally be the case for
630 Subversion and CVS conversions.)
631
632 preserve [file...]
633 Add (presumably untracked) files or directories to the repo's list
634 of paths to be restored from the backup directory after a rebuild.
635 Each argument, if any, is interpreted as a pathname. The current
636 preserve list is displayed afterwards.
637
638 It is only necessary to use this feature if your version-control
639 system lacks a command to list files under version control. Under
640 systems with such a command (which include git and hg), all files
641 that are neither beneath the repository dot directory nor under
642 reposurgeon temporary directories are preserved automatically.
643
644 unpreserve [file...]
645 Remove (presumably untracked) files or directories to the repo's
646 list of paths to be restored from the backup directory after a
647 rebuild. Each argument, if any, is interpreted as a pathname. The
648 current preserve list is displayed afterwards.
649
650 TIMEQUAKES AND TIMEBUMPS
651 Modifying a repository so every commit in it has a unique timestamp is
652 often a useful thing to do, in order for every commit has a unique
653 action stamp that can be referred to in surgical commands.
654
655 timequake
656 Attempt to hack committer and author time stamps in the selection
657 set (defaulting to all commits in the repository) to be unique.
658 Works by identifying collisions between parent and child, than
659 incrementing child timestamps so they no longer coincide. Won't
660 touch commits with multiple parents.
661
662 Because commits are checked in ascending order, this logic will
663 normally do the right thing on chains of three or more commits with
664 identical timestamps.
665
666 Any timestamp collisions left after this operation are probably
667 cross-branch and have to be individually dealt with using
668 'timebump' commands.
669
670 timebump [seconds]
671 Bump the committer and author timestamps of commits in the
672 selection set (defaulting to empty) by one second. With following
673 integer argument, that many seconds. Argument may be negative.
674
675 Those of you twitchy about "rewriting history" should bear in mind that
676 the commit stamps in many older repositories were never very reliable
677 to begin with.
678
679 CVS in particular is notorious for shipping client-side timestamps with
680 timezone and DST issues (as opposed to UTC) that don't necessary
681 compare well with stamps from different clients of the same CVS server.
682 Thus, inducing a timequake in a CVS repo seldom produces effects
683 anywhere near as large than the measurement noise of the repository's
684 own timestamps.
685
686 Subversion was somewhat better about this, as commits were stamped at
687 the server, but older Subversion repositories often have sections that
688 predate the era of ubiquitous NTP time.
689
690 INFORMATION AND REPORTS
691 Commands in this group report information about the selected
692 repository.
693
694 The output of these commands can individually be redirected to a named
695 output file. Where indicated in the syntax, you can prefix the output
696 filename with “>” and give it as a following argument. If you use “>>”
697 the file is opened for append rather than write.
698
699 list [>outfile]
700 This is the main command for identifying the events you want to
701 modify. It lists commits in the selection set by event sequence
702 number with summary information. The first column is raw event
703 numbers, the second a timestamp in local time. If the repository
704 has legacy IDs, they will be displayed in the third column. The
705 leading portion of the comment follows.
706
707 stamp [>outfile]
708 Alternative form of listing that displays full action stamps,
709 usable as references in selections. Supports > redirection.
710
711 tip [>outfile]
712 Display the branch tip names associated with commits in the
713 selection set. These will not necessarily be the same as their
714 branch fields (which will often be tag names if the repo contains
715 either annotated or lightweight tags).
716
717 If a commit is at a branch tip, its tip is its branch name. If it
718 has only one child, its tip is the child's tip. If it has multiple
719 children, then if there is a child with a matching branch name its
720 tip is the child's tip. Otherwise this function throws a
721 recoverable error.
722
723 tags [>outfile]
724 Display tags and resets: three fields, an event number and a type
725 and a name. Branch tip commits associated with tags are also
726 displayed with the type field 'commit'. Supports > redirection.
727
728 stats [repo-name...] [>outfile]
729 Report size statistics and import/export method information about
730 named repositories, or with no argument the currently chosen
731 repository.
732
733 count [>outfile]
734 Report a count of items in the selection set. Default set is
735 everything in the currently-selected repo. Supports > redirection.
736
737 inspect [>outfile]
738 Dump a fast-import stream representing selected events to standard
739 output. Just like a write, except (1) the progress meter is
740 disabled, and (2) there is an identifying header before each event
741 dump.
742
743 graph [>outfile]
744 Emit a visualization of the commit graph in the DOT markup language
745 used by the graphviz tool suite. This can be fed as input to the
746 main graphviz rendering program dot(1), which will yield a viewable
747 image. Supports > redirection.
748
749 You may find a script like this useful:
750
751 graph $1 >/tmp/foo$$
752 shell dot </tmp/foo$$ -Tpng | display -; rm /tmp/foo$$
753
754 You can substitute in your own preferred image viewer, of course.
755
756 sizes [>outfile]
757 Print a report on data volume per branch; takes a selection set,
758 defaulting to all events. The numbers tally the size of
759 uncompressed blobs, commit and tag comments, and other metadata
760 strings (a blob is counted each time a commit points at it).
761
762 The numbers are not an exact measure of storage size: they are
763 intended mainly as a way to get information on how to efficiently
764 partition a repository that has become large enough to be unwieldy.
765
766 Supports > redirection.
767
768 lint [>outfile]
769 Look for DAG and metadata configurations that may indicate a
770 problem. Presently checks for: (1) Mid-branch deletes, (2)
771 disconnected commits, (3) parentless commits, (4) the existence of
772 multiple roots, (5) committer and author IDs that don't look
773 well-formed as DVCS IDs, (6) multiple child links with identical
774 branch labels descending from the same commit, (7) time and
775 action-stamp collisions.
776
777 Options to issue only partial reports are supported; "lint
778 --options" or "lint -?" lists them.
779
780 The options and output format of this command are unstable; they
781 may change without notice as more sanity checks are added.
782
783 when >timespec
784 Interconvert between git timestamps (integer Unix time plus TZ) and
785 RFC3339 format. Takes one argument, autodetects the format. Useful
786 when eyeballing export streams. Also accepts any other supported
787 date format and converts to RFC3339.
788
789 SURGICAL OPERATIONS
790 These are the operations the rest of reposurgeon is designed to
791 support.
792
793 squash [policy...]
794 Combine or delete commits in a selection set of events. The default
795 selection set for this command is empty. Has no effect on events
796 other than commits unless the --delete policy is selected; see the
797 'delete' command for discussion.
798
799 Normally, when a commit is squashed, its file operation list (and
800 any associated blob references) gets either prepended to the
801 beginning of the operation list of each of the commit's children or
802 appended to the operation list of each of the commit's parents.
803 Then children of a deleted commit get it removed from their parent
804 set and its parents added to their parent set.
805
806 The analogous operation is performed on commit comments, so no
807 comment text is ever outright discarded. Exception: comments
808 consisting of "*** empty log messages ***", as generated by CVS,
809 are ignored.
810
811 The default is to squash forward, modifying children; but see the
812 list of policy modifiers below for how to change this.
813
814 Warning
815 It is easy to get the bounds of a squash command wrong, with
816 confusing and destructive results. Beware thinking you can
817 squash on a selection set to merge all commits except the last
818 one into the last one; what you will actually do is to merge
819 all of them to the first commit after the selected set.
820 Normally, any tag pointing to a combined commit will also be pushed
821 forward. But see the list of policy modifiers below for how to
822 change this.
823
824 Following all operation moves, every one of the altered file
825 operation lists is reduced to a shortest normalized form. The
826 normalized form detects various combinations of modification,
827 deletion, and renaming and simplifies the operation sequence as
828 much as it can without losing any information.
829
830 After canonicalization, a file op list may still end up containing
831 multiple M operations on the same file. Normally the tool utters a
832 warning when this occurs but does not try to resolve it.
833
834 The following modifiers change these policies:
835
836 --delete
837 Simply discards all file ops and tags associated with deleted
838 commit(s).
839
840 --coalesce
841 Discard all M operations (and associated blobs) except the
842 last.
843
844 --pushback
845 Append fileops to parents, rather than prepending to children.
846
847 --pushforward
848 Prepend fileops to children. This is the default; it can be
849 specified in a lift script for explicitness about intentions.
850
851 --tagforward
852 With the "tagforward" modifier, any tag on the deleted commit
853 is pushed forward to the first child rather than being deleted.
854 This is the default; it can be specified for explicitness.
855
856 --tagback
857 With the "--tagback" modifier, any tag on the deleted commit is
858 pushed backward to the first parent rather than being deleted.
859
860 --quiet
861 Suppresses warning messages about deletion of commits with
862 non-delete fileops.
863
864 --complain
865 The opposite of quiet. Can be specified for explicitness.
866
867 --empty-only
868 Complain if a squash operation modifies a nonempty comment.
869
870 Under any of these policies except “--delete”, deleting a commit
871 that has children does not back out the changes made by that
872 commit, as they will still be present in the blobs attached to
873 versions past the end of the deletion set. All a delete does when
874 the commit has children is lose the metadata information about when
875 and by who those changes were actually made; after the delete any
876 such changes will be attributed to the first undeleted children of
877 the deleted commits. It is expected that this command will be
878 useful mainly for removing commits mechanically generated by
879 repository converters such as cvs2svn.
880
881 delete [policy...]
882 Delete a selection set of events. The default selection set for
883 this command is empty. On a set of commits, this is equivalent to a
884 squash with the --delete flag. It unconditionally deletes tags,
885 resets, and passthroughs; blobs can be removed only as a side
886 effect of deleting every commit that points at them.
887
888 divide parent [child]
889 Attempt to partition a repo by cutting the parent-child link
890 between two specified commits (they must be adjacent). Does not
891 take a general selection set. It is only necessary to specify the
892 parent commit, unless it has multiple children in which case the
893 child commit must follow (separate it with a comma).
894
895 If the repo was named 'foo', you will normally end up with two
896 repos named 'foo-early' and 'foo-late' (option and feature events
897 at the beginning of the early segment will be duplicated onto the
898 beginning of the late one.). But if the commit graph would remain
899 connected through another path after the cut, the behavior changes.
900 In this case, if the parent and child were on the same branch
901 'qux', the branch segments are renamed 'qux-early' and 'qux-late'
902 but the repo is not divided.
903
904 expunge [--notagify] [path | /regexp/]...
905 Expunge files from the selected portion of the repo history; the
906 default is the entire history. The arguments to this command may be
907 paths or Python regular expressions matching paths (regexps must be
908 marked by being surrounded with //).
909
910 All filemodify (M) operations and delete (D) operations involving a
911 matched file in the selected set of events are disconnected from
912 the repo and put in a removal set. Renames are followed as the tool
913 walks forward in the selection set; each triggers a warning
914 message. If a selected file is a copy (C) target, the copy will be
915 deleted and a warning message issued. If a selected file is a copy
916 source, the copy target will be added to the list of paths to be
917 deleted and a warning issued.
918
919 After file expunges have been performed, any commits with no
920 remaining file operations will be removed, and any tags pointing to
921 them. By default each deleted commit is replaced with a tag of the
922 form 'emptycommit-ident' on the preceding commit unless
923 “--notagify” is specified as an argument. Commits with deleted
924 fileops pointing both in and outside the path set are not deleted,
925 but are cloned into the removal set.
926
927 The removal set is not discarded. It is assembled into a new
928 repository named after the old one with the suffix "-expunges"
929 added. Thus, this command can be used to carve a repository into
930 sections by file path matches.
931
932 tagify [--canonicalize] [--tipdeletes] [--tagify-merges]
933 Search for empty commits and turn them into tags. Takes an optional
934 selection set argument defaulting to all commits. For each commit
935 in the selection set, turn it into a tag with the same message and
936 author information if it has no fileops. By default merge commits
937 are not considered, even if they have no fileops (thus no tree
938 differences with their first parent). To change that, use the
939 --tagify-merges option.
940
941 The name of the generated tag will be 'emptycommit-ident', where
942 ident is generated from the legacy ID of the deleted commit, or
943 from its mark, or from its index in the repository, with a
944 disambiguation suffix if needed.
945
946 With the --canonicalize, tagify tries harder to detect trivial
947 commits by first ensuring that all fileops of selected commits will
948 have an actual effect when processed by fast-import.
949
950 With the --tipdeletes, tagify also considers branch tips with only
951 deleteall fileops to be candidates for tagification. The
952 corresponding tags get names of the form 'tipdelete-branchname'
953 rather than the default 'emptycommit-ident'.
954
955 With the --tagify-merges, tagify also tagifies merge commits that
956 have no fileops. When this is done the merge link is move to the
957 yagified commit's parent.
958
959 coalesce [--debug|--changelog] [timefuzz]
960 Scan the selection set for runs of commits with identical comments
961 close to each other in time (this is a common form of scar tissues
962 in repository up-conversions from older file-oriented
963 version-control systems). Merge these cliques by deleting all but
964 the last commit, in order; fileops from the deleted commits are
965 pushed forward to that last one
966
967 The optional second argument, if present, is a maximum time
968 separation in seconds; the default is 90 seconds.
969
970 The default selection set for this command is =C, all commits.
971 Occasionally you may want to restrict it, for example to avoid
972 coalescing unrelated cliques of "*** empty log message ***" commits
973 from CVS lifts.
974
975 With the --debug option, show messages about mismatches.
976
977 With the --changelog option, any commit with a comment containing
978 the string 'empty log message' (such as is generated by CVS) and
979 containing exactly one file operation modifying a path ending in
980 ChangeLog is treated specially. Such ChangeLog commits are
981 considered to match any commit before them by content, and will
982 coalesce with it if the committer matches and the commit separation
983 is small enough. This option handles a convention used by Free
984 Software Foundation projects.
985
986 split {at|by} item
987 The first argument is required to be a commit location; the second
988 is a preposition which indicates which splitting method to use. If
989 the preposition is 'at', then the third argument must be an integer
990 1-origin index of a file operation within the commit. If it is
991 'by', then the third argument must be a pathname to be
992 prefix-matched, pathname match is done first).
993
994 The commit is copied and inserted into a new position in the event
995 sequence, immediately following itself; the duplicate becomes the
996 child of the original, and replaces it as parent of the original's
997 children. Commit metadata is duplicated; the new commit then gets a
998 new mark. If the new commit has a legacy ID, the suffix '.split' is
999 appended to it.
1000
1001 Finally, some file operations - starting at the one matched or
1002 indexed by the split argument - are moved forward from the original
1003 commit into the new one. Legal indices are 2-n, where n is the
1004 number of file operations in the original commit.
1005
1006 add {D path | M perm mark path | R source target | C source target}
1007 To a specified commit, add a specified fileop.
1008
1009 For a D operation to be valid there must be an M operation for the
1010 path in the commit's ancestry. For an M operation to be valid, the
1011 'perm' part must be a token ending with 755 or 644 and the 'mark'
1012 must refer to a blob that precedes the commit location. For an R or
1013 C operation to be valid, there must be an M operation for the
1014 source in the commit's ancestry.
1015
1016 remove [index | path | deletes] [to commit]
1017 From a specified commit, remove a specified fileop. The op must be
1018 one of (a) the keyword “deletes”, (b) a file path, (c) a file path
1019 preceded by an op type set (some subset of the letters DMRCN), or
1020 (d) a 1-origin numeric index. The “deletes” keyword selects all D
1021 fileops in the commit; the others select one each.
1022
1023 If the “to” clause is present, the removed op is appended to the
1024 commit specified by the following singleton selection set. This
1025 option cannot be combined with “deletes”.
1026
1027 Note that this command does not attempt to scavenge blobs even if
1028 the deleted fileop might be the only reference to them. This
1029 behavior may change in a future release.
1030
1031 blob
1032 Create a blob at mark :1 after renumbering other marks starting
1033 from :2. Data is taken from stdin, which may be a here-doc. This
1034 can be used with the add command to patch synthetic data into a
1035 repository.
1036
1037 renumber
1038 Renumber the marks in a repository, from :1 up to :<n> where <n> is
1039 the count of the last mark. Just in case an importer ever cares
1040 about mark ordering or gaps in the sequence.
1041
1042 A side effect of this comment is to clean up stray "done"
1043 passthroughs that may have entered the repository via graft
1044 operations. After a renumber, the repository will have at most one
1045 "done" and it will be at the end of the events.
1046
1047 dedup
1048 Deduplicate blobs in the selection set. If multiple blobs in the
1049 selection set have the same SHA1, throw away all but the first, and
1050 change fileops referencing them to instead reference the (kept)
1051 first blob.
1052
1053 msgout [>outfile]
1054 Emit a file of messages in RFC2822 format representing the contents
1055 of repository metadata. Takes a selection set; members of the set
1056 other than commits, annotated tags, and passthroughs are ignored
1057 (that is, presently, blobs and resets).
1058
1059 The output from this command can optionally be redirected to a
1060 named output file. Prefix the filename with “>” and give it as a
1061 following argument.
1062
1063 May have an option --filter, followed by = and a /-enclosed regular
1064 expression. If this is given, only headers with names matching it
1065 are emitted. In this context the name of the header includes its
1066 trailing colon.
1067
1068 msgin [--create] [--empty-only] [<infile] [--changed >outfile]
1069 Accept a file of messages in RFC2822 format representing the
1070 contents of the metadata in selected commits and annotated tags.
1071 Takes no selection set. If there is an argument it will be taken as
1072 the name of a message file to read from; if no argument, or one of
1073 '-', reads from standard input. Supports < redirection.
1074
1075 Users should be aware that modifying an Event-Number or Event-Mark
1076 field will change which event the update from that message is
1077 applied to. This is unlikely to have good results.
1078
1079 The header CheckText, if present, is examined to see if the comment
1080 text of the associated event begins with it. If not, the item
1081 modification is aborted. This helps ensure that you are landing
1082 updates ob the events you intend.
1083
1084 If the “--create” modifier is present, new tags and commits will be
1085 appended to the repository. In this case it is an error for a tag
1086 name to match any exting tag name. Commit objects are created with
1087 no fileops. If Committer-Date or Tagger-Date fields are not present
1088 they are filled in with the time at which this command is executed.
1089 If Committer or Tagger fields are not present, reposurgeon will
1090 attempt to deduce the user's git-style identity and fill it in. If
1091 a singleton commit set was specified for commit creations, the new
1092 commits are made children of that commit.
1093
1094 Otherwise, if the Event-Number and Event-Mark fields are absent,
1095 the msgin logic will attempt to match the commit or tag first by
1096 Legacy-ID, then by a unique committer ID and timestamp pair.
1097
1098 If output is redirected and the modifier “--changed” appears, a
1099 minimal set of modifications actually made is written to the output
1100 file in a form that can be fed back in. Supports > redirection.
1101
1102 If the option “--empty-only” is given, this command will throw a
1103 recoverable error if it tries to alter a message body that is
1104 neither empty nor consists of the CVS empty-comment marker.
1105
1106 setfield attribute value
1107 In the selected objects (defaulting to none) set every instance of
1108 a named field to a string value. The string may be quoted to
1109 include whitespace, and use backslash escapes interpreted by the
1110 Python string-escape codec, such as \n and \t.
1111
1112 Attempts to set nonexistent attributes are ignored. Valid values
1113 for the attribute are internal Python field names; in particular,
1114 for commits, “comment” and “branch” are legal. Consult the source
1115 code for other interesting values.
1116
1117 The special fieldnames 'author', 'commitdate' and 'authdate' apply
1118 only to commits in the range. The latter two sets attribution
1119 dates. The former sets the author's name and email address
1120 (assuming the value can be parsed for both), copying the committer
1121 timestamp. The author's timezone may be deduced from the email
1122 address.
1123
1124 setperm 100644|100755|120000 path...
1125 For the selected objects (defaulting to none) take the first
1126 argument as an octal literal describing permissions. All subsequent
1127 arguments are paths. For each M fileop in the selection set and
1128 exactly matching one of the paths, patch the permission field to
1129 the first argument value.
1130
1131 append [--rstrip] [>text]
1132 Append text to the comments of commits and tags in the specified
1133 selection set. The text is the first token of the command and may
1134 be a quoted string. C-style escape sequences in the string are
1135 interpreted using Python's string_decode codec.
1136
1137 If the option --rstrip is given, the comment is right-stripped
1138 before the new text is appended.
1139
1140 filter [--shell|--regex|--replace|--dedos]
1141 Run blobs, commit comments, or tag comments in the selection set
1142 through the filter specified on the command line.
1143
1144 In any mode other than --dedos, attempting to specify a selection
1145 set including both blobs and non-blobs (that is, commits or tags)
1146 throws an error. Inline content in commits is filtered when the
1147 selection set contains (only) blobs and the commit is within the
1148 range bounded by the earliest and latest blob in the specification.
1149
1150 When filtering blobs, if the command line contains the magic cookie
1151 '%PATHS%' it is replaced with a space-separated list of all paths
1152 that reference the blob.
1153
1154 With --shell, the remainder of the line specifies a filter as a
1155 shell command. Each blob or comment is presented to the filter on
1156 standard input; the content is replaced with whatever the filter
1157 emits to standard output.
1158
1159 With --regex, the remainder of the line is expected to be a Python
1160 regular expression substitution written as /from/to/ with from and
1161 to being passed as arguments to the standard re.sub() function and
1162 it applied to modify the content. Actually, any non-space character
1163 will work as a delimiter in place of the /; this makes it easier to
1164 use / in patterns. Ordinarily only the first such substitution is
1165 performed; putting 'g' after the slash replaces globally, and a
1166 numeric literal gives the maximum number of substitutions to
1167 perform. Other flags available restrict substitution scope - 'c'
1168 for comment text only, 'C' for committer name only, 'a' for author
1169 names only. Note that parsing of a --regex argument will be
1170 confused by any substring consisting of whitespace followed by #;
1171 use "\s" rather than whitespace to avoid this.
1172
1173 With --replace, the behavior is like --regexp but the expressions
1174 are not interpreted as regular expressions. (This is slightly
1175 faster).
1176
1177 With --dedos, DOS/Windows-style \r\n line terminators are replaced
1178 with \n.
1179
1180 transcode codec
1181 Transcode blobs, commit comments and committer/author names, or tag
1182 comments and tag committer names in the selection set to UTF-8 from
1183 the character encoding specified on the command line.
1184
1185 Attempting to specify a selection set including both blobs and
1186 non-blobs (that is, commits or tags) throws an error. Inline
1187 content in commits is filtered when the selection set contains
1188 (only) blobs and the commit is within the range bounded by the
1189 earliest and latest blob in the specification.
1190
1191 The encoding argument must name one of the codecs known to the
1192 Python standard codecs library. In particular, 'latin1' is a valid
1193 codec name.
1194
1195 Errors in this command are fatal, because an error may leave
1196 repository objects in a damaged state.
1197
1198 The theory behind the design of this command is that the repository
1199 might contain a mixture of encodings used to enter commit metadata
1200 by different people at different times. After using =I to identify
1201 metadata containing non-Unicode high bytes in text, a human must
1202 use context to identify which particular encodings were used in
1203 particular event spans and compose appropriate transcode commands
1204 to fix them up.
1205
1206 edit
1207 Report the selection set of events to a tempfile as msgout does,
1208 call an editor on it, and update from the result as msgin does. If
1209 you do not specify an editor name as second argument, it will be
1210 taken from the $EDITOR variable in your environment. If $EDITOR is
1211 not set, /usr/bin/editor will be used as a fallback if it exists as
1212 a symlink to your default editor, as is the case on Debian, Ubuntu
1213 and their derivatives.
1214
1215 Normally this command ignores blobs because msgout does. However,
1216 if you specify a selection set consisting of a single blob, your
1217 editor will be called directly on the blob file.
1218
1219 Supports < and > redirection.
1220
1221 timeoffset offset [timezone]
1222 Apply a time offset to all time/date stamps in the selected set. An
1223 offset argument is required; it may be in the form [+-]ss,
1224 [+-]mm:ss or [+-]hh:mm:ss. The leading sign is required to
1225 distinguish it from a selection expression.
1226
1227 Optionally you may also specify another argument in the form
1228 [+-]hhmm, a timezone literal to apply. To apply a timezone without
1229 an offset, use an offset literal of +0 or -0.
1230
1231 unite [--prune] reponame...
1232 Unite repositories. Name any number of loaded repositories; they
1233 will be united into one union repo and removed from the load list.
1234 The union repo will be selected.
1235
1236 The root of each repo (other than the oldest repo) will be grafted
1237 as a child to the last commit in the dump with a preceding commit
1238 date. This will produce a union repository with one branch for each
1239 part. Running last to first, duplicate tag and branch names will be
1240 disambiguated using the source repository name (thus, recent
1241 duplicates will get priority over older ones). After all grafts,
1242 marks will be renumbered.
1243
1244 The name of the new repo will be the names of all parts
1245 concatenated, separated by '+'. It will have no source directory or
1246 preferred system type.
1247
1248 With the option --prune, at each join D operations for every
1249 ancestral file existing will be prepended to the root commit, then
1250 it will be canonicalized using the rules for squashing the effect
1251 will be that only files with properly matching M, R, and C
1252 operations in the root survive.
1253
1254 graft [--prune] reponame
1255 For when unite doesn't give you enough control. This command may
1256 have either of two forms, selected by the size of the selection
1257 set. The first argument is always required to be the name of a
1258 loaded repo.
1259
1260 If the selection set is of size 1, it must identify a single commit
1261 in the currently chosen repo; in this case the name repo's root
1262 will become a child of the specified commit. If the selection set
1263 is empty, the named repo must contain one or more callouts matching
1264 a commits in the currently chosen repo.
1265
1266 Labels and branches in the named repo are prefixed with its name;
1267 then it is grafted to the selected one. Any other callouts in the
1268 named repo are also resolved in the context of the currently chosen
1269 one. Finally, the named repo is removed from the load list.
1270
1271 With the option --prune, prepend a deleteall operation into the
1272 root of the grafted repository.
1273
1274 path [source] rename [--force] [target]
1275 Rename a path in every fileop of every selected commit. The default
1276 selection set is all commits. The first argument is interpreted as
1277 a Python regular expression to match against paths; the second may
1278 contain back-reference syntax.
1279
1280 Ordinarily, if the target path already exists in the fileops, or is
1281 visible in the ancestry of the commit, this command throws an
1282 error. With the --force option, these checks are skipped.
1283
1284 paths [{sub|sup}] [dirname] [>outfile]
1285 Takes a selection set. Without a modifier, list all paths touched
1286 by fileops in the selection set (which defaults to the entire
1287 repo). This reporting variant does >-redirection.
1288
1289 With the 'sub' modifier, take a second argument that is a directory
1290 name and prepend it to every path. With the 'sup' modifier, strip
1291 any directory argument from the start of the path if it appears
1292 there; with no argument, strip the first directory component from
1293 every path.
1294
1295 merge
1296 Create a merge link. Takes a selection set argument, ignoring all
1297 but the lowest (source) and highest (target) members. Creates a
1298 merge link from the highest member (child) to the lowest (parent).
1299
1300 unmerge
1301 Linearize a commit. Takes a selection set argument, which must
1302 resolve to a single commit, and removes all its parents except for
1303 the first.
1304
1305 It is equivalent to reparent --rebase first_parent,commit, where
1306 commit is the same selection set as used with unmerge and
1307 first_parent is a set resolving commit's first parent (see the
1308 reparent command below
1309
1310 The main interest of the unmerge is that you don't have to find and
1311 specify the first parent yourself, saving time and avoiding errors
1312 when nearby surgery would make a manual first parent argument
1313 stale.
1314
1315 reparent [options...] [policy]
1316 Changes the parent list of a commit. Takes a selection set, zero or
1317 more option arguments, and an optional policy argument.
1318
1319 Selection set:
1320 The selection set must resolve to one or more commits. The
1321 selected commit with the highest event number (not necessarily
1322 the last one selected) is the commit to modify. The remainder
1323 of the selected commits, if any, become its parents: the
1324 selected commit with the lowest event number (which is not
1325 necessarily the first one selected) becomes the first parent,
1326 the selected commit with second lowest event number becomes the
1327 second parent, and so on. All original parent links are
1328 removed. Examples:
1329
1330 # this makes 17 the parent of 33
1331 17,33 reparent
1332
1333 # this also makes 17 the parent of 33
1334 33,17 reparent
1335
1336 # this makes 33 a root (parentless) commit
1337 33 reparent
1338
1339 # this makes 33 an octopus merge commit. its first parent
1340 # is commit 15, second parent is 17, and third parent is 22
1341 22,33,15,17 reparent
1342
1343 Options:
1344
1345 --use-order
1346 Use the selection order to determine which selected commit
1347 is the commit to modify and which are the parents (and if
1348 there are multiple parents, their order). The last selected
1349 commit (not necessarily the one with the highest event
1350 number) is the commit to modify, the first selected commit
1351 (not necessarily the one with the lowest event number)
1352 becomes the first parent, the second selected commit
1353 becomes the second parent, and so on. Examples:
1354
1355 # this makes 33 the parent of 17
1356 33,17 reparent --use-order
1357
1358 # this makes 17 an octopus merge commit. its first parent
1359 # is commit 22, second parent is 33, and third parent is 15
1360 22,33,15,17 reparent --use-order
1361
1362 Warning: with this option, it is possible to preduce a
1363 repository graph in which parents precede children. This
1364 will produce a fatal error when the repository state is
1365 written out, so don't do that.
1366
1367 Policy:
1368 By default, the manifest of the reparented commit is computed
1369 before modifying it; a deleteall and some fileops are prepended
1370 so that the manifest stays unchanged even when the first parent
1371 has been changed. This behavior can be changed by specifying a
1372 policy flag:
1373
1374 --rebase
1375 Inhibits the default behavior—no deleteall is issued and
1376 the tree contents of all descendents can be modified as a
1377 result.
1378
1379 reorder [--quiet]
1380 Re-order a contiguous range of commits.
1381
1382 Older revision control systems tracked change history on a per-file
1383 basis, rather than as a series of atomic changesets, which often
1384 made it difficult to determine the relationships between changes.
1385 Some tools which convert a history from one revision control system
1386 to another attempt to infer changesets by comparing file commit
1387 comment and time-stamp against those of other nearby commits, but
1388 such inference is a heuristic and can easily fail. In the best
1389 case, when inference fails, a range of commits in the resulting
1390 conversion which should have been coalesced into a single changeset
1391 instead end up as a contiguous range of separate commits. This
1392 situation typically can be repaired easily enough with the coalesce
1393 or squash commands. However, in the worst case, numerous commits
1394 from several different topics, each of which should have been one
1395 or more distinct changesets, may end up interleaved in an
1396 apparently chaotic fashion. To deal with such cases, the commits
1397 need to be re-ordered, so that those pertaining to each particular
1398 topic are clumped together, and then possibly squashed into one or
1399 more changesets pertaining to each topic. This command, reorder,
1400 can help with the first task; the squash command with the second.
1401
1402 Selected commits are re-arranged in the order specified; for
1403 instance: ":7,:5,:9,:3 reorder". The specified commit range must be
1404 contiguous; each commit must be accounted for after re-ordering.
1405 Thus, for example, ':5' can not be omitted from ":7,:5,:9,:3
1406 reorder". (To drop a commit, use the delete or squash command.) The
1407 selected commits must represent a linear history, however, the
1408 lowest numbered commit being re-ordered may have multiple parents,
1409 and the highest numbered may have multiple children.
1410
1411 Re-ordered commits and their immediate descendants are inspected
1412 for rudimentary fileops inconsistencies. Warns if re-ordering
1413 results in a commit trying to delete, rename, or copy a file before
1414 it was ever created. Likewise, warns if all of a commit's fileops
1415 become no-ops after re-ordering. Other fileops inconsistencies may
1416 arise from re-ordering, both within the range of affected commits
1417 and beyond; for instance, moving a commit which renames a file
1418 ahead of a commit which references the original name. Such
1419 anomalies can be discovered via manual inspection and repaired with
1420 the add and remove (and possibly path) commands. Warnings can be
1421 suppressed with --quiet.
1422
1423 In addition to adjusting their parent/child relationships,
1424 re-ordering commits also re-orders the underlying events since
1425 ancestors must appear before descendants, and blobs must appear
1426 before commits which reference them. This means that events within
1427 the specified range will have different event numbers after the
1428 operation.
1429
1430 branch branchname {rename|delete} [arg]
1431 Rename or delete a branch (and any associated resets). First
1432 argument must be an existing branch name; second argument must one
1433 of the verbs 'rename' or 'delete'. The branchname may use backslash
1434 escapes interpreted by the Python string-escape codec, such as \s.
1435
1436 For a 'rename', the third argument may be any token that is a
1437 syntactically valid branch name (but not the name of an existing
1438 branch).
1439
1440 For either name, if it does not contain a '/' the prefix
1441 'refs/heads' is prepended.
1442
1443 tag tagname {create|move|rename|delete} [arg]
1444 Create, move, rename, or delete a tag.
1445
1446 Creation is a special case. First argument is a name, which must
1447 not be an existing tag. Takes a singleton event second argument
1448 which must point to a commit. A tag object pointing to the commit
1449 is created and inserted just after the last tag in the repo (or
1450 just after the last commit if there are no tags). The tagger,
1451 committish, and comment fields are copied from the commit's
1452 committer, mark, and comment fields.
1453
1454 Otherwise, first argument must be an existing tag name; second
1455 argument must be one of the verbs “move”, “rename”, or “delete”.
1456
1457 For a “move”, a third argument must be a singleton selection set.
1458 For a “rename”, the third argument may be any token that is a
1459 syntactically valid tag name (but not the name of an existing tag).
1460 For a “delete”, no third argument is required.
1461
1462 For a 'delete', no third argument is required. The name portion of
1463 a delete may be a regexp wrapped in //; if so, all objects of the
1464 specified type with names matching the regexp are deleted. This is
1465 useful for mass deletion of junk tags such as CVS branch-root tags.
1466
1467 The tagname may use backslash escapes interpreted by the Python
1468 string-escape codec, such as \s.
1469
1470 The behavior of this command is complex because features which
1471 present as tags may be any of three things: (1) True tag objects,
1472 (2) lightweight tags, actually sequences of commits with a common
1473 branchname beginning with “refs/tags” - in this case the tag is
1474 considered to point to the last commit in the sequence, (3) Reset
1475 objects. These may occur in combination; in fact, stream exporters
1476 from systems with annotation tags commonly express each of these as
1477 a true tag object (1) pointing at the tip commit of a sequence (2)
1478 in which the basename of the common branch field is identical to
1479 the tag name. An exporter that generates lightweight-tagged commit
1480 sequences (2) may or may not generate resets pointing at their tip
1481 commits.
1482
1483 This command tries to handle all combinations in a natural way by
1484 doing up to three operations on any true tag, commit sequence, and
1485 reset matching the source name. In a rename, all are renamed
1486 together. In a delete, any matching tag or reset is deleted; then
1487 matching branch fields are changed to match the branch of the
1488 unique descendent of the tagged commit, if there is one. When a tag
1489 is moved, no branch fields are changed and a warning is issued.
1490
1491 Attempts to delete a lightweight tag may fail with the message
1492 “couldn't determine a unique successor”. When this happens, the tag
1493 is on a commit with multiple children that have different branch
1494 labels. There is a hole in the specification of git fast-import
1495 streams that leaves it uncertain how branch labels can be safely
1496 reassigned in this case; rather than do something risky,
1497 reposurgeon throws a recoverable error.
1498
1499 reset resetname {create|move|rename|delete} [arg]
1500 Create, move, rename, or delete a reset. Create is a special case;
1501 it requires a singleton selection which is the associate commit for
1502 the reset, takes as a first argument the name of the reset (which
1503 must not exist), and ends with the keyword create.
1504
1505 In the other modes, the first argument must match an existing reset
1506 name; second argument must be one of the verbs “move”, “rename”, or
1507 “delete”.
1508
1509 The reset name may use backslash escapes interpreted by the Python
1510 string-escape codec, such as \s.
1511
1512 For a “move”, a third argument must be a singleton selection set.
1513 For a “rename”, the third argument may be any token token that
1514 matches a syntactically valid reset name (but not the name of an
1515 existing reset). For a “delete”, no third argument is required.
1516
1517 For either name, if it does not contain a “/” the prefix “heads/”
1518 is prepended. If it does not begin with “refs/”, “refs/” is
1519 prepended.
1520
1521 An argument matches a reset's name if it is either the entire
1522 reference (refs/heads/FOO or refs/tags/FOO for some some value of
1523 FOO) or the basename (e.g. FOO), or a suffix of the form heads/FOO
1524 or tags/FOO. An unqualified basename is assumed to refer to a head.
1525
1526 When a reset is renamed, commit branch fields matching the tag are
1527 renamed with it to match. When a reset is deleted, matching branch
1528 fields are changed to match the branch of the unique descendent of
1529 the tip commit of the associated branch, if there is one. When a
1530 reset is moved, no branch fields are changed.
1531
1532 debranch source-branch [target-branch]
1533 Takes one or two arguments which must be the names of source and
1534 target branches; if the second (target) argument is omitted it
1535 defaults to refs/heads/master. Any trailing segment of a branch
1536 name is accepted as a synonym for it; thus master is the same as
1537 refs/heads/master. Does not take a selection set.
1538
1539 The history of the source branch is merged into the history of the
1540 target branch, becoming the history of a subdirectory with the name
1541 of the source branch. Any resets of the source branch are removed.
1542
1543 strip [blobs|reduce]
1544 Reduce the selected repository to make it a more tractable test
1545 case. Use this when reporting bugs.
1546
1547 With the modifier 'blobs', replace each blob in the repository with
1548 a small, self-identifying stub, leaving all metadata and DAG
1549 topology intact. This is useful when you are reporting a bug, for
1550 reducing large repositories to test cases of manageable size.
1551
1552 A selection set is effective only with the 'blobs' option,
1553 defaulting to all blobs. The 'reduce' mode always acts on the
1554 entire repository.
1555
1556 With the modifier 'reduce', perform a topological reduction that
1557 throws out uninteresting commits. If a commit has all file
1558 modifications (no deletions or copies or renames) and has exactly
1559 one ancestor and one descendant, then it may be boring. To be fully
1560 boring, it must also not be referred to by any tag or reset.
1561 Interesting commits are not boring, or have a non-boring parent or
1562 non-boring child.
1563
1564 With no modifiers, this command strips blobs.
1565
1566 ignores [rename] [translate] [defaults]
1567 Intelligent handling of ignore-pattern files. This command fails if
1568 no repository has been selected or no preferred write type has been
1569 set for the repository. It does not take a selection set.
1570
1571 If the rename modifier is present, this command attempts to rename
1572 all ignore-pattern files to whatever is appropriate for the
1573 preferred type - e.g. .gitignore for git, .hgignore for hg, etc.
1574 This option does not cause any translation of the ignore files it
1575 renames.
1576
1577 If the translate modifier is present, syntax translation of each
1578 ignore file is attempted. At present, the only transformation the
1579 code knows is to prepend a 'syntax: glob' header if the preferred
1580 type is hg.
1581
1582 If the defaults modifier is present, the command attempts to
1583 prepend these default patterns to all ignore files. If no ignore
1584 file is created by the first commit, it will be modified to create
1585 one containing the defaults. This command will error out on prefer
1586 types that have no default ignore patterns (git and hg, in
1587 particular). It will also error out when it knows the import tool
1588 has already set default patterns.
1589
1590 attribution [selection] {show | set | delete | prepend | append} [args]
1591 Inspect, modify, add, and remove commit and tag attributions.
1592
1593 Attributions upon which to operate are selected in much the same
1594 way as events are selected, as described in SELECTION SYNTAX.
1595 selection is an expression composed of 1-origin
1596 attribution-sequence numbers, '$' for last attribution, '..'
1597 ranges, comma-separated items, '(...)' grouping, set operations '|'
1598 union, '&' intersection, and '~' negation, and function calls
1599 @min(), @max(), @amp(), @pre(), @suc(), @srt(). Attributions can
1600 also be selected by visibility set '=C' for committers, '=A' for
1601 authors, and '=T' for taggers. Finally, /regex/ will attempt to
1602 match the Python regular expression regex against an attribution
1603 name and email address; '/n' limits the match to only the name, and
1604 '/e' to only the email address.
1605
1606 With the exception of show, all actions require an explicit event
1607 selection upon which to operate. Available actions are:
1608
1609 [selection] [show] [>file]
1610 Inspect the selected attributions of the specified events
1611 (commits and tags). The show keyword is optional. If no
1612 attribution selection expression is given, defaults to all
1613 attributions. If no event selection is specified, defaults to
1614 all events. Supports > redirection.
1615
1616 selection set name [email] [date]
1617 selection set [name] email [date]
1618 selection set [name] [email] date
1619 Assign name, email, date to the selected attributions. As a
1620 convenience, if only some fields need to be changed, the others
1621 can be omitted. Arguments name, email, and date can be given in
1622 any order.
1623
1624 [selection] delete
1625 Delete the selected attributions. As a convenience, deletes all
1626 authors if selection is not given. It is an error to delete the
1627 mandatory committer and tagger attributions of commit and tag
1628 events, respectively.
1629
1630 [selection] prepend name [email] [date]
1631 [selection] prepend [name] email [date]
1632 Insert a new attribution before the first attribution named by
1633 selection. The new attribution has the same type (committer,
1634 author, or tagger) as the one before which it is being
1635 inserted. Arguments name, email, and date can be given in any
1636 order.
1637
1638 If name is omitted, an attempt is made to infer it from email
1639 by trying to match email against an existing attribution of the
1640 event, with preference given to the attribution before which
1641 the new attribution is being inserted. Similarly, email is
1642 inferred from an existing matching name. Likewise, for date.
1643
1644 As a convenience, if selection is empty or not specified a new
1645 author is prepended to the author list.
1646
1647 It is presently an error to insert a new committer or tagger
1648 attribution. To change a committer or tagger, use set instead.
1649
1650 [selection] append name [email] [date]
1651 [selection] append [name] email [date]
1652 Insert a new attribution after the last attribution named by
1653 selection. The new attribution has the same type (committer,
1654 author, or tagger) as the one after which it is being inserted.
1655 Arguments name, email, and date can be given in any order.
1656
1657 If name is omitted, an attempt is made to infer it from email
1658 by trying to match email against an existing attribution of the
1659 event, with preference given to the attribution after which the
1660 new attribution is being inserted. Similarly, email is inferred
1661 from an existing matching name. Likewise, for date.
1662
1663 As a convenience, if selection is empty or not specified a new
1664 author is appended to the author list.
1665
1666 It is presently an error to insert a new committer or tagger
1667 attribution. To change a committer or tagger, use set instead.
1668
1669 REFERENCE LIFTING
1670 This group of commands is meant for fixing up references in commits
1671 that are in the format of older version control systems. The general
1672 workflow is this: first, go over the comment history and change all
1673 old-fashioned commit references into machine-parseable cookies. Then,
1674 automatically turn the machine-parseable cookie into action stamps. The
1675 point of dividing the process this way is that the first part is hard
1676 for a machine to get right, while the second part is prone to errors
1677 when a human does it.
1678
1679 A Subversion cookie is a comment substring of the form [[SVN:ddddd]]
1680 (example: [[SVN:2355]] with the revision read directly via the
1681 Subversion exporter, deduced from git-svn metadata, or matching a
1682 $Revision$ header embedded in blob data for the filename.
1683
1684 A CVS cookie is a comment substring of the form
1685 [[CVS:filename:revision]] (example: [[CVS:src/README:1.23]] with the
1686 revision matching a CVS $Id$ or $Revision$ header embedded in blob data
1687 for the filename.
1688
1689 A mark cookie is of the form [[:dddd]] and is simply a reference to the
1690 specified mark. You may want to hand-patch this in when one of previous
1691 forms is inconvenient.
1692
1693 An action stamp is an RFC3339 timestamp, followed by a '!', followed by
1694 an author email address (author is preferred rather than committer
1695 because that timestamp is not changed when a patch is replayed on to a
1696 branch, but the code to make a stamp for a commit will fall back to the
1697 committer if no author field is present). It attempts to refer to a
1698 commit without being VCS-specific. Thus, instead of "commit 304a53c2"
1699 or "r2355", "2011-10-25T15:11:09Z!fred@foonly.com".
1700
1701 The following git aliases allow git to work directly with action
1702 stamps. Append it to your ~/.gitconfig; if you already have an [alias]
1703 section, leave off the first line.
1704
1705
1706 [alias]
1707 # git stamp <commit-ish> - print a reposurgeon-style action stamp
1708 stamp = show -s --format='%cI!%ce'
1709
1710 # git scommit <stamp> <rev-list-args> - list most recent commit that matches <stamp>.
1711 # Must also specify a branch to search or --all, after these arguments.
1712 scommit = "!f(){ d=${1%%!*}; a=${1##*!}; arg=\"--until=$d -1\"; if [ $a != $1 ]; then arg=\"$arg --committer=$a\"; fi; shift; git rev-list $arg ${1:+\"$@\"}; }; f"
1713
1714 # git scommits <stamp> <rev-list-args> - as above, but list all matching commits.
1715 scommits = "!f(){ d=${1%%!*}; a=${1##*!}; arg=\"--until=$d --after $d\"; if [ $a != $1 ]; then arg=\"$arg --committer=$a\"; fi; shift; git rev-list $arg ${1:+\"$@\"}; }; f"
1716
1717 # git smaster <stamp> - list most recent commit on master that matches <stamp>.
1718 smaster = "!f(){ git scommit \"$1\" master --first-parent; }; f"
1719 smasters = "!f(){ git scommits \"$1\" master --first-parent; }; f"
1720
1721 # git shs <stamp> - show the commits on master that match <stamp>.
1722 shs = "!f(){ stamp=$(git smasters $1); shift; git show ${stamp:?not found} $*; }; f"
1723
1724 # git slog <stamp> <log-args> - start git log at <stamp> on master
1725 slog = "!f(){ stamp=$(git smaster $1); shift; git log ${stamp:?not found} $*; }; f"
1726
1727 # git sco <stamp> - check out most recent commit on master that matches <stamp>.
1728 sco = "!f(){ stamp=$(git smaster $1); shift; git checkout ${stamp:?not found} $*; }; f"
1729
1730
1731 There is a rare case in which an action stamp will not refer uniquely
1732 to one commit. It is theoretically possible that the same author might
1733 check in revisions on different branches within the one-second
1734 resolution of the timestamps in a fast-import stream. There is nothing
1735 to be done about this; tools using action stamps need to be aware of
1736 the possibility and throw a warning when it occurs.
1737
1738 In order to support reference lifting, reposurgeon internally builds a
1739 legacy-reference map that associates revision identifiers in older
1740 version-control systems with commits. The contents of this map comes
1741 from three places: (1) cvs2svn:rev properties if the repository was
1742 read from a Subversion dump stream, (2) $Id$ and $Revision$ headers in
1743 repository files, and (3) the .git/cvs-revisions created by git
1744 cvsimport.
1745
1746 The detailed sequence for lifting possible references is this: first,
1747 find possible CVS and Subversion references with the references or =N
1748 visibility set; then replace them with equivalent cookies; then run
1749 references lift to turn the cookies into action stamps (using the
1750 information in the legacy-reference map) without having to do the
1751 lookup by hand.
1752
1753 references [list|edit|lift] [>outfile]
1754 With the modifier 'list', list commit and tag comments for strings
1755 that might be CVS- or Subversion-style revision identifiers. This
1756 will be useful when you want to replace them with equivalent
1757 cookies that can automatically be translated into VCS-independent
1758 action stamps. This reporting command supports >-redirection. It is
1759 equivalent to '=N list'.
1760
1761 With the modifier 'edit', edit the set where revision IDs are
1762 found. This version of the command supports < and > redirection.
1763 This is equivalent to '=N edit'.
1764
1765 With the modifier "lift", attempt to resolve Subversion and CVS
1766 cookies in comments into action stamps using the legacy map. An
1767 action stamp is a timestamp/email/sequence-number combination
1768 uniquely identifying the commit associated with that blob, as
1769 described in the section called “TRANSLATION STYLE”.
1770
1771 It is not guaranteed that every such reference will be resolved, or
1772 even that any at all will be. Normally all references in history
1773 from a Subversion repository will resolve, but CVS references are
1774 less likely to be resolvable.
1775
1776 CHANGELOGS
1777 CVS and Subversion do not have separated notions of committer and
1778 author for changesets; when lifted to a VCS that does, like git, their
1779 one author field is used for both.
1780
1781 However, if the project used the FSF ChangeLog convention, many
1782 changesets will include a ChangeLog modification listing an author for
1783 the commit. In the common case that the changeset was derived from a
1784 patch and committed by a project maintainer, but the ChangeLog entry
1785 names the actual author, this information can be recovered.
1786
1787 Use the "changelogs" command/ This takes neither arguments nor a
1788 selection set. It mines the ChangeLog files for authorship data.
1789
1790 It assumes such files have the basename 'ChangeLog', and that they are
1791 in the format used by FSF projects: entry header lines begin with
1792 YYYY-MM-DD and are followed by a fullname/address. When a ChangeLog
1793 file modification is found in a clique, the entry header at or before
1794 the section changed since its last revision is parsed and the address
1795 is inserted as the commit author.
1796
1797 If the entry header contains an email address but no name, a name will
1798 be filled in if possible by looking for the address in author map
1799 entries.
1800
1801 In accordance with FSF policy for ChangeLogs, any date in an
1802 attribution header is discarded and the committer date is used.
1803 However, if the name is an author-map alias with an associated
1804 timezone, that zone is used.
1805
1806 The command reports statistics on how many commits were altered.
1807
1808 RELEASE TARBALLS
1809 When converting a legacy repository, it sometimes happens that there
1810 are archived releases of the project surviving from before the date of
1811 the repository's initial commit. It may be desirable to insert those
1812 releases at the front of the repository history.
1813
1814 To do this, use the "incorporate" command. This command takes as its
1815 single argument naming a tarball, the content of which is to be
1816 inserted as a commit. It may be a gzipped or bzipped tarball. The
1817 initial segment of each path is assumed to be a version directory and
1818 stripped off. The number of segments stripped off can be set with the
1819 option --strip=n, n defaulting to 1.
1820
1821 Takes a singleton selection set. Normally inserts before that commit;
1822 with the option --after, insert after it. The default selection set is
1823 the very first commit of the repository.
1824
1825 The option --date can be used to set the commit date. It takes an
1826 argument, which is expected to be an RFC3339 timestamp.
1827
1828 The generated commit has a committer field (the invoking user) and gets
1829 as its commit date the modification time of the newest file in the
1830 tarball (not the mod time of the tarball itself). No author field is
1831 generated. A comment recording the tarball name is generated.
1832
1833 Note that the import stream generated by this command is - while
1834 correct - not optimal, and may in particular contain duplicate blobs.
1835
1836 VARIABLES, MACROS AND EXTENSIONS
1837 Occasionally you will need to issue a large number of complex surgical
1838 commands of very similar form, and it's convenient to be able to
1839 package that form so you don't need to do a lot of error-prone typing.
1840 For those occasions, reposurgeon supports simple forms of named
1841 variables and macro expansion.
1842
1843 assign [name]
1844 Compute a leading selection set and assign it to a symbolic name.
1845 It is an error to assign to a name that is already assigned, or to
1846 any existing branch name. Assignments may be cleared by sequence
1847 mutations (though not ordinary deletions); you will see a warning
1848 when this occurs.
1849
1850 With no selection set and no name, list all assignments.>
1851
1852 If the option --singleton is given, the assignment will throw an
1853 error if the selection set is not a singleton.
1854
1855 Use this to optimize out location and selection computations that
1856 would otherwise be performed repeatedly, e.g. in macro calls.
1857
1858 unassign name
1859 Unassign a symbolic name. Throws an error if the name is not
1860 assigned.
1861
1862 names [>outfile]
1863 List the names of all known branches and tags. Tells you what
1864 things are legal within angle brackets and parentheses.
1865
1866 define name body
1867 Define a macro. The first whitespace-separated token is the name;
1868 the remainder of the line is the body, unless it is “{”, which
1869 begins a multi-line macro terminated by a line beginning with “}”.
1870
1871 A later “do” call can invoke this macro.
1872
1873 The command “define” by itself without a name or body produces a
1874 macro list.
1875
1876 do name arguments...
1877 Expand and perform a macro. The first whitespace-separated token is
1878 the name of the macro to be called; remaining tokens replace {0},
1879 {1}... in the macro definition. Tokens may contain whitespace if
1880 they are string-quoted; string quotes are stripped. Macros can call
1881 macros.
1882
1883 If the macro expansion does not itself begin with a selection set,
1884 whatever set was specified before the "do" keyword is available to
1885 the command generated by the expansion.
1886
1887 undefine name
1888 Undefine the named macro.
1889
1890 Here's an example to illustrate how you might use this. In CVS
1891 repositories of projects that use the GNU ChangeLog convention, a very
1892 common pre-conversion artifact is a commit with the comment "*** empty
1893 log message ***" that modifies only a ChangeLog entry explaining the
1894 commit immediately previous to it. The following
1895
1896 define changelog <{0}> & /empty log message/ squash --pushback
1897 do changelog 2012-08-14T21:51:35Z
1898 do changelog 2012-08-08T22:52:14Z
1899 do changelog 2012-08-07T04:48:26Z
1900 do changelog 2012-08-08T07:19:09Z
1901 do changelog 2012-07-28T18:40:10Z
1902
1903 is equivalent to the more verbose
1904
1905 <2012-08-14T21:51:35Z> & /empty log message/ squash --pushback
1906 <2012-08-08T22:52:14Z> & /empty log message/ squash --pushback
1907 <2012-08-07T04:48:26Z> & /empty log message/ squash --pushback
1908 <2012-08-08T07:19:09Z> & /empty log message/ squash --pushback
1909 <2012-07-28T18:40:10Z> & /empty log message/ squash --pushback
1910
1911 but you are less likely to make difficult-to-notice errors typing the
1912 first version.
1913
1914 (Also note how the text regexp acts as a failsafe against the
1915 possibility of typing a wrong date that doesn't refer to a commit with
1916 an empty comment. This was a real-world example from the CVS-to-git
1917 conversion of groff.)
1918
1919 script filename [arg...]
1920 Takes a filename and optional following arguments. Reads each line
1921 from the file and executes it as a command.
1922
1923 During execution of the script, the script name replaces the string
1924 $0 and the optional following arguments (if any) replace the
1925 strings $1, $2 ... $n in the script text. This is done before
1926 tokenization, so the $1 in a string like “foo$1bar” will be
1927 expanded. Additionally, $$ is expanded to the current process ID
1928 (which may be useful for scripts that use tempfiles).
1929
1930 Within scripts (and only within scripts) reposurgeon accepts a
1931 slightly extended syntax: First, a backslash ending a line signals
1932 that the command continues on the next line. Any number of
1933 consecutive lines thus escaped are concatenated, without the ending
1934 backslashes, prior to evaluation. Second, a command that takes an
1935 input filename argument can instead take literal following data in
1936 the syntax of a shell here-document. That is: if the filename is
1937 replaced by "<<EOF", all following lines in the script up to a
1938 terminating line consisting only of "EOF" will be read, placed in a
1939 temporary file, and that file fed to the command and afterwards
1940 deleted. EOF may be replaced by any string. Backslashes have no
1941 special meaning while reading a here-document.
1942
1943 Scripts may have comments. Any line beginning with a '#' is
1944 ignored. If a line has a trailing position that begins with one or
1945 more whitespace characters followed by '#', that trailing portion
1946 is ignored.
1947
1948 ARTIFACT REMOVAL
1949 Some commands automate fixing various kinds of artifacts associated
1950 with repository conversions from older systems.
1951
1952 authors [read|write] [<filename] [>filename]
1953 Apply or dump author-map information for the specified selection
1954 set, defaulting to all events.
1955
1956 Lifts from CVS and Subversion may have only usernames local to the
1957 repository host in committer and author IDs. DVCSes want email
1958 addresses (net-wide identifiers) and complete names. To supply the
1959 map from one to the other, an authors file is expected to consist
1960 of lines each beginning with a local user ID, followed by a '='
1961 (possibly surrounded by whitespace) followed by a full name and
1962 email address, optionally followed by a timezone offset field.
1963 Thus:
1964
1965 ferd = Ferd J. Foonly <foonly@foo.com> America/New_York
1966
1967 An authors file may also contain lines of this form
1968
1969 + Ferd J. Foonly <foonly@foobar.com> America/Los_Angeles
1970
1971 These are interpreted as aliases for the last preceding '=' entry
1972 that may appear in ChangeLog files. When such an alias is matched
1973 on a ChangeLog attribution line, the author attribution for the
1974 commit is mapped to the basename, but the timezone is used as is.
1975 This accommodates people with past addresses (possibly at)
1976 different locations) unifying such aliases in metadata so searches
1977 and statistical aggregation will work better.
1978
1979 An authors file may have comment lines beginning with '#'; these
1980 are ignored.
1981
1982 When an authors file is applied, email addresses in committer and
1983 author metadata for which the local ID matches between < and @ are
1984 replaced according to the mapping (this handles git-svn lifts).
1985 Alternatively, if the local ID is the entire address, this is also
1986 considered a match (this handles what git-cvsimport and cvs2git
1987 do). If a timezone was specified in the map entry, that person's
1988 author and committer dates are mapped to it.
1989
1990 With the 'read' modifier, or no modifier, apply author mapping data
1991 (from standard input or a <-redirected file). May be useful if you
1992 are editing a repo or dump created by cvs2git or by git-svn invoked
1993 without -A.
1994
1995 With the 'write' modifier, write a mapping file that could be
1996 interpreted by authors read, with entries for each unique
1997 committer, author, and tagger (to standard output or a <-redirected
1998 mapping file). This may be helpful as a start on building an
1999 authors file, though each part to the right of an equals sign will
2000 need editing.
2001
2002 branchify [path-set]
2003 Specify the list of directories to be treated as potential branches
2004 (to become tags if there are no modifications after the creation
2005 copies) when analyzing a Subversion repo. This list is ignored when
2006 the --nobranch read option is used. It defaults to the 'standard
2007 layout' set of directories, plus any unrecognized directories in
2008 the repository root.
2009
2010 With no arguments, displays the current branchification set.
2011
2012 An asterisk at the end of a path in the set means 'all immediate
2013 subdirectories of this path, unless they are part of another
2014 (longer) path in the branchify set'.
2015
2016 Note that the branchify set is a property of the reposurgeon
2017 interpreter, not of any individual repository, and will persist
2018 across Subversion dumpfile reads. This may lead to unexpected
2019 results if you forget to re-set it.
2020
2021 branchify_map [/regex/branch/...]
2022 Specify the list of regular expressions used for mapping the svn
2023 branches that are detected by branchify. If none of the expressions
2024 match the default behaviour applies. This maps a branch to the name
2025 of the last directory, except for trunk and “*” which are mapped to
2026 master and root.
2027
2028 With no arguments the current regex replacement pairs are shown.
2029 Passing 'reset' will clear the mapping.
2030
2031 The branchify command will match each branch name against regex1
2032 and if it matches rewrite its branch name to branch1. If not it
2033 will try regex2 and so forth until it either found a matching regex
2034 or there are no regexs left. The regular expressions should be in
2035 Python's[2]. format. The branch name can use backreferences (see
2036 the re.sub function in the Python documentation).
2037
2038 Note that the regular expressions are appended to 'refs/' without
2039 either the needed 'heads/' or 'tags/'. This allows for choosing the
2040 right kind of branch type.
2041
2042 While the syntax template above uses slashes, any first character
2043 will be used as a delimiter (and you will need to use a different
2044 one in the common case that the paths contain slashes).
2045
2046 You must give this command before the Subversion repository read it
2047 is supposed to affect!
2048
2049 Note that the branchify_map set is a property of the reposurgeon
2050 interpreter, not of any individual repository, and will persist
2051 across Subversion dumpfiile or repository reads. This may lead to
2052 unexpected results if you forget to re-set it.
2053
2054 EXAMINING TREE STATES
2055 manifest [regular expression] [>outfile]
2056 Takes an optional selection set argument defaulting to all commits,
2057 and an optional Python regular expression. For each commit in the
2058 selection set, print the mapping of all paths in that commit tree
2059 to the corresponding blob marks, mirroring what files would be
2060 created in a checkout of the commit. If a regular expression is
2061 given, only print "path -> mark" lines for paths matching it. This
2062 command supports > redirection.
2063
2064 checkout directory
2065 Takes a selection set which must resolve to a single commit, and a
2066 second argument. The second argument is interpreted as a directory
2067 name. The state of the code tree at that commit is materialized
2068 beneath the directory.
2069
2070 diff [>outfile]
2071 Display the difference between commits. Takes a selection-set
2072 argument which must resolve to exactly two commits. Supports output
2073 redirection.
2074
2075 HOUSEKEEPING
2076 These are backed up by the following housekeeping commands, none of
2077 which take a selection set:
2078
2079 help
2080 Get help on the interpreter commands. Optionally follow with
2081 whitespace and a command name; with no argument, lists all
2082 commands. '?' also invokes this.
2083
2084 shell
2085 Execute the shell command given in the remainder of the line. '!'
2086 also invokes this.
2087
2088 prefer [repotype]
2089 With no arguments, describe capabilities of all supported systems.
2090 With an argument (which must be the name of a supported system)
2091 this has two effects:
2092
2093 First, if there are multiple repositories in a directory you do a
2094 read on, reposurgeon will read the preferred one (otherwise it will
2095 complain that it can't choose among them).
2096
2097 Secondly, this will change reposurgeon's preferred type for output.
2098 This means that you do a write to a directory, it will build a repo
2099 of the preferred type rather than its original type (if it had
2100 one).
2101
2102 If no preferred type has been explicitly selected, reading in a
2103 repository (but not a fast-import stream) will implicitly set the
2104 preferred type to the type of that repository.
2105
2106 In older versions of reposurgeon this command changed the type of
2107 the selected repository, if there is one. That behavior interacted
2108 badly with attempts to interpret legacy IDs and has been removed.
2109
2110 sourcetype [repotype]
2111 Report (with no arguments) or select (with one argument) the
2112 current repository's source type. This type is normally set at
2113 repository-read time, but may remain unset if the source was a
2114 stream file.
2115
2116 The source type affects the interpretation of legacy IDs (for
2117 purposes of the =N visibility set and the 'references' command) by
2118 controlling the regular expressions used to recognize them. If no
2119 preferred output type has been set, it may also change the output
2120 format of stream files made from the repository.
2121
2122 The source type is reliably set whenever a live repository is read,
2123 or when a Subversion stream or Fossil dump is interpreted but not
2124 necessarily by other stream files. Streams generated by cvs-fast-
2125 export(1) using the --reposurgeon are detected as CVS. In some
2126 other cases, the source system is detected from the presence of
2127 magic $-headers in contents blobs.
2128
2129 INSTRUMENTATION
2130 A few commands have been implemented primarily for debugging and
2131 regression-testing purposes, but may be useful in unusual
2132 circumstances.
2133
2134 The output of most of these commands can individually be redirected to
2135 a named output file. Where indicated in the syntax, you can prefix the
2136 output filename with “>” and give it as a following argument.
2137
2138 index [>outfile]
2139 Display four columns of info on objects in the selection set: their
2140 number, their type, the associate mark (or '-' if no mark) and a
2141 summary field varying by type. For a branch or tag it's the
2142 reference; for a commit it's the commit branch; for a blob it's the
2143 repository path of the file in the blob.
2144
2145 The default selection set for this command is =CTRU, all objects
2146 except blobs.
2147
2148 resolve [label-text...]
2149 Does nothing but resolve a selection-set expression and echo the
2150 resulting event-number set to standard output. The remainder of the
2151 line after the command is used as a label for the output.
2152
2153 Implemented mainly for regression testing, but may be useful for
2154 exploring the selection-set language.
2155
2156 attribution selection resolve [>outfile] [label-text...]
2157 Does nothing but resolve an attribution selection-set expression
2158 for the selected events and echo the resulting attribution-number
2159 set to standard output. The remainder of the line after the command
2160 is used as a label for the output.
2161
2162 Implemented mainly for regression testing, but may be useful for
2163 exploring the selection-set language.
2164
2165 verbose [n]
2166 'verbose 1' enables the progress meter and messages, 'verbose 0'
2167 disables them. Higher levels of verbosity are available but
2168 intended for developers only.
2169
2170 quiet [on | off]
2171 Without an argument, this command requests a report of the quiet
2172 boolean; with the argument 'on' or 'off' it is changed. When quiet
2173 is on, time-varying report fields which would otherwise cause
2174 spurious failures in regression testing are suppressed.
2175
2176 relax
2177 Normally, a command error aborts the execution of an enclosing
2178 script. The relax command suppresses this behavior. It is useful
2179 when writing regression tests that exercise failure cases.
2180
2181 print output-text...
2182 Does nothing but ship its argument line to standard output. Useful
2183 in regression tests.
2184
2185 echo [number]
2186 'echo 1' causes each reposurgeon command to be echoed to standard
2187 output just before its output. This can be useful in constructing
2188 regression tests that are easily checked by eyeball.
2189
2190 version [version...]
2191 With no argument, display the program version and the list of VCSes
2192 directly supported. With argument, declare the major version
2193 (single digit) or full version (major.minor) under which the
2194 enclosing script was developed. The program will error out if the
2195 major version has changed (which means the surgical language is not
2196 backwards compatible).
2197
2198 It is good practice to start your lift script with a version
2199 requirement, especially if you are going to archive it for later
2200 reference.
2201
2202 prompt [format...]
2203 Set the command prompt format to the value of the command line;
2204 with an empty command line, display it. The prompt format is
2205 evaluated in Python after each command with the following
2206 dictionary substitutions:
2207
2208 chosen
2209 The name of the selected repository, or None if none is
2210 currently selected.
2211
2212 Thus, one useful format might be 'rs[%(chosen)s]%% '.
2213
2214 More format items may be added in the future. The default prompt
2215 corresponds to the format 'reposurgeon%% '. The format line is
2216 evaluated with shell quotng of tokens, so that spaces can be
2217 included.
2218
2219 history
2220 List the commands you have entered this session.
2221
2222 legacy [read|write] [<filename] [>filename]
2223 Apply or list legacy-reference information. Does not take a
2224 selection set. The 'read' variant reads from standard input or a
2225 <-redirected filename; the 'write' variant writes to standard
2226 output or a >-redirected filename.
2227
2228 A legacy-reference file maps reference cookies to (committer,
2229 commit-date, sequence-number) pairs; these in turn (should)
2230 uniquely identify a commit. The format is two whitespace-separated
2231 fields: the cookie followed by an action stamp identifying the
2232 commit.
2233
2234 It should not normally be necessary to use this command. The legacy
2235 map is automatically preserved through repository reads and
2236 rebuilds, being stored in the file legacy-map under the repository
2237 subdirectory..
2238
2239 set [option]
2240 Turn on an option flag. With no arguments, list all options
2241
2242 Most options are described in conjunction with the specific
2243 operations that the modify. One of general interest is
2244 “compressblobs”; this enables compression on the blob files in the
2245 internal representation reposurgeon uses for editing repositories.
2246 With this option, reading and writing of repositories is slower,
2247 but editing a repository requires less (sometimes much less) disk
2248 space.
2249
2250 clear [option]
2251 Turn off an option flag. With no arguments, list all options
2252
2253 profile
2254 Enable profiling. Profile statistics are dumped to the path given
2255 as argument. Must be one of the initial command-line arguments, and
2256 gathers statistics only on code executed via '-'.
2257
2258 timing
2259 Display statistics on phase timing and memory usage in repository
2260 analysis. Mainly of interest to developers trying to speed up the
2261 program.
2262
2263 exit
2264 Exit, reporting the time. Included here because, while EOT will
2265 also cleanly exit the interpreter, this command reports elapsed
2266 time since start.
2267
2269 reposurgeon uses a built-in extractor class to perform extractions from
2270 Mercurial repositories.
2271
2272 Mercurial branches are exported as branches in the exported repository
2273 and tags are exported as tags. By default, bookmarks are ignored. You
2274 can specify explicit handling for bookmarks by setting
2275 reposurgeon.bookmarks in your .hg/hgrc. Set the value to the prefix
2276 that reposurgeon should use for bookmarks.
2277
2278 For example, if your bookmarks represent branches, put this at the
2279 bottom of your .hg/hgrc:
2280
2281 [reposurgeon]
2282 bookmarks=heads/
2283
2284 If you do that, it's your responsibility to ensure that branch names do
2285 not conflict with bookmark names. You can add a prefix like
2286 bookmarks=heads/feature- to disambiguate as necessary.
2287
2289 reposurgeon can read Subversion dumpfiles or edit a Subversion
2290 repository (and you must point it at a repository, not a checkout
2291 directory).
2292
2293 READING SUBVERSION REPOSITORIES
2294 Certain optional modifiers on the read command change its behavior when
2295 reading Subversion repositories:
2296
2297 --nobranch
2298 Suppress branch analysis.
2299
2300 --preserve
2301 Never discard metadata. In particular, preserve branch-creation
2302 commits (and their metadata) in full rather than turning commits
2303 for empty branches into bare gitspace resets. Also, preserve
2304 branges and tags with following tip deletes rather than nuking
2305 them; the tip deletes become tags.
2306
2307 --ignore-properties
2308 Suppress read-time warnings about discarded property settings.
2309
2310 --user-ignores
2311 Don't generate .gitignore files from svn:ignore properties.
2312 Instead, just pass through .gitignore files found in the history.
2313
2314 --use-uuid
2315 If the --use-uuid read option is set, the repository's UUID will be
2316 used as the hostname when faking up email addresses, a la git-svn.
2317 Otherwise, addresses will be generated the way git cvs-import does
2318 it, simply copying the username into the address field.
2319
2320 --noignores
2321 Do not fill in an equivalent of default Subversion ignore patterns.
2322
2323 These modifiers can go anywhere in any order on the read command line
2324 after the read verb. They must be whitespace-separated.
2325
2326 It is also possible to embed a magic comment in a Subversion stream
2327 file to set these options. Prefix a space-separated list of them with
2328 the magic comment " # reposurgeon-read-options:"; the leading space is
2329 required. This may be useful when synthesizing test loads; in
2330 partticular, a stream file that does not set up a standard
2331 trunk/branches/tags directoryt layout can use this to perform a mapping
2332 of all commits onto the master branch that the git importer will
2333 accept.
2334
2335 Here are the rules used for mapping subdirectories in a Subversion
2336 repository to branches:
2337
2338 1. At any given time there is a set of eligible paths and path
2339 wildcards which declare potential branches. See the documentation
2340 of the branchify for how to alter this set, which initially
2341 consists of {trunk, tags/*, branches/*, and '*'}.
2342
2343 2. A repository is considered "flat" if it has no directory that
2344 matches a path or path wildcard in the branchify set. All commits
2345 in a flat repository are assigned to branch master, and what would
2346 have been branch structure becomes directory structure. In this
2347 case, we're done; all the other rules apply to non-flat repos.
2348
2349 If you give the option --nobranch when reading a Subversion
2350 repository, branch analysis is skipped and the repository is
2351 treated as though flat (left as a linear sequence of commits on
2352 refs/heads/master). This may be useful if your repository
2353 configuration is highly unusual and you need to do your own branch
2354 surgery. Note that this option will disable partitioning of mixed
2355 commits.
2356
2357 3. If "trunk" is eligible, it always becomes the master branch.
2358
2359 4. If an element of the branchify set ends with *, each immediate
2360 subdirectory of it is considered a potential branch. If '*' is in
2361 the branchify set (which is true by default) all top-level
2362 directories other than /trunk, /tags, and /branches are also
2363 considered potential branches.
2364
2365 5. Files in the top-level directory are assigned to a synthetic branch
2366 named 'root'.
2367
2368 6. Each potential branch is checked to see if it has commits on it
2369 after the initial creation or copy. If there are such commits, it
2370 becomes a branch. If not, it may become a tag in order to preserve
2371 the commit metadata (see the description of the --preserve option
2372 below). In all cases, the name of any created tag or branch is the
2373 basename of the directory.
2374
2375 Branch-creation operations with no following commits are treated
2376 differently depending on whether or not the --preserve option is on. If
2377 it is off (the default) the branch creation becomes an empty gitspace
2378 branch represented by a reset operation; any comment on the commit is
2379 issued with a warning. If --preserve is on, the comment metadata is
2380 preserved in an empty commit attached to the branchpoint.
2381
2382 Otherwise, each commit that only creates or deletes directories (in
2383 particular, copy commits for tags and branches, and commits that only
2384 change properties) will be transformed into a tag named after the tag
2385 or branch, containing the date/author/comment metadata from the commit.
2386
2387 Subversion branch deletions are turned into deletealls, clearing the
2388 fileset of the import-stream branch. When a branch finishes with a
2389 deleteall at its tip, the deleteall is transformed into a tag. This
2390 rule cleans up after aborted branch renames.
2391
2392 Occasionally (and usually by mistake) a branchy Subversion repository
2393 will contain revisions that touch multiple branches. These are handled
2394 by partitioning them into multiple import-stream commits, one on each
2395 affected branch. The Legacy-ID of such a split commit will have a
2396 pseudo-decimal part - for example, if Subversion revision 2317 touches
2397 three branches, the three generated commits will have IDs 2317.1,
2398 2317.2, and 2317.3.
2399
2400 The svn:executable and svn:special properties are translated into
2401 permission settings in the input stream; svn:executable becomes 100755
2402 and svn:special becomes 120000 (indicating a symlink; the blob contents
2403 will be the path to which the symlink should resolve).
2404
2405 Any cvs2svn:rev properties generated by cvs2svn are incorporated into
2406 the internal map used for reference-lifting, then discarded.
2407
2408 Normally, per-directory svn:ignore properties become .gitignore files.
2409 Actual .gitignore files in a Subversion directory are presumed to have
2410 been created by git-svn users separately from native Subversion ignore
2411 properties and discarded with a warning. It is up to the user to merge
2412 the content of such files into the target repository by hand. But this
2413 behavior is inverted by the --user-ignores option; if that is on,
2414 .gitignore files are passed through and Subversion svn:ignore
2415 properties are discarded.
2416
2417 (Regardless of the setting of the --user-ignores option, .cvsignore
2418 files found in Subversion repositories always become .gitignores in the
2419 translation. The assumption is that these date from before a CVS-to-SVN
2420 lift and should be preserved to affect behavior when browsing that
2421 section of the repository.)
2422
2423 svn:mergeinfo properties are interpreted. Any svn:mergeinfo property on
2424 a revision A with a merge source range ending in revision B produces a
2425 merge link such that B becomes a parent of A.
2426
2427 All other Subversion properties are discarded. (This may change in a
2428 future release.) The property for which this is most likely to cause
2429 semantic problems is svn:eol-style. However, since property-change-only
2430 commits get turned into annotated tags, the translated tags will retain
2431 information about setting changes.
2432
2433 The sub-second resolution on Subversion commit dates is discarded; Git
2434 wants integer timestamps only.
2435
2436 Because fast-import format cannot represent an empty directory, empty
2437 directories in Subversion repositories will be lost in translation.
2438
2439 Normally, Subversion local usernames are mapped in the style of git
2440 cvs-import; thus user "foo" becomes "foo <foo>", which is sufficient to
2441 pacify git and other systems that require email addresses. With the
2442 option "svn_use_uuid", usernames are mapped in the git-svn style, with
2443 the repository's UUID used as a fake domain in the email address. Both
2444 forms can be remapped to real address using the authors read command.
2445
2446 Reading a Subversion stream enables writing of the legacy map as
2447 'legacy' passthroughs when the repo is written to a stream file.
2448
2449 reposurgeon tries hard to silently do the right thing, but there are
2450 Subversion edge cases in which it emits warnings because a human may
2451 need to intervene and perform fixups by hand. Here are the less obvious
2452 messages it may emit:
2453
2454 user-generated .gitignore
2455 This message means means reposurgeon has found a .gitignore file in
2456 the Subversion repository it is analyzing. This probably happened
2457 because somebody was using git-svn as a live gateway, and created
2458 ignores which may or may not be congruent with those in the
2459 generated .gitignore files that the Subversion ignore properties
2460 will be translated into. You'll need to make a policy decision
2461 about which set of ignores to use in the conversion, and possibly
2462 set the --user-ignores option on read to pass through user-created
2463 .gitignore files; in that case this warning will not be emitted.
2464
2465 can't connect nonempty branch XXXX to origin
2466 This is a serious error. reposurgeon has been unable to find a
2467 link from a specified branch to the trunk (master) branch. The
2468 commit graph will not be fully connected and will need manual
2469 repair.
2470
2471 permission information may be lost
2472 A Subversion node change on a file sets or clears properties, but
2473 no ancestor can be found for this file. Executable or symlink
2474 position may be set wrongly on later revisions of this file.
2475 Subversion user-defined properties may also be scrambled or lost.
2476 Usually this error can be ignored.
2477
2478 properties set
2479 reposurgeon has detected a setting of a user-defined property, or
2480 the Subversion properties svn:externals. These properties cannot be
2481 expressed in an import stream; the user is notified in case this is
2482 a showstopper for the conversion or some corrective action is
2483 required, but normally this error can be ignored. This warning is
2484 suppressed by the --ignore-properties option.
2485
2486 branch links detected by file ops only
2487 Branch links are normally deduced by examining Subversion directory
2488 copy operations. A common user error (making a branch with a
2489 non-Subversion directory copy and then doing an svn add on the
2490 contends) can defeat this. While reposurgeon should detect and cope
2491 with most such copies correctly, you should examine the commit
2492 graph to check that the branch is rooted at the correct place.
2493
2494 could not tagify root commit
2495 The earliest commit in your Subversion repository has file
2496 operations, rather than being a pure directory creation. This
2497 probably means your Subversion dump file is malformed, or you may
2498 have attempted to lift from an incremental dump. Proceed with
2499 caution.
2500
2501 deleting parentless tip delete
2502 This message may be triggered by a Subversion branch move followed
2503 by a re-creation under the source name. Check near the indicated
2504 revision to make sure the renamed branch is connected to master.
2505
2506 mid-branch deleteall
2507 A deleteall operation has been found in the middle of a branch
2508 history. This usually indicates that a Subversion tag or branch was
2509 created by mistake, and someone later tried to undo the error by
2510 deleting the tag/branch directory before recreating it with a copy
2511 operation. Examine the topology near the deleteall closely, it may
2512 need hand-hacking. It is fairly likely that both (a) the
2513 reposurgeon translation will be different from what other
2514 translators (such as git-svn) produce, and (b) it will not be
2515 immediately obvious which is right.
2516
2517 lookback for XXX failed, not making branch link
2518 Branch analysis failed, probably due to a set of file copies that
2519 reposurgeon thought it should interpret as a botched branch
2520 creation but couldn't deduce a history for. This is a warning;
2521 check how the directory XXX is converted, it may need post-editing
2522 into a branch.
2523
2524 WRITING SUBVERSION REPOSITORIES
2525 reposurgeon has support for writing Subversion repositories. Due to
2526 mismatches between the ontology of Subversion and that of git import
2527 streams, this support has some significant limitations and bugs.
2528
2529 In summary, Subversion repository histories do not round-trip through
2530 reposurgeon editing. File content changes are preserved but some
2531 metadata is unavoidably lost. Furthermore, writing out a DVCS history
2532 in Subversion also loses significant portions of its metadata. Details
2533 follow.
2534
2535 Writing a Subversion repository or dump stream discards author
2536 information, the committer's name, and the hostname part of the commit
2537 address; only the commit timestamp and the local part of the
2538 committer's email address are preserved, the latter becoming the
2539 Subversion author field. However, reading a Subversion repository and
2540 writing it out again will preserve the author fields.
2541
2542 Import-stream timestamps have 1-second granularity. The sub-second
2543 parts of Subversion commit timestamps will be lost on their way through
2544 reposurgeon.
2545
2546 Empty directories aren't represented in import streams. Consequently,
2547 reading and writing Subversion repositories preserves file content, but
2548 not empty directories. It is also not guaranteed that after editing a
2549 Subversion repository that the sequence of directory creations and
2550 deletions relative to other operations will be identical; the only
2551 guarantee is that enclosing directories will be created before any
2552 files in them are.
2553
2554 When reading a Subversion repository, reposurgeon discards the special
2555 directory-copy nodes associated with branch creations. These can't be
2556 recreated if and when the repository is written back out to Subversion;
2557 rather, each branch copy node from the original translates into a
2558 branch creation plus the first set of file modifications on the branch.
2559
2560 When reading a Subversion repository, reposurgeon also automatically
2561 breaks apart mixed-branch commits. These are not re-united if the
2562 repository is written back out.
2563
2564 When writing to a Subversion repository, all lightweight tags become
2565 Subversion tag copies with empty log comments, named for the tag
2566 basename. The committer name and timestamp are copied from the commit
2567 the tag points to. The distinction between heads and tags is lost.
2568
2569 Because of the preceding two points, it is not guaranteed that even
2570 revision numbers will be stable when a Subversion repository is read in
2571 and then written out!
2572
2573 Subversion repositories are always written with a standard
2574 (trunk/tags/branches) layout. Thus, a repository with a nonstandard
2575 shape that has been analyzed by reposurgeon won't be written out with
2576 the same shape.
2577
2578 When writing a Subversion repository, branch merges are translated into
2579 svn:mergeinfo properties in the simplest possible way - as an
2580 svn:mergeinfo property of the translated merge commit listing the merge
2581 source revisions.
2582
2583 Subversion has a concept of "flows"; that is, named segments of history
2584 corresponding to files or directories that are created when the path is
2585 added, cloned when the path is copied, and deleted when the path is
2586 deleted. This information is not preserved in import streams or the
2587 internal representation that reposurgeon uses. Thus, after editing, the
2588 flow boundaries of a Subversion history may be arbitrarily changed.
2589
2591 reposurgeon recognizes how supported VCSes represent file ignores (CVS
2592 .cvsignore files lurking untranslated in older Subversion repositories,
2593 Subversion ignore properties, .gitignore/.hgignore/.bzrignore file in
2594 other systems) and moves ignore declarations among these containers on
2595 repo input and output. This will be sufficient if the ignore patterns
2596 are exact filenames.
2597
2598 Translation may not, however, be perfect when the ignore patterns are
2599 Unix glob patterns or regular expressions. This compatibility table
2600 describes which patterns will translate; “plain” indicates a plain
2601 filename with no glob or regexp syntax or negation.
2602
2603 RCS has no ignore files or patterns and is therefore not included in
2604 the table.
2605
2606┌─────────────┬───────────────┬──────────────┬───────────────────┬───────────────────┬─────────────────────┬──────────────┬────────────┬────────────┐
2607│ │ from CVS │ from svn │ from git │ from hg │ from bzr │ from │ from SRC │ from bk │
2608│ │ │ │ │ │ │ darcs │ │ │
2609├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2610│ │ │ │ │ │ │ │ │ │
2611│ to │ all │ all │ all │ all │ all │ plain │ all │ all │
2612│ CVS │ │ │ except │ │ except │ │ │ │
2613│ │ │ │ !-prefixed │ │ RE:- │ │ │ │
2614│ │ │ │ but │ │ and │ │ │ │
2615│ │ │ │ nonempty │ │ !-prefixed │ │ │ │
2616├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2617│ │ all except │ │ │ │ │ │ │ │
2618│ to │ !.PP │ all │ all except │ all │ all except │ plain │ all │ all │
2619│ svn │ │ │ !-prefixed │ │ RE:- and │ │ │ │
2620│ │ │ │ │ │ !-prefixed │ │ │ │
2621├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2622│ │ │ │ │ │ │ │ │ │
2623│ to │ all │ all │ all │ all │ all except │ plain │ all │ all │
2624│ git │ │ │ │ except │ RE:-prefixed │ │ │ │
2625│ │ │ │ │ !-prefixed │ │ │ │ │
2626├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2627│ │ │ │ │ │ │ │ │ │
2628│ to │ all │ all │ all except │ all │ all except │ plain │ all │ all │
2629│ hg │ except │ │ !-prefixed │ │ RE:- and │ │ │ │
2630│ │ ! │ │ │ │ !-prefixed │ │ │ │
2631├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2632│ │ │ │ │ │ │ │ │ │
2633│ to │ all │ all │ all │ all │ all │ plain │ all │ all │
2634│ bzr │ │ │ │ │ │ │ │ │
2635├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2636│ │ │ │ │ │ │ │ │ │
2637│ to │ plain │ plain │ plain │ plain │ plain │ all │ all │ all │
2638│ darcs │ │ │ │ │ │ │ │ │
2639├─────────────┼───────────────┼──────────────┼───────────────────┼───────────────────┼─────────────────────┼──────────────┼────────────┼────────────┤
2640│ │ │ │ │ │ │ │ │ │
2641│ to │ all │ all │ all except │ all │ all except │ plain │ all │ all │
2642│ SRC │ except │ │ !-prefixed │ │ RE:- and │ │ │ │
2643│ │ ! │ │ │ │ !-prefixed │ │ │ │
2644└─────────────┴───────────────┴──────────────┴───────────────────┴───────────────────┴─────────────────────┴──────────────┴────────────┴────────────┘
2645
2646 The hg rows and columns of the table describes compatibility to hg's
2647 glob syntax rather than its default regular-expression syntax. When
2648 writing to an hg repository from any other kind, reposurgeon prepends
2649 to the output .hgignore a "syntax: glob" line.
2650
2652 After converting a CVS, SVN, or BitKeeper repository, check for and
2653 remove $-cookies in the head revision(s) of the files. The full
2654 Subversion set is $Date:, $Revision:, $Author:, $HeadURL and $Id:. CVS
2655 uses $Author:, $Date:, $Header:, $Id:, $Log:, $Revision:, also (rarely)
2656 $Locker:, $Name:, $RCSfile:, $Source:, and $State:.
2657
2658 When you need to specify a commit, use the action-stamp format that
2659 references lift generates when it can resolve an SVN or CVS reference
2660 in a comment. It is best that you not vary from this format, even in
2661 trivial ways like omitting the 'Z' or changing the 'T' or '!' or ':'.
2662 Making action stamps uniform and machine-parseable will have good
2663 consequences for future repository-browsing tools.
2664
2665 Sometimes, in converting a repository, you may need to insert an
2666 explanatory comment - for example, if metadata has been garbled or
2667 missing and you need to point to that fact. It's helpful for
2668 repository-browsing tools if there is a uniform syntax for this that is
2669 highly unlikely to show up in repository comments. We recommend
2670 enclosing translation notes in [[ ]]. This has the advantage of being
2671 visually similar to the [ ] traditionally used for editorial comments
2672 in text.
2673
2674 It is good practice to include, in the comment for the root commit of
2675 the repository, a note dating and attributing the conversion work and
2676 explaining these conventions. Example:
2677
2678 [[This repository was converted from Subversion to git on 2011-10-24 by
2679 Eric S. Raymond <esr@thyrsus.com>. Here and elsewhere, conversion notes
2680 are enclosed in double square brackets. Junk commits generated by
2681 cvs2svn have been removed, commit references have been mapped into a
2682 uniform VCS-independent syntax, and some comments edited into
2683 summary-plus-continuation form.]]
2684
2685 It is also good practice to include a generated tag at the point of
2686 conversion. E.g
2687
2688 msgin --create <<EOF
2689 Tag-Name: git-conversion
2690
2691 Marks the spot at which this repository was converted from Subversion to git.
2692 EOF
2693
2695 define lastchange {
2696 @max(=B & [/ChangeLog/] & /{0}/B)? list
2697 }
2698
2699 List the last commit that refers to a ChangeLog file containing a
2700 specified string. (The trick here is that ? extends the singleton set
2701 consisting of the last eligible ChangeLog blob to its set of referring
2702 commits, and listonly notices the commits.)
2703
2705 The event-stream parser in “reposurgeon” supports some extended syntax.
2706 Exporters designed to work with “reposurgeon” may have a --reposurgeon
2707 option that enables emission of extended syntax; notably, this is true
2708 of cvs-fast-export(1). The remainder of this section describes these
2709 syntax extensions. The properties they set are (usually) preserved and
2710 re-output when the stream file is written.
2711
2712 The token “#reposurgeon” at the start of a comment line in a
2713 fast-import stream signals reposurgeon that the remainder is an
2714 extension command to be interpreted by “reposurgeon”.
2715
2716 One such extension command is implemented: #sourcetype, which behaves
2717 identically to the reposurgeon sourcetype command. An exporter for a
2718 version-control system named “frobozz” could, for example, say
2719
2720 #reposurgeon sourcetype frobozz
2721
2722 Within a commit, a magic comment of the form “#legacy-id” declares a
2723 legacy ID from the stream file's source version-control system.
2724
2725 Also accepted is the bzr syntax for setting per-commit properties.
2726 While parsing commit syntax, a line beginning with the token “property”
2727 must contibue with a whitespace-separated property-name token. If it is
2728 then followed by a newline it is taken to set that boolean-valued
2729 property to true. Otherwise it must be followed by a numeric token
2730 specifying a data length, a space, following data (which may contain
2731 newlines) and a terminating newline. For example:
2732
2733 commit refs/heads/master
2734 mark :1
2735 committer Eric S. Raymond <esr@thyrsus.com> 1289147634 -0500
2736 data 16
2737 Example commit.
2738
2739 property legacy-id 2 r1
2740 M 644 inline README
2741
2742 Unlike other extensions, bzr properties are only preserved on stream
2743 output if the preferred type is bzr, because any importer other than
2744 bzr's will choke on them.
2745
2747 In versions before 3.23, “prefer” changed the repository type as well
2748 as the preferred output format.
2749
2750 In versions before 3.0, the general command syntax put the command verb
2751 first, then the selection set (if any) then modifiers (VSO). It has
2752 changed to optional selection set first, then command verb, then
2753 modifiers (SVO). The change made parsing simpler, allowed abolishing
2754 some noise keywords, and recapitulates a successful design pattern in
2755 some other Unix tools - notably sed(1).
2756
2757 In versions before 3.0, path expressions only matched commits, not
2758 commits and the associated blobs as well. The names of the “a” and “c”
2759 flags were different.
2760
2761 In reposurgeon versions before 3.0, the delete command had the
2762 semantics of squash; also, the policy flags did not require a “--”
2763 prefix. The “--delete” flag was named “obliterate”.
2764
2765 In reposurgeon versions before 3.0, read and write optionally took file
2766 arguments rather than requiring redirects (and the write command never
2767 wrote into directories). This was changed in order to allow these
2768 commands to have modifiers. These modifiers replaced several global
2769 options that no longer exist.
2770
2771 In reposurgeon versions before 3.0, the earliest factor in a unite
2772 command always kept its tag and branch names unaltered. The new rule
2773 for resolving name conflicts, giving priority to the latest factor,
2774 produces more natural behavior when uniting two repositories end to
2775 end; the master branch of the second (later) one keeps its name.
2776
2777 In reposurgeon versions before 3.0, the tagify command expected
2778 policies as trailing arguments to alter its behaviour. The new syntax
2779 uses similarly named options with leading dashes, that can appear
2780 anywhere after the tagify command
2781
2782 In versions before 2.9. the syntax of "authors", "legacy", "list", and
2783 what are now "msg{in|out}" was different (and "legacy" was "fossils").
2784 They took plain filename arguments rather that using redirect < and >.
2785
2786 In versions before 4.0, msgin and msgout were named mailbox_in and
2787 mailbox_out.
2788
2790 Guarantee: In DVCses that use commit hashes, editing with reposurgeon
2791 never changes the hash of a commit object unless (a) you edit the
2792 commit, or (b) it is a descendant of an edited commit in a VCS that
2793 includes parent hashes in the input of a child object's hash (git and
2794 hg both do this).
2795
2796 Guarantee: reposurgeon only requires main memory proportional to the
2797 size of a repository's metadata history, not its entire content
2798 history. (Exception: the data from inline content is held in memory.)
2799
2800 Guarantee: In the worst case, reposurgeon makes its own copy of every
2801 content blob in the repository's history and thus uses intermediate
2802 disk space approximately equal to the size of a repository's content
2803 history. However, when the repository to be edited is presented as a
2804 stream file, reposurgeon requires no or only very little extra disk
2805 space to represent it; the internal representation of content blobs is
2806 a (seek-offset, length) pair pointing into the stream file.
2807
2808 Guarantee: reposurgeon never modifies the contents of a repository it
2809 reads, nor deletes any repository. The results of surgery are always
2810 expressed in a new repository.
2811
2812 Guarantee: Any line in a fast-import stream that is not a part of a
2813 command reposurgeon parses and understands will be passed through
2814 unaltered. At present the set of potential passthroughs is known to
2815 include the progress, the options, and checkpoint commands as well as
2816 comments led by #.
2817
2818 Guarantee: All reposurgeon operations either preserve all repository
2819 state they are not explicitly told to modify or warn you when they
2820 cannot do so.
2821
2822 Guarantee: reposurgeon handles the bzr commit-properties extension,
2823 correctly passing through property items including those with embedded
2824 newlines. (Such properties are also editable in the message-box
2825 format.)
2826
2827 Limitation: Because reposurgeon relies on other programs to generate
2828 and interpret the fast-import command stream, it is subject to bugs in
2829 those programs.
2830
2831 Limitation: bzr suffers from deep confusion over whether its unit of
2832 work is a repository or a floating branch that might have been cloned
2833 from a repo or created from scratch, and might or might not be destined
2834 to be merged to a repo one day. Its exporter only works on branches,
2835 but its importer creates repos. Thus, a rebuild operation will produce
2836 a subdirectory structure that differs from what you expect. Look for
2837 your content under the subdirectory 'trunk'.
2838
2839 Limitation: under git, signed tags are imported verbatim. However, any
2840 operation that modifies any commit upstream of the target of the tag
2841 will invalidate it.
2842
2843 Limitation: Stock git (at least as of version 1.7.3.2) will choke on
2844 property extension commands. Accordingly, reposurgeon omits them when
2845 rebuilding a repo with git type.
2846
2847 Limitation: Converting an hg repo that uses bookmarks (not branches) to
2848 git can lose information; the branch ref that git assigns to each
2849 commit may not be the same as the hg bookmark that was active when the
2850 commit was originally made under hg. Unfortunately, this is a real
2851 ontological mismatch, not a problem that can be fixed by cleverness in
2852 reposurgeon.
2853
2854 Limitation: Converting an hg repo that uses branches to git can lose
2855 information because git does not store an explicit branch as part of
2856 commit metadata, but colors commits with branch or tag names on the fly
2857 using a specific coloring algorithm, which might not match the explicit
2858 branch assignments to commits in the original hg repo. Reposurgeon
2859 preserves the hg branch information when reading an hg repo, so it is
2860 available from within reposurgeon itself, but there is no way to
2861 preserve it if the repo is written to git.
2862
2863 Limitation: While the Subversion read-side support is in good shape,
2864 the write-side support is more of a sketch or proof-of-concept than a
2865 robust implementation; it only works on very simple cases and does not
2866 round-trip. It may improve in future releases.
2867
2868 Limitation: Not all BitKeeper versions have the fast-import and
2869 fast-export commands that reposurgeon requires. They are present back
2870 to the 7.3 opensource version.
2871
2872 Limitation: reposurgeon may misbehave under a filesystem which smashes
2873 case in filenames, or which nominally preserves case but maps names
2874 differing only by case to the same filesystem node (Mac OS X behaves
2875 like this by default). Problems will arise if any two paths in a repo
2876 differ by case only. To avoid the problem on a Mac, do all your surgery
2877 on an HFS+ file system formatted with case sensitivity specifically
2878 enabled.
2879
2880 Limitation: If whitespace followed by # appears in a string or regexp
2881 command argument, it will be misinterpreted as the beginning of a
2882 line-ending comment and screw up parsing.
2883
2884 Guarantee: As version-control systems add support for the fast-import
2885 format, their repositories will become editable by reposurgeon.
2886
2887 Limitations edescribed above are unlikely to change. Do "help bugs" at
2888 the reposurgeon prompt to see up-to-date information on reposurgeon
2889 bugs and internal problems that are expected to be fixed in some future
2890 release.
2891
2893 reposurgeon relies on importers and exporters associated with the VCSes
2894 it supports.
2895
2896 git
2897 Core git supports both export and import.
2898
2899 bzr
2900 Requires bzr plus the bzr-fast-import plugin.
2901
2902 hg
2903 Requires core hg, the hg-fastimport plugin, and the third-party
2904 hg-fast-export.py script.
2905
2906 svn
2907 Stock Subversion commands support export and import.
2908
2909 darcs
2910 Stock darcs commands support export and import.
2911
2912 CVS
2913 Requires cvs-fast-export. Note that the quality of CVS lifts may be
2914 poor, with individual lifts requiring serious hand-hacking. This is
2915 due to inherent problems with CVS's file-oriented model.
2916
2917 RCS
2918 Requires cvs-fast-export (yes, that's not a typo; cvs-fast-export
2919 handles RCS collections as well). The caveat for CVS applies.
2920
2922 It is expected that reposurgeon will be extended with more deletion
2923 policies. Policy authors may need to know more about how a commit's
2924 file operation sequence is reduced to normal form after operations from
2925 deleted commits are prepended to it.
2926
2927 Recall that each commit has a list of file operations, each a M
2928 (modify), D (delete), R (rename), C (copy), or 'deleteall' (delete all
2929 files). Only M operations have associated blobs. Normally there is only
2930 one M operation per individual file in a commit's operation list.
2931
2932 To understand how the reduction process works, it's enough to
2933 understand the case where all the operation in the list are working on
2934 the same file. Sublists of operations referring to different files
2935 don't affect each other and reducing them can be thought of as separate
2936 operations. Also, a "deleteall" acts as a D for everything and cancels
2937 all operations before it in the list.
2938
2939 The reduction process walks through the list from the beginning looking
2940 for adjacent pairs of operations it can compose. The following table
2941 describes all possible cases and all but one of the reductions.
2942
2943 ┌──────────────────────────┬────────────────────────────┐
2944 │ M + D → D │ │
2945 │ │ If a file is │
2946 │ │ modified then │
2947 │ │ deleted, the result │
2948 │ │ is as though it had │
2949 │ │ been deleted. If │
2950 │ │ the M was the only │
2951 │ │ modify for the │
2952 │ │ file, it's removed │
2953 │ │ too. │
2954 ├──────────────────────────┼────────────────────────────┤
2955 │M a + R a b → R a b + M b │ │
2956 │ │ The purpose of this │
2957 │ │ transformation is │
2958 │ │ to push renames │
2959 │ │ toward the │
2960 │ │ beginning of the │
2961 │ │ list, where they │
2962 │ │ may become adjacent │
2963 │ │ to another R or C │
2964 │ │ they can be │
2965 │ │ composed with. If │
2966 │ │ the M is the only │
2967 │ │ modify operation │
2968 │ │ for this file, the │
2969 │ │ rename is dropped. │
2970 ├──────────────────────────┼────────────────────────────┤
2971 │ M a + C a b │ │
2972 │ │ No reduction. │
2973 ├──────────────────────────┼────────────────────────────┤
2974 │ M b + R a b → nothing │ │
2975 │ │ Should be │
2976 │ │ impossible, and may │
2977 │ │ indicate repository │
2978 │ │ corruption. │
2979 ├──────────────────────────┼────────────────────────────┤
2980 │ M b + C a b → nothing │ │
2981 │ │ The copy undoes the │
2982 │ │ modification. │
2983 ├──────────────────────────┼────────────────────────────┤
2984 │ D + M → M │ │
2985 │ │ If a file is │
2986 │ │ deleted and │
2987 │ │ modified, the │
2988 │ │ result is as though │
2989 │ │ the deletion had │
2990 │ │ not taken place │
2991 │ │ (because M │
2992 │ │ operations store │
2993 │ │ entire files, not │
2994 │ │ deltas). │
2995 ├──────────────────────────┼────────────────────────────┤
2996 │ D + {D|R|C} │ │
2997 │ │ These cases should │
2998 │ │ be impossible and │
2999 │ │ would suggest the │
3000 │ │ repository has been │
3001 │ │ corrupted. │
3002 ├──────────────────────────┼────────────────────────────┤
3003 │ R a b + D a │ │
3004 │ │ Should never │
3005 │ │ happen, and is │
3006 │ │ another case that │
3007 │ │ would suggest │
3008 │ │ repository │
3009 │ │ corruption. │
3010 ├──────────────────────────┼────────────────────────────┤
3011 │ R a b + D b → D a │ │
3012 │ │ The delete removes │
3013 │ │ the just-renamed │
3014 │ │ file. │
3015 ├──────────────────────────┼────────────────────────────┤
3016 │ {R|C} + M │ │
3017 │ │ No reduction. │
3018 ├──────────────────────────┼────────────────────────────┤
3019 │ R a b + R b c → R a c │ │
3020 │ │ The b terms have to │
3021 │ │ match for these │
3022 │ │ operations to have │
3023 │ │ made sense when │
3024 │ │ they lived in │
3025 │ │ separate commits; │
3026 │ │ if they don't, it │
3027 │ │ indicates │
3028 │ │ repository │
3029 │ │ corruption. │
3030 ├──────────────────────────┼────────────────────────────┤
3031 │ R a b + C b c │ │
3032 │ │ No reduction. │
3033 ├──────────────────────────┼────────────────────────────┤
3034 │ C a b + D a → R a b │ │
3035 │ │ Copy followed by │
3036 │ │ delete of the │
3037 │ │ source is a rename. │
3038 ├──────────────────────────┼────────────────────────────┤
3039 │ C a b + D b → nothing │ │
3040 │ │ This delete undoes │
3041 │ │ the copy. │
3042 ├──────────────────────────┼────────────────────────────┤
3043 │ C a b + R a c │ │
3044 │ │ No reduction. │
3045 ├──────────────────────────┼────────────────────────────┤
3046 │ C a b + R b c → C a c │ │
3047 │ │ Copy followed by a │
3048 │ │ rename of the │
3049 │ │ target reduces to │
3050 │ │ single copy │
3051 ├──────────────────────────┼────────────────────────────┤
3052 │ C + C │ │
3053 │ │ No reduction. │
3054 └──────────────────────────┴────────────────────────────┘
3055
3057 This section will become relevant only if reposurgeon or something
3058 underneath it in the software and hardware stack crashes while in the
3059 middle of writing out a repository, in particular if the target
3060 directory of the rebuild is your current directory.
3061
3062 The tool has two conflicting objectives. On the one hand, we never want
3063 to risk clobbering a pre-existing repo. On the other hand, we want to
3064 be able to run this tool in a directory with a repo and modify it in
3065 place.
3066
3067 We resolve this dilemma by playing a game of three-directory monte.
3068
3069 1. First, we build the repo in a freshly-created staging directory. If
3070 your target directory is named /path/to/foo, the staging directory
3071 will be a peer named /path/to/foo-stageNNNN, where NNNN is a cookie
3072 derived from reposurgeon's process ID.
3073
3074 2. We then make an empty backup directory. This directory will be
3075 named /path/to/foo.~N~, where N is incremented so as not to
3076 conflict with any existing backup directories. reposurgeon never,
3077 under any circumstances, ever deletes a backup directory.
3078
3079 So far, all operations are safe; the worst that can happen up to
3080 this point if the process gets interrupted is that the staging and
3081 backup directories get left behind.
3082
3083 3. The critical region begins. We first move everything in the target
3084 directory to the backup directory.
3085
3086 4. Then we move everything in the staging directory to the target.
3087
3088 5. We finish off by restoring untracked files in the target directory
3089 from the backup directory. That ends the critical region.
3090
3091 During the critical region, all signals that can be ignored are
3092 ignored.
3093
3095 Returns 1 on fatal error, 0 otherwise. In batch mode all errors are
3096 fatal.
3097
3099 bzr(1), cvs(1), darcs(1), git(1), hg(1), rcs(1), svn(1). bk(1).
3100
3102 Eric S. Raymond <esr@thyrsus.com>; project page at
3103 http://www.catb.org/~esr/reposurgeon.
3104
3106 1. DVCS Migration HOWTO
3107 http://www.catb.org/esr/dvcs-migration-guide.html
3108
3109 2. Python's
3110 http://docs.python.org/2/library/re.html
3111
3112
3113
3114reposurgeon 01/30/2020 REPOSURGEON(1)