1REPOCUTTER(1)                                                    REPOCUTTER(1)
2
3
4

NAME

6       repocutter - surgical and filtering operations on Subversion dump files
7

SYNOPSIS

9       repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10

DESCRIPTION

12       This program does surgical and filtering operations on Subversion dump
13       files. While it is is not as flexible as reposurgeon(1), it can perform
14       Subversion-specific transformations that reposurgeon cannot, and can be
15       useful for processing Subversion repositories into a form suitable for
16       conversion. Also, it supports the version 3 dumpfile format, which
17       reposurgeon does not.
18
19       In most commands, the -r (or --range) option limits the selection of
20       revisions over which an operation will be performed. Usually other
21       revisions will be passed through unaltered, except in the select and
22       deselect commands for which the option controls which revisions will be
23       passed through. A selection consists of one or more comma-separated
24       ranges. A range may consist of an integer revision number or the
25       special name HEAD for the head revision. Or it may be a colon-separated
26       pair of integers, or an integer followed by a colon followed by HEAD.
27
28       If the output stream contains copyfrom references to missing revisions,
29       repocutter silently patch each copysources by stepping it backwards to
30       the most recent previous version that exists.
31
32       (Older versions of this tool, before 4.30, treated -r as an implied
33       selection filter rather than passing through unselected revisions
34       unaltered. If you have old scripts using repocutter they may need
35       modification.)
36
37       Normally, each subcommand produces a progress spinner on standard
38       error; each turn means another revision has been filtered. The -q (or
39       --quiet) option suppresses this. Quiet mode is set when output is
40       redirected to a file or pipe.
41
42       The -d option enables debug messages on standard error. It takes an
43       integer debug level. These messages are probably only of interest to
44       repocutter developers.
45
46       The -i option sets the input source to a specified filename. This is
47       primarily useful when running the program under a debugger. When this
48       option is not present the program expects to read a stream from
49       standard input.
50
51       Generally, if you need to use this program at all, you will find that
52       you need to pipe your dump file through multiple instances of it doing
53       one kind of operation each. This is not as expensive as it sounds; with
54       the exception of the reduce subcommand, the working set of this program
55       is bounded by the size of the the largest single blob plus its
56       metadata. It does not need to hold the entire repo metadata in memory.
57
58       The -f/-fixed option disables regexp compilation of PATTERN arguments,
59       treating them as literal strings.
60
61       The -t option sets a tag to be included in error and warning messages.
62       This will be useful for determining which stage of a multistage
63       repocutter pipeline failed.
64
65       There are a few other command-specific options described under
66       individual commands.
67
68       In the command descriptions, PATTERN arguments are regular expressions
69       to match pathnames, constrained so that each match must be a path
70       segment or a sequence of path segments; that is, the left end must be
71       either at the start of path or immediately following a /, and the right
72       end must precede a / or be at end of string. With a leading ^ the match
73       is constrained to be a leading sequence of the pathname; with a
74       trailing $, a trailing one.
75
76       The following subcommands are available:
77
78       select
79           The 'select' subcommand selects a range and permits only revisions
80           and nodes in that range to pass to standard output. A range
81           beginning with 0 includes the dumpfile header. Mergeinfo properties
82           in all revisions are updated so they no longer refer to omitted
83           revisions.
84
85           Warning::valid dump that can be read by reposurgeon. In particular,
86           it may delete a revision that is referenced in a later copy-from
87           operation, which will crash reposurgeon.
88
89       deselect
90           The 'deselect' subcommand selects a range and permits only
91           revisions and nodes NOT in that range to pass to standard output.
92           Any mergeinfo properties in other revisions are updated so they no
93           longer refer to dropped revisions.
94
95           Warning::valid dump that can be read by reposurgeon. In particular,
96           it may delete a revision that is referenced in a later copy-from
97           operation, which will crash reposurgeon.
98
99       see
100           Render a very condensed report on the repository node structure,
101           mainly useful for examining strange and pathological repositories.
102           File content is ignored. You get one line per repository operation,
103           reporting the revision, operation type, file path, and the copy
104           source (if any). Directory paths are distinguished by a trailing
105           slash. The 'copy' operation is really an 'add' with a directory
106           source and target; the display name is changed to make them easier
107           to see. This report can be restricted by a selection set.
108
109       renumber
110           Renumber all revisions, patching Node-copyfrom headers as required.
111           Any selection option is ignored. Takes no arguments. The -b option
112           can be used to set the base to renumber from, defaulting to 0.
113
114       count
115           The 'count' subcommand lists the last revision number in the input
116           stream. This is normally the revision count, buut may not if the
117           stream has omitted revisions.
118
119       log
120           Generate a log report, same format as the output of svn log on a
121           repository, to standard output.
122
123       setlog
124           Replace the log entries in the input dumpfile with the
125           corresponding entries in the LOGFILE, which should be in the format
126           of an svn log output. Replacements may be restricted to a specified
127           range.
128
129       propdel
130           Delete the property PROPNAME. May be restricted by a revision
131           selection. You may specify multiple properties to be deleted.
132
133       proprename
134           Rename the property OLDNAME to NEWNAME. May be restricted by a
135           revision selection. You may specify multiple properties to be
136           renamed.
137
138       propset
139           Set the property PROPNAME to PROPVAL.
140
141           May be restricted by a revision selection. Note that specifying
142           only a revision will cause the property  to be seet on the revision
143           properties and on all nodes in the rtevision; you’ll probably want
144           to specify a node index.
145
146           You may specify multiple property settings.
147
148       propclean
149           Every path with a suffix matching one of SUFFIXES gets a property
150           turned off. The default property is svn::Another property may be
151           set with the -p option.
152
153       expunge
154           Delete all operations with Node-path or Node-copyfrom-path headers
155           matching specified Golang regular expressions (opposite of 'sift').
156           Any revision left with no Node records after this filtering has its
157           Revision record dropped as well. Mergeinfo properties in all
158           revisions are updated so they no longer refer to dropped revisions.
159
160           Warning::valid dump that can be read by reposurgeon. In particular,
161           it may delete a revision that is referenced in a later copy-from
162           operation, which will crash reposurgeon.
163
164       sift
165           Delete all operations with either Node-path or Node-copyfrom-path
166           headers not matching specified Golang regular expressions (opposite
167           of 'expunge'). Any revision left with no Node records after this
168           filtering has its Revision record removed as well. Mergeinfo
169           properties in all revisions are updated so they no longer refer to
170           dropped revisions.
171
172           This transform can be restricted by a selection set.
173
174           Warning::valid dump that can be read by reposurgeon. In particular,
175           it may delete a revision that is referenced in a later copy-from
176           operation, which will crash reposurgeon.
177
178       closure
179           The 'closure' subcommand computes the transitive closure of a path
180           set under the relation 'copies from' - that is, with the smallest
181           set of additional paths such that every copy-from source is in the
182           set.
183
184       pathlist
185           List all distinct node-paths in the stream, once each, in the order
186           first encountered.
187
188       pathrename
189           Modify Node-path headers, Node-copyfrom-path headers, and
190           svn::expression FROM; replace with TO. TO may contain Golang-style
191           backreferences (${1}, ${2} etc - curly brackets not optional) to
192           parenthesized portions of FROM.
193
194           Matches are constrained so that each match must be a path segment
195           or a sequence of path segments; that is, the left end must be
196           either at the start of path or immediately following a /, and the
197           right end must precede a / or be at end of string. With a leading ^
198           the match is constrained to be a leading sequence of the pathname;
199           with a trailing $, a trailing one.
200
201           Multiple FROM/TO pairs may be specified and are applied in order.
202           This transform can be restricted by a selection set.
203
204           All mergeinfo properties are updated in accordance with the path
205           renames,
206
207       setpath
208           In the specified revisions, replace the Node-path with the
209           specified PATH. Does not alter mergeinfo properties as a side
210           effect.
211
212       setcopyfrom
213           In the specified revisions, replace the Node-copyfrom-path with the
214           specified PATH. Does not alter mergeinfo properties as a side
215           effect. Terminates with error if any selected node is not a copy.
216
217       pop
218           Pop initial segment off each path matching PATTERN - by default,
219           all paths.
220
221           May be useful after a sift command to turn a dump from a subproject
222           stripped from a dump for a multiple-project repository into the
223           normal form with trunk/tags/branches at the top level.
224
225           This transform cannot be restricted by a selection set, as it is
226           not possible to guarantee that copyfro paths and mergeinfo
227           properties will be modified consistently in the presence of that
228           kind of restriction.
229
230           Mergeinfo properties in all revisions are updated, as well as path
231           and copyfrom parts.
232
233       push
234           Push an initial segment onto each matching path. Normally used to
235           add a "trunk" prefix to every path in a flat repository. The -s
236           option can be used rton set a different initial segment.
237
238           This transform cannot be restricted by a selection set, as it is
239           not possible to guarantee that copyfro paths and mergeinfo
240           properties will be modified consistently in the presence of that
241           kind of restriction.
242
243           Mergeinfo properties in all revisions are updated toi refer to the
244           new pathnames.
245
246       filecopy
247           For each node in the revision range, stash the current version of
248           the node-path’s content. For each later file copy operation with
249           that source, replace the file copy with an explicit add/change
250           using the stashed content.
251
252           You can use this operation to sever links from obsolete branches or
253           non-conformable directories in a multiproject repository so the
254           unwanted content can be expunged without changing the content of
255           later revisions.
256
257           If a PATTERN argument is provided, only replace copies with an
258           explicit add/change when the source node path matches PATTERN.
259
260           With the -n flag, only the basename is required to match PATTERN if
261           it is provided. Otherwise, with -n and no PATTERN, require a match
262           of source to target on basename only rather than the full path.
263           This may be required in order to extract filecopies from branches.
264
265           Restricting the range holds down the memory requirement of this
266           tool, which in the worst (and default) 1:$ case will keep a copy of
267           every blob in the repository until it’s done processing the stream.
268
269       skipcopy
270           Replace the source revision and path of a copy at the upper end of
271           the selection with the source revisions and path of a copy at the
272           lower end. Fails unless both revisions are copies. Used to remove
273           an unwanted intermediate copy or copies, cleaning up the history.
274
275       swap
276           Swap the top two elements of each pathname in every revision in the
277           selection set. Useful following a sift operation for straightening
278           out a common form of multi-project repository. If a PATTERN
279           argument is given, only paths matching it are swapped.
280
281       swapsvn
282           Like swap, but is aware of Subversion structure. Used for
283           transforming multiproject repositories into a standard layout with
284           trunk, tags, and branches at the top level.
285
286           Fires when the second component of a matching path is "trunk",
287           "branches", or "tags", or the path consists of a single segment
288           that is a top-level project directory; passes through all paths for
289           this is not so unaltered.
290
291           Top-level project directories with properties or comments make this
292           command die (return status 1) with an error message on stderr;
293           otherwise these directories are silently discarded.
294
295           Otherwise, swaps "trunk" and the top-level (project) directory
296           straight up. For tags and branches, the following two components
297           are swapped to the top. thus, "foo/branches/release23" becomes
298           "branches/release23/foo", putting the project directory beneath the
299           branch.
300
301           Also fires when an entire project directory is copied; this is
302           transformed into a copy of trunk and copies of each subbranch and
303           tag that exists.
304
305           After the swap, there are attempts to recognize spans of copies
306           into branch directories, and copies into tag subdirectories that
307           are parallel in all top-level (project) directories. These are
308           coalesced into single copies in the inverted structure. No attempts
309           is made to coalesce deletes; the user must manually trim unneeded
310           branches.
311
312           Accordingly, copies with three-segment sources and three-segment
313           targets are transformed; for tags/ and branches/ paths the last
314           segment (the subdirectory below the branch name) is dropped,
315           Following copies are skipped.
316
317           This has two minor negative consequences. One is that metadata
318           belonging to all deletes or copies after the first one in a
319           coalesced span is lost. The other is that branches and tags local
320           to individual project directories are promoted to global branches
321           and tags across the entire transformed repository; no content is
322           lost this way.
323
324           Parallel rename sequences are also coalesced.
325
326           If a PATTERN argument is given, only paths matching the pattern are
327           swapped.
328
329           Note that the result of swapping does not have initial
330           trunk/branches/tags directory creations and can thus not be fed
331           directly to svnload. reposurgeon copes with this, but Subversion
332           will not.
333
334           Merfeinfo propertied are updated to use the swapped path names.
335
336           This transform can be restricted by a selection set.
337
338       swapcheck
339           List directory prefixes of anomalous paths that would confuse
340           swapsvn. This includes any single-segment path other than
341           trunk/tags/branches or a project copy operation, any path with two
342           or more segments in which the second is not trunk/tags/branches,
343           and any path in which trunk/tags/branches occurs more than one
344           segment down from the root.
345
346           Each report line has two fields; the first is the earliest revision
347           containing a path with the prefix given, and the second is the
348           prefix. Once a particular path prefix has been recognized and
349           reported as anomalous, later paths with that prefix are not
350           reported.
351
352           If feeding a Subversion dump to this subcommand doesn’t produce an
353           empty report, you can expect swapsvn to produce an invalid dump
354           that will confuse and possibly crash reposurgeon. The remedy for
355           this is a set of pathrenames and/or deselections that yields paths
356           conformable to being swapped into a regular Subversion structure.
357
358       replace
359           Perform a regular expression search/replace on blob content. The
360           first character of the argument (normally /) is treated as the end
361           delimiter for the regular-expression and replacement parts. This
362           transform can be restricted by a selection set.
363
364       strip
365           Replace content with unique generated cookies on all node paths
366           matching the specified regular expressions; if no expressions are
367           given, match all paths.
368
369           This command is useful for reducing the bulk of a stream without
370           touching its metadata, so you can do test conversions more quickly.
371
372       hash
373           Replace content with hash on all node paths matching the specified
374           regular expressions; if no expressions are given, match all paths.
375
376       obscure
377           Replace path segments and committer IDs with arbitrary but
378           consistent names in order to obscure them. The replacement
379           algorithm is tuned to make the replacements readily distinguishable
380           by eyeball. This transform can be restricted by a selection set.
381
382       reduce
383           Strip revisions out of a dump so the only parts left those likely
384           to be relevant to a conversion problem. This is done by dropping
385           every node that consists of a change on a file and has no property
386           settings. Mergeinfo properties in all revisions are updated so they
387           no longer refer to dropped revisions.
388
389       testify
390           Replace commit timestamps with a monotonically increasing clock
391           tick starting at the Unix epoch and advancing by 10 seconds per
392           commit. Replace all attributions with 'fred'. Discard the
393           repository UUID. Use this to neutralize procedurally-generated
394           streams so they can be compared. This transform can be restricted
395           by a selection set.
396
397       count
398           Set the debug level to the specified value on the selected
399           revisions. Setting debugging enables diagnostics to standard error,
400           and suppresses the progress baton for the entire run in order not
401           to step on any diagnostics that might be emitted.
402
403           For the meaning of the debug levels, see the source code. This
404           option is probably only of interest to repocutter developers.
405
406       version
407           Report major and minor repocutter version.
408

HISTORY

410       Under the name "svncutter", an ancestor of this program traveled in the
411       'contrib/' director of the Subversion distribution. It had functional
412       overlap with reposurgeon(1) because it was directly ancestral to that
413       code. It was moved to the reposurgeon(1) distribution in January 2016.
414       This program was ported from Python to Go in August 2018, at which time
415       the obsolete "squash" command was retired. The syntax of regular
416       expressions in the pathrename command changed at that time.
417
418       The reason for the partial functional overlap between repocutter and
419       reposurgeon is that repocutter was first written earlier and became a
420       testbed for some of the design concepts in reposurgeon. After
421       reposurgeon was written, the author learned that it could not naturally
422       support some useful operations very specific to Subversion, and
423       enhanced repocutter to do those.
424

RETURN VALUES

426       Normally 0. Can be 1 if repocutter sees an ill-formed dump, or if the
427       output stream contains any copyfrom references to missing revisions.
428

BUGS

430       There is one regression since the Python version: repocutter no longer
431       recognizes Macintosh-style line endings consisting of a carriage return
432       only. This may be addressed in a future version.
433

SEE ALSO

435       reposurgeon(1).
436

EXAMPLE

438       Suppose you have a Subversion repository with the following
439       semi-pathological structure:
440
441           Directory1/ (with unrelated content)
442           Directory2/ (with unrelated content)
443           TheDirIWantToMigrate/
444                           branches/
445                                          crazy-feature/
446                                                          UnrelatedApp1/
447                                                          TheAppIWantToMigrate/
448                           tags/
449                                          v1.001/
450                                                          UnrelatedApp1/
451                                                          UnrelatedApp2/
452                                                          TheAppIWantToMigrate/
453                           trunk/
454                                          UnrelatedApp1/
455                                          UnrelatedApp2/
456                                          TheAppIWantToMigrate/
457
458       You want to transform the dump file so that TheAppIWantToMigrate can be
459       subject to a regular branchy lift. A way to dissect out the code of
460       interest would be with the following series of filters applied:
461
462           repocutter expunge '^Directory1' '^Directory2'
463           repocutter pathrename '^TheDirIWantToMigrate/' ''
464           repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
465           repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
466           repocutter expunge '^tags/v1.001/UnrelatedApp1/'
467           repocutter expunge '^tags/v1.001/UnrelatedApp2/'
468           repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
469           repocutter expunge '^trunk/UnrelatedApp1/'
470           repocutter expunge '^trunk/UnrelatedApp2/'
471           repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
472

LIMITATIONS

474       The sift and expunge operations can produce output dumps that are
475       invalid. The problem is copyfrom operations (Subversion branch and tag
476       creations). If an included revision includes a copyfrom reference to an
477       excluded one, the reference target won’t be in the emitted dump; it
478       won’t load correctly in Subversion, and while reposurgeon has fallback
479       logic that backs down to the latest existing revision before the
480       kissing one this expedient is fragile. The revision number in a
481       copyfrom header pointing to a missing revision will be zero. Attempts
482       to be clever about this won’t work; the problem is inherent in the data
483       model of Subversion.
484

AUTHOR

486       Eric S. Raymond esr@thyrsus.com. This tool is distributed with
487       reposurgeon; see the project page
488       <http://www.catb.org/~esr/reposurgeon>.
489
490
491
492                                  2023-02-28                     REPOCUTTER(1)
Impressum