1REPOCUTTER(1)                                                    REPOCUTTER(1)
2
3
4

NAME

6       repocutter - surgical and filtering operations on Subversion dump files
7

SYNOPSIS

9       repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10

DESCRIPTION

12       This program does surgical and filtering operations on Subversion dump
13       files. While it is is not as flexible as reposurgeon(1), it can perform
14       Subversion-specific transformations that reposurgeon cannot, and can be
15       useful for processing Subversion repositories into a form suitable for
16       conversion. Also, it supports the version 3 dumpfile format, which
17       reposurgeon does not.
18
19       In most commands, the -r (or --range) option limits the selection of
20       revisions over which an operation will be performed. Usually other
21       revisions will be passed through unaltered, except in the select and
22       deselect commands for which the option controls which revisions will be
23       passed through. A selection consists of one or more comma-separated
24       ranges. A range may consist of an integer revision number or the
25       special name HEAD for the head revision. Or it may be a colon-separated
26       pair of integers, or an integer followed by a colon followed by HEAD.
27
28       (Older versions of this tool, before 4.30, treated -r as an implied
29       selection filter rather than passing through unselected revisions
30       unaltered. If you have old scripts using repocutter they may need
31       modification.)
32
33       Normally, each subcommand produces a progress spinner on standard
34       error; each turn means another revision has been filtered. The -q (or
35       --quiet) option suppresses this.
36
37       The -d option enables debug messages on standard error. It takes an
38       integer debug level. These messages are probably only of interest to
39       repocutter developers.
40
41       The -i option sets the input source to a specified filename. This is
42       primarily useful when running the program under a debugger. When this
43       option is not present the program expects to read a stream from
44       standard input.
45
46       Generally, if you need to use this program at all, you will find that
47       you need to pipe your dump file through multiple instances of it doing
48       one kind of operation each. This is not as expensive as it sounds; with
49       the exception of the reduce subcommand, the working set of this program
50       is bounded by the size of the the largest single blob plus its
51       metadata. It does not need to hold the entire repo metadata in memory.
52
53       The -f/-fixed option disables regexp compilation of PATTERN arguments,
54       treating them as literal strings.
55
56       The -t option sets a tag to be included in error message. This will be
57       useful for determining which stage of a multistage repocutter pipeline
58       failed.
59
60       There are a few other command-specific options described under
61       individual commands.
62
63       In the command descriptions, PATTERN arguments are regular expressions
64       to match pathnames, constrained so that each match must be a path
65       segment or a sequence of path segments; that is, the left end must be
66       either at the start of path or immediately following a /, and the right
67       end must precede a / or be at end of string. With a leading ^ the match
68       is constrained to be a leading sequence of the pathname; with a
69       trailing $, a trailing one.
70
71       The following subcommands are available:
72
73       select
74           The 'select' subcommand selects a range and permits only revisions
75           and nodes in that range to pass to standard output. A range
76           beginning with 0 includes the dumpfile header. Mergeinfo properties
77           in all revisions are updated so they no longer refer to omitted
78           revisions.
79
80       deselect
81           The 'deselect' subcommand selects a range and permits only
82           revisions and nodes NOT in that range to pass to standard output.
83           Any mergeinfo properties in other revisions are updated so they no
84           longer refer to dropped revisiomns.
85
86       see
87           Render a very condensed report on the repository node structure,
88           mainly useful for examining strange and pathological repositories.
89           File content is ignored. You get one line per repository operation,
90           reporting the revision, operation type, file path, and the copy
91           source (if any). Directory paths are distinguished by a trailing
92           slash. The 'copy' operation is really an 'add' with a directory
93           source and target; the display name is changed to make them easier
94           to see. This report can be restricted by a selection set.
95
96       renumber
97           Renumber all revisions, patching Node-copyfrom headers as required.
98           Any selection option is ignored. Takes no arguments. The -b option
99           can be used to set the base to renumber from, defaulting to 0.
100
101       log
102           Generate a log report, same format as the output of svn log on a
103           repository, to standard output.
104
105       setlog
106           Replace the log entries in the input dumpfile with the
107           corresponding entries in the LOGFILE, which should be in the format
108           of an svn log output. Replacements may be restricted to a specified
109           range.
110
111       propdel
112           Delete the property PROPNAME. May be restricted by a revision
113           selection. You may specify multiple properties to be deleted.
114
115       proprename
116           Rename the property OLDNAME to NEWNAME. May be restricted by a
117           revision selection. You may specify multiple properties to be
118           renamed.
119
120       propset
121           Set the property PROPNAME to PROPVAL.
122
123           May be restricted by a revision selection. Note that specifying
124           only a revision will cause the property  to be seet on the revision
125           properties and on all nodes in the rtevision; you’ll probably want
126           to specify a node index.
127
128           You may specify multiple property settings.
129
130       ppropclean
131           Every path with a suffix matching one of SUFFIXES gets a property
132           turned off. The default property is svn::Another property may be
133           set with the -p option.
134
135       expunge
136           Delete all operations with Node-path or Node-copyfrom-path headers
137           matching specified Golang regular expressions (opposite of 'sift').
138           Any revision left with no Node records after this filtering has its
139           Revision record dropped as well. Mergeinfo properties in all
140           revisions are updated so they no longer refer to dropped revisions.
141
142       sift
143           Delete all operations with either Node-path or Node-copyfrom-path
144           headers not matching specified Golang regular expressions (opposite
145           of 'expunge'). Any revision left with no Node records after this
146           filtering has its Revision record removed as well. Mergeinfo
147           properties in all revisions are updated so they no longer refer to
148           dropped revisions.
149
150           This transform can be restricted by a selection set.
151
152       closure
153           The 'closure' subcommand computes the transitive closure of a path
154           set under the relation 'copies from' - that is, with the smallest
155           set of additional paths such that every copy-from source is in the
156           set.
157
158       pathlist
159           List all distinct node-paths in the stream, once each, in the order
160           first encountered.
161
162       pathrename
163           Modify Node-path headers, Node-copyfrom-path headers, and
164           svn::expression FROM; replace with TO. TO may contain Golang-style
165           backreferences (${1}, ${2} etc - curly brackets not optional) to
166           parenthesized portions of FROM.
167
168           Matches are constrained so that each match must be a path segment
169           or a sequence of path segments; that is, the left end must be
170           either at the start of path or immediately following a /, and the
171           right end must precede a / or be at end of string. With a leading ^
172           the match is constrained to be a leading sequence of the pathname;
173           with a trailing $, a trailing one.
174
175           Multiple FROM/TO pairs may be specified and are applied in order.
176           This transform can be restricted by a selection set.
177
178           All mergeinfo properties are updated in accordance with the path
179           renames,
180
181       setcopyfrom
182           In the specified revisions, replace the Node-path with the
183           specified PATH. Does not alter mergeinfo properties as a side
184           effect.
185
186       setcopyfrom
187           In the specified revisions, replace the Node-copyfrom-path with the
188           specified PATH. Does not alter mergeinfo properties as a side
189           effect. Terminates with error if any selected node is not a copy.
190
191       pop
192           Pop initial segment off each path matching PATTERN - by default,
193           all paths.
194
195           May be useful after a sift command to turn a dump from a subproject
196           stripped from a dump for a multiple-project repository into the
197           normal form with trunk/tags/branches at the top level.
198
199           This transform cannot be restricted by a selection set, as it is
200           not possible to guarantee that copyfro paths and mergeinfo
201           properties will be modified consistently in the presence of that
202           kind of restriction.
203
204           Mergeinfo properties in all revisions are updated, as well as path
205           and copyfrom parts.
206
207       push
208           Push an initial segment onto each matching path. Normally used to
209           add a "trunk" prefix to every path in a flat repository. The -s
210           option can be used rton set a different initial segment.
211
212           This transform cannot be restricted by a selection set, as it is
213           not possible to guarantee that copyfro paths and mergeinfo
214           properties will be modified consistently in the presence of that
215           kind of restriction.
216
217           Mergeinfo properties in all revisions are updated toi refer to the
218           new pathnames.
219
220       filecopy
221           For each node in the revision range, stash the current version of
222           the node-path’s content. For each later file copy operation with
223           that source, replace the file copy with an explicit add/change
224           using the stashed content.
225
226           With the -f flag and a BASENAME argument, require the source
227           basename to be as specified. Otherwise, with -f and no BASENAME,
228           require a match of source to targwt on basename only rather than
229           the full path. This may be required in order to extract filecopies
230           from branches.
231
232           Restricting the range holds down the memory requirement of this
233           tool, which in the worst (and default) 1:$ case will keep a copy of
234           evert blob in the repository until it’s done processing the stream.
235
236       skipcopy
237           Replace the source revision and path of a copy at the upper end of
238           the selection with the source revisions and path of a copy at the
239           lower end. Fails unless both revisions are copies. Used to remove
240           an unwanted intermediate copy or copies.
241
242       swap
243           Swap the top two elements of each pathname in every revision in the
244           selection set. Useful following a sift operation for straightening
245           out a common form of multi-project repository. If a PATTERN
246           argument is given, only paths matching it are swapped.
247
248       swapsvn
249           Like swap, but is aware of Subversion structure. Used for
250           transforming multiproject repositories into a standard layout with
251           trunk, tags, and branches at the top level.
252
253           Fires when the second component of a matching path is "trunk",
254           "branches", or "tags", or the path consists of a single segment
255           that is a top-level project directory; passes through all paths for
256           this is not so unaltered.
257
258           Top-level project directories with properties or comments make this
259           command die (return status 1) with an error message on stderr;
260           otherwise these directories are silently discarded.
261
262           Otherwise, swaps "trunk" and the top-level (project) directory
263           straight up. For tags and branches, the following two components
264           are swapped to the top. thus, "foo/branches/release23" becomes
265           "branches/release23/foo", putting the project directory beneath the
266           branch.
267
268           Also fires when an entire project directory is copied; this is
269           transformed into a copy of trunk and copies of each subbranch and
270           tag that exists.
271
272           After the swap, there are attempts to recognize spans of copies
273           into branch directories, and copies into tag subdirectories that
274           are parallel in all top-level (project) directories. These are
275           coalesced into single copies in the inverted structure. No attempts
276           is made to coalesce deletes; the user must manually trim unneeded
277           branches.
278
279           Accordingly, copies with three-segment sources and three-segment
280           targets are transformed; for tags/ and branches/ paths the last
281           segment (the subdirectory below the branch name) is dropped,
282           Following copies are skipped.
283
284           This has two minor negative consequences. One is that metadata
285           belonging to all deletes or copies after the first one in a
286           coalesced span is lost. The other is that branches and tags local
287           to individual project directories are promoted to global branches
288           and tags across the entire transformed repository; no content is
289           lost this way.
290
291           Parallel rename sequences are also coalesced.
292
293           If a PATTERN argument is given, only paths matching the pattern are
294           swapped.
295
296           Note that the result of swapping does not have initial
297           trunk/branches/tags directory creations and can thus not be fed
298           directly to svnload. reposurgeon copes with this, but Subversion
299           will not.
300
301           Merfeinfo propertied are updated to use the swapped path names.
302
303           This transform can be restricted by a selection set.
304
305       replace
306           Perform a regular expression search/replace on blob content. The
307           first character of the argument (normally /) is treated as the end
308           delimiter for the regular-expression and replacement parts. This
309           transform can be restricted by a selection set.
310
311       strip
312           Replace content with unique generated cookies on all node paths
313           matching the specified regular expressions; if no expressions are
314           given, match all paths.
315
316           This command is useful for reducing the bulk of a stream without
317           touching its metdata, so you can doio test conversions more
318           quickly.
319
320       obscure
321           Replace path segments and committer IDs with arbitrary but
322           consistent names in order to obscure them. The replacement
323           algorithm is tuned to make the replacements readily distinguishable
324           by eyeball. This transform can be restricted by a selection set.
325
326       reduce
327           Strip revisions out of a dump so the only parts left those likely
328           to be relevant to a conversion problem. This is done by dropping
329           every node that consists of a change on a file and has no property
330           settings. Mergeinfo properties in all revisions are updated so they
331           no longer refer to dropped revisions.
332
333       testify
334           Replace commit timestamps with a monotonically increasing clock
335           tick starting at the Unix epoch and advancing by 10 seconds per
336           commit. Replace all attributions with 'fred'. Discard the
337           repository UUID. Use this to neutralize procedurally-generated
338           streams so they can be compared. This transform can be restricted
339           by a selection set.
340
341       version
342           Report major and minor repocutter version.
343

HISTORY

345       Under the name "svncutter", an ancestor of this program traveled in the
346       'contrib/' director of the Subversion distribution. It had functional
347       overlap with reposurgeon(1) because it was directly ancestral to that
348       code. It was moved to the reposurgeon(1) distribution in January 2016.
349       This program was ported from Python to Go in August 2018, at which time
350       the obsolete "squash" command was retired. The syntax of regular
351       expressions in the pathrename command changed at that time.
352
353       The reason for the partial functional overlap between repocutter and
354       reposurgeon is that repocutter was first written earlier and became a
355       testbed for some of the design concepts in reposurgeon. After
356       reposurgeon was written, the author learned that it could not naturally
357       support some useful operations very specific to Subversion, and
358       enhanced repocutter to do those.
359

BUGS

361       There is one regression since the Python version: repocutter no longer
362       recognizes Macintosh-style line endings consisting of a carriage return
363       only. This may be addressed in a future version.
364

SEE ALSO

366       reposurgeon(1).
367

EXAMPLE

369       Suppose you have a Subversion repository with the following
370       semi-pathological structure:
371
372           Directory1/ (with unrelated content)
373           Directory2/ (with unrelated content)
374           TheDirIWantToMigrate/
375                           branches/
376                                          crazy-feature/
377                                                          UnrelatedApp1/
378                                                          TheAppIWantToMigrate/
379                           tags/
380                                          v1.001/
381                                                          UnrelatedApp1/
382                                                          UnrelatedApp2/
383                                                          TheAppIWantToMigrate/
384                           trunk/
385                                          UnrelatedApp1/
386                                          UnrelatedApp2/
387                                          TheAppIWantToMigrate/
388
389       You want to transform the dump file so that TheAppIWantToMigrate can be
390       subject to a regular branchy lift. A way to dissect out the code of
391       interest would be with the following series of filters applied:
392
393           repocutter expunge '^Directory1' '^Directory2'
394           repocutter pathrename '^TheDirIWantToMigrate/' ''
395           repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
396           repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
397           repocutter expunge '^tags/v1.001/UnrelatedApp1/'
398           repocutter expunge '^tags/v1.001/UnrelatedApp2/'
399           repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
400           repocutter expunge '^trunk/UnrelatedApp1/'
401           repocutter expunge '^trunk/UnrelatedApp2/'
402           repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
403

LIMITATIONS

405       The sift and expunge operations can produce output dumps that are
406       invalid. The problem is copyfrom operations (Subversion branch and tag
407       creations). If an included revision includes a copyfrom reference to an
408       excluded one, the reference target won’t be in the emitted dump; it
409       won’t load correctly in Subversion, and while reposurgeon has fallback
410       logic that backs down to the latest existing revision before the
411       kissing one this expedient is fragile. The revision number in a
412       copyfrom header pointing to a missing revision will be zero. Attempts
413       to be clever about this won’t work; the problem is inherent in the data
414       model of Subversion.
415

AUTHOR

417       Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with
418       reposurgeon; see the project page
419       <http://www.catb.org/~esr/reposurgeon>.
420
421
422
423                                  2022-04-21                     REPOCUTTER(1)
Impressum