1REPOCUTTER(1)                                                    REPOCUTTER(1)
2
3
4

NAME

6       repocutter - surgical and filtering operations on Subversion dump files
7

SYNOPSIS

9       repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10

DESCRIPTION

12       This program does surgical and filtering operations on Subversion dump
13       files. While it is is not as flexible as reposurgeon(1), it can perform
14       Subversion-specific transformations that reposurgeon cannot, and can be
15       useful for processing Subversion repositories into a form suitable for
16       conversion. Also, it supports the version 3 dumpfile format, which
17       reposurgeon does not.
18
19       In all commands, the -r (or --range) option limits the selection of
20       revisions over which an operation will be performed. Usually other
21       revisions will be passed through unaltered, except in the select and
22       deselect commands for which the option controls whicjh revisions will
23       be passed through. A selection consists of one or more comma-separated
24       ranges. A range may consist of an integer revision number or the
25       special name HEAD for the head revision. Or it may be a colon-separated
26       pair of integers, or an integer followed by a colon followed by HEAD.
27
28       (Older versions of this tool, before 4.30, treated -r as an implied
29       selection filter rather than passing through unselected revisions
30       unaltered. If you have old scripts using repocutter they may need
31       nodification.)
32
33       Normally, each subcommand produces a progress spinner on standard
34       error; each turn means another revision has been filtered. The -q (or
35       --quiet) option suppresses this.
36
37       The -d option enables debug messages on standard error. It takes an
38       integer debug level. These messages are probably only of interest to
39       repocutter developers.
40
41       The -i option sets the input source to a specified filename. This is
42       primarily useful when running the program under a debugger. When this
43       option is not present the program expects to read a stream from
44       standard input.
45
46       Generally, if you need to use this program at all, you will find that
47       you need to pipe your dump file through multiple instances of it doing
48       one kind of operation each. This is not as expensive as it sounds; with
49       the exception of the reduce subcommand, the working set of this program
50       is bounded by the size of the the largest single blob plus its
51       metadata. It does not need to hold the entire repo metadata in memory.
52
53       The -t option sets a tag to be included in error message. This will be
54       useful for determining which stage of a multistage repocutter pipeline
55       failed.
56
57       The following subcommands are available:
58
59       select
60           The 'select' subcommand selects a range and permits only revisions
61           in that range to pass to standard output. A range beginning with 0
62           includes the dumpfile header.
63
64       deselect
65           The 'deselect' subcommand selects a range and permits only
66           revisions NOT in that range to pass to standard output.
67
68       see
69           Render a very condensed report on the repository node structure,
70           mainly useful for examining strange and pathological repositories.
71           File content is ignored. You get one line per repository operation,
72           reporting the revision, operation type, file path, and the copy
73           source (if any). Directory paths are distinguished by a trailing
74           slash. The 'copy' operation is really an 'add' with a directory
75           source and target; the display name is changed to make them easier
76           to see. This report can be restricted by a selection set.
77
78       renumber
79           Renumber all revisions, patching Node-copyfrom headers as required.
80           Any selection option is ignored. Takes no arguments. The -b option
81           can be used to set the base to renumber from, defaulting to 0.
82
83       log
84           Generate a log report, same format as the output of svn log on a
85           repository, to standard output.
86
87       setlog
88           Replace the log entries in the input dumpfile with the
89           corresponding entries in the LOGFILE, which should be in the format
90           of an svn log output. Replacements may be restricted to a specified
91           range.
92
93       propdel
94           Delete the property PROPNAME. May be restricted by a revision
95           selection. You may specify multiple properties to be deleted.
96
97       proprename
98           Rename the property OLDNAME to NEWNAME. May be restricted by a
99           revision selection. You may specify multiple properties to be
100           renamed.
101
102       propset
103           Set the property PROPNAME to PROPVAL. May be restricted by a
104           revision selection. You may specify multiple property settings.
105
106       expunge
107           Delete all operations with Node-path headers matching specified
108           Golang regular expressions (opposite of 'sift'). Any revision left
109           with no Node records after this filtering has its Revision record
110           is removed as well.
111
112       sift
113           Delete all operations with Node-path headers not matching specified
114           Golang regular expressions (opposite of 'expunge'). Any revision
115           left with no Node records after this filtering has its Revision
116           record removed as well. This transform can be restricted by a
117           selection set.
118
119       closure
120           The 'closure' subcommand computes the transitive closure of a path
121           set under thw relation 'copies from' - that is, with the smallest
122           set of additional paths such that every copy-from source is in the
123           set.
124
125       pathrename
126           Modify Node-path headers, Node-copyfrom-path headers, and
127           svn::expression FROM; replace with TO. TO may contain Golang-style
128           backreferences (${1}, ${2} etc - curly brackets not optional) to
129           parenthesized portions of FROM. Multiple FROM/TO pairs may be
130           specified and are applied in order. This ttansform can be
131           restricted by a selection set.
132
133       pop
134           Pop initial segment off each path. May be useful after a sift
135           command to turn a dump from a subproject stripped from a dump for a
136           multiple-project repository into the normal form with
137           trunk/tags/branches at the top level. This transform can be
138           restricted by a selection set.
139
140       split
141           Transform every stream operation with Node-path PATH in the path
142           list into three operations on PATH/trunk. PATH/branches, and
143           PATH/tags. This operation assumes if the operation is a copy  that
144           structure exists under the source directory and also mutates
145           Node-copyfrom headers accordingly. This transform can be restricted
146           by a selection set.
147
148       swap
149           Swap the top two elements of each pathname in every revision in the
150           selection set. Useful following a sift operation for straightening
151           out a common form of multi-project repository. If a PATTERN
152           argument is given, only paths matching the pattern are swapped.
153           This transform can be restricted by a selection set.
154
155       swapsvn
156           Like swap, but is aware of Subversion structure. Used for
157           transforming multiproject repositories intoo a standard layout with
158           trunk, tags, and branches at the top level.
159
160           Requires that the second component of each matching path be
161           "trunk", "branches", or "tags", terminates with error if this is
162           not so. Swaps "trunk" and the top-level (project) directory
163           straight up. For tags and  branches, the following two components
164           are swapped to the top. thus, "foo/branches/release23" becomes
165           "branches/release23/foo", putting the project directory beneath the
166           branch.
167
168           After the swap, more attempts to recognize spans of deletes, copies
169           into branch directories, and copies into tag subdirectories that
170           are parallel in all top-level (project) directories. These are
171           coalesced into single deketes or copies in the inverted structure.
172
173           Accordingly, deletes and copies with three-segment sources and
174           three-segment targets are  transformed; for tags/ and branches/
175           paths the last segment (the subdirectory below the branch name)  is
176           dropped, while for trunk/ paths the last two segments are dropped
177           leaving only trunk/. Following duplicate deletes and copies are
178           skipped.
179
180           This has two minor negative consequences. One is that metadata
181           belonging to all deletes or copies afrter the first one in a
182           coalesced span is lost. The other is that branches and tags local
183           to individual project directories are promoted to global branches
184           and tags across the entire transformed repository; no content is
185           lost this way.
186
187           Parallel rename sequences are also coalesced.
188
189           If a PATTERN argument is given, only paths matching the pattern are
190           swapped.
191
192           This transform can be restricted by a selection set.
193
194       replace
195           Perform a regular expression search/replace on blob content. The
196           first character of the argument (normally /) is treated as the end
197           delimiter for the regular-expression and replacement parts. This
198           transform can be restricted by a selection set.
199
200       strip
201           Replace content with unique generated cookies on all node paths
202           matching the specified regular expressions; if no expressions are
203           given, match all paths. Useful when you need to examine a
204           particularly complex node structure. This transform can be
205           restricted by a selection set.
206
207       obscure
208           Replace path segments and committer IDs with arbitrary but
209           consistent names in order to obscure them. The replacement
210           algorithm is tuned to make the replacements readily distinguishable
211           by eyeball. This ttansform can be restricted by a selection set.
212
213       reduce
214           Strip revisions out of a dump so the only parts left those likely
215           to be relevant to a conversion problem. A revision is interesting
216           if it either (a) contains any operation that is not a plain file
217           modification - any directory operation, or any add, or any delete,
218           or any copy, or any operation on properties - or (b) it is
219           referenced by a later copy operation. Any commit that is neither
220           interesting nor has interesting neighbors is dropped.
221
222           Because the 'interesting' status of a commit is not known for sure
223           until all future commits have been checked for copy operations,
224           this command requires an input file. It cannot operate on standard
225           input. The reduced dump is emitted to standard output.
226
227       testify
228           Replace commit timestamps with a monotonically increasing clock
229           tick starting at the Unix epoch and advancing by 10 seconds per
230           commit. Replace all attributions with 'fred'. Discard the
231           repository UUID. Use this to neutralize procedurally-generated
232           streams so they can be compared. This transform can be restricted
233           by a selection set.
234
235       version
236           Report major and minor repocutter version.
237

HISTORY

239       Under the name "svncutter", an ancestor of this program traveled in the
240       'contrib/' director of the Subversion distribution. It had functional
241       overlap with reposurgeon(1) because it was directly ancestral to that
242       code. It was moved to the reposurgeon(1) distribution in January 2016.
243       This program was ported from Python to Go in August 2018, at which time
244       the obsolete "squash" command was retired. The syntax of regular
245       expressions in the pathrename command changed at that time.
246
247       The reason for the partial functional overlap between repocutter and
248       reposurgeon is that repocutter was first written earlier and became a
249       testbed for some of the design concepts in reposurgeon. After
250       reposurgeon was written, the author learned that it could not naturally
251       support some useful operations very specific to Subversion, and
252       enhanced repocutter to do those.
253

BUGS

255       There is one regression since the Python version: repocutter no longer
256       recognizes Macintosh-style line endings consisting of a carriage return
257       only. This may be addressed in a future version.
258

SEE ALSO

260       reposurgeon(1).
261

EXAMPLE

263       Suppose you have a Subversion repository with the following
264       semi-pathological structure:
265
266           Directory1/ (with unrelated content)
267           Directory2/ (with unrelated content)
268           TheDirIWantToMigrate/
269                           branches/
270                                          crazy-feature/
271                                                          UnrelatedApp1/
272                                                          TheAppIWantToMigrate/
273                           tags/
274                                          v1.001/
275                                                          UnrelatedApp1/
276                                                          UnrelatedApp2/
277                                                          TheAppIWantToMigrate/
278                           trunk/
279                                          UnrelatedApp1/
280                                          UnrelatedApp2/
281                                          TheAppIWantToMigrate/
282
283       You want to transform the dump file so that TheAppIWantToMigrate can be
284       subject to a regular branchy lift. A way to dissect out the code of
285       interest would be with the following series of filters applied:
286
287           repocutter expunge '^Directory1' '^Directory2'
288           repocutter pathrename '^TheDirIWantToMigrate/' ''
289           repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
290           repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
291           repocutter expunge '^tags/v1.001/UnrelatedApp1/'
292           repocutter expunge '^tags/v1.001/UnrelatedApp2/'
293           repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
294           repocutter expunge '^trunk/UnrelatedApp1/'
295           repocutter expunge '^trunk/UnrelatedApp2/'
296           repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
297

LIMITATIONS

299       The sift and expunge operations can produce output dumps that are
300       invalid. The problem is copyfrom operations (Subversion branch and tag
301       creations). If an included revision includes a copyfrom reference to an
302       excluded one, the reference target won’t be in the emitted dump; it
303       won’t load correctly in Subversion, and while reposurgeon has fallback
304       logic that backs down to the latest existing revisioon before the
305       kissing one this expedient is fragile. The revision number in a
306       copyfrom header pointing to a missing revision will be zero. Attempts
307       to be clever about this won’t work; the problem is inherent in the data
308       model of Subversion.
309

AUTHOR

311       Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with
312       reposurgeon; see the project page
313       <http://www.catb.org/~esr/reposurgeon>.
314
315
316
317                                  2021-10-08                     REPOCUTTER(1)
Impressum