1REPOCUTTER(1) REPOCUTTER(1)
2
3
4
6 repocutter - surgical and filtering operations on Subversion dump files
7
9 repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10
12 This program does surgical and filtering operations on Subversion dump
13 files. While it is is not as flexible as reposurgeon(1), it can perform
14 Subversion-specific transformations that reposurgeon cannot, and can be
15 useful for processing Subversion repositories into a form suitable for
16 conversion. Also, it supports the version 3 dumpfile format, which
17 reposurgeon does not.
18
19 In all commands, the -r (or --range) option limits the selection of
20 revisions over which an operation will be performed. Usually other
21 revisions will be passed through unaltered, except in the select and
22 deselect commands for which the option controls whicjh revisions will
23 be passed through. A selection consists of one or more comma-separated
24 ranges. A range may consist of an integer revision number or the
25 special name HEAD for the head revision. Or it may be a colon-separated
26 pair of integers, or an integer followed by a colon followed by HEAD.
27
28 (Older versions of this tool, before 4.30, treated -r as an implied
29 selection filter rather than passing through unselected revisions
30 unaltered. If you have old scripts using repocutter they may need
31 nodification.)
32
33 Normally, each subcommand produces a progress spinner on standard
34 error; each turn means another revision has been filtered. The -q (or
35 --quiet) option suppresses this.
36
37 The -d option enables debug messages on standard error. It takes an
38 integer debug level. These messages are probably only of interest to
39 repocutter developers.
40
41 The -i option sets the input source to a specified filename. This is
42 primarily useful when running the program under a debugger. When this
43 option is not present the program expects to read a stream from
44 standard input.
45
46 Generally, if you need to use this program at all, you will find that
47 you need to pipe your dump file through multiple instances of it doing
48 one kind of operation each. This is not as expensive as it sounds; with
49 the exception of the reduce subcommand, the working set of this program
50 is bounded by the size of the the largest single blob plus its
51 metadata. It does not need to hold the entire repo metadata in memory.
52
53 The -t option sets a tag to be included in error message. This will be
54 useful for determining which stage of a multistage repocutter pipeline
55 failed.
56
57 The following subcommands are available:
58
59 select
60 The 'select' subcommand selects a range and permits only revisions
61 in that range to pass to standard output. A range beginning with 0
62 includes the dumpfile header.
63
64 deselect
65 The 'deselect' subcommand selects a range and permits only
66 revisions NOT in that range to pass to standard output.
67
68 see
69 Render a very condensed report on the repository node structure,
70 mainly useful for examining strange and pathological repositories.
71 File content is ignored. You get one line per repository operation,
72 reporting the revision, operation type, file path, and the copy
73 source (if any). Directory paths are distinguished by a trailing
74 slash. The 'copy' operation is really an 'add' with a directory
75 source and target; the display name is changed to make them easier
76 to see. This report can be restricted by a selection set.
77
78 renumber
79 Renumber all revisions, patching Node-copyfrom headers as required.
80 Any selection option is ignored. Takes no arguments. The -b option
81 can be used to set the base to renumber from, defaulting to 0.
82
83 log
84 Generate a log report, same format as the output of svn log on a
85 repository, to standard output.
86
87 setlog
88 Replace the log entries in the input dumpfile with the
89 corresponding entries in the LOGFILE, which should be in the format
90 of an svn log output. Replacements may be restricted to a specified
91 range.
92
93 propdel
94 Delete the property PROPNAME. May be restricted by a revision
95 selection. You may specify multiple properties to be deleted.
96
97 proprename
98 Rename the property OLDNAME to NEWNAME. May be restricted by a
99 revision selection. You may specify multiple properties to be
100 renamed.
101
102 propset
103 Set the property PROPNAME to PROPVAL. May be restricted by a
104 revision selection. You may specify multiple property settings.
105
106 expunge
107 Delete all operations with Node-path headers matching specified
108 Golang regular expressions (opposite of 'sift'). Any revision left
109 with no Node records after this filtering has its Revision record
110 is removed as well.
111
112 sift
113 Delete all operations with Node-path headers not matching specified
114 Golang regular expressions (opposite of 'expunge'). Any revision
115 left with no Node records after this filtering has its Revision
116 record removed as well. This transform can be restricted by a
117 selection set.
118
119 closure
120 The 'closure' subcommand computes the transitive closure of a path
121 set under thw relation 'copies from' - that is, with the smallest
122 set of additional paths such that every copy-from source is in the
123 set.
124
125 pathrename
126 Modify Node-path headers, Node-copyfrom-path headers, and
127 svn::expression FROM; replace with TO. TO may contain Golang-style
128 backreferences (${1}, ${2} etc - curly brackets not optional) to
129 parenthesized portions of FROM. Multiple FROM/TO pairs may be
130 specified and are applied in order. This ttansform can be
131 restricted by a selection set.
132
133 pop
134 Pop initial segment off each path. May be useful after a sift
135 command to turn a dump from a subproject stripped from a dump for a
136 multiple-project repository into the normal form with
137 trunk/tags/branches at the top level. This transform can be
138 restricted by a selection set.
139
140 split
141 Transform every stream operation with Node-path PATH in the path
142 list into three operations on PATH/trunk. PATH/branches, and
143 PATH/tags. This operation assumes if the operation is a copy that
144 structure exists under the source directory and also mutates
145 Node-copyfrom headers accordingly. This transform can be restricted
146 by a selection set.
147
148 swap
149 Swap the top two elements of each pathname in every revision in the
150 selection set. Useful following a sift operation for straightening
151 out a common form of multi-project repository. If a PATTERN
152 argument is given, only paths matching the pattern are swapped.
153 This transform can be restricted by a selection set.
154
155 swapsvn
156 Like swap, but is aware of Subversion structure. Used for
157 transforming multiproject repositories intoo a standard layout with
158 trunk, tags, and branches at the top level.
159
160 Requires that the second component of each matching path be
161 "trunk", "branches", or "tags", terminates with error if this is
162 not so. Swaps "trunk" and the top-level (project) directory
163 straight up. For tags and branches, the following two components
164 are swapped to the top. thus, "foo/branches/release23" becomes
165 "branches/release23/foo", putting the project directory beneath the
166 branch.
167
168 After the swap, more attempts to recognize spans of deletes, copies
169 into branch directories, and copies into tag subdirectories that
170 are parallel in all top-level (project) directories. These are
171 coalesced into single deketes or copies in the inverted structure.
172
173 Accordingly, deletes and copies with three-segment sources and
174 three-segment targets are transformed; for tags/ and branches/
175 paths the last segment (the subdirectory below the branch name) is
176 dropped, while for trunk/ paths the last two segments are dropped
177 leaving only trunk/. Following duplicate deletes and copies are
178 skipped.
179
180 This has two minor negative consequences. One is that metadata
181 belonging to all deletes or copies afrter the first one in a
182 coalesced span is lost. The other is that branches and tags local
183 to individual project directories are promoted to global branches
184 and tags across the entire transformed repository; no content is
185 lost this way.
186
187 Parallel rename sequences are also coalesced.
188
189 If a PATTERN argument is given, only paths matching the pattern are
190 swapped.
191
192 This transform can be restricted by a selection set.
193
194 replace
195 Perform a regular expression search/replace on blob content. The
196 first character of the argument (normally /) is treated as the end
197 delimiter for the regular-expression and replacement parts. This
198 transform can be restricted by a selection set.
199
200 strip
201 Replace content with unique generated cookies on all node paths
202 matching the specified regular expressions; if no expressions are
203 given, match all paths. Useful when you need to examine a
204 particularly complex node structure. This transform can be
205 restricted by a selection set.
206
207 obscure
208 Replace path segments and committer IDs with arbitrary but
209 consistent names in order to obscure them. The replacement
210 algorithm is tuned to make the replacements readily distinguishable
211 by eyeball. This ttansform can be restricted by a selection set.
212
213 reduce
214 Strip revisions out of a dump so the only parts left those likely
215 to be relevant to a conversion problem. A revision is interesting
216 if it either (a) contains any operation that is not a plain file
217 modification - any directory operation, or any add, or any delete,
218 or any copy, or any operation on properties - or (b) it is
219 referenced by a later copy operation. Any commit that is neither
220 interesting nor has interesting neighbors is dropped.
221
222 Because the 'interesting' status of a commit is not known for sure
223 until all future commits have been checked for copy operations,
224 this command requires an input file. It cannot operate on standard
225 input. The reduced dump is emitted to standard output.
226
227 testify
228 Replace commit timestamps with a monotonically increasing clock
229 tick starting at the Unix epoch and advancing by 10 seconds per
230 commit. Replace all attributions with 'fred'. Discard the
231 repository UUID. Use this to neutralize procedurally-generated
232 streams so they can be compared. This transform can be restricted
233 by a selection set.
234
235 version
236 Report major and minor repocutter version.
237
239 Under the name "svncutter", an ancestor of this program traveled in the
240 'contrib/' director of the Subversion distribution. It had functional
241 overlap with reposurgeon(1) because it was directly ancestral to that
242 code. It was moved to the reposurgeon(1) distribution in January 2016.
243 This program was ported from Python to Go in August 2018, at which time
244 the obsolete "squash" command was retired. The syntax of regular
245 expressions in the pathrename command changed at that time.
246
247 The reason for the partial functional overlap between repocutter and
248 reposurgeon is that repocutter was first written earlier and became a
249 testbed for some of the design concepts in reposurgeon. After
250 reposurgeon was written, the author learned that it could not naturally
251 support some useful operations very specific to Subversion, and
252 enhanced repocutter to do those.
253
255 There is one regression since the Python version: repocutter no longer
256 recognizes Macintosh-style line endings consisting of a carriage return
257 only. This may be addressed in a future version.
258
260 reposurgeon(1).
261
263 Suppose you have a Subversion repository with the following
264 semi-pathological structure:
265
266 Directory1/ (with unrelated content)
267 Directory2/ (with unrelated content)
268 TheDirIWantToMigrate/
269 branches/
270 crazy-feature/
271 UnrelatedApp1/
272 TheAppIWantToMigrate/
273 tags/
274 v1.001/
275 UnrelatedApp1/
276 UnrelatedApp2/
277 TheAppIWantToMigrate/
278 trunk/
279 UnrelatedApp1/
280 UnrelatedApp2/
281 TheAppIWantToMigrate/
282
283 You want to transform the dump file so that TheAppIWantToMigrate can be
284 subject to a regular branchy lift. A way to dissect out the code of
285 interest would be with the following series of filters applied:
286
287 repocutter expunge '^Directory1' '^Directory2'
288 repocutter pathrename '^TheDirIWantToMigrate/' ''
289 repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
290 repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
291 repocutter expunge '^tags/v1.001/UnrelatedApp1/'
292 repocutter expunge '^tags/v1.001/UnrelatedApp2/'
293 repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
294 repocutter expunge '^trunk/UnrelatedApp1/'
295 repocutter expunge '^trunk/UnrelatedApp2/'
296 repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
297
299 The sift and expunge operations can produce output dumps that are
300 invalid. The problem is copyfrom operations (Subversion branch and tag
301 creations). If an included revision includes a copyfrom reference to an
302 excluded one, the reference target won’t be in the emitted dump; it
303 won’t load correctly in Subversion, and while reposurgeon has fallback
304 logic that backs down to the latest existing revisioon before the
305 kissing one this expedient is fragile. The revision number in a
306 copyfrom header pointing to a missing revision will be zero. Attempts
307 to be clever about this won’t work; the problem is inherent in the data
308 model of Subversion.
309
311 Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with
312 reposurgeon; see the project page
313 <http://www.catb.org/~esr/reposurgeon>.
314
315
316
317 2021-10-08 REPOCUTTER(1)