1REPOCUTTER(1) REPOCUTTER(1)
2
3
4
6 repocutter - surgical and filtering operations on Subversion dump files
7
9 repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10
12 This program does surgical and filtering operations on Subversion dump
13 files. While it is is not as flexible as reposurgeon(1), it can perform
14 Subversion-specific transformations that reposurgeon cannot, and can be
15 useful for processing Subversion repositories into a form suitable for
16 conversion. Also, it supports the version 3 dumpfile format, which
17 reposurgeon does not.
18
19 In most commands, the -r (or --range) option limits the selection of
20 revisions over which an operation will be performed. Usually other
21 revisions will be passed through unaltered, except in the select and
22 deselect commands for which the option controls which revisions will be
23 passed through. A selection consists of one or more comma-separated
24 ranges. A range may consist of an integer revision number or the
25 special name HEAD for the head revision. Or it may be a colon-separated
26 pair of integers, or an integer followed by a colon followed by HEAD.
27
28 (Older versions of this tool, before 4.30, treated -r as an implied
29 selection filter rather than passing through unselected revisions
30 unaltered. If you have old scripts using repocutter they may need
31 modification.)
32
33 Normally, each subcommand produces a progress spinner on standard
34 error; each turn means another revision has been filtered. The -q (or
35 --quiet) option suppresses this.
36
37 The -d option enables debug messages on standard error. It takes an
38 integer debug level. These messages are probably only of interest to
39 repocutter developers.
40
41 The -i option sets the input source to a specified filename. This is
42 primarily useful when running the program under a debugger. When this
43 option is not present the program expects to read a stream from
44 standard input.
45
46 Generally, if you need to use this program at all, you will find that
47 you need to pipe your dump file through multiple instances of it doing
48 one kind of operation each. This is not as expensive as it sounds; with
49 the exception of the reduce subcommand, the working set of this program
50 is bounded by the size of the the largest single blob plus its
51 metadata. It does not need to hold the entire repo metadata in memory.
52
53 The -f/-fixed option disables regexp compilation of PATTERN arguments,
54 treating them as literal strings.
55
56 The -t option sets a tag to be included in error message. This will be
57 useful for determining which stage of a multistage repocutter pipeline
58 failed.
59
60 There are a few other command-specific options described under
61 individual commands.
62
63 In the command descriptions, PATTERN arguments are regular expressions
64 to match pathnames, constrained so that each match must be a path
65 segment or a sequence of path segments; that is, the left end must be
66 either at the start of path or immediately following a /, and the right
67 end must precede a / or be at end of string. With a leading ^ the match
68 is constrained to be a leading sequence of the pathname; with a
69 trailing $, a trailing one.
70
71 The following subcommands are available:
72
73 select
74 The 'select' subcommand selects a range and permits only revisions
75 and nodes in that range to pass to standard output. A range
76 beginning with 0 includes the dumpfile header. Mergeinfo properties
77 in all revisions are updated so they no longer refer to omitted
78 revisions.
79
80 deselect
81 The 'deselect' subcommand selects a range and permits only
82 revisions and nodes NOT in that range to pass to standard output.
83 Any mergeinfo properties in other revisions are updated so they no
84 longer refer to dropped revisiomns.
85
86 see
87 Render a very condensed report on the repository node structure,
88 mainly useful for examining strange and pathological repositories.
89 File content is ignored. You get one line per repository operation,
90 reporting the revision, operation type, file path, and the copy
91 source (if any). Directory paths are distinguished by a trailing
92 slash. The 'copy' operation is really an 'add' with a directory
93 source and target; the display name is changed to make them easier
94 to see. This report can be restricted by a selection set.
95
96 renumber
97 Renumber all revisions, patching Node-copyfrom headers as required.
98 Any selection option is ignored. Takes no arguments. The -b option
99 can be used to set the base to renumber from, defaulting to 0.
100
101 log
102 Generate a log report, same format as the output of svn log on a
103 repository, to standard output.
104
105 setlog
106 Replace the log entries in the input dumpfile with the
107 corresponding entries in the LOGFILE, which should be in the format
108 of an svn log output. Replacements may be restricted to a specified
109 range.
110
111 propdel
112 Delete the property PROPNAME. May be restricted by a revision
113 selection. You may specify multiple properties to be deleted.
114
115 proprename
116 Rename the property OLDNAME to NEWNAME. May be restricted by a
117 revision selection. You may specify multiple properties to be
118 renamed.
119
120 propset
121 Set the property PROPNAME to PROPVAL.
122
123 May be restricted by a revision selection. Note that specifying
124 only a revision will cause the property to be seet on the revision
125 properties and on all nodes in the rtevision; you’ll probably want
126 to specify a node index.
127
128 You may specify multiple property settings.
129
130 ppropclean
131 Every path with a suffix matching one of SUFFIXES gets a property
132 turned off. The default property is svn::Another property may be
133 set with the -p option.
134
135 expunge
136 Delete all operations with Node-path or Node-copyfrom-path headers
137 matching specified Golang regular expressions (opposite of 'sift').
138 Any revision left with no Node records after this filtering has its
139 Revision record dropped as well. Mergeinfo properties in all
140 revisions are updated so they no longer refer to dropped revisions.
141
142 sift
143 Delete all operations with either Node-path or Node-copyfrom-path
144 headers not matching specified Golang regular expressions (opposite
145 of 'expunge'). Any revision left with no Node records after this
146 filtering has its Revision record removed as well. Mergeinfo
147 properties in all revisions are updated so they no longer refer to
148 dropped revisions.
149
150 This transform can be restricted by a selection set.
151
152 closure
153 The 'closure' subcommand computes the transitive closure of a path
154 set under the relation 'copies from' - that is, with the smallest
155 set of additional paths such that every copy-from source is in the
156 set.
157
158 pathlist
159 List all distinct node-paths in the stream, once each, in the order
160 first encountered.
161
162 pathrename
163 Modify Node-path headers, Node-copyfrom-path headers, and
164 svn::expression FROM; replace with TO. TO may contain Golang-style
165 backreferences (${1}, ${2} etc - curly brackets not optional) to
166 parenthesized portions of FROM.
167
168 Matches are constrained so that each match must be a path segment
169 or a sequence of path segments; that is, the left end must be
170 either at the start of path or immediately following a /, and the
171 right end must precede a / or be at end of string. With a leading ^
172 the match is constrained to be a leading sequence of the pathname;
173 with a trailing $, a trailing one.
174
175 Multiple FROM/TO pairs may be specified and are applied in order.
176 This transform can be restricted by a selection set.
177
178 All mergeinfo properties are updated in accordance with the path
179 renames,
180
181 setcopyfrom
182 In the specified revisions, replace the Node-path with the
183 specified PATH. Does not alter mergeinfo properties as a side
184 effect.
185
186 setcopyfrom
187 In the specified revisions, replace the Node-copyfrom-path with the
188 specified PATH. Does not alter mergeinfo properties as a side
189 effect. Terminates with error if any selected node is not a copy.
190
191 pop
192 Pop initial segment off each path matching PATTERN - by default,
193 all paths.
194
195 May be useful after a sift command to turn a dump from a subproject
196 stripped from a dump for a multiple-project repository into the
197 normal form with trunk/tags/branches at the top level.
198
199 This transform cannot be restricted by a selection set, as it is
200 not possible to guarantee that copyfro paths and mergeinfo
201 properties will be modified consistently in the presence of that
202 kind of restriction.
203
204 Mergeinfo properties in all revisions are updated, as well as path
205 and copyfrom parts.
206
207 push
208 Push an initial segment onto each matching path. Normally used to
209 add a "trunk" prefix to every path in a flat repository. The -s
210 option can be used rton set a different initial segment.
211
212 This transform cannot be restricted by a selection set, as it is
213 not possible to guarantee that copyfro paths and mergeinfo
214 properties will be modified consistently in the presence of that
215 kind of restriction.
216
217 Mergeinfo properties in all revisions are updated toi refer to the
218 new pathnames.
219
220 filecopy
221 For each node in the revision range, stash the current version of
222 the node-path’s content. For each later file copy operation with
223 that source, replace the file copy with an explicit add/change
224 using the stashed content.
225
226 With the -f flag and a BASENAME argument, require the source
227 basename to be as specified. Otherwise, with -f and no BASENAME,
228 require a match of source to targwt on basename only rather than
229 the full path. This may be required in order to extract filecopies
230 from branches.
231
232 Restricting the range holds down the memory requirement of this
233 tool, which in the worst (and default) 1:$ case will keep a copy of
234 evert blob in the repository until it’s done processing the stream.
235
236 skipcopy
237 Replace the source revision and path of a copy at the upper end of
238 the selection with the source revisions and path of a copy at the
239 lower end. Fails unless both revisions are copies. Used to remove
240 an unwanted intermediate copy or copies.
241
242 swap
243 Swap the top two elements of each pathname in every revision in the
244 selection set. Useful following a sift operation for straightening
245 out a common form of multi-project repository. If a PATTERN
246 argument is given, only paths matching it are swapped.
247
248 swapsvn
249 Like swap, but is aware of Subversion structure. Used for
250 transforming multiproject repositories into a standard layout with
251 trunk, tags, and branches at the top level.
252
253 Fires when the second component of a matching path is "trunk",
254 "branches", or "tags", or the path consists of a single segment
255 that is a top-level project directory; passes through all paths for
256 this is not so unaltered.
257
258 Top-level project directories with properties or comments make this
259 command die (return status 1) with an error message on stderr;
260 otherwise these directories are silently discarded.
261
262 Otherwise, swaps "trunk" and the top-level (project) directory
263 straight up. For tags and branches, the following two components
264 are swapped to the top. thus, "foo/branches/release23" becomes
265 "branches/release23/foo", putting the project directory beneath the
266 branch.
267
268 Also fires when an entire project directory is copied; this is
269 transformed into a copy of trunk and copies of each subbranch and
270 tag that exists.
271
272 After the swap, there are attempts to recognize spans of copies
273 into branch directories, and copies into tag subdirectories that
274 are parallel in all top-level (project) directories. These are
275 coalesced into single copies in the inverted structure. No attempts
276 is made to coalesce deletes; the user must manually trim unneeded
277 branches.
278
279 Accordingly, copies with three-segment sources and three-segment
280 targets are transformed; for tags/ and branches/ paths the last
281 segment (the subdirectory below the branch name) is dropped,
282 Following copies are skipped.
283
284 This has two minor negative consequences. One is that metadata
285 belonging to all deletes or copies after the first one in a
286 coalesced span is lost. The other is that branches and tags local
287 to individual project directories are promoted to global branches
288 and tags across the entire transformed repository; no content is
289 lost this way.
290
291 Parallel rename sequences are also coalesced.
292
293 If a PATTERN argument is given, only paths matching the pattern are
294 swapped.
295
296 Note that the result of swapping does not have initial
297 trunk/branches/tags directory creations and can thus not be fed
298 directly to svnload. reposurgeon copes with this, but Subversion
299 will not.
300
301 Merfeinfo propertied are updated to use the swapped path names.
302
303 This transform can be restricted by a selection set.
304
305 replace
306 Perform a regular expression search/replace on blob content. The
307 first character of the argument (normally /) is treated as the end
308 delimiter for the regular-expression and replacement parts. This
309 transform can be restricted by a selection set.
310
311 strip
312 Replace content with unique generated cookies on all node paths
313 matching the specified regular expressions; if no expressions are
314 given, match all paths.
315
316 This command is useful for reducing the bulk of a stream without
317 touching its metdata, so you can doio test conversions more
318 quickly.
319
320 obscure
321 Replace path segments and committer IDs with arbitrary but
322 consistent names in order to obscure them. The replacement
323 algorithm is tuned to make the replacements readily distinguishable
324 by eyeball. This transform can be restricted by a selection set.
325
326 reduce
327 Strip revisions out of a dump so the only parts left those likely
328 to be relevant to a conversion problem. This is done by dropping
329 every node that consists of a change on a file and has no property
330 settings. Mergeinfo properties in all revisions are updated so they
331 no longer refer to dropped revisions.
332
333 testify
334 Replace commit timestamps with a monotonically increasing clock
335 tick starting at the Unix epoch and advancing by 10 seconds per
336 commit. Replace all attributions with 'fred'. Discard the
337 repository UUID. Use this to neutralize procedurally-generated
338 streams so they can be compared. This transform can be restricted
339 by a selection set.
340
341 version
342 Report major and minor repocutter version.
343
345 Under the name "svncutter", an ancestor of this program traveled in the
346 'contrib/' director of the Subversion distribution. It had functional
347 overlap with reposurgeon(1) because it was directly ancestral to that
348 code. It was moved to the reposurgeon(1) distribution in January 2016.
349 This program was ported from Python to Go in August 2018, at which time
350 the obsolete "squash" command was retired. The syntax of regular
351 expressions in the pathrename command changed at that time.
352
353 The reason for the partial functional overlap between repocutter and
354 reposurgeon is that repocutter was first written earlier and became a
355 testbed for some of the design concepts in reposurgeon. After
356 reposurgeon was written, the author learned that it could not naturally
357 support some useful operations very specific to Subversion, and
358 enhanced repocutter to do those.
359
361 There is one regression since the Python version: repocutter no longer
362 recognizes Macintosh-style line endings consisting of a carriage return
363 only. This may be addressed in a future version.
364
366 reposurgeon(1).
367
369 Suppose you have a Subversion repository with the following
370 semi-pathological structure:
371
372 Directory1/ (with unrelated content)
373 Directory2/ (with unrelated content)
374 TheDirIWantToMigrate/
375 branches/
376 crazy-feature/
377 UnrelatedApp1/
378 TheAppIWantToMigrate/
379 tags/
380 v1.001/
381 UnrelatedApp1/
382 UnrelatedApp2/
383 TheAppIWantToMigrate/
384 trunk/
385 UnrelatedApp1/
386 UnrelatedApp2/
387 TheAppIWantToMigrate/
388
389 You want to transform the dump file so that TheAppIWantToMigrate can be
390 subject to a regular branchy lift. A way to dissect out the code of
391 interest would be with the following series of filters applied:
392
393 repocutter expunge '^Directory1' '^Directory2'
394 repocutter pathrename '^TheDirIWantToMigrate/' ''
395 repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
396 repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
397 repocutter expunge '^tags/v1.001/UnrelatedApp1/'
398 repocutter expunge '^tags/v1.001/UnrelatedApp2/'
399 repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
400 repocutter expunge '^trunk/UnrelatedApp1/'
401 repocutter expunge '^trunk/UnrelatedApp2/'
402 repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
403
405 The sift and expunge operations can produce output dumps that are
406 invalid. The problem is copyfrom operations (Subversion branch and tag
407 creations). If an included revision includes a copyfrom reference to an
408 excluded one, the reference target won’t be in the emitted dump; it
409 won’t load correctly in Subversion, and while reposurgeon has fallback
410 logic that backs down to the latest existing revision before the
411 kissing one this expedient is fragile. The revision number in a
412 copyfrom header pointing to a missing revision will be zero. Attempts
413 to be clever about this won’t work; the problem is inherent in the data
414 model of Subversion.
415
417 Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with
418 reposurgeon; see the project page
419 <http://www.catb.org/~esr/reposurgeon>.
420
421
422
423 2022-04-21 REPOCUTTER(1)