1REPOCUTTER(1) REPOCUTTER(1)
2
3
4
6 repocutter - surgical and filtering operations on Subversion dump files
7
9 repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'
10
12 This program does surgical and filtering operations on Subversion dump
13 files. While it is is not as flexible as reposurgeon(1), it can perform
14 Subversion-specific transformations that reposurgeon cannot, and can be
15 useful for processing Subversion repositories into a form suitable for
16 conversion. Also, it supports the version 3 dumpfile format, which
17 reposurgeon does not.
18
19 In most commands, the -r (or --range) option limits the selection of
20 revisions over which an operation will be performed. Usually other
21 revisions will be passed through unaltered, except in the select and
22 deselect commands for which the option controls which revisions will be
23 passed through. A selection consists of one or more comma-separated
24 ranges. A range may consist of an integer revision number or the
25 special name HEAD for the head revision. Or it may be a colon-separated
26 pair of integers, or an integer followed by a colon followed by HEAD.
27
28 If the output stream contains copyfrom references to missing revisions,
29 repocutter silently patch each copysources by stepping it backwards to
30 the most recent previous version that exists.
31
32 (Older versions of this tool, before 4.30, treated -r as an implied
33 selection filter rather than passing through unselected revisions
34 unaltered. If you have old scripts using repocutter they may need
35 modification.)
36
37 Normally, each subcommand produces a progress spinner on standard
38 error; each turn means another revision has been filtered. The -q (or
39 --quiet) option suppresses this. Quiet mode is set when output is
40 redirected to a file or pipe.
41
42 The -d option enables debug messages on standard error. It takes an
43 integer debug level. These messages are probably only of interest to
44 repocutter developers.
45
46 The -i option sets the input source to a specified filename. This is
47 primarily useful when running the program under a debugger. When this
48 option is not present the program expects to read a stream from
49 standard input.
50
51 Generally, if you need to use this program at all, you will find that
52 you need to pipe your dump file through multiple instances of it doing
53 one kind of operation each. This is not as expensive as it sounds; with
54 the exception of the reduce subcommand, the working set of this program
55 is bounded by the size of the the largest single blob plus its
56 metadata. It does not need to hold the entire repo metadata in memory.
57
58 The -f/-fixed option disables regexp compilation of PATTERN arguments,
59 treating them as literal strings.
60
61 The -t option sets a tag to be included in error and warning messages.
62 This will be useful for determining which stage of a multistage
63 repocutter pipeline failed.
64
65 There are a few other command-specific options described under
66 individual commands.
67
68 In the command descriptions, PATTERN arguments are regular expressions
69 to match pathnames, constrained so that each match must be a path
70 segment or a sequence of path segments; that is, the left end must be
71 either at the start of path or immediately following a /, and the right
72 end must precede a / or be at end of string. With a leading ^ the match
73 is constrained to be a leading sequence of the pathname; with a
74 trailing $, a trailing one.
75
76 The following subcommands are available:
77
78 select
79 The 'select' subcommand selects a range and permits only revisions
80 and nodes in that range to pass to standard output. A range
81 beginning with 0 includes the dumpfile header. Mergeinfo properties
82 in all revisions are updated so they no longer refer to omitted
83 revisions.
84
85 Warning::valid dump that can be read by reposurgeon. In particular,
86 it may delete a revision that is referenced in a later copy-from
87 operation, which will crash reposurgeon.
88
89 deselect
90 The 'deselect' subcommand selects a range and permits only
91 revisions and nodes NOT in that range to pass to standard output.
92 Any mergeinfo properties in other revisions are updated so they no
93 longer refer to dropped revisions.
94
95 Warning::valid dump that can be read by reposurgeon. In particular,
96 it may delete a revision that is referenced in a later copy-from
97 operation, which will crash reposurgeon.
98
99 see
100 Render a very condensed report on the repository node structure,
101 mainly useful for examining strange and pathological repositories.
102 File content is ignored. You get one line per repository operation,
103 reporting the revision, operation type, file path, and the copy
104 source (if any). Directory paths are distinguished by a trailing
105 slash. The 'copy' operation is really an 'add' with a directory
106 source and target; the display name is changed to make them easier
107 to see. This report can be restricted by a selection set.
108
109 renumber
110 Renumber all revisions, patching Node-copyfrom headers as required.
111 Any selection option is ignored. Takes no arguments. The -b option
112 can be used to set the base to renumber from, defaulting to 0.
113
114 count
115 The 'count' subcommand lists the last revision number in the input
116 stream. This is normally the revision count, buut may not if the
117 stream has omitted revisions.
118
119 log
120 Generate a log report, same format as the output of svn log on a
121 repository, to standard output.
122
123 setlog
124 Replace the log entries in the input dumpfile with the
125 corresponding entries in the LOGFILE, which should be in the format
126 of an svn log output. Replacements may be restricted to a specified
127 range.
128
129 propdel
130 Delete the property PROPNAME. May be restricted by a revision
131 selection. You may specify multiple properties to be deleted.
132
133 proprename
134 Rename the property OLDNAME to NEWNAME. May be restricted by a
135 revision selection. You may specify multiple properties to be
136 renamed.
137
138 propset
139 Set the property PROPNAME to PROPVAL.
140
141 May be restricted by a revision selection. Note that specifying
142 only a revision will cause the property to be seet on the revision
143 properties and on all nodes in the rtevision; you’ll probably want
144 to specify a node index.
145
146 You may specify multiple property settings.
147
148 propclean
149 Every path with a suffix matching one of SUFFIXES gets a property
150 turned off. The default property is svn::Another property may be
151 set with the -p option.
152
153 expunge
154 Delete all operations with Node-path or Node-copyfrom-path headers
155 matching specified Golang regular expressions (opposite of 'sift').
156 Any revision left with no Node records after this filtering has its
157 Revision record dropped as well. Mergeinfo properties in all
158 revisions are updated so they no longer refer to dropped revisions.
159
160 Warning::valid dump that can be read by reposurgeon. In particular,
161 it may delete a revision that is referenced in a later copy-from
162 operation, which will crash reposurgeon.
163
164 sift
165 Delete all operations with either Node-path or Node-copyfrom-path
166 headers not matching specified Golang regular expressions (opposite
167 of 'expunge'). Any revision left with no Node records after this
168 filtering has its Revision record removed as well. Mergeinfo
169 properties in all revisions are updated so they no longer refer to
170 dropped revisions.
171
172 This transform can be restricted by a selection set.
173
174 Warning::valid dump that can be read by reposurgeon. In particular,
175 it may delete a revision that is referenced in a later copy-from
176 operation, which will crash reposurgeon.
177
178 closure
179 The 'closure' subcommand computes the transitive closure of a path
180 set under the relation 'copies from' - that is, with the smallest
181 set of additional paths such that every copy-from source is in the
182 set.
183
184 pathlist
185 List all distinct node-paths in the stream, once each, in the order
186 first encountered.
187
188 pathrename
189 Modify Node-path headers, Node-copyfrom-path headers, and
190 svn::expression FROM; replace with TO. TO may contain Golang-style
191 backreferences (${1}, ${2} etc - curly brackets not optional) to
192 parenthesized portions of FROM.
193
194 Matches are constrained so that each match must be a path segment
195 or a sequence of path segments; that is, the left end must be
196 either at the start of path or immediately following a /, and the
197 right end must precede a / or be at end of string. With a leading ^
198 the match is constrained to be a leading sequence of the pathname;
199 with a trailing $, a trailing one.
200
201 Multiple FROM/TO pairs may be specified and are applied in order.
202 This transform can be restricted by a selection set.
203
204 All mergeinfo properties are updated in accordance with the path
205 renames,
206
207 setpath
208 In the specified revisions, replace the Node-path with the
209 specified PATH. Does not alter mergeinfo properties as a side
210 effect.
211
212 setcopyfrom
213 In the specified revisions, replace the Node-copyfrom-path with the
214 specified PATH. Does not alter mergeinfo properties as a side
215 effect. Terminates with error if any selected node is not a copy.
216
217 pop
218 Pop initial segment off each path matching PATTERN - by default,
219 all paths.
220
221 May be useful after a sift command to turn a dump from a subproject
222 stripped from a dump for a multiple-project repository into the
223 normal form with trunk/tags/branches at the top level.
224
225 This transform cannot be restricted by a selection set, as it is
226 not possible to guarantee that copyfro paths and mergeinfo
227 properties will be modified consistently in the presence of that
228 kind of restriction.
229
230 Mergeinfo properties in all revisions are updated, as well as path
231 and copyfrom parts.
232
233 push
234 Push an initial segment onto each matching path. Normally used to
235 add a "trunk" prefix to every path in a flat repository. The -s
236 option can be used rton set a different initial segment.
237
238 This transform cannot be restricted by a selection set, as it is
239 not possible to guarantee that copyfro paths and mergeinfo
240 properties will be modified consistently in the presence of that
241 kind of restriction.
242
243 Mergeinfo properties in all revisions are updated toi refer to the
244 new pathnames.
245
246 filecopy
247 For each node in the revision range, stash the current version of
248 the node-path’s content. For each later file copy operation with
249 that source, replace the file copy with an explicit add/change
250 using the stashed content.
251
252 You can use this operation to sever links from obsolete branches or
253 non-conformable directories in a multiproject repository so the
254 unwanted content can be expunged without changing the content of
255 later revisions.
256
257 If a PATTERN argument is provided, only replace copies with an
258 explicit add/change when the source node path matches PATTERN.
259
260 With the -n flag, only the basename is required to match PATTERN if
261 it is provided. Otherwise, with -n and no PATTERN, require a match
262 of source to target on basename only rather than the full path.
263 This may be required in order to extract filecopies from branches.
264
265 Restricting the range holds down the memory requirement of this
266 tool, which in the worst (and default) 1:$ case will keep a copy of
267 every blob in the repository until it’s done processing the stream.
268
269 skipcopy
270 Replace the source revision and path of a copy at the upper end of
271 the selection with the source revisions and path of a copy at the
272 lower end. Fails unless both revisions are copies. Used to remove
273 an unwanted intermediate copy or copies, cleaning up the history.
274
275 swap
276 Swap the top two elements of each pathname in every revision in the
277 selection set. Useful following a sift operation for straightening
278 out a common form of multi-project repository. If a PATTERN
279 argument is given, only paths matching it are swapped.
280
281 swapsvn
282 Like swap, but is aware of Subversion structure. Used for
283 transforming multiproject repositories into a standard layout with
284 trunk, tags, and branches at the top level.
285
286 Fires when the second component of a matching path is "trunk",
287 "branches", or "tags", or the path consists of a single segment
288 that is a top-level project directory; passes through all paths for
289 this is not so unaltered.
290
291 Top-level project directories with properties or comments make this
292 command die (return status 1) with an error message on stderr;
293 otherwise these directories are silently discarded.
294
295 Otherwise, swaps "trunk" and the top-level (project) directory
296 straight up. For tags and branches, the following two components
297 are swapped to the top. thus, "foo/branches/release23" becomes
298 "branches/release23/foo", putting the project directory beneath the
299 branch.
300
301 Also fires when an entire project directory is copied; this is
302 transformed into a copy of trunk and copies of each subbranch and
303 tag that exists.
304
305 After the swap, there are attempts to recognize spans of copies
306 into branch directories, and copies into tag subdirectories that
307 are parallel in all top-level (project) directories. These are
308 coalesced into single copies in the inverted structure. No attempts
309 is made to coalesce deletes; the user must manually trim unneeded
310 branches.
311
312 Accordingly, copies with three-segment sources and three-segment
313 targets are transformed; for tags/ and branches/ paths the last
314 segment (the subdirectory below the branch name) is dropped,
315 Following copies are skipped.
316
317 This has two minor negative consequences. One is that metadata
318 belonging to all deletes or copies after the first one in a
319 coalesced span is lost. The other is that branches and tags local
320 to individual project directories are promoted to global branches
321 and tags across the entire transformed repository; no content is
322 lost this way.
323
324 Parallel rename sequences are also coalesced.
325
326 If a PATTERN argument is given, only paths matching the pattern are
327 swapped.
328
329 Note that the result of swapping does not have initial
330 trunk/branches/tags directory creations and can thus not be fed
331 directly to svnload. reposurgeon copes with this, but Subversion
332 will not.
333
334 Merfeinfo propertied are updated to use the swapped path names.
335
336 This transform can be restricted by a selection set.
337
338 swapcheck
339 List directory prefixes of anomalous paths that would confuse
340 swapsvn. This includes any single-segment path other than
341 trunk/tags/branches or a project copy operation, any path with two
342 or more segments in which the second is not trunk/tags/branches,
343 and any path in which trunk/tags/branches occurs more than one
344 segment down from the root.
345
346 Each report line has two fields; the first is the earliest revision
347 containing a path with the prefix given, and the second is the
348 prefix. Once a particular path prefix has been recognized and
349 reported as anomalous, later paths with that prefix are not
350 reported.
351
352 If feeding a Subversion dump to this subcommand doesn’t produce an
353 empty report, you can expect swapsvn to produce an invalid dump
354 that will confuse and possibly crash reposurgeon. The remedy for
355 this is a set of pathrenames and/or deselections that yields paths
356 conformable to being swapped into a regular Subversion structure.
357
358 replace
359 Perform a regular expression search/replace on blob content. The
360 first character of the argument (normally /) is treated as the end
361 delimiter for the regular-expression and replacement parts. This
362 transform can be restricted by a selection set.
363
364 strip
365 Replace content with unique generated cookies on all node paths
366 matching the specified regular expressions; if no expressions are
367 given, match all paths.
368
369 This command is useful for reducing the bulk of a stream without
370 touching its metadata, so you can do test conversions more quickly.
371
372 hash
373 Replace content with hash on all node paths matching the specified
374 regular expressions; if no expressions are given, match all paths.
375
376 obscure
377 Replace path segments and committer IDs with arbitrary but
378 consistent names in order to obscure them. The replacement
379 algorithm is tuned to make the replacements readily distinguishable
380 by eyeball. This transform can be restricted by a selection set.
381
382 reduce
383 Strip revisions out of a dump so the only parts left those likely
384 to be relevant to a conversion problem. This is done by dropping
385 every node that consists of a change on a file and has no property
386 settings. Mergeinfo properties in all revisions are updated so they
387 no longer refer to dropped revisions.
388
389 testify
390 Replace commit timestamps with a monotonically increasing clock
391 tick starting at the Unix epoch and advancing by 10 seconds per
392 commit. Replace all attributions with 'fred'. Discard the
393 repository UUID. Use this to neutralize procedurally-generated
394 streams so they can be compared. This transform can be restricted
395 by a selection set.
396
397 count
398 Set the debug level to the specified value on the selected
399 revisions. Setting debugging enables diagnostics to standard error,
400 and suppresses the progress baton for the entire run in order not
401 to step on any diagnostics that might be emitted.
402
403 For the meaning of the debug levels, see the source code. This
404 option is probably only of interest to repocutter developers.
405
406 version
407 Report major and minor repocutter version.
408
410 Under the name "svncutter", an ancestor of this program traveled in the
411 'contrib/' director of the Subversion distribution. It had functional
412 overlap with reposurgeon(1) because it was directly ancestral to that
413 code. It was moved to the reposurgeon(1) distribution in January 2016.
414 This program was ported from Python to Go in August 2018, at which time
415 the obsolete "squash" command was retired. The syntax of regular
416 expressions in the pathrename command changed at that time.
417
418 The reason for the partial functional overlap between repocutter and
419 reposurgeon is that repocutter was first written earlier and became a
420 testbed for some of the design concepts in reposurgeon. After
421 reposurgeon was written, the author learned that it could not naturally
422 support some useful operations very specific to Subversion, and
423 enhanced repocutter to do those.
424
426 Normally 0. Can be 1 if repocutter sees an ill-formed dump, or if the
427 output stream contains any copyfrom references to missing revisions.
428
430 There is one regression since the Python version: repocutter no longer
431 recognizes Macintosh-style line endings consisting of a carriage return
432 only. This may be addressed in a future version.
433
435 reposurgeon(1).
436
438 Suppose you have a Subversion repository with the following
439 semi-pathological structure:
440
441 Directory1/ (with unrelated content)
442 Directory2/ (with unrelated content)
443 TheDirIWantToMigrate/
444 branches/
445 crazy-feature/
446 UnrelatedApp1/
447 TheAppIWantToMigrate/
448 tags/
449 v1.001/
450 UnrelatedApp1/
451 UnrelatedApp2/
452 TheAppIWantToMigrate/
453 trunk/
454 UnrelatedApp1/
455 UnrelatedApp2/
456 TheAppIWantToMigrate/
457
458 You want to transform the dump file so that TheAppIWantToMigrate can be
459 subject to a regular branchy lift. A way to dissect out the code of
460 interest would be with the following series of filters applied:
461
462 repocutter expunge '^Directory1' '^Directory2'
463 repocutter pathrename '^TheDirIWantToMigrate/' ''
464 repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
465 repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
466 repocutter expunge '^tags/v1.001/UnrelatedApp1/'
467 repocutter expunge '^tags/v1.001/UnrelatedApp2/'
468 repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
469 repocutter expunge '^trunk/UnrelatedApp1/'
470 repocutter expunge '^trunk/UnrelatedApp2/'
471 repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
472
474 The sift and expunge operations can produce output dumps that are
475 invalid. The problem is copyfrom operations (Subversion branch and tag
476 creations). If an included revision includes a copyfrom reference to an
477 excluded one, the reference target won’t be in the emitted dump; it
478 won’t load correctly in Subversion, and while reposurgeon has fallback
479 logic that backs down to the latest existing revision before the
480 kissing one this expedient is fragile. The revision number in a
481 copyfrom header pointing to a missing revision will be zero. Attempts
482 to be clever about this won’t work; the problem is inherent in the data
483 model of Subversion.
484
486 Eric S. Raymond esr@thyrsus.com. This tool is distributed with
487 reposurgeon; see the project page
488 <http://www.catb.org/~esr/reposurgeon>.
489
490
491
492 2023-02-28 REPOCUTTER(1)