1REPOCUTTER(1) REPOCUTTER(1)
2
3
4
6 repocutter - surgical and filtering operations on Subversion dump files
7
9 repocutter [-q] [-d] [-i 'filename'] [-r 'selection'] 'subcommand'
10
12 This program does surgical and filtering operations on Subversion dump
13 files. While it is is not as flexible as reposurgeon(1), it can perform
14 Subversion-specific transformations that reposurgeon cannot, and can be
15 useful for processing Subversion repositories into a form suitable for
16 conversion. Also, it supports the version 3 dumpfile format, which
17 reposurgeon does not.
18
19 In all commands, the -r (or --range) option limits the selection of
20 revisions over which an operation will be performed. Usually it behaves
21 like an implied select on the revision output range. A selection
22 consists of one or more comma-separated ranges. A range may consist of
23 an integer revision number or the special name HEAD for the head
24 revision. Or it may be a colon-separated pair of integers, or an
25 integer followed by a colon followed by HEAD.
26
27 Normally, each subcommand produces a progress spinner on standard
28 error; each turn means another revision has been filtered. The -q (or
29 --quiet) option suppresses this.
30
31 The -d option enables debug messages on standard error. These are
32 probably only of interest to repocutter developers.
33
34 The -i option sets the input source to a specified filename. This is
35 primarily useful when running the program under a debugger. When this
36 option is not present the program expects to read a stream from
37 standard input.
38
39 Generally, if you need to use this program at all, you will find that
40 you need to pipe your dump file through multiple instances of it doing
41 one kind of operation each. This is not as expensive as it sounds; with
42 the exception of the reduce subcommand, the working set of this program
43 is bounded by the size of the the largest single blob plus its
44 metadata. It does not need to hold the entire repo metadata in memory.
45
46 The following subcommands are available:
47
48 help
49 Without arguments, list available commands. With a command-name
50 argument, show detailed help for that subcommand.
51
52 The 'deselect' subcommand selects a range and permits only revisions
53 NOT in that range to pass to standard output.
54
55 deselect
56 The 'deselect' subcommand selects a range and permits only
57 revisions not in that range to pass to standard output.
58
59 expunge
60 Delete all operations with Node-path headers matching specified Go
61 regular expressions. Any revision left with no Node records after
62 this filtering has its Revision record removed as well.
63
64 log
65 Generate a log report, same format as the output of svn log on a
66 repository, to standard output.
67
68 obscure
69 Replace path segments and committer IDs with arbitrary but
70 consistent names in order to obscure them. The replacement
71 algorithm is tuned to make the replacements readily distinguishable
72 by eyeball.
73
74 pathrename
75 Modify Node-path and Node-copyfrom-path headers matching a
76 specified regular expression; replace with a given string. The
77 string may contain references to parenthesized portions of the
78 pattern - note, these must be Go-style references led by $, not by
79 a backslash as in reposurgeon itself. See the embedded help for
80 syntax details.
81
82 pop
83 Pop initial segment off each path. May be useful after a sift
84 command to turn a dump from a subproject stripped from a dump for a
85 multiple-project repository into the normal form with
86 trunk/tags/branches at the top level.
87
88 propset
89 Set a property to a value. May be restricted by a revision
90 selection. You may specify multiple property settings. See the
91 embedded help for syntax details.
92
93 propdel
94 Delete the named property. May be restricted by a revision
95 selection. You may specify multiple properties to be deleted. See
96 the embedded help for syntax details.
97
98 proprename
99 Rename a property. May be restricted by a revision selection. You
100 may specify multiple properties to be renamed. See the embedded
101 help for syntax details.
102
103 reduce
104 Strip revisions out of a dump so the only parts left those likely
105 to be relevant to a conversion problem. See the embedded help for
106 syntax details and the relevance filter.
107
108 renumber
109 Renumber all revisions, patching Node-copyfrom headers as required.
110 Any selection option is ignored. Takes no arguments. The -b option
111 set the base to renumber, defaulting to 0.
112
113 replace
114 Perform a regular expression search/replace on blog content. The
115 first character of the argument (normally /) is treadted as the end
116 delimiter for the regulat-expression and replacement parts.
117
118 see
119 Render a very condensed report on the repository node structure,
120 mainly useful for examining strange and pathological repositories.
121 File content is ignored. You get one line per repository operation,
122 reporting the revision, operation type, file path, and the copy
123 source (if any). Directory paths are distinguished by a trailing
124 slash. The 'copy' operation is really an 'add' with a directory
125 source and target; the display name is changed to make them easier
126 to see. Additionally, any property settings on a node are dumped
127 immediately after it.
128
129 select
130 The 'select' subcommand selects a range and permits only revisions
131 in that range to pass to standard output. A range beginning with 0
132 includes the dumpfile header.
133
134 setlog
135 Replace the log entries in the input dumpfile with the
136 corresponding entries in a specified file, which should be in the
137 format of an svn log output. Replacements may be restricted to a
138 specified range. See the embedded help for syntax details.
139
140 sift
141 Delete all operations with Node-path headers not matching specified
142 Go regular expressions (opposite of 'expunge'). Any revision left
143 with no Node records after this filtering has its Revision record
144 removed as well.
145
146 strip
147 Replace content with unique generated cookies on all node paths
148 matching the specified regular expressions; if no expressions are
149 given, match all paths. Useful when you need to examine a
150 particularly complex node structure.
151
152 swap
153 Swap the top two components of every path. This is sometimes useful
154 when converting a multi-project Subversion repository that has
155 normal trunk/branch/tag structure under each top-level directory
156 (of course the alternative is to break it into components using
157 multiple strip operations).
158
159 testify
160 Replace commit timestamps with a monotonically increasing clock
161 tick starting at the Unix epoch and advancing by 10 seconds per
162 commit. Replace all attributions with 'fred'. Discard the
163 repository UUID. Use this to neutralize procedurally-generated
164 streams so they can be compared.
165
167 Under the name "svncutter", an ancestor of this program traveled in the
168 'contrib/' director of the Subversion distribution. It had functional
169 overlap with reposurgeon(1) because it was directly ancestral to that
170 code. It was moved to the reposurgeon(1) distribution in January 2016.
171 This program was ported from Python to Go in August 2018, at which time
172 the obsolete "squash" command was retired. The syntax of regular
173 expressions in the pathrename command changed at that time.
174
175 The reason for the partial functional overlap between repocutter and
176 reposurgeon is that repocutter was first written earlier and became a
177 testbed for some of the design concepts in reposurgeon. After
178 reposurgeon was written, the author learned that it could not naturally
179 support some useful operations very specific to Subversion, and
180 enhanced repocutter to do those.
181
183 There is one regression since the Python version: repocutter no longer
184 recognizes Macintosh-style line endings consisting of a carriage return
185 only. This may be addressed in a future version.
186
188 reposurgeon(1).
189
191 Suppose you have a Subversion repository with the following
192 semi-pathological structure:
193
194 Directory1/ (with unrelated content)
195 Directory2/ (with unrelated content)
196 TheDirIWantToMigrate/
197 branches/
198 crazy-feature/
199 UnrelatedApp1/
200 TheAppIWantToMigrate/
201 tags/
202 v1.001/
203 UnrelatedApp1/
204 UnrelatedApp2/
205 TheAppIWantToMigrate/
206 trunk/
207 UnrelatedApp1/
208 UnrelatedApp2/
209 TheAppIWantToMigrate/
210
211 You want to transform the dump file so that TheAppIWantToMigrate can be
212 subject to a regular branchy lift. A way to dissect out the code of
213 interest would be with the following series of filters applied:
214
215 repocutter expunge '^Directory1' '^Directory2'
216 repocutter pathrename '^TheDirIWantToMigrate/' ''
217 repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
218 repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
219 repocutter expunge '^tags/v1.001/UnrelatedApp1/'
220 repocutter expunge '^tags/v1.001/UnrelatedApp2/'
221 repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
222 repocutter expunge '^trunk/UnrelatedApp1/'
223 repocutter expunge '^trunk/UnrelatedApp2/'
224 repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'
225
227 The sift and expunge operations can produce output dumps that are
228 invalid. The problem is copyfrom operations (Subversion branch and tag
229 creations). If an included revision includes a copyfrom reference to an
230 excluded one, the reference target won’t be in the emitted dump; it
231 won’t load correctly in either Subversion or reposurgeon. The revision
232 number in a copyfrom header pointing to a missing revision will be
233 zero. Attempts to be clever about this won’t work; the problem is
234 inherent in the data model of Subversion.
235
237 Eric S. Raymond <esr@thyrsus.com>. This tool is distributed with
238 reposurgeon; see the project page
239 <http://www.catb.org/~esr/reposurgeon>.
240
241
242
243 2021-01-12 REPOCUTTER(1)