1bup-split(1) bup-split(1)
2
3
4
6 bup-split - save individual files to bup backup sets
7
9 bup split [-t] [-c] [-n name] COMMON_OPTIONS
10
11 bup split -b COMMON_OPTIONS
12
13 bup split <–noop [–copy]|–copy> COMMON_OPTIONS
14
15 COMMON_OPTIONS
16 [-r host:path] [-v] [-q] [-d seconds-since-epoch] [--bench]
17 [--max-pack-size=bytes] [-#] [--bwlimit=bytes] [--max-pack-ob‐
18 jects=n] [--fanout=count] [--keep-boundaries] [–git-ids | file‐
19 names...]
20
22 bup split concatenates the contents of the given files (or if no file‐
23 names are given, reads from stdin), splits the content into chunks of
24 around 8k using a rolling checksum algorithm, and saves the chunks into
25 a bup repository. Chunks which have previously been stored are not
26 stored again (ie. they are `deduplicated').
27
28 Because of the way the rolling checksum works, chunks tend to be very
29 stable across changes to a given file, including adding, deleting, and
30 changing bytes.
31
32 For example, if you use bup split to back up an XML dump of a database,
33 and the XML file changes slightly from one run to the next, nearly all
34 the data will still be deduplicated and the size of each backup after
35 the first will typically be quite small.
36
37 Another technique is to pipe the output of the tar(1) or cpio(1) pro‐
38 grams to bup split. When individual files in the tarball change
39 slightly or are added or removed, bup still processes the remainder of
40 the tarball efficiently. (Note that bup save is usually a more effi‐
41 cient way to accomplish this, however.)
42
43 To get the data back, use bup-join(1).
44
46 These options select the primary behavior of the command, with -n being
47 the most likely choice.
48
49 -n, --name=name
50 after creating the dataset, create a git branch named name so
51 that it can be accessed using that name. If name already ex‐
52 ists, the new dataset will be considered a descendant of the old
53 name. (Thus, you can continually create new datasets with the
54 same name, and later view the history of that dataset to see how
55 it has changed over time.) The original data will also be avail‐
56 able as a top-level file named “data” in the VFS, accessible via
57 bup fuse, bup ftp, etc.
58
59 -t, --tree
60 output the git tree id of the resulting dataset.
61
62 -c, --commit
63 output the git commit id of the resulting dataset.
64
65 -b, --blobs
66 output a series of git blob ids that correspond to the chunks in
67 the dataset. Incompatible with -n, -t, and -c.
68
69 --noop read the data and split it into blocks based on the “bupsplit”
70 rolling checksum algorithm, but don't do anything with the
71 blocks. This is mostly useful for benchmarking. Incompatible
72 with -n, -t, -c, and -b.
73
74 --copy like --noop, but also write the data to stdout. This can be
75 useful for benchmarking the speed of read+bupsplit+write for
76 large amounts of data. Incompatible with -n, -t, -c, and -b.
77
79 -r, --remote=host:path
80 save the backup set to the given remote server. If path is
81 omitted, uses the default path on the remote server (you still
82 need to include the `:'). The connection to the remote server
83 is made with SSH. If you'd like to specify which port, user or
84 private key to use for the SSH connection, we recommend you use
85 the ~/.ssh/config file. Even though the destination is remote,
86 a local bup repository is still required.
87
88 -d, --date=seconds-since-epoch
89 specify the date inscribed in the commit (seconds since
90 1970-01-01).
91
92 -q, --quiet
93 disable progress messages.
94
95 -v, --verbose
96 increase verbosity (can be used more than once).
97
98 --git-ids
99 stdin is a list of git object ids instead of raw data.
100 bup split will read the contents of each named git object (if it
101 exists in the bup repository) and split it. This might be use‐
102 ful for converting a git repository with large binary files to
103 use bup-style hashsplitting instead. This option is probably
104 most useful when combined with --keep-boundaries.
105
106 --keep-boundaries
107 if multiple filenames are given on the command line, they are
108 normally concatenated together as if the content all came from a
109 single file. That is, the set of blobs/trees produced is iden‐
110 tical to what it would have been if there had been a single in‐
111 put file. However, if you use --keep-boundaries, each file is
112 split separately. You still only get a single tree or commit or
113 series of blobs, but each blob comes from only one of the files;
114 the end of one of the input files always ends a blob.
115
116 --bench
117 print benchmark timings to stderr.
118
119 --max-pack-size=bytes
120 never create git packfiles larger than the given number of
121 bytes. Default is 1 billion bytes. Usually there is no reason
122 to change this.
123
124 --max-pack-objects=numobjs
125 never create git packfiles with more than the given number of
126 objects. Default is 200 thousand objects. Usually there is no
127 reason to change this.
128
129 --fanout=numobjs
130 when splitting very large files, try and keep the number of ele‐
131 ments in trees to an average of numobjs.
132
133 --bwlimit=bytes/sec
134 don't transmit more than bytes/sec bytes per second to the serv‐
135 er. This is good for making your backups not suck up all your
136 network bandwidth. Use a suffix like k, M, or G to specify mul‐
137 tiples of 1024, 10241024, 10241024*1024 respectively.
138
139 -#, --compress=#
140 set the compression level to # (a value from 0-9, where 9 is the
141 highest and 0 is no compression). The default is 1 (fast, loose
142 compression)
143
145 $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
146 tar: Removing leading /' from member names
147 Indexing objects: 100% (196/196), done.
148
149 $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
150 1961
151
153 bup-join(1), bup-index(1), bup-save(1), bup-on(1), ssh_config(5)
154
156 Part of the bup(1) suite.
157
159 Avery Pennarun <apenwarr@gmail.com>.
160
161
162
163Bup 0.29.2 2018-10-20 bup-split(1)