1bup-split(1)                                                      bup-split(1)
2
3
4

NAME

6       bup-split - save individual files to bup backup sets
7

SYNOPSIS

9       bup split [-t] [-c] [-n name] COMMON_OPTIONS
10
11       bup split -b COMMON_OPTIONS
12
13       bup split <–noop [–copy]|–copy> COMMON_OPTIONS
14
15       COMMON_OPTIONS
16              [-r  host:path]  [-v]  [-q]  [-d  seconds-since-epoch] [--bench]
17              [--max-pack-size=bytes] [-#]  [--bwlimit=bytes]  [--max-pack-ob‐
18              jects=n]  [--fanout=count] [--keep-boundaries] [–git-ids | file‐
19              names...]
20

DESCRIPTION

22       bup split concatenates the contents of the given files (or if no  file‐
23       names  are  given, reads from stdin), splits the content into chunks of
24       around 8k using a rolling checksum algorithm, and saves the chunks into
25       a  bup  repository.   Chunks  which have previously been stored are not
26       stored again (ie.  they are `deduplicated').
27
28       Because of the way the rolling checksum works, chunks tend to  be  very
29       stable  across changes to a given file, including adding, deleting, and
30       changing bytes.
31
32       For example, if you use bup split to back up an XML dump of a database,
33       and  the XML file changes slightly from one run to the next, nearly all
34       the data will still be deduplicated and the size of each  backup  after
35       the first will typically be quite small.
36
37       Another  technique  is to pipe the output of the tar(1) or cpio(1) pro‐
38       grams to bup split.   When  individual  files  in  the  tarball  change
39       slightly  or are added or removed, bup still processes the remainder of
40       the tarball efficiently.  (Note that bup save is usually a  more  effi‐
41       cient way to accomplish this, however.)
42
43       To get the data back, use bup-join(1).
44

MODES

46       These options select the primary behavior of the command, with -n being
47       the most likely choice.
48
49       -n, --name=name
50              after creating the dataset, create a git branch  named  name  so
51              that  it  can  be accessed using that name.  If name already ex‐
52              ists, the new dataset will be considered a descendant of the old
53              name.   (Thus,  you can continually create new datasets with the
54              same name, and later view the history of that dataset to see how
55              it has changed over time.) The original data will also be avail‐
56              able as a top-level file named “data” in the VFS, accessible via
57              bup fuse, bup ftp, etc.
58
59       -t, --tree
60              output the git tree id of the resulting dataset.
61
62       -c, --commit
63              output the git commit id of the resulting dataset.
64
65       -b, --blobs
66              output a series of git blob ids that correspond to the chunks in
67              the dataset.  Incompatible with -n, -t, and -c.
68
69       --noop read the data and split it into blocks based on  the  “bupsplit”
70              rolling  checksum  algorithm,  but  don't  do  anything with the
71              blocks.  This is mostly useful for  benchmarking.   Incompatible
72              with -n, -t, -c, and -b.
73
74       --copy like  --noop,  but  also  write the data to stdout.  This can be
75              useful for benchmarking the  speed  of  read+bupsplit+write  for
76              large amounts of data.  Incompatible with -n, -t, -c, and -b.
77

OPTIONS

79       -r, --remote=host:path
80              save  the  backup  set  to  the given remote server.  If path is
81              omitted, uses the default path on the remote server  (you  still
82              need  to  include the `:').  The connection to the remote server
83              is made with SSH.  If you'd like to specify which port, user  or
84              private  key to use for the SSH connection, we recommend you use
85              the ~/.ssh/config file.  Even though the destination is  remote,
86              a local bup repository is still required.
87
88       -d, --date=seconds-since-epoch
89              specify   the  date  inscribed  in  the  commit  (seconds  since
90              1970-01-01).
91
92       -q, --quiet
93              disable progress messages.
94
95       -v, --verbose
96              increase verbosity (can be used more than once).
97
98       --git-ids
99              stdin is  a  list  of  git  object  ids  instead  of  raw  data.
100              bup split will read the contents of each named git object (if it
101              exists in the bup repository) and split it.  This might be  use‐
102              ful  for  converting a git repository with large binary files to
103              use bup-style hashsplitting instead.  This  option  is  probably
104              most useful when combined with --keep-boundaries.
105
106       --keep-boundaries
107              if  multiple  filenames  are given on the command line, they are
108              normally concatenated together as if the content all came from a
109              single  file.  That is, the set of blobs/trees produced is iden‐
110              tical to what it would have been if there had been a single  in‐
111              put  file.   However, if you use --keep-boundaries, each file is
112              split separately.  You still only get a single tree or commit or
113              series of blobs, but each blob comes from only one of the files;
114              the end of one of the input files always ends a blob.
115
116       --bench
117              print benchmark timings to stderr.
118
119       --max-pack-size=bytes
120              never create git packfiles  larger  than  the  given  number  of
121              bytes.   Default is 1 billion bytes.  Usually there is no reason
122              to change this.
123
124       --max-pack-objects=numobjs
125              never create git packfiles with more than the  given  number  of
126              objects.   Default is 200 thousand objects.  Usually there is no
127              reason to change this.
128
129       --fanout=numobjs
130              when splitting very large files, try and keep the number of ele‐
131              ments in trees to an average of numobjs.
132
133       --bwlimit=bytes/sec
134              don't transmit more than bytes/sec bytes per second to the serv‐
135              er.  This is good for making your backups not suck up  all  your
136              network bandwidth.  Use a suffix like k, M, or G to specify mul‐
137              tiples of 1024, 10241024, 10241024*1024 respectively.
138
139       -#, --compress=#
140              set the compression level to # (a value from 0-9, where 9 is the
141              highest and 0 is no compression).  The default is 1 (fast, loose
142              compression)
143

EXAMPLES

145              $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
146              tar: Removing leading /' from member names
147              Indexing objects: 100% (196/196), done.
148
149              $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
150              1961
151

SEE ALSO

153       bup-join(1), bup-index(1), bup-save(1), bup-on(1), ssh_config(5)
154

BUP

156       Part of the bup(1) suite.
157

AUTHORS

159       Avery Pennarun <apenwarr@gmail.com>.
160
161
162
163Bup 0.29.1                        2017-03-26                      bup-split(1)
Impressum