1datalad clone(1) General Commands Manual datalad clone(1)
2
3
4
6 datalad clone - obtain a dataset (copy) from a URL or local directory
7
9 datalad clone [-h] [-d DATASET] [-D DESCRIPTION] [--reckless
10 [auto|ephemeral|shared-...]] [--version] SOURCE [PATH] ...
11
12
13
15 The purpose of this command is to obtain a new clone (copy) of a
16 dataset and place it into a not-yet-existing or empty directory. As
17 such CLONE provides a strict subset of the functionality offered by
18 `install`. Only a single dataset can be obtained, and immediate recur‐
19 sive installation of subdatasets is not supported. However, once a (su‐
20 per)dataset is installed via CLONE, any content, including subdatasets
21 can be obtained by a subsequent `get` command.
22
23 Primary differences over a direct `git clone` call are 1) the automatic
24 initialization of a dataset annex (pure Git repositories are equally
25 supported); 2) automatic registration of the newly obtained dataset as
26 a subdataset (submodule), if a parent dataset is specified; 3) support
27 for additional resource identifiers (DataLad resource identifiers as
28 used on datasets.datalad.org, and RIA store URLs as used for
29 store.datalad.org - optionally in specific versions as identified by a
30 branch or a tag; see examples); and 4) automatic configurable genera‐
31 tion of alternative access URL for common cases (such as appending
32 '.git' to the URL in case the accessing the base URL failed).
33
34 In case the clone is registered as a subdataset, the original URL
35 passed to CLONE is recorded in `.gitmodules` of the parent dataset in
36 addition to the resolved URL used internally for git-clone. This allows
37 to preserve datalad specific URLs like ria+ssh://... for subsequent
38 calls to GET if the subdataset was locally removed later on.
39
40 URL mapping configuration
41
42 specifications. A substitution specification is defined as a configura‐
43 tion setting 'datalad.clone.url-substition.<seriesID>' with a string
44 containing a match and substitution expression, each following Python's
45 regular expression syntax. Both expressions are concatenated to a sin‐
46 gle string with an arbitrary delimiter character. The delimiter is de‐
47 fined by prefixing the string with the delimiter. Prefix and delimiter
48 are stripped from the expressions (Example:
49 ",^http://(.*)$,https://1"). This setting can be defined multiple
50 times, using the same '<seriesID>'. Substitutions in a series will be
51 applied incrementally, in order of their definition. The first substi‐
52 tution in such a series must match, otherwise no further substitutions
53 in a series will be considered. However, following the first match all
54 further substitutions in a series are processed, regardless whether in‐
55 termediate expressions match or not. Substitution series themselves
56 have no particular order, each matching series will result in a candi‐
57 date clone URL. Consequently, the initial match specification in a se‐
58 ries should be as precise as possible to prevent inflation of candidate
59 URLs.
60
61 SEEALSO
62
63 handbook:3-001 (http://handbook.datalad.org/symbols)
64 More information on Remote Indexed Archive (RIA) stores
65
66 Examples
67 Install a dataset from GitHub into the current directory::
68
69 % datalad clone https://github.com/datalad-datasets/longnow-pod‐
70 casts.git
71
72 Install a dataset into a specific directory::
73
74 % datalad clone https://github.com/datalad-datasets/longnow-pod‐
75 casts.git myfavpodcasts
76
77 Install a dataset as a subdataset into the current dataset::
78
79 % datalad clone -d . https://github.com/datalad-datasets/longnow-pod‐
80 casts.git
81
82 Install the main superdataset from datasets.datalad.org::
83
84 % datalad clone ///
85
86 Install a dataset identified by a literal alias from store.datal‐
87 ad.org::
88
89 % datalad clone ria+http://store.datalad.org#~hcp-openaccess
90
91 Install a dataset in a specific version as identified by a branch or
92 tag name from store.datalad.org::
93
94 % datalad clone ria+http://store.datal‐
95 ad.org#76b6ca66-36b1-11ea-a2e6-f0d5bf7b5561@myidentifier
96
97 Install a dataset with group-write access permissions::
98
99 % datalad clone http://example.com/dataset --reckless shared-group
100
101
103 SOURCE URL, DataLad resource identifier, local path or instance of
104 dataset to be cloned. Constraints: value must be a string
105
106 PATH path to clone into. If no PATH is provided a destination path
107 will be derived from a source URL similar to git clone.
108
109 GIT CLONE OPTIONS
110 Options to pass to git clone. Any argument specified after
111 SOURCE and the optional PATH will be passed to git-clone. Note
112 that not all options will lead to viable results. For example
113 '--single-branch' will not result in a functional annex reposi‐
114 tory because both a regular branch and the git-annex branch are
115 required. Note that a version in a RIA URL takes precedence over
116 '--branch'.
117
118
119 -h, --help, --help-np
120 show this help message. --help-np forcefully disables the use of
121 a pager for displaying the help message
122
123 -d DATASET, --dataset DATASET
124 (parent) dataset to clone into. If given, the newly cloned
125 dataset is registered as a subdataset of the parent. Also, if
126 given, relative paths are interpreted as being relative to the
127 parent dataset, and not relative to the working directory. Con‐
128 straints: Value must be a Dataset or a valid identifier of a
129 Dataset (e.g. a path) or value must be NONE
130
131 -D DESCRIPTION, --description DESCRIPTION
132 short description to use for a dataset location. Its primary
133 purpose is to help humans to identify a dataset copy (e.g.,
134 "mike's dataset on lab server"). Note that when a dataset is
135 published, this information becomes available on the remote
136 side. Constraints: value must be a string or value must be NONE
137
138 --reckless [auto|ephemeral|shared-...]
139 Obtain a dataset or subdatset and set it up in a potentially un‐
140 safe way for performance, or access reasons. Use with care, any
141 dataset is marked as 'untrusted'. The reckless mode is stored in
142 a dataset's local configuration under 'datalad.clone.reckless',
143 and will be inherited to any of its subdatasets. Supported modes
144 are: ['auto']: hard-link files between local clones. In-place
145 modification in any clone will alter original annex content.
146 ['ephemeral']: symlink annex to origin's annex and discard local
147 availability info via git-annex-dead 'here' and declares this
148 annex private. Shares an annex between origin and clone w/o git-
149 annex being aware of it. In case of a change in origin you need
150 to update the clone before you're able to save new content on
151 your end. Alternative to 'auto' when hardlinks are not an op‐
152 tion, or number of consumed inodes needs to be minimized. Note
153 that this mode can only be used with clones from non-bare repos‐
154 itories or a RIA store! Otherwise two different annex object
155 tree structures (dirhashmixed vs dirhashlower) will be used si‐
156 multaneously, and annex keys using the respective other struc‐
157 ture will be inaccessible. ['shared-<mode>']: set up repository
158 and annex permission to enable multi-user access. This disables
159 the standard write protection of annex'ed files. <mode> can be
160 any value support by 'git init --shared=', such as 'group', or
161 'all'. Constraints: value must be one of (True, False, 'auto',
162 'ephemeral') or value must start with 'shared-'
163
164 --version
165 show the module and its version which provides the command
166
168 datalad is developed by The DataLad Team and Contributors <team@datal‐
169 ad.org>.
170
171
172
173datalad clone 0.19.3 2023-08-11 datalad clone(1)