1datalad clone(1)            General Commands Manual           datalad clone(1)
2
3
4

NAME

6       datalad clone - obtain a dataset (copy) from a URL or local directory
7

SYNOPSIS

9       datalad   clone   [-h]   [-d   DATASET]  [-D  DESCRIPTION]  [--reckless
10              [auto|ephemeral|shared-...]] [--version] SOURCE [PATH] ...
11
12
13

DESCRIPTION

15       The purpose of this command is to  obtain  a  new  clone  (copy)  of  a
16       dataset  and  place  it  into a not-yet-existing or empty directory. As
17       such CLONE provides a strict subset of  the  functionality  offered  by
18       `install`.  Only a single dataset can be obtained, and immediate recur‐
19       sive installation of subdatasets is not supported. However, once a (su‐
20       per)dataset  is installed via CLONE, any content, including subdatasets
21       can be obtained by a subsequent `get` command.
22
23       Primary differences over a direct `git clone` call are 1) the automatic
24       initialization  of  a  dataset annex (pure Git repositories are equally
25       supported); 2) automatic registration of the newly obtained dataset  as
26       a  subdataset (submodule), if a parent dataset is specified; 3) support
27       for additional resource identifiers (DataLad  resource  identifiers  as
28       used   on   datasets.datalad.org,  and  RIA  store  URLs  as  used  for
29       store.datalad.org - optionally in specific versions as identified by  a
30       branch  or  a tag; see examples); and 4) automatic configurable genera‐
31       tion of alternative access URL for  common  cases  (such  as  appending
32       '.git' to the URL in case the accessing the base URL failed).
33
34       In  case  the  clone  is  registered  as a subdataset, the original URL
35       passed to CLONE is recorded in `.gitmodules` of the parent  dataset  in
36       addition to the resolved URL used internally for git-clone. This allows
37       to preserve datalad specific URLs  like  ria+ssh://...  for  subsequent
38       calls to GET if the subdataset was locally removed later on.
39
40       URL mapping configuration
41
42       specifications. A substitution specification is defined as a configura‐
43       tion setting 'datalad.clone.url-substition.<seriesID>'  with  a  string
44       containing a match and substitution expression, each following Python's
45       regular expression syntax. Both expressions are concatenated to a  sin‐
46       gle  string with an arbitrary delimiter character. The delimiter is de‐
47       fined by prefixing the string with the delimiter. Prefix and  delimiter
48       are       stripped       from       the      expressions      (Example:
49       ",^http://(.*)$,https://1").  This  setting  can  be  defined  multiple
50       times,  using the same '<seriesID>'.  Substitutions in a series will be
51       applied incrementally, in order of their definition.  The first substi‐
52       tution  in such a series must match, otherwise no further substitutions
53       in a series will be considered. However, following the first match  all
54       further substitutions in a series are processed, regardless whether in‐
55       termediate expressions match or  not.  Substitution  series  themselves
56       have  no particular order, each matching series will result in a candi‐
57       date clone URL. Consequently, the initial match specification in a  se‐
58       ries should be as precise as possible to prevent inflation of candidate
59       URLs.
60
61       SEEALSO
62
63       handbook:3-001 (http://handbook.datalad.org/symbols)
64         More information on Remote Indexed Archive (RIA) stores
65
66   Examples
67       Install a dataset from GitHub into the current directory::
68
69        %   datalad   clone   https://github.com/datalad-datasets/longnow-pod
70       casts.git
71
72       Install a dataset into a specific directory::
73
74        %   datalad   clone   https://github.com/datalad-datasets/longnow-pod
75       casts.git    myfavpodcasts
76
77       Install a dataset as a subdataset into the current dataset::
78
79        % datalad clone -d .  https://github.com/datalad-datasets/longnow-pod
80       casts.git
81
82       Install the main superdataset from datasets.datalad.org::
83
84        % datalad clone ///
85
86       Install  a  dataset  identified  by  a  literal alias from store.datal‐
87       ad.org::
88
89        % datalad clone ria+http://store.datalad.org#~hcp-openaccess
90
91       Install a dataset in a specific version as identified by  a  branch  or
92       tag name from store.datalad.org::
93
94        %            datalad           clone           ria+http://store.datal
95       ad.org#76b6ca66-36b1-11ea-a2e6-f0d5bf7b5561@myidentifier
96
97       Install a dataset with group-write access permissions::
98
99        % datalad clone http://example.com/dataset --reckless shared-group
100
101

OPTIONS

103       SOURCE URL, DataLad resource identifier,  local  path  or  instance  of
104              dataset to be cloned. Constraints: value must be a string
105
106       PATH   path  to  clone  into. If no PATH is provided a destination path
107              will be derived from a source URL similar to git clone.
108
109       GIT CLONE OPTIONS
110              Options to pass to  git  clone.  Any  argument  specified  after
111              SOURCE  and  the optional PATH will be passed to git-clone. Note
112              that not all options will lead to viable  results.  For  example
113              '--single-branch'  will not result in a functional annex reposi‐
114              tory because both a regular branch and the git-annex branch  are
115              required. Note that a version in a RIA URL takes precedence over
116              '--branch'.
117
118
119       -h, --help, --help-np
120              show this help message. --help-np forcefully disables the use of
121              a pager for displaying the help message
122
123       -d DATASET, --dataset DATASET
124              (parent)  dataset  to  clone  into.  If  given, the newly cloned
125              dataset is registered as a subdataset of the  parent.  Also,  if
126              given,  relative  paths are interpreted as being relative to the
127              parent dataset, and not relative to the working directory.  Con‐
128              straints:  Value  must  be  a Dataset or a valid identifier of a
129              Dataset (e.g. a path) or value must be NONE
130
131       -D DESCRIPTION, --description DESCRIPTION
132              short description to use for a  dataset  location.  Its  primary
133              purpose  is  to  help  humans  to identify a dataset copy (e.g.,
134              "mike's dataset on lab server"). Note that  when  a  dataset  is
135              published,  this  information  becomes  available  on the remote
136              side. Constraints: value must be a string or value must be NONE
137
138       --reckless [auto|ephemeral|shared-...]
139              Obtain a dataset or subdatset and set it up in a potentially un‐
140              safe  way for performance, or access reasons. Use with care, any
141              dataset is marked as 'untrusted'. The reckless mode is stored in
142              a  dataset's local configuration under 'datalad.clone.reckless',
143              and will be inherited to any of its subdatasets. Supported modes
144              are:  ['auto']:  hard-link  files between local clones. In-place
145              modification in any clone will  alter  original  annex  content.
146              ['ephemeral']: symlink annex to origin's annex and discard local
147              availability info via git-annex-dead 'here'  and  declares  this
148              annex private. Shares an annex between origin and clone w/o git-
149              annex being aware of it. In case of a change in origin you  need
150              to  update  the  clone before you're able to save new content on
151              your end. Alternative to 'auto' when hardlinks are  not  an  op‐
152              tion,  or  number of consumed inodes needs to be minimized. Note
153              that this mode can only be used with clones from non-bare repos‐
154              itories  or  a  RIA  store! Otherwise two different annex object
155              tree structures (dirhashmixed vs dirhashlower) will be used  si‐
156              multaneously,  and  annex keys using the respective other struc‐
157              ture will be inaccessible. ['shared-<mode>']: set up  repository
158              and  annex permission to enable multi-user access. This disables
159              the standard write protection of annex'ed files. <mode>  can  be
160              any  value  support by 'git init --shared=', such as 'group', or
161              'all'. Constraints: value must be one of (True,  False,  'auto',
162              'ephemeral') or value must start with 'shared-'
163
164       --version
165              show the module and its version which provides the command
166

AUTHORS

168        datalad is developed by The DataLad Team and Contributors <team@datal‐
169       ad.org>.
170
171
172
173datalad clone 0.19.3              2023-08-11                  datalad clone(1)
Impressum