1datalad install(1) General Commands Manual datalad install(1)
2
3
4
6 datalad install - install one or many datasets from remote URL(s) or
7 local PATH source(s).
8
10 datalad install [-h] [-s URL-OR-PATH] [-d DATASET] [-g]
11 [-D DESCRIPTION] [-r] [-R LEVELS] [--reckless
12 [auto|ephemeral|shared-...]] [-J NJOBS] [--branch BRANCH]
13 [--version] [URL-OR-PATH ...]
14
15
16
18 This command creates local sibling(s) of existing dataset(s) from (re‐
19 mote) locations specified as URL(s) or path(s). Optional recursion into
20 potential subdatasets, and download of all referenced data is support‐
21 ed. The new dataset(s) can be optionally registered in an existing su‐
22 perdataset by identifying it via the DATASET argument (the new
23 dataset's path needs to be located within the superdataset for that).
24
25 If no explicit -s|--source option is specified, then all positional
26 URL-OR-PATH arguments are considered to be "sources" if they are URLs
27 or target locations if they are paths. If a target location path corre‐
28 sponds to a submodule, the source location for it is figured out from
29 its record in the `.gitmodules`. If -s|--source is specified, then a
30 single optional positional PATH would be taken as the destination path
31 for that dataset.
32
33 It is possible to provide a brief description to label the dataset's
34 nature *and* location, e.g. "Michael's music on black laptop". This
35 helps humans to identify data locations in distributed scenarios. By
36 default an identifier comprised of user and machine name, plus path
37 will be generated.
38
39 When only partial dataset content shall be obtained, it is recommended
40 to use this command without the `get-data` flag, followed by a `get`
41 operation to obtain the desired data.
42
43 NOTE Power-user info: This command uses git clone, and git annex init
44 to prepare the dataset. Registering to a superdataset is per‐
45 formed via a git submodule add operation in the discovered su‐
46 perdataset.
47
48 Examples
49 Install a dataset from GitHub into the current directory::
50
51 % datalad install https://github.com/datalad-datasets/longnow-pod‐
52 casts.git
53
54 Install a dataset as a subdataset into the current dataset::
55
56 % datalad install -d . --source='https://github.com/datal‐
57 ad-datasets/longnow-podcasts.git'
58
59 Install a dataset into 'podcasts' (not 'longnow-podcasts') directory,
60 and get all content right away::
61
62 % datalad install --get-data -s https://github.com/datal‐
63 ad-datasets/longnow-podcasts.git podcasts
64
65 Install a dataset with all its subdatasets::
66
67 % datalad install -r https://github.com/datalad-datasets/long‐
68 now-podcasts.git
69
70
72 URL-OR-PATH
73 path/name of the installation target. If no PATH is provided a
74 destination path will be derived from a source URL similar to
75 git clone.
76
77
78 -h, --help, --help-np
79 show this help message. --help-np forcefully disables the use of
80 a pager for displaying the help message
81
82 -s URL-OR-PATH, --source URL-OR-PATH
83 URL or local path of the installation source. Constraints: value
84 must be a string or value must be NONE
85
86 -d DATASET, --dataset DATASET
87 specify the dataset to perform the install operation on. If no
88 dataset is given, an attempt is made to identify the dataset in
89 a parent directory of the current working directory and/or the
90 PATH given. Constraints: Value must be a Dataset or a valid
91 identifier of a Dataset (e.g. a path) or value must be NONE
92
93 -g, --get-data
94 if given, obtain all data content too.
95
96 -D DESCRIPTION, --description DESCRIPTION
97 short description to use for a dataset location. Its primary
98 purpose is to help humans to identify a dataset copy (e.g.,
99 "mike's dataset on lab server"). Note that when a dataset is
100 published, this information becomes available on the remote
101 side. Constraints: value must be a string or value must be NONE
102
103 -r, --recursive
104 if set, recurse into potential subdatasets.
105
106 -R LEVELS, --recursion-limit LEVELS
107 limit recursion into subdatasets to the given number of levels.
108 Constraints: value must be convertible to type 'int' or value
109 must be NONE
110
111 --reckless [auto|ephemeral|shared-...]
112 Obtain a dataset or subdatset and set it up in a potentially un‐
113 safe way for performance, or access reasons. Use with care, any
114 dataset is marked as 'untrusted'. The reckless mode is stored in
115 a dataset's local configuration under 'datalad.clone.reckless',
116 and will be inherited to any of its subdatasets. Supported modes
117 are: ['auto']: hard-link files between local clones. In-place
118 modification in any clone will alter original annex content.
119 ['ephemeral']: symlink annex to origin's annex and discard local
120 availability info via git-annex-dead 'here' and declares this
121 annex private. Shares an annex between origin and clone w/o git-
122 annex being aware of it. In case of a change in origin you need
123 to update the clone before you're able to save new content on
124 your end. Alternative to 'auto' when hardlinks are not an op‐
125 tion, or number of consumed inodes needs to be minimized. Note
126 that this mode can only be used with clones from non-bare repos‐
127 itories or a RIA store! Otherwise two different annex object
128 tree structures (dirhashmixed vs dirhashlower) will be used si‐
129 multaneously, and annex keys using the respective other struc‐
130 ture will be inaccessible. ['shared-<mode>']: set up repository
131 and annex permission to enable multi-user access. This disables
132 the standard write protection of annex'ed files. <mode> can be
133 any value support by 'git init --shared=', such as 'group', or
134 'all'. Constraints: value must be one of (True, False, 'auto',
135 'ephemeral') or value must start with 'shared-'
136
137 -J NJOBS, --jobs NJOBS
138 how many parallel jobs (where possible) to use. "auto" corre‐
139 sponds to the number defined by 'datalad.runtime.max-annex-jobs'
140 configuration item NOTE: This option can only parallelize input
141 retrieval (get) and output recording (save). DataLad does NOT
142 parallelize your scripts for you. Constraints: value must be
143 convertible to type 'int' or value must be NONE or value must be
144 one of ('auto',) [Default: 'auto']
145
146 --branch BRANCH
147 Clone source at this branch or tag. This option applies only to
148 the top-level dataset not any subdatasets that may be cloned
149 when installing recursively. Note that if the source is a RIA
150 URL with a version, it takes precedence over this option. Con‐
151 straints: value must be a string or value must be NONE
152
153 --version
154 show the module and its version which provides the command
155
157 datalad is developed by The DataLad Team and Contributors <team@datal‐
158 ad.org>.
159
160
161
162datalad install 0.19.3 2023-08-11 datalad install(1)