1datalad get(1) General Commands Manual datalad get(1)
2
3
4
6 datalad get - get any dataset content (files/directories/subdatasets).
7
9 datalad get [-h] [-s LABEL] [-d PATH] [-r] [-R LEVELS] [-n]
10 [-D DESCRIPTION] [--reckless [auto|ephemeral|shared-...]]
11 [-J NJOBS] [--version] [PATH ...]
12
13
14
16 This command only operates on dataset content. To obtain a new indepen‐
17 dent dataset from some source use the CLONE command.
18
19 By default this command operates recursively within a dataset, but not
20 across potential subdatasets, i.e. if a directory is provided, all
21 files in the directory are obtained. Recursion into subdatasets is sup‐
22 ported too. If enabled, relevant subdatasets are detected and installed
23 in order to fulfill a request.
24
25 Known data locations for each requested file are evaluated and data are
26 obtained from some available location (according to git-annex configu‐
27 ration and possibly assigned remote priorities), unless a specific
28 source is specified.
29
30 Getting subdatasets
31 Just as DataLad supports getting file content from more than one loca‐
32 tion, the same is supported for subdatasets, including a ranking of in‐
33 dividual sources for prioritization.
34
35 The following location candidates are considered. For each candidate a
36 cost is given in parenthesis, higher values indicate higher cost, and
37 thus lower priority:
38
39 - A datalad URL recorded in `.gitmodules` (cost 590). This allows for
40 datalad URLs that require additional handling/resolution by datalad,
41 like ria-schemes (ria+http, ria+ssh, etc.)
42
43 - A URL or absolute path recorded for git in `.gitmodules` (cost 600).
44
45 - URL of any configured superdataset remote that is known to have the
46 desired submodule commit, with the submodule path appended to it.
47 There can be more than one candidate (cost 650).
48
49 - In case `.gitmodules` contains a relative path instead of a URL, the
50 URL of any configured superdataset remote that is known to have the de‐
51 sired submodule commit, with this relative path appended to it. There
52 can be more than one candidate (cost 650).
53
54 - In case `.gitmodules` contains a relative path as a URL, the absolute
55 path of the superdataset, appended with this relative path (cost 900).
56
57 Additional candidate URLs can be generated based on templates specified
58 as configuration variables with the pattern
59
60 `datalad.get.subdataset-source-candidate-<name>`
61
62 where NAME is an arbitrary identifier. If `name` starts with three dig‐
63 its (e.g. '400myserver') these will be interpreted as a cost, and the
64 respective candidate will be sorted into the generated candidate list
65 according to this cost. If no cost is given, a default of 700 is used.
66
67 A template string assigned to such a variable can utilize the Python
68 format mini language and may reference a number of properties that are
69 inferred from the parent dataset's knowledge about the target sub‐
70 dataset. Properties include any submodule property specified in the re‐
71 spective `.gitmodules` record. For convenience, an existing `datal‐
72 ad-id` record is made available under the shortened name ID.
73
74 Additionally, the URL of any configured remote that contains the re‐
75 spective submodule commit is available as `remoteurl-<name>` property,
76 where NAME is the configured remote name.
77
78 Hence, such a template could be `http://example.org/datasets/{id}` or
79 `http://example.org/datasets/{path}`, where `{id}` and `{path}` would
80 be replaced by the `datalad-id` or PATH entry in the `.gitmodules`
81 record.
82
83 If this config is committed in `.datalad/config`, a clone of a dataset
84 can look up any subdataset's URL according to such scheme(s) irrespec‐
85 tive of what URL is recorded in `.gitmodules`.
86
87 Lastly, all candidates are sorted according to their cost (lower values
88 first), and duplicate URLs are stripped, while preserving the first
89 item in the candidate list.
90
91 NOTE Power-user info: This command uses git annex get to fulfill file
92 handles.
93
94 Examples
95 Get a single file::
96
97 % datalad get <path/to/file>
98
99 Get contents of a directory::
100
101 % datalad get <path/to/dir/>
102
103 Get all contents of the current dataset and its subdatasets::
104
105 % datalad get . -r
106
107 Get (clone) a registered subdataset, but don't retrieve data::
108
109 % datalad get -n <path/to/subds>
110
111
113 PATH path/name of the requested dataset component. The component must
114 already be known to a dataset. To add new components to a
115 dataset use the ADD command. Constraints: value must be a string
116 or value must be NONE
117
118
119 -h, --help, --help-np
120 show this help message. --help-np forcefully disables the use of
121 a pager for displaying the help message
122
123 -s LABEL, --source LABEL
124 label of the data source to be used to fulfill requests. This
125 can be the name of a dataset sibling or another known source.
126 Constraints: value must be a string or value must be NONE
127
128 -d PATH, --dataset PATH
129 specify the dataset to perform the add operation on, in which
130 case PATH arguments are interpreted as being relative to this
131 dataset. If no dataset is given, an attempt is made to identify
132 a dataset for each input `path`. Constraints: Value must be a
133 Dataset or a valid identifier of a Dataset (e.g. a path) or val‐
134 ue must be NONE
135
136 -r, --recursive
137 if set, recurse into potential subdatasets.
138
139 -R LEVELS, --recursion-limit LEVELS
140 limit recursion into subdataset to the given number of levels.
141 Alternatively, 'existing' will limit recursion to subdatasets
142 that already existed on the filesystem at the start of process‐
143 ing, and prevent new subdatasets from being obtained recursive‐
144 ly. Constraints: value must be convertible to type 'int' or val‐
145 ue must be one of ('existing',) or value must be NONE
146
147 -n, --no-data
148 whether to obtain data for all file handles. If disabled, GET
149 operations are limited to dataset handles. This option prevents
150 data for file handles from being obtained.
151
152 -D DESCRIPTION, --description DESCRIPTION
153 short description to use for a dataset location. Its primary
154 purpose is to help humans to identify a dataset copy (e.g.,
155 "mike's dataset on lab server"). Note that when a dataset is
156 published, this information becomes available on the remote
157 side. Constraints: value must be a string or value must be NONE
158
159 --reckless [auto|ephemeral|shared-...]
160 Obtain a dataset or subdatset and set it up in a potentially un‐
161 safe way for performance, or access reasons. Use with care, any
162 dataset is marked as 'untrusted'. The reckless mode is stored in
163 a dataset's local configuration under 'datalad.clone.reckless',
164 and will be inherited to any of its subdatasets. Supported modes
165 are: ['auto']: hard-link files between local clones. In-place
166 modification in any clone will alter original annex content.
167 ['ephemeral']: symlink annex to origin's annex and discard local
168 availability info via git-annex-dead 'here' and declares this
169 annex private. Shares an annex between origin and clone w/o git-
170 annex being aware of it. In case of a change in origin you need
171 to update the clone before you're able to save new content on
172 your end. Alternative to 'auto' when hardlinks are not an op‐
173 tion, or number of consumed inodes needs to be minimized. Note
174 that this mode can only be used with clones from non-bare repos‐
175 itories or a RIA store! Otherwise two different annex object
176 tree structures (dirhashmixed vs dirhashlower) will be used si‐
177 multaneously, and annex keys using the respective other struc‐
178 ture will be inaccessible. ['shared-<mode>']: set up repository
179 and annex permission to enable multi-user access. This disables
180 the standard write protection of annex'ed files. <mode> can be
181 any value support by 'git init --shared=', such as 'group', or
182 'all'. Constraints: value must be one of (True, False, 'auto',
183 'ephemeral') or value must start with 'shared-'
184
185 -J NJOBS, --jobs NJOBS
186 how many parallel jobs (where possible) to use. "auto" corre‐
187 sponds to the number defined by 'datalad.runtime.max-annex-jobs'
188 configuration item NOTE: This option can only parallelize input
189 retrieval (get) and output recording (save). DataLad does NOT
190 parallelize your scripts for you. Constraints: value must be
191 convertible to type 'int' or value must be NONE or value must be
192 one of ('auto',) [Default: 'auto']
193
194 --version
195 show the module and its version which provides the command
196
198 datalad is developed by The DataLad Team and Contributors <team@datal‐
199 ad.org>.
200
201
202
203datalad get 0.19.3 2023-08-11 datalad get(1)