1datalad get(1)              General Commands Manual             datalad get(1)
2
3
4

NAME

6       datalad get - get any dataset content (files/directories/subdatasets).
7

SYNOPSIS

9       datalad   get   [-h]  [-s  LABEL]  [-d  PATH]  [-r]  [-R  LEVELS]  [-n]
10              [-D   DESCRIPTION]   [--reckless    [auto|ephemeral|shared-...]]
11              [-J NJOBS] [--version] [PATH ...]
12
13
14

DESCRIPTION

16       This command only operates on dataset content. To obtain a new indepen‐
17       dent dataset from some source use the CLONE command.
18
19       By default this command operates recursively within a dataset, but  not
20       across  potential  subdatasets,  i.e.  if  a directory is provided, all
21       files in the directory are obtained. Recursion into subdatasets is sup‐
22       ported too. If enabled, relevant subdatasets are detected and installed
23       in order to fulfill a request.
24
25       Known data locations for each requested file are evaluated and data are
26       obtained  from some available location (according to git-annex configu‐
27       ration and possibly assigned  remote  priorities),  unless  a  specific
28       source is specified.
29
30   Getting subdatasets
31       Just  as DataLad supports getting file content from more than one loca‐
32       tion, the same is supported for subdatasets, including a ranking of in‐
33       dividual sources for prioritization.
34
35       The  following location candidates are considered. For each candidate a
36       cost is given in parenthesis, higher values indicate higher  cost,  and
37       thus lower priority:
38
39       -  A  datalad URL recorded in `.gitmodules` (cost 590). This allows for
40       datalad URLs that require additional  handling/resolution  by  datalad,
41       like ria-schemes (ria+http, ria+ssh, etc.)
42
43       - A URL or absolute path recorded for git in `.gitmodules` (cost 600).
44
45       -  URL  of any configured superdataset remote that is known to have the
46       desired submodule commit, with  the  submodule  path  appended  to  it.
47       There can be more than one candidate (cost 650).
48
49       -  In case `.gitmodules` contains a relative path instead of a URL, the
50       URL of any configured superdataset remote that is known to have the de‐
51       sired  submodule commit, with this relative path appended to it.  There
52       can be more than one candidate (cost 650).
53
54       - In case `.gitmodules` contains a relative path as a URL, the absolute
55       path of the superdataset, appended with this relative path (cost 900).
56
57       Additional candidate URLs can be generated based on templates specified
58       as configuration variables with the pattern
59
60       `datalad.get.subdataset-source-candidate-<name>`
61
62       where NAME is an arbitrary identifier. If `name` starts with three dig‐
63       its  (e.g.  '400myserver') these will be interpreted as a cost, and the
64       respective candidate will be sorted into the generated  candidate  list
65       according to this cost. If no cost is given, a default of 700 is used.
66
67       A  template  string  assigned to such a variable can utilize the Python
68       format mini language and may reference a number of properties that  are
69       inferred  from  the  parent  dataset's  knowledge about the target sub‐
70       dataset. Properties include any submodule property specified in the re‐
71       spective  `.gitmodules`  record.  For  convenience, an existing `datal‐
72       ad-id` record is made available under the shortened name ID.
73
74       Additionally, the URL of any configured remote that  contains  the  re‐
75       spective  submodule commit is available as `remoteurl-<name>` property,
76       where NAME is the configured remote name.
77
78       Hence, such a template could be  `http://example.org/datasets/{id}`  or
79       `http://example.org/datasets/{path}`,  where  `{id}` and `{path}` would
80       be replaced by the `datalad-id` or  PATH  entry  in  the  `.gitmodules`
81       record.
82
83       If  this config is committed in `.datalad/config`, a clone of a dataset
84       can look up any subdataset's URL according to such scheme(s)  irrespec‐
85       tive of what URL is recorded in `.gitmodules`.
86
87       Lastly, all candidates are sorted according to their cost (lower values
88       first), and duplicate URLs are stripped,  while  preserving  the  first
89       item in the candidate list.
90
91       NOTE   Power-user info: This command uses git annex get to fulfill file
92              handles.
93
94   Examples
95       Get a single file::
96
97        % datalad get <path/to/file>
98
99       Get contents of a directory::
100
101        % datalad get <path/to/dir/>
102
103       Get all contents of the current dataset and its subdatasets::
104
105        % datalad get . -r
106
107       Get (clone) a registered subdataset, but don't retrieve data::
108
109        % datalad get -n <path/to/subds>
110
111

OPTIONS

113       PATH   path/name of the requested dataset component. The component must
114              already  be  known  to  a  dataset.  To  add new components to a
115              dataset use the ADD command. Constraints: value must be a string
116              or value must be NONE
117
118
119       -h, --help, --help-np
120              show this help message. --help-np forcefully disables the use of
121              a pager for displaying the help message
122
123       -s LABEL, --source LABEL
124              label of the data source to be used to  fulfill  requests.  This
125              can  be  the  name of a dataset sibling or another known source.
126              Constraints: value must be a string or value must be NONE
127
128       -d PATH, --dataset PATH
129              specify the dataset to perform the add operation  on,  in  which
130              case  PATH  arguments  are interpreted as being relative to this
131              dataset. If no dataset is given, an attempt is made to  identify
132              a  dataset  for  each input `path`. Constraints: Value must be a
133              Dataset or a valid identifier of a Dataset (e.g. a path) or val‐
134              ue must be NONE
135
136       -r, --recursive
137              if set, recurse into potential subdatasets.
138
139       -R LEVELS, --recursion-limit LEVELS
140              limit  recursion  into subdataset to the given number of levels.
141              Alternatively, 'existing' will limit  recursion  to  subdatasets
142              that  already existed on the filesystem at the start of process‐
143              ing, and prevent new subdatasets from being obtained  recursive‐
144              ly. Constraints: value must be convertible to type 'int' or val‐
145              ue must be one of ('existing',) or value must be NONE
146
147       -n, --no-data
148              whether to obtain data for all file handles.  If  disabled,  GET
149              operations  are limited to dataset handles. This option prevents
150              data for file handles from being obtained.
151
152       -D DESCRIPTION, --description DESCRIPTION
153              short description to use for a  dataset  location.  Its  primary
154              purpose  is  to  help  humans  to identify a dataset copy (e.g.,
155              "mike's dataset on lab server"). Note that  when  a  dataset  is
156              published,  this  information  becomes  available  on the remote
157              side. Constraints: value must be a string or value must be NONE
158
159       --reckless [auto|ephemeral|shared-...]
160              Obtain a dataset or subdatset and set it up in a potentially un‐
161              safe  way for performance, or access reasons. Use with care, any
162              dataset is marked as 'untrusted'. The reckless mode is stored in
163              a  dataset's local configuration under 'datalad.clone.reckless',
164              and will be inherited to any of its subdatasets. Supported modes
165              are:  ['auto']:  hard-link  files between local clones. In-place
166              modification in any clone will  alter  original  annex  content.
167              ['ephemeral']: symlink annex to origin's annex and discard local
168              availability info via git-annex-dead 'here'  and  declares  this
169              annex private. Shares an annex between origin and clone w/o git-
170              annex being aware of it. In case of a change in origin you  need
171              to  update  the  clone before you're able to save new content on
172              your end. Alternative to 'auto' when hardlinks are  not  an  op‐
173              tion,  or  number of consumed inodes needs to be minimized. Note
174              that this mode can only be used with clones from non-bare repos‐
175              itories  or  a  RIA  store! Otherwise two different annex object
176              tree structures (dirhashmixed vs dirhashlower) will be used  si‐
177              multaneously,  and  annex keys using the respective other struc‐
178              ture will be inaccessible. ['shared-<mode>']: set up  repository
179              and  annex permission to enable multi-user access. This disables
180              the standard write protection of annex'ed files. <mode>  can  be
181              any  value  support by 'git init --shared=', such as 'group', or
182              'all'. Constraints: value must be one of (True,  False,  'auto',
183              'ephemeral') or value must start with 'shared-'
184
185       -J NJOBS, --jobs NJOBS
186              how  many  parallel  jobs (where possible) to use. "auto" corre‐
187              sponds to the number defined by 'datalad.runtime.max-annex-jobs'
188              configuration  item NOTE: This option can only parallelize input
189              retrieval (get) and output recording (save).  DataLad  does  NOT
190              parallelize  your  scripts  for  you. Constraints: value must be
191              convertible to type 'int' or value must be NONE or value must be
192              one of ('auto',) [Default: 'auto']
193
194       --version
195              show the module and its version which provides the command
196

AUTHORS

198        datalad is developed by The DataLad Team and Contributors <team@datal‐
199       ad.org>.
200
201
202
203datalad get 0.19.3                2023-08-11                    datalad get(1)
Impressum