1datalad foreach-dataset(1)  General Commands Manual datalad foreach-dataset(1)
2
3
4

NAME

6       datalad  foreach-dataset  - run a command or Python code on the dataset
7       and/or each of its sub-datasets.
8

SYNOPSIS

10       datalad  foreach-dataset  [-h]  [--cmd-type  {auto|external|exec|eval}]
11              [-d  DATASET]  [--state  {present|absent|any}]  [-r] [-R LEVELS]
12              [--contains   PATH]    [--bottomup]    [-s]    [--output-streams
13              {capture|pass-through|relpath}]  [--chpwd  {ds|pwd}] [--safe-to-
14              consume  {auto|all-subds-done|superds-done|always}]  [-J  NJOBS]
15              [--version] ...
16
17
18

DESCRIPTION

20       This  command  provides  a  convenience for the cases were no dedicated
21       DataLad  command  is  provided  to  operate  across  the  hierarchy  of
22       datasets.  It  is  very similar to `git submodule foreach` command with
23       the following major differences
24
25       - by default (unless --subdatasets-only) it would include operation  on
26       the  original dataset as well, - subdatasets could be traversed in bot‐
27       tom-up order, - can execute commands in parallel (see JOBS option), but
28       would account for the order, e.g. in bottom-up order command is execut‐
29       ed in super-dataset only after it is executed in all subdatasets.
30
31       Additional notes:
32
33       - for execution of "external" commands we use the environment  used  to
34       execute external git and git-annex commands.
35
36   Command format
37       --cmd-type  external:  A  few placeholders are supported in the command
38       via Python format specification:
39
40       - "{pwd}" will be replaced with the full path of  the  current  working
41       directory.   -  "{ds}"  and  "{refds}"  will  provide  instances of the
42       dataset currently operated on and the reference "context" dataset which
43       was  provided  via ``dataset`` argument.  - "{tmpdir}" will be replaced
44       with the full path of a temporary directory.
45
46   Examples
47       Aggressively  git clean  all datasets, running 5 parallel jobs::
48
49        % datalad foreach-dataset -r -J 5 git clean -dfx
50
51

OPTIONS

53       COMMAND
54              command for execution. A leading '--'  can  be  used  to  disam‐
55              biguate  this command from the preceding options to DataLad. For
56              --cmd-type exec or eval only a single command  argument  (Python
57              code) is supported.
58
59
60       -h, --help, --help-np
61              show this help message. --help-np forcefully disables the use of
62              a pager for displaying the help message
63
64       --cmd-type {auto|external|exec|eval}
65              type of the command. EXTERNAL: to be run in a child process  us‐
66              ing  dataset's runner; 'exec': Python source code to execute us‐
67              ing 'exec(), no value returned; 'eval': Python  source  code  to
68              evaluate  using  'eval()',  return value is placed into 'result'
69              field. 'auto': If used via Python API, and  `cmd`  is  a  Python
70              function, it will use 'eval', and otherwise would assume 'exter‐
71              nal'. Constraints: value must be  one  of  ('auto',  'external',
72              'exec', 'eval') [Default: 'auto']
73
74       -d DATASET, --dataset DATASET
75              specify  the  dataset  to operate on. If no dataset is given, an
76              attempt is made to identify  the  dataset  based  on  the  input
77              and/or the current working directory. Constraints: Value must be
78              a Dataset or a valid identifier of a Dataset (e.g.  a  path)  or
79              value must be NONE
80
81       --state {present|absent|any}
82              indicate  which  (sub)datasets  to consider: either only locally
83              present, absent, or any of those two kinds.  Constraints:  value
84              must be one of ('present', 'absent', 'any') [Default: 'present']
85
86       -r, --recursive
87              if set, recurse into potential subdatasets.
88
89       -R LEVELS, --recursion-limit LEVELS
90              limit  recursion into subdatasets to the given number of levels.
91              Constraints: value must be convertible to type  'int'  or  value
92              must be NONE
93
94       --contains PATH
95              limit  to  the  subdatasets containing the given path. If a root
96              path of a subdataset is given, the last considered dataset  will
97              be  the  subdataset  itself.  This  option can be given multiple
98              times, in which case datasets that  contain  any  of  the  given
99              paths will be considered. Constraints: value must be a string or
100              value must be NONE
101
102       --bottomup
103              whether to report subdatasets  in  bottom-up  order  along  each
104              branch in the dataset tree, and not top-down.
105
106       -s, --subdatasets-only
107              whether  to  exclude  top level dataset. It is implied if a non-
108              empty CONTAINS is used.
109
110       --output-streams {capture|pass-through|relpath},  --o-s  {capture|pass-
111       through|relpath}
112              ways  to handle outputs. 'capture' and return outputs from 'cmd'
113              in the record ('stdout', 'stderr'); 'pass-through' to the screen
114              (and  thus  absent  from returned record); prefix with 'relpath'
115              captured output (similar to like grep does) and write to  stdout
116              and  stderr.  In 'relpath', relative path is relative to the top
117              of the dataset if DATASET is specified, and if not - relative to
118              current directory. Constraints: value must be one of ('capture',
119              'pass-through', 'relpath') [Default: 'pass-through']
120
121       --chpwd {ds|pwd}
122
123       --safe-to-consume {auto|all-subds-done|superds-done|always}
124              Important only in the case of parallel (jobs greater than 1) ex‐
125              ecution. 'all-subds-done' instructs to not consider superdataset
126              until command finished execution in all subdatasets (it  is  the
127              value  in  case  of  'auto' if traversal is bottomup). 'superds-
128              done' instructs to not process subdatasets  until  command  fin‐
129              ished in the super-dataset (it is the value in case of 'auto' in
130              traversal is not bottom up, which is the default). With 'always'
131              there  is  no  constraint  on  either to execute in sub or super
132              dataset. Constraints: value must be one of ('auto',  'all-subds-
133              done', 'superds-done', 'always') [Default: 'auto']
134
135       -J NJOBS, --jobs NJOBS
136              how  many  parallel  jobs (where possible) to use. "auto" corre‐
137              sponds to the number defined by 'datalad.runtime.max-annex-jobs'
138              configuration  item NOTE: This option can only parallelize input
139              retrieval (get) and output recording (save).  DataLad  does  NOT
140              parallelize  your  scripts  for  you. Constraints: value must be
141              convertible to type 'int' or value must be NONE or value must be
142              one of ('auto',)
143
144       --version
145              show the module and its version which provides the command
146

AUTHORS

148        datalad is developed by The DataLad Team and Contributors <team@datal‐
149       ad.org>.
150
151
152
153datalad foreach-dataset 0.19.3    2023-08-11        datalad foreach-dataset(1)
Impressum