1datalad foreach-dataset(1) General Commands Manual datalad foreach-dataset(1)
2
3
4
6 datalad foreach-dataset - run a command or Python code on the dataset
7 and/or each of its sub-datasets.
8
10 datalad foreach-dataset [-h] [--cmd-type {auto|external|exec|eval}]
11 [-d DATASET] [--state {present|absent|any}] [-r] [-R LEVELS]
12 [--contains PATH] [--bottomup] [-s] [--output-streams
13 {capture|pass-through|relpath}] [--chpwd {ds|pwd}] [--safe-to-
14 consume {auto|all-subds-done|superds-done|always}] [-J NJOBS]
15 [--version] ...
16
17
18
20 This command provides a convenience for the cases were no dedicated
21 DataLad command is provided to operate across the hierarchy of
22 datasets. It is very similar to `git submodule foreach` command with
23 the following major differences
24
25 - by default (unless --subdatasets-only) it would include operation on
26 the original dataset as well, - subdatasets could be traversed in bot‐
27 tom-up order, - can execute commands in parallel (see JOBS option), but
28 would account for the order, e.g. in bottom-up order command is execut‐
29 ed in super-dataset only after it is executed in all subdatasets.
30
31 Additional notes:
32
33 - for execution of "external" commands we use the environment used to
34 execute external git and git-annex commands.
35
36 Command format
37 --cmd-type external: A few placeholders are supported in the command
38 via Python format specification:
39
40 - "{pwd}" will be replaced with the full path of the current working
41 directory. - "{ds}" and "{refds}" will provide instances of the
42 dataset currently operated on and the reference "context" dataset which
43 was provided via ``dataset`` argument. - "{tmpdir}" will be replaced
44 with the full path of a temporary directory.
45
46 Examples
47 Aggressively git clean all datasets, running 5 parallel jobs::
48
49 % datalad foreach-dataset -r -J 5 git clean -dfx
50
51
53 COMMAND
54 command for execution. A leading '--' can be used to disam‐
55 biguate this command from the preceding options to DataLad. For
56 --cmd-type exec or eval only a single command argument (Python
57 code) is supported.
58
59
60 -h, --help, --help-np
61 show this help message. --help-np forcefully disables the use of
62 a pager for displaying the help message
63
64 --cmd-type {auto|external|exec|eval}
65 type of the command. EXTERNAL: to be run in a child process us‐
66 ing dataset's runner; 'exec': Python source code to execute us‐
67 ing 'exec(), no value returned; 'eval': Python source code to
68 evaluate using 'eval()', return value is placed into 'result'
69 field. 'auto': If used via Python API, and `cmd` is a Python
70 function, it will use 'eval', and otherwise would assume 'exter‐
71 nal'. Constraints: value must be one of ('auto', 'external',
72 'exec', 'eval') [Default: 'auto']
73
74 -d DATASET, --dataset DATASET
75 specify the dataset to operate on. If no dataset is given, an
76 attempt is made to identify the dataset based on the input
77 and/or the current working directory. Constraints: Value must be
78 a Dataset or a valid identifier of a Dataset (e.g. a path) or
79 value must be NONE
80
81 --state {present|absent|any}
82 indicate which (sub)datasets to consider: either only locally
83 present, absent, or any of those two kinds. Constraints: value
84 must be one of ('present', 'absent', 'any') [Default: 'present']
85
86 -r, --recursive
87 if set, recurse into potential subdatasets.
88
89 -R LEVELS, --recursion-limit LEVELS
90 limit recursion into subdatasets to the given number of levels.
91 Constraints: value must be convertible to type 'int' or value
92 must be NONE
93
94 --contains PATH
95 limit to the subdatasets containing the given path. If a root
96 path of a subdataset is given, the last considered dataset will
97 be the subdataset itself. This option can be given multiple
98 times, in which case datasets that contain any of the given
99 paths will be considered. Constraints: value must be a string or
100 value must be NONE
101
102 --bottomup
103 whether to report subdatasets in bottom-up order along each
104 branch in the dataset tree, and not top-down.
105
106 -s, --subdatasets-only
107 whether to exclude top level dataset. It is implied if a non-
108 empty CONTAINS is used.
109
110 --output-streams {capture|pass-through|relpath}, --o-s {capture|pass-
111 through|relpath}
112 ways to handle outputs. 'capture' and return outputs from 'cmd'
113 in the record ('stdout', 'stderr'); 'pass-through' to the screen
114 (and thus absent from returned record); prefix with 'relpath'
115 captured output (similar to like grep does) and write to stdout
116 and stderr. In 'relpath', relative path is relative to the top
117 of the dataset if DATASET is specified, and if not - relative to
118 current directory. Constraints: value must be one of ('capture',
119 'pass-through', 'relpath') [Default: 'pass-through']
120
121 --chpwd {ds|pwd}
122
123 --safe-to-consume {auto|all-subds-done|superds-done|always}
124 Important only in the case of parallel (jobs greater than 1) ex‐
125 ecution. 'all-subds-done' instructs to not consider superdataset
126 until command finished execution in all subdatasets (it is the
127 value in case of 'auto' if traversal is bottomup). 'superds-
128 done' instructs to not process subdatasets until command fin‐
129 ished in the super-dataset (it is the value in case of 'auto' in
130 traversal is not bottom up, which is the default). With 'always'
131 there is no constraint on either to execute in sub or super
132 dataset. Constraints: value must be one of ('auto', 'all-subds-
133 done', 'superds-done', 'always') [Default: 'auto']
134
135 -J NJOBS, --jobs NJOBS
136 how many parallel jobs (where possible) to use. "auto" corre‐
137 sponds to the number defined by 'datalad.runtime.max-annex-jobs'
138 configuration item NOTE: This option can only parallelize input
139 retrieval (get) and output recording (save). DataLad does NOT
140 parallelize your scripts for you. Constraints: value must be
141 convertible to type 'int' or value must be NONE or value must be
142 one of ('auto',)
143
144 --version
145 show the module and its version which provides the command
146
148 datalad is developed by The DataLad Team and Contributors <team@datal‐
149 ad.org>.
150
151
152
153datalad foreach-dataset 0.19.3 2023-08-11 datalad foreach-dataset(1)