1datalad run(1) General Commands Manual datalad run(1)
2
3
4
6 datalad run - run an arbitrary shell command and record its impact on a
7 dataset.
8
10 datalad run [-h] [-d DATASET] [-i PATH] [-o PATH]
11 [--expand {inputs|outputs|both}] [--assume-
12 ready {inputs|outputs|both}] [--explicit] [-m MESSAGE]
13 [--sidecar {yes|no}] [--dry-run {basic|command}] [-J NJOBS]
14 [--version] ...
15
16
17
19 It is recommended to craft the command such that it can run in the root
20 directory of the dataset that the command will be recorded in. However,
21 as long as the command is executed somewhere underneath the dataset
22 root, the exact location will be recorded relative to the dataset root.
23
24 If the executed command did not alter the dataset in any way, no record
25 of the command execution is made.
26
27 If the given command errors, a COMMANDERROR exception with the same ex‐
28 it code will be raised, and no modifications will be saved. A command
29 execution will not be attempted, by default, when an error occurred
30 during input or output preparation. This default ``stop`` behavior can
31 be overridden via --on-failure ....
32
33 In the presence of subdatasets, the full dataset hierarchy will be
34 checked for unsaved changes prior command execution, and changes in any
35 dataset will be saved after execution. Any modification of subdatasets
36 is also saved in their respective superdatasets to capture a comprehen‐
37 sive record of the entire dataset hierarchy state. The associated
38 provenance record is duplicated in each modified (sub)dataset, although
39 only being fully interpretable and re-executable in the actual top-lev‐
40 el superdataset. For this reason the provenance record contains the
41 dataset ID of that superdataset.
42
43 Command format
44 A few placeholders are supported in the command via Python format spec‐
45 ification. "{pwd}" will be replaced with the full path of the current
46 working directory. "{dspath}" will be replaced with the full path of
47 the dataset that run is invoked on. "{tmpdir}" will be replaced with
48 the full path of a temporary directory. "{inputs}" and "{outputs}" rep‐
49 resent the values specified by --input and --output. If multiple values
50 are specified, the values will be joined by a space. The order of the
51 values will match that order from the command line, with any globs ex‐
52 panded in alphabetical order (like bash). Individual values can be ac‐
53 cessed with an integer index (e.g., "{inputs[0]}").
54
55 Note that the representation of the inputs or outputs in the formatted
56 command string depends on whether the command is given as a list of ar‐
57 guments or as a string (quotes surrounding the command). The concate‐
58 nated list of inputs or outputs will be surrounded by quotes when the
59 command is given as a list but not when it is given as a string. This
60 means that the string form is required if you need to pass each input
61 as a separate argument to a preceding script (i.e., write the command
62 as "./script {inputs}", quotes included). The string form should also
63 be used if the input or output paths contain spaces or other characters
64 that need to be escaped.
65
66 To escape a brace character, double it (i.e., "{{" or "}}").
67
68 Custom placeholders can be added as configuration variables under
69 "datalad.run.substitutions". As an example:
70
71 Add a placeholder "name" with the value "joe"::
72
73 % datalad configuration --scope branch set datalad.run.substitu‐
74 tions.name=joe
75 % datalad save -m "Configure name placeholder" .datalad/config
76
77 Access the new placeholder in a command::
78
79 % datalad run "echo my name is {name} >me"
80
81 Examples
82 Run an executable script and record the impact on a dataset::
83
84 % datalad run -m 'run my script' 'code/script.sh'
85
86 Run a command and specify a directory as a dependency for the run. The
87 contents of the dependency will be retrieved prior to running the
88 script::
89
90 % datalad run -m 'run my script' -i 'data/*' 'code/script.sh'
91
92 Run an executable script and specify output files of the script to be
93 unlocked prior to running the script::
94
95 % datalad run -m 'run my script' -i 'data/*' -o 'output_dir/*'
96 'code/script.sh'
97
98 Specify multiple inputs and outputs::
99
100 % datalad run -m 'run my script' -i 'data/*' -i 'datafile.txt' -o
101 'output_dir/*' -o 'outfile.txt' 'code/script.sh'
102
103 Use ** to match any file at any directory depth recursively. Single *
104 does not check files within matched directories.::
105
106 % datalad run -m 'run my script' -i 'data/**/*.dat' -o 'out‐
107 put_dir/**' 'code/script.sh'
108
109
111 COMMAND
112 command for execution. A leading '--' can be used to disam‐
113 biguate this command from the preceding options to DataLad.
114
115
116 -h, --help, --help-np
117 show this help message. --help-np forcefully disables the use of
118 a pager for displaying the help message
119
120 -d DATASET, --dataset DATASET
121 specify the dataset to record the command results in. An attempt
122 is made to identify the dataset based on the current working di‐
123 rectory. If a dataset is given, the command will be executed in
124 the root directory of this dataset. Constraints: Value must be a
125 Dataset or a valid identifier of a Dataset (e.g. a path) or val‐
126 ue must be NONE
127
128 -i PATH, --input PATH
129 A dependency for the run. Before running the command, the con‐
130 tent for this relative path will be retrieved. A value of "."
131 means "run datalad get .". The value can also be a glob. This
132 option can be given more than once.
133
134 -o PATH, --output PATH
135 Prepare this relative path to be an output file of the command.
136 A value of "." means "run datalad unlock ." (and will fail if
137 some content isn't present). For any other value, if the content
138 of this file is present, unlock the file. Otherwise, remove it.
139 The value can also be a glob. This option can be given more than
140 once.
141
142 --expand {inputs|outputs|both}
143 Expand globs when storing inputs and/or outputs in the commit
144 message. Constraints: value must be one of ('inputs', 'outputs',
145 'both')
146
147 --assume-ready {inputs|outputs|both}
148 Assume that inputs do not need to be retrieved and/or outputs do
149 not need to unlocked or removed before running the command. This
150 option allows you to avoid the expense of these preparation
151 steps if you know that they are unnecessary. Constraints: value
152 must be one of ('inputs', 'outputs', 'both')
153
154 --explicit
155 Consider the specification of inputs and outputs to be explicit.
156 Don't warn if the repository is dirty, and only save modifica‐
157 tions to the listed outputs.
158
159 -m MESSAGE, --message MESSAGE
160 a description of the state or the changes made to a dataset.
161 Constraints: value must be a string or value must be NONE
162
163 --sidecar {yes|no}
164 By default, the configuration variable 'datalad.run.record-side‐
165 car' determines whether a record with information on a command's
166 execution is placed into a separate record file instead of the
167 commit message (default: off). This option can be used to over‐
168 ride the configured behavior on a case-by-case basis. Sidecar
169 files are placed into the dataset's '.datalad/runinfo' directory
170 (customizable via the 'datalad.run.record-directory' configura‐
171 tion variable). Constraints: value must be NONE or value must be
172 convertible to type bool
173
174 --dry-run {basic|command}
175 Do not run the command; just display details about the command
176 execution. A value of "basic" reports a few important details
177 about the execution, including the expanded command and expanded
178 inputs and outputs. "command" displays the expanded command on‐
179 ly. Note that input and output globs underneath an uninstalled
180 dataset will be left unexpanded because no subdatasets will be
181 installed for a dry run. Constraints: value must be one of ('ba‐
182 sic', 'command')
183
184 -J NJOBS, --jobs NJOBS
185 how many parallel jobs (where possible) to use. "auto" corre‐
186 sponds to the number defined by 'datalad.runtime.max-annex-jobs'
187 configuration item NOTE: This option can only parallelize input
188 retrieval (get) and output recording (save). DataLad does NOT
189 parallelize your scripts for you. Constraints: value must be
190 convertible to type 'int' or value must be NONE or value must be
191 one of ('auto',)
192
193 --version
194 show the module and its version which provides the command
195
197 datalad is developed by The DataLad Team and Contributors <team@datal‐
198 ad.org>.
199
200
201
202datalad run 0.19.3 2023-08-11 datalad run(1)