1datalad create-sibling-ria(1)General Commands Manuadlatalad create-sibling-ria(1)
2
3
4
6 datalad create-sibling-ria - creates a sibling to a dataset in a RIA
7 store
8
10 datalad create-sibling-ria [-h] -s NAME [-d DATASET] [--storage-
11 name NAME] [--alias ALIAS] [--post-update-hook]
12 [--shared {false|true|umask|group|all|world|everybody|0xxx}]
13 [--group GROUP] [--storage-sibling MODE] [--existing MODE]
14 [--new-store-ok] [--trust-level TRUST-LEVEL] [-r] [-R LEVELS]
15 [--no-storage-sibling] [--push-url
16 ria+<ssh|file>://<host>[/path]] [--version]
17 ria+<ssh|file|http(s)>://<host>[/path]
18
19
20
22 Communication with a dataset in a RIA store is implemented via two sib‐
23 lings. A regular Git remote (repository sibling) and a git-annex spe‐
24 cial remote for data transfer (storage sibling) -- with the former hav‐
25 ing a publication dependency on the latter. By default, the name of the
26 storage sibling is derived from the repository sibling's name by ap‐
27 pending "-storage".
28
29 The store's base path is expected to not exist, be an empty directory,
30 or a valid RIA store.
31
32 Notes -----
33
34 *RIA URL format*
35 Interactions with new or existing RIA stores require RIA URLs to iden‐
36 tify the store or specific datasets inside of it.
37
38 The general structure of a RIA URL pointing to a store takes the form
39 ``ria+[scheme]://<storelocation>`` (e.g., ``ria+ssh://[user@]host‐
40 name:/absolute/path/to/ria-store``, or ``ria+file:///abso‐
41 lute/path/to/ria-store``)
42
43 The general structure of a RIA URL pointing to a dataset in a store
44 (for example for cloning) takes a similar form, but appends either the
45 datasets UUID or a "~" symbol followed by the dataset's alias name:
46 ``ria+[scheme]://<storelocation>#<dataset-UUID>`` or
47 ``ria+[scheme]://<storelocation>#~<aliasname>``. In addition, specific
48 version identifiers can be appended to the URL with an additional "@"
49 symbol: ``ria+[scheme]://<storelocation>#<dataset-UUID>@<dataset-ver‐
50 sion>``, where ``dataset-version`` refers to a branch or tag.
51
52 *RIA store layout*
53 A RIA store is a directory tree with a dedicated subdirectory for each
54 dataset in the store. The subdirectory name is constructed from the
55 DataLad dataset ID, e.g. ``124/68afe-59ec-11ea-93d7-f0d5bf7b5561``,
56 where the first three characters of the ID are used for an intermediate
57 subdirectory in order to mitigate files system limitations for stores
58 containing a large number of datasets.
59
60 By default, a dataset in a RIA store consists of two components: A Git
61 repository (for all dataset contents stored in Git) and a storage sib‐
62 ling (for dataset content stored in git-annex).
63
64 It is possible to selectively disable either component using ``stor‐
65 age-sibling 'off'`` or ``storage-sibling 'only'``, respectively. If
66 neither component is disabled, a dataset's subdirectory layout in a RIA
67 store contains a standard bare Git repository and an ``annex/`` subdi‐
68 rectory inside of it. The latter holds a Git-annex object store and
69 comprises the storage sibling. Disabling the standard git-remote
70 (``storage-sibling='only'``) will result in not having the bare git
71 repository, disabling the storage sibling (``storage-sibling='off'``)
72 will result in not having the ``annex/`` subdirectory.
73
74 Optionally, there can be a further subdirectory ``archives`` with (com‐
75 pressed) 7z archives of annex objects. The storage remote is able to
76 pull annex objects from these archives, if it cannot find in the regu‐
77 lar annex object store. This feature can be useful for storing large
78 collections of rarely changing data on systems that limit the number of
79 files that can be stored.
80
81 Each dataset directory also contains a ``ria-layout-version`` file that
82 identifies the data organization (as, for example, described above).
83
84 Lastly, there is a global ``ria-layout-version`` file at the store's
85 base path that identifies where dataset subdirectories themselves are
86 located. At present, this file must contain a single line stating the
87 version (currently "1"). This line MUST end with a newline character.
88
89 It is possible to define an alias for an individual dataset in a store
90 by placing a symlink to the dataset location into an ``alias/`` direc‐
91 tory in the root of the store. This enables dataset access via URLs of
92 format: ``ria+<protocol>://<storelocation>#~<aliasname>``.
93
94 Compared to standard git-annex object stores, the ``annex/`` subdirec‐
95 tories used as storage siblings follow a different layout naming scheme
96 ('dirhashmixed' instead of 'dirhashlower'). This is mostly noted as a
97 technical detail, but also serves to remind git-annex powerusers to re‐
98 frain from running git-annex commands directly in-store as it can cause
99 severe damage due to the layout difference. Interactions should be han‐
100 dled via the ORA special remote instead.
101
102 *Error logging*
103 To enable error logging at the remote end, append a pipe symbol and an
104 "l" to the version number in ria-layout-version (like so: ``1|l0`).
105
106 Error logging will create files in an "error_log" directory whenever
107 the git-annex special remote (storage sibling) raises an exception,
108 storing the Python traceback of it. The logfiles are named according to
109 the scheme ``<dataset id>.<annex uuid of the remote>.log`` showing
110 "who" ran into this issue with which dataset. Because logging can po‐
111 tentially leak personal data (like local file paths for example), it
112 can be disabled client-side by setting the configuration variable ``an‐
113 nex.ora-remote.<storage-sibling-name>.ignore-remote-config``.
114
116 ria+<ssh|file|http(s)>://<host>[/path]
117 URL identifying the target RIA store and access protocol. If
118 ``--push-url`` is given in addition, this is used for read ac‐
119 cess only. Otherwise it will be used for write access too and to
120 create the repository sibling in the RIA store. Note, that
121 HTTP(S) currently is valid for consumption only thus requiring
122 to provide ``--push-url``. Constraints: value must be a string
123 or value must be NONE
124
125
126 -h, --help, --help-np
127 show this help message. --help-np forcefully disables the use of
128 a pager for displaying the help message
129
130 -s NAME, --name NAME
131 Name of the sibling. With RECURSIVE, the same name will be used
132 to label all the subdatasets' siblings. Constraints: value must
133 be a string or value must be NONE
134
135 -d DATASET, --dataset DATASET
136 specify the dataset to process. If no dataset is given, an at‐
137 tempt is made to identify the dataset based on the current work‐
138 ing directory. Constraints: Value must be a Dataset or a valid
139 identifier of a Dataset (e.g. a path) or value must be NONE
140
141 --storage-name NAME
142 Name of the storage sibling (git-annex special remote). Must not
143 be identical to the sibling name. If not specified, defaults to
144 the sibling name plus '-storage' suffix. If only a storage sib‐
145 ling is created, this setting is ignored, and the primary sib‐
146 ling name is used. Constraints: value must be a string or value
147 must be NONE
148
149 --alias ALIAS
150 Alias for the dataset in the RIA store. Add the necessary sym‐
151 link so that this dataset can be cloned from the RIA store using
152 the given ALIAS instead of its ID. With `recursive=True`, only
153 the top dataset will be aliased. Constraints: value must be a
154 string or value must be NONE
155
156 --post-update-hook
157 Enable Git's default post-update-hook for the created sibling.
158 This is useful when the sibling is made accessible via a "dumb
159 server" that requires running 'git update-server-info' to let
160 Git interact properly with it.
161
162 --shared {false|true|umask|group|all|world|everybody|0xxx}
163 If given, configures the permissions in the RIA store for multi-
164 users access. Possible values for this option are identical to
165 those of `git init --shared` and are described in its documenta‐
166 tion. Constraints: value must be a string or value must be con‐
167 vertible to type bool or value must be NONE
168
169 --group GROUP
170 Filesystem group for the repository. Specifying the group is
171 crucial when --shared=group. Constraints: value must be a string
172 or value must be NONE
173
174 --storage-sibling MODE
175 By default, an ORA storage sibling and a Git repository sibling
176 are created (on). Alternatively, creation of the storage sibling
177 can be disabled (off), or a storage sibling created only and no
178 Git sibling (only). In the latter mode, no Git installation is
179 required on the target host. Constraints: value must be one of
180 ('only',) or value must be convertible to type bool or value
181 must be NONE [Default: True]
182
183 --existing MODE
184 Action to perform, if a (storage) sibling is already configured
185 under the given name and/or a target already exists. In this
186 case, a dataset can be skipped ('skip'), an existing target
187 repository be forcefully re-initialized, and the sibling
188 (re-)configured ('reconfigure'), or the command be instructed to
189 fail ('error'). Constraints: value must be one of ('skip', 'er‐
190 ror', 'reconfigure') [Default: 'error']
191
192 --new-store-ok
193 When set, a new store will be created, if necessary. Otherwise,
194 a sibling will only be created if the url points to an existing
195 RIA store.
196
197 --trust-level TRUST-LEVEL
198 specify a trust level for the storage sibling. If not specified,
199 the default git-annex trust level is used. 'trust' should be
200 used with care (see the git-annex-trust man page). Constraints:
201 value must be one of ('trust', 'semitrust', 'untrust')
202
203 -r, --recursive
204 if set, recurse into potential subdatasets.
205
206 -R LEVELS, --recursion-limit LEVELS
207 limit recursion into subdatasets to the given number of levels.
208 Constraints: value must be convertible to type 'int' or value
209 must be NONE
210
211 --no-storage-sibling
212 This option is deprecated. Use '--storage-sibling off' instead.
213
214 --push-url ria+<ssh|file>://<host>[/path]
215 URL identifying the target RIA store and access protocol for
216 write access to the storage sibling. If given this will also be
217 used for creation of the repository sibling in the RIA store.
218 Constraints: value must be a string or value must be NONE
219
220 --version
221 show the module and its version which provides the command
222
224 datalad is developed by The DataLad Team and Contributors <team@datal‐
225 ad.org>.
226
227
228
229datalad create-sibling-ria 0.19.3 2023-08-11 datalad create-sibling-ria(1)