1CONDA-INDEX(1) conda-index CONDA-INDEX(1)
2
3
4
6 conda-index - conda-index
7
8 conda index, formerly part of conda-build. Create repodata.json for
9 collections of conda packages.
10
11 The conda_index command operates on a channel directory. A channel di‐
12 rectory contains a noarch subdirectory at a minimum and will almost al‐
13 ways contain other subdirectories named for conda's supported platforms
14 linux-64, win-64, osx-64, etc. A channel directory cannot have the same
15 name as a supported platform. Place packages into the same platform
16 subdirectory each archive was built for. Conda-index extracts metadata
17 from these packages to generate index.html, repodata.json etc. with
18 summaries of the packages' metadata. Then conda uses the metadata to
19 solve dependencies before doing an install.
20
21 By default, the metadata is output to the same directory tree as the
22 channel directory, but it can be output to a separate tree with the
23 --output <output> parameter. The metadata cache is always placed with
24 the packages, in .cache folders under each platform subdirectory.
25
26 After conda-index has finished, its output can be used as a channel
27 conda install -c file:///path/to/output ... or it would typically be
28 placed on a web server.
29
31 python -m conda_index <path to channel directory>
32
33
34 Note conda index (instead of python -m conda_index) may find legacy
35 conda-build index.
36
38 python -m conda_index --verbose --threads=1 <path to channel directory>
39
40
42 conda create -n conda-index "python >=3.9" conda conda-build "pip >=22"
43
44 git clone https://github.com/conda/conda-index.git
45 pip install -e conda-index[test]
46
47 cd conda-index
48 pytest
49
50
52 • Approximately 2.2x faster conda package extraction, by extracting
53 just the metadata to streams instead of extracting packages to a tem‐
54 porary directory; closes the package early if all metadata has been
55 found.
56
57 • No longer read existing repodata.json. Always load from cache.
58
59 • Uses a sqlite metadata cache that is orders of magnitude faster than
60 the old many-tiny-files cache.
61
62 • The first time conda index runs, it will convert the existing
63 file-based .cache to a sqlite3 database .cache/cache.db. This takes
64 about ten minutes per subdir for conda-forge. (If this is inter‐
65 rupted, delete cache.db to start over, or packages will be re-ex‐
66 tracted into the cache.) sqlite3 must be compiled with the JSON1 ex‐
67 tension. JSON1 is built into SQLite by default as of SQLite version
68 3.38.0 (2022-02-22).
69
70 • Each subdir osx-64, linux-64 etc. has its own cache.db; conda-forge’s
71 1.2T osx-64 subdir has a single 2.4GB cache.db. Storing the cache in
72 fewer files saves time since there is a per-file wait to open each of
73 the many tiny .json files in old-style .cache/.
74
75 • cache.db is highly compressible, like the text metadata. 2.4G → zstd
76 → 88M
77
78 • No longer cache paths.json (only used to create post_install.json and
79 not referenced later in the indexing process). Saves 90% disk space
80 in .cache.
81
82 • Updated Python and dependency requirements.
83
84 • Mercilessly cull less-used features.
85
86 • Format with black
87
89 This version of conda-index continues indexing packages from other sub‐
90 dirs while the main thread is writing a repodata.json.
91
92 All current_repodata.json are generated in parallel. This may use a lot
93 of ram if repodata.json has tens of thousands of entries.
94
95 Command-line interface
96 python -m conda_index
97 python -m conda_index [OPTIONS] DIR
98
99 Options
100
101 --output <output>
102 Output repodata to given directory.
103
104 --subdir <subdir>
105 Subdir to index. Accepts multiple.
106
107 -n, --channel-name <channel_name>
108 Customize the channel name listed in each channel's index.html.
109
110 --patch-generator <patch_generator>
111 Path to Python file that outputs metadata patch instructions
112 from its _patch_repodata function or a .tar.bz2/.conda file
113 which contains a patch_instructions.json file for each subdir
114
115 --channeldata, --no-channeldata
116 Generate channeldata.json.
117
118 Default
119 False
120
121 --rss, --no-rss
122 Write rss.xml (Only if --channeldata is enabled).
123
124 Default
125 True
126
127 --bz2, --no-bz2
128 Write repodata.json.bz2.
129
130 Default
131 False
132
133 --zst, --no-zst
134 Write repodata.json.zst.
135
136 Default
137 False
138
139 --run-exports, --no-run-exports
140 Write run_exports.json.
141
142 Default
143 False
144
145 -m, --current-index-versions-file <current_index_versions_file>
146 YAML file containing name of package as key, and list of ver‐
147 sions as values. The current_index.json will contain the newest
148 from this series of versions. For example:
149
150 python:
151
152 • 3.8
153
154 • 3.9
155
156 will keep python 3.8.X and 3.9.Y in the current_index.json, in‐
157 stead of only the very latest python version.
158
159 --threads <threads>
160
161 Default
162 3
163
164 --verbose
165 Enable debug logging.
166
167 Arguments
168
169 DIR Required argument
170
171 conda_index
172 conda_index.index
173 This module provides the main entry point to create indexes from col‐
174 lections of conda packages.
175
176 conda_index.index.update_index(dir_path, output_dir=None,
177 check_md5=False, channel_name=None, patch_generator=None, threads: int
178 | None = 3, verbose=False, progress=False, subdirs=None, warn=True,
179 current_index_versions=None, debug=False, write_bz2=True,
180 write_zst=False, write_run_exports=False)
181 High-level interface to ChannelIndex. Index all subdirs under
182 dir_path. Output to output_dir, or under the input directory if
183 output_dir is not given. Writes updated channeldata.json.
184
185 The input dir_path should at least contain a directory named
186 noarch. The path tree therein is treated as a full channel,
187 with a level of subdirs, each subdir having an update to repo‐
188 data.json. The full channel will also have a channeldata.json
189 file.
190
191 class conda_index.index.ChannelIndex(channel_root, channel_name, sub‐
192 dirs=None, threads: int | None = 3, deep_integrity_check=False, de‐
193 bug=False, output_root=None, cache_class=<class 'conda_index.in‐
194 dex.sqlitecache.CondaIndexCache'>, write_bz2=False, write_zst=False,
195 write_run_exports=False, compact_json=True)
196 Class implementing update_index. Allows for more fine-grained
197 control of output.
198
199 See the implementation of conda_index.cli for usage.
200
201 index(patch_generator, verbose=False, progress=False, cur‐
202 rent_index_versions=None)
203 Examine all changed packages under self.channel_root, up‐
204 dating index.html for each subdir.
205
206 update_channeldata(rss=False)
207 Update channeldata based on re-reading output repo‐
208 data.json and existing channeldata.json. Call after in‐
209 dex() if channeldata is needed.
210
211 Database schema
212 Standalone conda-index uses a per-subdir sqlite database to track pack‐
213 age metadata, unlike the older version which used millions of tiny
214 .json files. The new strategy is much faster because we don't have to
215 pay for many individual stat() or open() calls.
216
217 The whole schema looks like this:
218
219 <subdir>/.cache % sqlite3 cache.db
220 SQLite version 3.41.2 2023-03-22 11:56:21
221 Enter ".help" for usage hints.
222 sqlite> .schema
223 CREATE TABLE about (path TEXT PRIMARY KEY, about BLOB);
224 CREATE TABLE index_json (path TEXT PRIMARY KEY, index_json BLOB);
225 CREATE TABLE recipe (path TEXT PRIMARY KEY, recipe BLOB);
226 CREATE TABLE recipe_log (path TEXT PRIMARY KEY, recipe_log BLOB);
227 CREATE TABLE run_exports (path TEXT PRIMARY KEY, run_exports BLOB);
228 CREATE TABLE post_install (path TEXT PRIMARY KEY, post_install BLOB);
229 CREATE TABLE icon (path TEXT PRIMARY KEY, icon_png BLOB);
230 CREATE TABLE stat (
231 stage TEXT NOT NULL DEFAULT 'indexed',
232 path TEXT NOT NULL,
233 mtime NUMBER,
234 size INTEGER,
235 sha256 TEXT,
236 md5 TEXT,
237 last_modified TEXT,
238 etag TEXT
239 );
240 CREATE UNIQUE INDEX idx_stat ON stat (path, stage);
241 CREATE INDEX idx_stat_stage ON stat (stage, path);
242
243
244 sqlite> select stage, path from stat where path like 'libcurl%';
245 fs|libcurl-7.84.0-hc6d1d07_0.conda
246 fs|libcurl-7.86.0-h0f1d93c_0.conda
247 fs|libcurl-7.87.0-h0f1d93c_0.conda
248 fs|libcurl-7.88.1-h0f1d93c_0.conda
249 fs|libcurl-7.88.1-h9049daf_0.conda
250 indexed|libcurl-7.84.0-hc6d1d07_0.conda
251 indexed|libcurl-7.86.0-h0f1d93c_0.conda
252 indexed|libcurl-7.87.0-h0f1d93c_0.conda
253 indexed|libcurl-7.88.1-h0f1d93c_0.conda
254 indexed|libcurl-7.88.1-h9049daf_0.conda
255
256
257 Most of these tables store json-format metadata extracted from each
258 package.
259
260 select * from index_json where path = 'libcurl-7.88.1-h9049daf_0.conda';
261 libcurl-7.88.1-h9049daf_0.conda|{"build":"h9049daf_0","build_number":0,"depends":["krb5 >=1.20.1,<1.21.0a0","libnghttp2 >=1.51.0,<2.0a0","libssh2 >=1.10.0,<2.0a0","libzlib >=1.2.13,<1.3.0a0","openssl >=3.0.8,<4.0a0"],"license":"curl","license_family":"MIT","name":"libcurl","subdir":"osx-arm64","timestamp":1676918523934,"version":"7.88.1","md5":"c86bbee944bb640609670ce722fba9a4","sha256":"37b8d58c05386ac55d1d8e196c90b92b0a63f3f1fe2fa916bf5ed3e1656d8e14","size":321706}
262
263
264 To track whether a package is indexed in the cache or not, conda-index
265 uses a table named stat. The main point of this table is to assign a
266 stage value to each artifact filename; usually 'fs' which is called the
267 upstream stage, and 'indexed'. 'fs' means that the artifact is now
268 available in the set of packages (assumed by default to be the local
269 filesystem). 'indexed' means that the entry already exists in the data‐
270 base (same filename, same timestamp, same hash), and its package meta‐
271 data has been extracted to the index_json etc. tables. Paths in 'fs'
272 but not in 'indexed' need to be unpacked to have their metadata added
273 to the database. Paths in 'indexed' but not in 'fs' will be ignored and
274 left out of repodata.json.
275
276 First, conda-index adds all files in a subdir to the upstream stage.
277 This involves a listdir() and stat() for each file in the index. The
278 default upstream stage is named fs, but this step is designed to be
279 overridden by subclassing CondaIndexCache() and replacing the
280 save_fs_state() and changed_packages() methods. By overriding CondexIn‐
281 dexCache() it is possible to index without calling stat() on each pack‐
282 age, or without even having all packages stored on the indexing ma‐
283 chine.
284
285 Next, conda-index looks for all changed_packages(): paths in the up‐
286 stream (fs) stage that don't exist in or have a different modification
287 time than those in thie indexed stage.
288
289 Finally, a join between the upstream stage, usually 'fs', and the in‐
290 dex_json table yields a basic repodata_from_packages.json without any
291 repodata patches.
292
293 SELECT path, index_json FROM stat JOIN index_json USING (path) WHERE stat.stage = :upstream_stage
294
295
296 The steps to create repodata.json, including any repodata patches, and
297 to create current_repodata.json with only the latest versions of each
298 package, are similar to pre-sqlite3 conda-index.
299
300 The other cached metadata tables are used to create channeldata.json.
301
302 • Index
303
304 • Module Index
305
306 • Search Page
307
309 conda
310
312 conda
313
314
315
316
317 Dec 05, 2023 CONDA-INDEX(1)