conda-index(1)

1CONDA-INDEX(1)                    conda-index                   CONDA-INDEX(1)
2
3
4

NAME

6       conda-index - conda-index
7
8       conda  index,  formerly  part  of conda-build. Create repodata.json for
9       collections of conda packages.
10
11       The conda_index command operates on a channel directory. A channel  di‐
12       rectory contains a noarch subdirectory at a minimum and will almost al‐
13       ways contain other subdirectories named for conda's supported platforms
14       linux-64, win-64, osx-64, etc. A channel directory cannot have the same
15       name as a supported platform. Place packages  into  the  same  platform
16       subdirectory  each archive was built for. Conda-index extracts metadata
17       from these packages to generate  index.html,  repodata.json  etc.  with
18       summaries  of  the packages' metadata.  Then conda uses the metadata to
19       solve dependencies before doing an install.
20
21       By default, the metadata is output to the same directory  tree  as  the
22       channel  directory,  but  it  can be output to a separate tree with the
23       --output <output> parameter. The metadata cache is always  placed  with
24       the packages, in .cache folders under each platform subdirectory.
25
26       After  conda-index  has  finished,  its output can be used as a channel
27       conda install -c file:///path/to/output ... or it  would  typically  be
28       placed on a web server.
29

RUN NORMALLY

31          python -m conda_index <path to channel directory>
32
33
34       Note  conda  index  (instead  of python -m conda_index) may find legacy
35       conda-build index.
36

RUN FOR DEBUGGING

38          python -m conda_index --verbose --threads=1 <path to channel directory>
39
40

CONTRIBUTING

42          conda create -n conda-index "python >=3.9" conda conda-build "pip >=22"
43
44          git clone https://github.com/conda/conda-index.git
45          pip install -e conda-index[test]
46
47          cd conda-index
48          pytest
49
50

SUMMARY OF CHANGES FROM THE PREVIOUS CONDA-BUILD INDEX VERSION

52       • Approximately 2.2x faster conda  package  extraction,  by  extracting
53         just the metadata to streams instead of extracting packages to a tem‐
54         porary directory; closes the package early if all metadata  has  been
55         found.
56
57       • No longer read existing repodata.json. Always load from cache.
58
59       • Uses  a sqlite metadata cache that is orders of magnitude faster than
60         the old many-tiny-files cache.
61
62       • The first time  conda  index  runs,  it  will  convert  the  existing
63         file-based  .cache  to a sqlite3 database .cache/cache.db. This takes
64         about ten minutes per subdir for  conda-forge.  (If  this  is  inter‐
65         rupted,  delete  cache.db  to  start over, or packages will be re-ex‐
66         tracted into the cache.) sqlite3 must be compiled with the JSON1  ex‐
67         tension.  JSON1  is built into SQLite by default as of SQLite version
68         3.38.0 (2022-02-22).
69
70       • Each subdir osx-64, linux-64 etc. has its own cache.db; conda-forge’s
71         1.2T  osx-64 subdir has a single 2.4GB cache.db. Storing the cache in
72         fewer files saves time since there is a per-file wait to open each of
73         the many tiny .json files in old-style .cache/.
74
75       • cache.db  is highly compressible, like the text metadata. 2.4G → zstd
76         → 88M
77
78       • No longer cache paths.json (only used to create post_install.json and
79         not  referenced  later in the indexing process). Saves 90% disk space
80         in .cache.
81
82       • Updated Python and dependency requirements.
83
84       • Mercilessly cull less-used features.
85
86       • Format with black
87

PARALLELISM

89       This version of conda-index continues indexing packages from other sub‐
90       dirs while the main thread is writing a repodata.json.
91
92       All current_repodata.json are generated in parallel. This may use a lot
93       of ram if repodata.json has tens of thousands of entries.
94
95   Command-line interface
96   python -m conda_index
97          python -m conda_index [OPTIONS] DIR
98
99       Options
100
101       --output <output>
102              Output repodata to given directory.
103
104       --subdir <subdir>
105              Subdir to index. Accepts multiple.
106
107       -n, --channel-name <channel_name>
108              Customize the channel name listed in each channel's index.html.
109
110       --patch-generator <patch_generator>
111              Path to Python file that  outputs  metadata  patch  instructions
112              from  its  _patch_repodata  function  or  a .tar.bz2/.conda file
113              which contains a patch_instructions.json file for each subdir
114
115       --channeldata, --no-channeldata
116              Generate channeldata.json.
117
118              Default
119                     False
120
121       --rss, --no-rss
122              Write rss.xml (Only if --channeldata is enabled).
123
124              Default
125                     True
126
127       --bz2, --no-bz2
128              Write repodata.json.bz2.
129
130              Default
131                     False
132
133       --zst, --no-zst
134              Write repodata.json.zst.
135
136              Default
137                     False
138
139       --run-exports, --no-run-exports
140              Write run_exports.json.
141
142              Default
143                     False
144
145       -m, --current-index-versions-file <current_index_versions_file>
146              YAML file containing name of package as key, and  list  of  ver‐
147              sions as values.  The current_index.json will contain the newest
148              from this series of versions.  For example:
149
150              python:
151
152                     • 3.8
153
154                     • 3.9
155
156              will keep python 3.8.X and 3.9.Y in the current_index.json,  in‐
157              stead of only the very latest python version.
158
159       --threads <threads>
160
161              Default
162                     3
163
164       --verbose
165              Enable debug logging.
166
167       Arguments
168
169       DIR    Required argument
170
171   conda_index
172   conda_index.index
173       This  module  provides the main entry point to create indexes from col‐
174       lections of conda packages.
175
176       conda_index.index.update_index(dir_path,               output_dir=None,
177       check_md5=False,  channel_name=None, patch_generator=None, threads: int
178       | None = 3,  verbose=False,  progress=False,  subdirs=None,  warn=True,
179       current_index_versions=None,        debug=False,        write_bz2=True,
180       write_zst=False, write_run_exports=False)
181              High-level interface to ChannelIndex. Index  all  subdirs  under
182              dir_path.  Output to output_dir, or under the input directory if
183              output_dir is not given. Writes updated channeldata.json.
184
185              The input dir_path should at least  contain  a  directory  named
186              noarch.   The  path  tree  therein is treated as a full channel,
187              with a level of subdirs, each subdir having an update  to  repo‐
188              data.json.  The  full  channel will also have a channeldata.json
189              file.
190
191       class conda_index.index.ChannelIndex(channel_root,  channel_name,  sub‐
192       dirs=None,  threads:  int  |  None = 3, deep_integrity_check=False, de‐
193       bug=False,   output_root=None,   cache_class=<class    'conda_index.in‐
194       dex.sqlitecache.CondaIndexCache'>,   write_bz2=False,  write_zst=False,
195       write_run_exports=False, compact_json=True)
196              Class implementing update_index. Allows  for  more  fine-grained
197              control of output.
198
199              See the implementation of conda_index.cli for usage.
200
201              index(patch_generator,   verbose=False,   progress=False,   cur‐
202              rent_index_versions=None)
203                     Examine all changed packages under self.channel_root, up‐
204                     dating index.html for each subdir.
205
206              update_channeldata(rss=False)
207                     Update  channeldata  based  on  re-reading  output  repo‐
208                     data.json and existing channeldata.json. Call  after  in‐
209                     dex() if channeldata is needed.
210
211   Database schema
212       Standalone conda-index uses a per-subdir sqlite database to track pack‐
213       age metadata, unlike the older version  which  used  millions  of  tiny
214       .json  files.  The new strategy is much faster because we don't have to
215       pay for many individual stat() or open() calls.
216
217       The whole schema looks like this:
218
219          <subdir>/.cache % sqlite3 cache.db
220          SQLite version 3.41.2 2023-03-22 11:56:21
221          Enter ".help" for usage hints.
222          sqlite> .schema
223          CREATE TABLE about (path TEXT PRIMARY KEY, about BLOB);
224          CREATE TABLE index_json (path TEXT PRIMARY KEY, index_json BLOB);
225          CREATE TABLE recipe (path TEXT PRIMARY KEY, recipe BLOB);
226          CREATE TABLE recipe_log (path TEXT PRIMARY KEY, recipe_log BLOB);
227          CREATE TABLE run_exports (path TEXT PRIMARY KEY, run_exports BLOB);
228          CREATE TABLE post_install (path TEXT PRIMARY KEY, post_install BLOB);
229          CREATE TABLE icon (path TEXT PRIMARY KEY, icon_png BLOB);
230          CREATE TABLE stat (
231                          stage TEXT NOT NULL DEFAULT 'indexed',
232                          path TEXT NOT NULL,
233                          mtime NUMBER,
234                          size INTEGER,
235                          sha256 TEXT,
236                          md5 TEXT,
237                          last_modified TEXT,
238                          etag TEXT
239                      );
240          CREATE UNIQUE INDEX idx_stat ON stat (path, stage);
241          CREATE INDEX idx_stat_stage ON stat (stage, path);
242
243
244          sqlite> select stage, path from stat where path like 'libcurl%';
245          fs|libcurl-7.84.0-hc6d1d07_0.conda
246          fs|libcurl-7.86.0-h0f1d93c_0.conda
247          fs|libcurl-7.87.0-h0f1d93c_0.conda
248          fs|libcurl-7.88.1-h0f1d93c_0.conda
249          fs|libcurl-7.88.1-h9049daf_0.conda
250          indexed|libcurl-7.84.0-hc6d1d07_0.conda
251          indexed|libcurl-7.86.0-h0f1d93c_0.conda
252          indexed|libcurl-7.87.0-h0f1d93c_0.conda
253          indexed|libcurl-7.88.1-h0f1d93c_0.conda
254          indexed|libcurl-7.88.1-h9049daf_0.conda
255
256
257       Most of these tables store json-format  metadata  extracted  from  each
258       package.
259
260          select * from index_json where path = 'libcurl-7.88.1-h9049daf_0.conda';
261          libcurl-7.88.1-h9049daf_0.conda|{"build":"h9049daf_0","build_number":0,"depends":["krb5 >=1.20.1,<1.21.0a0","libnghttp2 >=1.51.0,<2.0a0","libssh2 >=1.10.0,<2.0a0","libzlib >=1.2.13,<1.3.0a0","openssl >=3.0.8,<4.0a0"],"license":"curl","license_family":"MIT","name":"libcurl","subdir":"osx-arm64","timestamp":1676918523934,"version":"7.88.1","md5":"c86bbee944bb640609670ce722fba9a4","sha256":"37b8d58c05386ac55d1d8e196c90b92b0a63f3f1fe2fa916bf5ed3e1656d8e14","size":321706}
262
263
264       To  track whether a package is indexed in the cache or not, conda-index
265       uses a table named stat. The main point of this table is  to  assign  a
266       stage value to each artifact filename; usually 'fs' which is called the
267       upstream stage, and 'indexed'. 'fs' means  that  the  artifact  is  now
268       available  in  the  set of packages (assumed by default to be the local
269       filesystem). 'indexed' means that the entry already exists in the data‐
270       base  (same filename, same timestamp, same hash), and its package meta‐
271       data has been extracted to the index_json etc.  tables. Paths  in  'fs'
272       but  not  in 'indexed' need to be unpacked to have their metadata added
273       to the database. Paths in 'indexed' but not in 'fs' will be ignored and
274       left out of repodata.json.
275
276       First,  conda-index  adds  all files in a subdir to the upstream stage.
277       This involves a listdir() and stat() for each file in  the  index.  The
278       default  upstream  stage  is  named fs, but this step is designed to be
279       overridden  by  subclassing   CondaIndexCache()   and   replacing   the
280       save_fs_state() and changed_packages() methods. By overriding CondexIn‐
281       dexCache() it is possible to index without calling stat() on each pack‐
282       age,  or  without  even  having all packages stored on the indexing ma‐
283       chine.
284
285       Next, conda-index looks for all changed_packages(): paths  in  the  up‐
286       stream (fs) stage that don't exist in or have a different  modification
287       time than those in thie indexed stage.
288
289       Finally, a join between the upstream stage, usually 'fs', and  the  in‐
290       dex_json  table  yields a basic repodata_from_packages.json without any
291       repodata patches.
292
293          SELECT path, index_json FROM stat JOIN index_json USING (path) WHERE stat.stage = :upstream_stage
294
295
296       The steps to create repodata.json, including any repodata patches,  and
297       to  create  current_repodata.json with only the latest versions of each
298       package, are similar to pre-sqlite3 conda-index.
299
300       The other cached metadata tables are used to create channeldata.json.
301
302       • Index
303
304       • Module Index
305
306       • Search Page
307

AUTHOR

309       conda
310

COPYRIGHT

312       conda
313
314
315
316
317                                 Dec 05, 2023                   CONDA-INDEX(1)