gitformat-chunk(5)

1GITFORMAT-CHUNK(5)                Git Manual                GITFORMAT-CHUNK(5)
2
3
4

NAME

6       gitformat-chunk - Chunk-based file formats
7

SYNOPSIS

9       Used by gitformat-commit-graph(5) and the "MIDX" format (see the pack
10       format documentation in gitformat-pack(5)).
11

DESCRIPTION

13       Some file formats in Git use a common concept of "chunks" to describe
14       sections of the file. This allows structured access to a large file by
15       scanning a small "table of contents" for the remaining data. This
16       common format is used by the commit-graph and multi-pack-index files.
17       See the multi-pack-index format in gitformat-pack(5) and the
18       commit-graph format in gitformat-commit-graph(5) for how they use the
19       chunks to describe structured data.
20
21       A chunk-based file format begins with some header information custom to
22       that format. That header should include enough information to identify
23       the file type, format version, and number of chunks in the file. From
24       this information, that file can determine the start of the chunk-based
25       region.
26
27       The chunk-based region starts with a table of contents describing where
28       each chunk starts and ends. This consists of (C+1) rows of 12 bytes
29       each, where C is the number of chunks. Consider the following table:
30
31           | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
32           |--------------------|------------------------|
33           | ID[0]              | OFFSET[0]              |
34           | ...                | ...                    |
35           | ID[C]              | OFFSET[C]              |
36           | 0x0000             | OFFSET[C+1]            |
37
38       Each row consists of a 4-byte chunk identifier (ID) and an 8-byte
39       offset. Each integer is stored in network-byte order.
40
41       The chunk identifier ID[i] is a label for the data stored within this
42       fill from OFFSET[i] (inclusive) to OFFSET[i+1] (exclusive). Thus, the
43       size of the i`th chunk is equal to the difference between `OFFSET[i+1]
44       and OFFSET[i]. This requires that the chunk data appears contiguously
45       in the same order as the table of contents.
46
47       The final entry in the table of contents must be four zero bytes. This
48       confirms that the table of contents is ending and provides the offset
49       for the end of the chunk-based data.
50
51       Note: The chunk-based format expects that the file contains at least a
52       trailing hash after OFFSET[C+1].
53
54       Functions for working with chunk-based file formats are declared in
55       chunk-format.h. Using these methods provide extra checks that assist
56       developers when creating new file formats.
57

WRITING CHUNK-BASED FILE FORMATS

59       To write a chunk-based file format, create a struct chunkfile by
60       calling init_chunkfile() and pass a struct hashfile pointer. The caller
61       is responsible for opening the hashfile and writing header information
62       so the file format is identifiable before the chunk-based format
63       begins.
64
65       Then, call add_chunk() for each chunk that is intended for write. This
66       populates the chunkfile with information about the order and size of
67       each chunk to write. Provide a chunk_write_fn function pointer to
68       perform the write of the chunk data upon request.
69
70       Call write_chunkfile() to write the table of contents to the hashfile
71       followed by each of the chunks. This will verify that each chunk wrote
72       the expected amount of data so the table of contents is correct.
73
74       Finally, call free_chunkfile() to clear the struct chunkfile data. The
75       caller is responsible for finalizing the hashfile by writing the
76       trailing hash and closing the file.
77

READING CHUNK-BASED FILE FORMATS

79       To read a chunk-based file format, the file must be opened as a
80       memory-mapped region. The chunk-format API expects that the entire file
81       is mapped as a contiguous memory region.
82
83       Initialize a struct chunkfile pointer with init_chunkfile(NULL).
84
85       After reading the header information from the beginning of the file,
86       including the chunk count, call read_table_of_contents() to populate
87       the struct chunkfile with the list of chunks, their offsets, and their
88       sizes.
89
90       Extract the data information for each chunk using pair_chunk() or
91       read_chunk():
92
93       •   pair_chunk() assigns a given pointer with the location inside the
94           memory-mapped file corresponding to that chunk’s offset. If the
95           chunk does not exist, then the pointer is not modified.
96
97       •   read_chunk() takes a chunk_read_fn function pointer and calls it
98           with the appropriate initial pointer and size information. The
99           function is not called if the chunk does not exist. Use this method
100           to read chunks if you need to perform immediate parsing or if you
101           need to execute logic based on the size of the chunk.
102
103       After calling these methods, call free_chunkfile() to clear the struct
104       chunkfile data. This will not close the memory-mapped region. Callers
105       are expected to own that data for the timeframe the pointers into the
106       region are needed.
107

EXAMPLES

109       These file formats use the chunk-format API, and can be used as
110       examples for future formats:
111
112       •   commit-graph: see write_commit_graph_file() and
113           parse_commit_graph() in commit-graph.c for how the chunk-format API
114           is used to write and parse the commit-graph file format documented
115           in the commit-graph file format in gitformat-commit-graph(5).
116
117       •   multi-pack-index: see write_midx_internal() and
118           load_multi_pack_index() in midx.c for how the chunk-format API is
119           used to write and parse the multi-pack-index file format documented
120           in the multi-pack-index file format section of gitformat-pack(5).
121

GIT

123       Part of the git(1) suite
124
125
126
127Git 2.39.1                        2023-01-13                GITFORMAT-CHUNK(5)