1CRUSHTOOL(8) Ceph CRUSHTOOL(8)
2
3
4
6 crushtool - CRUSH map manipulation tool
7
9 crushtool ( -d map | -c map.txt | --build --num_osds numosds
10 layer1 ... | --test ) [ -o outfile ]
11
12
14 crushtool is a utility that lets you create, compile, decompile and
15 test CRUSH map files.
16
17 CRUSH is a pseudo-random data distribution algorithm that efficiently
18 maps input values (which, in the context of Ceph, correspond to Place‐
19 ment Groups) across a heterogeneous, hierarchically structured device
20 map. The algorithm was originally described in detail in the following
21 paper (although it has evolved some since then):
22
23 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25 The tool has four modes of operation.
26
27 --compile|-c map.txt
28 will compile a plaintext map.txt into a binary map file.
29
30 --decompile|-d map
31 will take the compiled map and decompile it into a plaintext
32 source file, suitable for editing.
33
34 --build --num_osds {num-osds} layer1 ...
35 will create map with the given layer structure. See below for a
36 detailed explanation.
37
38 --test will perform a dry run of a CRUSH mapping for a range of input
39 values [--min-x,--max-x] (default [0,1023]) which can be thought
40 of as simulated Placement Groups. See below for a more detailed
41 explanation.
42
43 Unlike other Ceph tools, crushtool does not accept generic options such
44 as --debug-crush from the command line. They can, however, be provided
45 via the CEPH_ARGS environment variable. For instance, to silence all
46 output from the CRUSH subsystem:
47
48 CEPH_ARGS="--debug-crush 0" crushtool ...
49
51 The test mode will use the input crush map ( as specified with -i map )
52 and perform a dry run of CRUSH mapping or random placement (if --simu‐
53 late is set ). On completion, two kinds of reports can be created. 1)
54 The --show-... option outputs human readable information on stderr. 2)
55 The --output-csv option creates CSV files that are documented by the
56 --help-output option.
57
58 Note: Each Placement Group (PG) has an integer ID which can be obtained
59 from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32). The
60 pool and PG IDs are combined by a function to get a value which is
61 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
62 pools; it only runs simulations by mapping values in the range
63 [--min-x,--max-x].
64
65 --show-statistics
66 Displays a summary of the distribution. For instance:
67
68 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
69
70 shows that rule 1 which is named metadata successfully mapped
71 1024 values to result size == 5 devices when trying to map them
72 to num_rep 5 replicas. When it fails to provide the required
73 mapping, presumably because the number of tries must be in‐
74 creased, a breakdown of the failures is displayed. For instance:
75
76 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
77 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
78 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
79
80 shows that although num_rep 10 replicas were required, 4 out of
81 1024 values ( 4/1024 ) were mapped to result size == 8 devices
82 only.
83
84 --show-mappings
85 Displays the mapping of each value in the range
86 [--min-x,--max-x]. For instance:
87
88 CRUSH rule 1 x 24 [11,6]
89
90 shows that value 24 is mapped to devices [11,6] by rule 1.
91
92 One of the following is required when using the --show-mappings
93 option:
94
95 a. --num-rep
96
97 b. both --min-rep and --max-rep
98
99 --num-rep stands for "number of replicas, indicates the number
100 of replicas in a pool, and is used to specify an exact number of
101 replicas (for example --num-rep 5). --min-rep and --max-rep are
102 used together to specify a range of replicas (for example,
103 --min-rep 1 --max-rep 10).
104
105 --show-bad-mappings
106 Displays which value failed to be mapped to the required number
107 of devices. For instance:
108
109 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
110
111 shows that when rule 1 was required to map 7 devices, it could
112 map only six : [8,10,2,11,6,9].
113
114 --show-utilization
115 Displays the expected and actual utilization for each device,
116 for each number of replicas. For instance:
117
118 device 0: stored : 951 expected : 853.333
119 device 1: stored : 963 expected : 853.333
120 ...
121
122 shows that device 0 stored 951 values and was expected to store
123 853. Implies --show-statistics.
124
125 --show-utilization-all
126 Displays the same as --show-utilization but does not suppress
127 output when the weight of a device is zero. Implies --show-sta‐
128 tistics.
129
130 --show-choose-tries
131 Displays how many attempts were needed to find a device mapping.
132 For instance:
133
134 0: 95224
135 1: 3745
136 2: 2225
137 ..
138
139 shows that 95224 mappings succeeded without retries, 3745 map‐
140 pings succeeded with one attempts, etc. There are as many rows
141 as the value of the --set-choose-total-tries option.
142
143 --output-csv
144 Creates CSV files (in the current directory) containing informa‐
145 tion documented by --help-output. The files are named after the
146 rule used when collecting the statistics. For instance, if the
147 rule : 'metadata' is used, the CSV files will be:
148
149 metadata-absolute_weights.csv
150 metadata-device_utilization.csv
151 ...
152
153 The first line of the file shortly explains the column layout.
154 For instance:
155
156 metadata-absolute_weights.csv
157 Device ID, Absolute Weight
158 0,1
159 ...
160
161 --output-name NAME
162 Prepend NAME to the file names generated when --output-csv is
163 specified. For instance --output-name FOO will create files:
164
165 FOO-metadata-absolute_weights.csv
166 FOO-metadata-device_utilization.csv
167 ...
168
169 The --set-... options can be used to modify the tunables of the input
170 crush map. The input crush map is modified in memory. For example:
171
172 $ crushtool -i mymap --test --show-bad-mappings
173 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
174
175 could be fixed by increasing the choose-total-tries as follows:
176
177 $ crushtool -i mymap --test
178 --show-bad-mappings --set-choose-total-tries 500
179
181 The build mode will generate hierarchical maps. The first argument
182 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
183 layer describes how the layer (or devices) preceding it should be
184 grouped.
185
186 Each layer consists of:
187
188 bucket ( uniform | list | tree | straw | straw2 ) size
189
190 The bucket is the type of the buckets in the layer (e.g. "rack"). Each
191 bucket name will be built by appending a unique number to the bucket
192 string (e.g. "rack0", "rack1"...).
193
194 The second component is the type of bucket: straw should be used most
195 of the time.
196
197 The third component is the maximum size of the bucket. A size of zero
198 means a bucket of infinite capacity.
199
201 Suppose we have two rows with two racks each and 20 nodes per rack.
202 Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
203 configuration allows us to deploy 320 Ceph OSD Daemons. Lets assume a
204 42U rack with 2U nodes, leaving an extra 2U for a rack switch.
205
206 To reflect our hierarchy of devices, nodes, racks and rows, we would
207 execute the following:
208
209 $ crushtool -o crushmap --build --num_osds 320 \
210 node straw 4 \
211 rack straw 20 \
212 row straw 2 \
213 root straw 0
214 # id weight type name reweight
215 -87 320 root root
216 -85 160 row row0
217 -81 80 rack rack0
218 -1 4 node node0
219 0 1 osd.0 1
220 1 1 osd.1 1
221 2 1 osd.2 1
222 3 1 osd.3 1
223 -2 4 node node1
224 4 1 osd.4 1
225 5 1 osd.5 1
226 ...
227
228 CRUSH rules are created so the generated crushmap can be tested. They
229 are the same rules as the ones created by default when creating a new
230 Ceph cluster. They can be further edited with:
231
232 # decompile
233 crushtool -d crushmap -o map.txt
234
235 # edit
236 emacs map.txt
237
238 # recompile
239 crushtool -c map.txt -o crushmap
240
242 The reclassify function allows users to transition from older maps that
243 maintain parallel hierarchies for OSDs of different types to a modern
244 CRUSH map that makes use of the device class feature. For more infor‐
245 mation, see
246 https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
247
249 See
250 https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
251 for sample crushtool --test commands and output produced thereby.
252
254 crushtool is part of Ceph, a massively scalable, open-source, distrib‐
255 uted storage system. Please refer to the Ceph documentation at
256 https://docs.ceph.com for more information.
257
259 ceph(8), osdmaptool(8),
260
262 John Wilkins, Sage Weil, Loic Dachary
263
265 2010-2023, Inktank Storage, Inc. and contributors. Licensed under Cre‐
266 ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
267
268
269
270
271dev Nov 15, 2023 CRUSHTOOL(8)