1CRUSHTOOL(8) Ceph CRUSHTOOL(8)
2
3
4
6 crushtool - CRUSH map manipulation tool
7
9 crushtool ( -d map | -c map.txt | --build --num_osds numosds
10 layer1 ... | --test ) [ -o outfile ]
11
12
14 crushtool is a utility that lets you create, compile, decompile and
15 test CRUSH map files.
16
17 CRUSH is a pseudo-random data distribution algorithm that efficiently
18 maps input values (which, in the context of Ceph, correspond to Place‐
19 ment Groups) across a heterogeneous, hierarchically structured device
20 map. The algorithm was originally described in detail in the following
21 paper (although it has evolved some since then):
22
23 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25 The tool has four modes of operation.
26
27 --compile|-c map.txt
28 will compile a plaintext map.txt into a binary map file.
29
30 --decompile|-d map
31 will take the compiled map and decompile it into a plaintext
32 source file, suitable for editing.
33
34 --build --num_osds {num-osds} layer1 ...
35 will create map with the given layer structure. See below for a
36 detailed explanation.
37
38 --test will perform a dry run of a CRUSH mapping for a range of input
39 values [--min-x,--max-x] (default [0,1023]) which can be thought
40 of as simulated Placement Groups. See below for a more detailed
41 explanation.
42
43 Unlike other Ceph tools, crushtool does not accept generic options such
44 as --debug-crush from the command line. They can, however, be provided
45 via the CEPH_ARGS environment variable. For instance, to silence all
46 output from the CRUSH subsystem:
47
48 CEPH_ARGS="--debug-crush 0" crushtool ...
49
51 The test mode will use the input crush map ( as specified with -i map )
52 and perform a dry run of CRUSH mapping or random placement (if --simu‐
53 late is set ). On completion, two kinds of reports can be created. 1)
54 The --show-... option outputs human readable information on stderr. 2)
55 The --output-csv option creates CSV files that are documented by the
56 --help-output option.
57
58 Note: Each Placement Group (PG) has an integer ID which can be obtained
59 from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32). The
60 pool and PG IDs are combined by a function to get a value which is
61 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
62 pools; it only runs simulations by mapping values in the range
63 [--min-x,--max-x].
64
65 --show-statistics
66 Displays a summary of the distribution. For instance:
67
68 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
69
70 shows that rule 1 which is named metadata successfully mapped
71 1024 values to result size == 5 devices when trying to map them
72 to num_rep 5 replicas. When it fails to provide the required
73 mapping, presumably because the number of tries must be in‐
74 creased, a breakdown of the failures is displayed. For instance:
75
76 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
77 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
78 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
79
80 shows that although num_rep 10 replicas were required, 4 out of
81 1024 values ( 4/1024 ) were mapped to result size == 8 devices
82 only.
83
84 --show-mappings
85 Displays the mapping of each value in the range
86 [--min-x,--max-x]. For instance:
87
88 CRUSH rule 1 x 24 [11,6]
89
90 shows that value 24 is mapped to devices [11,6] by rule 1.
91
92 --show-bad-mappings
93 Displays which value failed to be mapped to the required number
94 of devices. For instance:
95
96 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
97
98 shows that when rule 1 was required to map 7 devices, it could
99 map only six : [8,10,2,11,6,9].
100
101 --show-utilization
102 Displays the expected and actual utilization for each device,
103 for each number of replicas. For instance:
104
105 device 0: stored : 951 expected : 853.333
106 device 1: stored : 963 expected : 853.333
107 ...
108
109 shows that device 0 stored 951 values and was expected to store
110 853. Implies --show-statistics.
111
112 --show-utilization-all
113 Displays the same as --show-utilization but does not suppress
114 output when the weight of a device is zero. Implies --show-sta‐
115 tistics.
116
117 --show-choose-tries
118 Displays how many attempts were needed to find a device mapping.
119 For instance:
120
121 0: 95224
122 1: 3745
123 2: 2225
124 ..
125
126 shows that 95224 mappings succeeded without retries, 3745 map‐
127 pings succeeded with one attempts, etc. There are as many rows
128 as the value of the --set-choose-total-tries option.
129
130 --output-csv
131 Creates CSV files (in the current directory) containing informa‐
132 tion documented by --help-output. The files are named after the
133 rule used when collecting the statistics. For instance, if the
134 rule : 'metadata' is used, the CSV files will be:
135
136 metadata-absolute_weights.csv
137 metadata-device_utilization.csv
138 ...
139
140 The first line of the file shortly explains the column layout.
141 For instance:
142
143 metadata-absolute_weights.csv
144 Device ID, Absolute Weight
145 0,1
146 ...
147
148 --output-name NAME
149 Prepend NAME to the file names generated when --output-csv is
150 specified. For instance --output-name FOO will create files:
151
152 FOO-metadata-absolute_weights.csv
153 FOO-metadata-device_utilization.csv
154 ...
155
156 The --set-... options can be used to modify the tunables of the input
157 crush map. The input crush map is modified in memory. For example:
158
159 $ crushtool -i mymap --test --show-bad-mappings
160 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
161
162 could be fixed by increasing the choose-total-tries as follows:
163
164 $ crushtool -i mymap --test
165 --show-bad-mappings --set-choose-total-tries 500
166
168 The build mode will generate hierarchical maps. The first argument
169 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
170 layer describes how the layer (or devices) preceding it should be
171 grouped.
172
173 Each layer consists of:
174
175 bucket ( uniform | list | tree | straw | straw2 ) size
176
177 The bucket is the type of the buckets in the layer (e.g. "rack"). Each
178 bucket name will be built by appending a unique number to the bucket
179 string (e.g. "rack0", "rack1"...).
180
181 The second component is the type of bucket: straw should be used most
182 of the time.
183
184 The third component is the maximum size of the bucket. A size of zero
185 means a bucket of infinite capacity.
186
188 Suppose we have two rows with two racks each and 20 nodes per rack.
189 Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
190 configuration allows us to deploy 320 Ceph OSD Daemons. Lets assume a
191 42U rack with 2U nodes, leaving an extra 2U for a rack switch.
192
193 To reflect our hierarchy of devices, nodes, racks and rows, we would
194 execute the following:
195
196 $ crushtool -o crushmap --build --num_osds 320 \
197 node straw 4 \
198 rack straw 20 \
199 row straw 2 \
200 root straw 0
201 # id weight type name reweight
202 -87 320 root root
203 -85 160 row row0
204 -81 80 rack rack0
205 -1 4 node node0
206 0 1 osd.0 1
207 1 1 osd.1 1
208 2 1 osd.2 1
209 3 1 osd.3 1
210 -2 4 node node1
211 4 1 osd.4 1
212 5 1 osd.5 1
213 ...
214
215 CRUSH rules are created so the generated crushmap can be tested. They
216 are the same rules as the ones created by default when creating a new
217 Ceph cluster. They can be further edited with:
218
219 # decompile
220 crushtool -d crushmap -o map.txt
221
222 # edit
223 emacs map.txt
224
225 # recompile
226 crushtool -c map.txt -o crushmap
227
229 The reclassify function allows users to transition from older maps that
230 maintain parallel hierarchies for OSDs of different types to a modern
231 CRUSH map that makes use of the device class feature. For more infor‐
232 mation, see
233 https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
234
236 See
237 https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
238 for sample crushtool --test commands and output produced thereby.
239
241 crushtool is part of Ceph, a massively scalable, open-source, distrib‐
242 uted storage system. Please refer to the Ceph documentation at
243 http://ceph.com/docs for more information.
244
246 ceph(8), osdmaptool(8),
247
249 John Wilkins, Sage Weil, Loic Dachary
250
252 2010-2022, Inktank Storage, Inc. and contributors. Licensed under Cre‐
253 ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
254
255
256
257
258dev Jun 22, 2022 CRUSHTOOL(8)